The GDELT Project
www.gdeltproject.org
先声明一下, 这个大数据研究项目, 我今天早晨刚看到,
第一印象不错
Querying, Analyzing and Downloading
The entire GDELT database is 100% free and open and you can
download the raw datafiles, visualize it using the
GDELT Analysis Service, or analyze it at limitless scale with Google BigQuery.
The GDELT Project is the largest, most comprehensive, and highest resolution open database of human society ever created. Just the 2015 data alone records nearly three quarters of a trillion emotional snapshots and more than 1.5 billion location references, while its total archives span more than 215 years, making it one of the largest open-access spatio-temporal datasets in existance and pushing the boundaries of "big data" study of global human society. Its Global Knowledge Graph connects the world's people, organizations, locations, themes, counts, images and emotions into a single holistic network over the entire planet. How can you query, explore, model, visualize, interact, and even forecast this vast archive of human society?
Born Out Of The 2014 Ebola Epidemic GDELT's Mass Translation Infrastructure Helped BlueDot Identify 2019's Coronavirus
February 5, 2020
On
March 13, 2014, GDELT's global monitoring infrastructure detected the first local reports of what would go on to become the
Ebola epidemic of 2014-2016. Unfortunately, they were in French and GDELT's English-only processing at the time never sent an alert. At the time, GDELT monitored global news media in more than 100 languages (which has grown today to more than 150), but the immense computational demands of high quality robust machine translation at the scale of even a fraction of the totality of global news output each day was beyond tractability of the day. Experimental
work supported by Google Translate at the time reinforced just how much of global events and narratives were missing from the world's English language press and that to truly understand the world, GDELT must find a way to machine translate everything it monitored around the globe in as many languages as possible.
The end result later that year was
GDELT Translingual, a pioneering infrastructure that first introduced the world to the concept of at-scale mass machine translation of the news, a model which GDELT has helped bring to countless industries in the years since. Powered by Translingual's global infrastructure, GDELT today translates absolutely everything it monitors globally in
65 languages, allowing it to surface the most nuanced narratives and subtle indicators about the least expected events.
Fast forward to this past December when the Chinese Coronavirus first emerged and this vision of mass machine translation of the planet made it possible in December 2019 for BlueDot Global to use its machine learning algorithms to
identify the earliest reports of the Chinese Coronavirus from GDELT's feeds when it was still just a handful of cases of "
viral pneumonia … of unknown cause", with BlueDot sending out an alert
nearly a full week before the CDC's and WHO's official warnings.
GDELT's mass machine translation initiative was born out of its inability in 2014 of its event and knowledge graph systems to identify the first French-language domestic reports of the Ebola outbreak that it had monitored due to its inability to process content beyond English.
Fast forward to this past December and the ability of GDELT's immense translation infrastructure that resulted from that outbreak allowed BlueDot Global's machine learning algorithms to flag and send out alerts of the Chinese Coronavirus outbreak almost a week before official sources.
That's a pretty incredible outcome.