Data sources

The GDELT Database

Subject:
Other data:
  • News and news sentiment
Keywords: open data, social sciences, big data, news, trends, event classification, network analysis
Accessible to: Aalto Students & Staff, Aalto users, Everybody
Logo for The GDELT Project

Data source type

Open source: Other

License

Open access

Analysis unit

Other

Geographical coverage

Global

Time period coverage

Other

Frequency

Other

Format

CSV

Description

The GDELT Database, or the Global Database of Events, Language, and Tone, is a vast and continually updated repository of global news media data. It captures and analyzes news articles, broadcasts, and online sources from around the world to provide valuable insights into a wide range of events, emotions, and trends. 

This comprehensive resource categorizes and tracks events spanning political, social, economic, and cultural domains. It offers geospatial information to analyze events by location, facilitates sentiment analysis for emotional context, and provides temporal trends for tracking changes over time. 

Major datasets available through GDELT as of 2016:

GDELT 2.0 EVENT DATABASE

  • Type: Event Data
  • Time Range: February 2015 – Present (Will extend back to 1979 by Fall 2016)
  • Update Interval: Every 15 minutes
  • Source: Worldwide news coverage in 100 languages, 65 are live machine translated at 100% volume

GDELT 2.0 GLOBAL KNOWLEDGE GRAPH

  • Type: Global Knowledge Graph
  • Time Range: February 2015 – Present (Will extend back to 1979 by Fall 2016)
  • Update Interval: Every 15 minutes
  • Source: Worldwide news outlets in 100 languages, 65 are live machine translated at 100% volume

VISUAL GLOBAL KNOWLEDGE GRAPH

  • Type: Visual Global Knowledge Graph
  • Time Range: December 2015 – Present
  • Update Interval: Every 15 minutes
  • Source: Worldwide news imagery, with all annotations performed by Google Cloud Vision API

GDELT 1.0

  • Type: Event Data
  • Time Range: January 1979 – Present
  • Update Interval: Daily
  • Source: Worldwide news outlets in 100 languages, selected sample of foreign language content

AMERICAN TELEVISION GKG

  • Type: Global Knowledge Graph
  • Time Range: July 2009 – Present
  • Update Interval: Daily with 48 hour embargo
  • Source: American English language television news stations monitored by the Internet Archive’s Television News Archive

AFRICA AND THE MIDDLE EAST ACADEMIC LITERATURE GKG

  • Type: Global Knowledge Graph
  • Time Range: 1950-2012 (some material dating back to 1906 and newer than 2012 may be present)
  • Update Interval: None at this time
  • Source: More than 21 billion words of academic literature covering Africa and the Middle East, including all JSTOR holdings mentioning locations in the region 1950-2007, all identifiable academic literature found online since 1996 and still accessible in 2014 in the Internet Archive’s Wayback Machine holdings of 1.6 billion PDFs, all unclassified/declassified Defense Technical Information Center (DTIC) holdings 1906-2012, all unclassified/declassified CIA holdings 1946 to 2012, all CORE fulltext research collection holdings 1940 to 2012 and CiteSeerX holdings 1947 to 2011.

HUMAN RIGHTS KNOWLEDGE GRAPH

  • Type: Global Knowledge Graph
  • Time Range: 1960-2014
  • Update Interval: None at this time
  • Source: More than 110,000 documents from Amnesty International, FIDH, Human Rights Watch, ICC, ICG, US Department of State and the United Nations dating back to 1960 documenting human rights abuses across the world

HISTORICAL AMERICAN BOOKS ARCHIVE

  • Type: Global Knowledge Graph
  • Time Range: 1800-2015
  • Update Interval: None at this time
  • Source: Complete set of English-language public domain books digitized by the Internet Archive and HathiTrust from 1800 to Fall 2015 totaling 3.5 million volumes

In addition to these, there are many more datasets on different topics, all

Distributor

The GDELT Project

Cite as

Varies

Contact

[email protected] or The GDELT Project

Share
URL copied!