Linked Data Journalism

Wouter Beek

Knowledge Representation & Reasoning Group
VU University Amsterdam

Slides at

Existing paradigms

Methodology Purpose Professional ethic
Computer-Assisted Reporting Social-science empiricism Story creation Investigative Journalism
Data Journalism Explorative search Information overview /
Participatory transparency
Open Source

Problems with existing paradigms

  • Computer-Assisted Reporting
    • Labor-intensive
    • Applicability of empirical approach
  • Data Journalism
    • Focus on numeric data
    • Visualization-centric
    • Distant from a traditional (narrative) journalistic product

Data ≠ digitally stored numbers

As more information has become ones and zeroes at its most elemental level, more journalism has involved gathering, analyzing, and computing that information as quantitative data as well.
(Coddington 2015)

Data can be:

  • Facts (some facts are numeric)
  • Arguments
  • Taxonomies
  • Ontologies

Let's integrate data into journalism

  • "A new type of reporter is needed."
  • Data Journalism will lead to "[l]ess guessing, less looking for quotes — instead, a journalist can build a strong position supported by data" (Gray2012)

Let's integrate data into the journalist's workflow

Better integraton between journalistic product (natural language text) and data.

  1. Big Data platform
  2. Entity Detection: the automatic identification of concepts, people, places, ...
  3. Linking: between concepts/people/places/etc. and data about them
  4. Aggregation: Data → Information
  5. Contextualization: Information → Knowledge

[1] Big Data platform:

LOD Laundromat

[2] Entity Detection

For a natural language text, find the objects denoted by names and the relations denoted by adjectives, nouns and verbs.

E.g., DBpedia Spotlight

[3] Linking

Given a logical model of the natural language text, pull in data about the entities described in that text.


  • [4] Aggregation
  • [5] Contextualization
  • Domain expert involvement & field testing