Linked Data


Wouter Beek (w.g.j.beek@vu.nl, wouter@triply.cc)

January 24th, 2018

SPARQL

The first query


select * {
  ?s ?p ?o .
}
limit 10
            

Search by label


prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select ?s {
  ?s wp:organismName "Homo sapiens"^^xsd:string .
}
limit 10
            

Property path


prefix dct: <http://purl.org/dc/terms/>
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select ?s {
  ?s dct:isPartOf/wp:organismName "Homo sapiens"^^xsd:string .
}
limit 10
            

Match any graph structure


prefix dct: <http://purl.org/dc/terms/>
prefix obo: <http://purl.obolibrary.org/obo/>
prefix path: <http://identifiers.org/wikipathways/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix wp: <http://vocabularies.wikipathways.org/wp#>
select * {
  ?interaction dct:isPartOf ?organism ;
               wp:source ?source ;
               wp:target ?target .
  ?source rdfs:label ?sourceName .
  ?target rdfs:label ?targetName ;
          wp:bdbWikidata ?wikidata .
  ?organism wp:organism obo:NCBITaxon_9606 ;
            wp:organismName ?organismName .
}
limit 1
            

Link to Wikidata


prefix direct: <http://www.wikidata.org/prop/direct/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix wikidata: <http://www.wikidata.org/entity/>
select * {
  wikidata:Q27102800 direct:P274 ?molecule ;
                     direct:P2067 ?mass ;
                     rdfs:label ?label .
}
limit 10
            

Aggregate query


prefix dct: <http://purl.org/dc/terms/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?s ?label (count(?article) as ?n) {
  ?s dct:bibliographicCitation ?article ;
     rdfs:label ?label
}
group by ?s ?label
order by desc(?n)
limit 10
            

What is LOD?

WWW (Web of documents)

LOD (Web of data)

The 5 stars of LOD

Source: Tim Berners-Lee (http://5stardata.info)

Let's look at some Linked Open Data

What is the LOD Cloud?

Who uses Linked Data? (1/2)

Who uses Linked Data? (2/2)

Schema.org:

  • 20M web sites
  • 35% of pages in search index
  • 50% of US/EU eCommerce emails
  • 800B small graphs of ~25 statements
Source: A.W.Moore & R.V. Gua, Google Research

LOD Cloud: 2014

LOD Cloud: 2017

LOD Laundromat

lodlaundromat.org



L. Rietveld & W. Beek & S. Schlobach, “LOD Lab: Experiments at LOD Scale”, International Semantic Web Conference, 2015 (Best Paper Award).
Best Linked Open Data Application, 2015

lodsearch.org

F. Ilievski, W. Beek, M. van Erp, L. Rietveld, S. Schlobach, “LOTUS: Adaptive Text Search for Big Linked Data”, ESWC 2016.

Nice, but expensive to scale
to the full web.

Scalable Knowledge Graph Solutions

Query a Large Knowledge Graph Using Commodity Hardware

Lowering the cost of access

  • Store Linked Data in 1 file
  • 28,362,198,927 triples (>650K data documents)
  • €305,- hardware cost (524GB disk; 15.7GB RAM)

J.D. Fernández, W. Beek, M.A. Martínez-Prieto, M. Arias, “LOD-a-lot”, International Semantic Web Conference, 2017

Up to 1K HDT files

Header Dictionary Triples (HDT)

Fernández & Martínez-Prieto & Gutiérrez, “Binary RDF representation for publication and exchange (HDT)”, ISWC, 2013.

Empirical Semantics

(Measuring Meaning)

‘Barack Obama’ in the LOD Cloud

owl:sameAs links

Data cleaning (owl:sameAs)

http://als.dbpedia.org/resource/Barack_Obama
http://am.dbpedia.org/resource/ባራክ_ኦባማ
http://data.nytimes.com/obama_barack_per
http://nl.dbpedia.org/resource/Barack_Obama
http://rdf.freebase.com/ns/m.02mjmr
http://viaf.org/viaf/52010985
http://yago-knowledge.org/resource/Barack_Obama
http://dbpedia.org/resource/Administration_of_Barack_Obama
http://dbpedia.org/resource/Barack_Obama_Cabinet
http://dbpedia.org/resource/Barack_Obama_presidency
http://rdf.freebase.com/ns/m.05b6w1g
http://wikidata.dbpedia.org/resource/Q1379733
http://yago-knowledge.org/resource/Presidency_of_Barack_Obama

‘Barack Obama’ after community detection

purple: person; orange: government; green: president; blue: senator

Thank you!

Mail: w.g.j.beek@vu.nl
Mail: wouter@triply.cc