Digging Into the Knowledge Graph


Wouter Beek w.g.j.beek@vu.nl

December 1st, 2017

What is LOD?

Web of documents

Web of data / Semantic Web / LOD

The 5 stars of LOD

Source: Tim Berners-Lee (http://5stardata.info)

The 5 stars of LOD

  1. Data is available on the Web under an open license
  2. Data is available as structured data
  3. Data is available in a non-proprietary open format
  4. Data uses URIs to denote things
  5. Data links to other data

LOD in practice

Infinite domains: XSD datatypes

Let's Publish LOD

Publishing LOD

Some simple steps:

  1. Obtain or create an RDF file (BCC, UDC)
  2. Login at the KR&R web server
  3. Create a dataset (name, description, access, upload)
  4. Add dataset metadata (license, example)
  5. Browse the data
  6. Query the data
  7. Optional: federated query (local, remote)

LOD Cloud

Who uses Linked Data? (1/2)

Who uses Linked Data? (2/2)

Schema.org:

  • 20M web sites
  • 35% of pages in search index
  • 50% of US/EU eCommerce emails
  • 800B small graphs of ~25 statements
Source: A.W.Moore & R.V. Gua, Google Research

LOD Cloud: 2014

LOD Cloud: 2017

Vocabularies

LOV

Ontologies

SUMO

LOD Laundromat

lodlaundromat.org



Beek & Rietveld & Bazoobandi & Wielemaker & Schlobach “LOD laundromat: A Uniform Way of Publishing Other People’s Dirty Data” ISWC, 2014.

How generalizable is SW research?

L. Rietveld & W. Beek & S. Schlobach, “LOD Lab: Experiments at LOD Scale”, International Semantic Web Conference, 2015 (Best Paper Award).

Evaluation results for ±600,000 datasets

De Rooij & Beek & Bloem & Schlobach & Van Harmelen, ‘Are Names Meaningful? Quantifying Social Meaning on the Semantic Web’ in ISWC, 2016.

Semantic Search Engine

Ilievski & Beek & Van Erp & Rietveld & Schlobach, ‘LOTUS: Adaptive Text Search for Big Linked Data’, ESWC 2016.

lodsearch.org

LOD-a-lot

The cost of access

  • 1 file
  • 28,362,198,927 unique triples
  • >650K data documents
  • 524 GB of disk space
  • 15.7 GB of RAM
  • €305,- hardware cost

Empirical Semantics

bbc:sameAs

bbc:sameAs

owl:sameAs

?

owl:sameAs has 2 meanings

Formal meaning

$$a = b \,\longleftrightarrow\, (\forall P)(Pa = Pb)$$

Social meaning

“Include links to other URIs, to discover more things.”

Example: Identity closure

558,943,116 owl:sameAs triples

Cleaning owl:sameAs

Community 1: Obama, the person

              http://als.dbpedia.org/resource/Barack_Obama
              http://am.dbpedia.org/resource/ባራክ_ኦባማ
              http://data.nytimes.com/obama_barack_per
              http://nl.dbpedia.org/resource/Barack_Obama
              http://rdf.freebase.com/ns/m.02mjmr
              http://viaf.org/viaf/52010985
              http://yago-knowledge.org/resource/Barack_Obama

Community 2: Obama, the administration

                http://dbpedia.org/resource/Administration_of_Barack_Obama
                http://dbpedia.org/resource/Barack_Obama_Cabinet
                http://dbpedia.org/resource/Barack_Obama_presidency
                http://rdf.freebase.com/ns/m.05b6w1g
                http://wikidata.dbpedia.org/resource/Q1379733
                http://yago-knowledge.org/resource/Presidency_of_Barack_Obama

UDC ↔ LOD

UDCLOD
Coordination/addition (+) Concept intersection (C ⊓ D)
Subgrouping […] Precedence
Common auxiliaries of language (=…) Language tag ("Բարև ձեզ"@hy-am)
Date/time "YYYY.MM.DD" Datatype xsd:dateTime
Reduced precision
2017-12
Datatype
"2017-12"^^xsd:gYearMonth
Non-UDC notation (*) Vocabulary reuse, OWL import
relation between 2 or more subjects (:) reification (n-ary relationship)
Concept extension
(DUTCH) = DUTCH-SPEAKERS
Range restriction
Person ⊓ (∃ speaks.{dutch})
Order fixing
77.04::355.4, “war photography”)
Range restriction
Photograph ⊓ (∃ subject.{war})

UDC features not in LOD

  1. Order carries meaning (x.y.z); grouping of documents; used for indexing/shelving
  2. ‘Phase’ 37-042.4:004 “use of computers in education”
  3. Distinction between common & special auxiliaries
    • Common: apply all classes
    • Special: apply to certain classes
  4. Special notation for certain properties, e.g., form (0…), country (n…)
  5. Special notation for temporal ranges "BEGIN/END"

Thank you!

Mail: w.g.j.beek@vu.nl
WWW: wouterbeek.com