LOD Laundromat
@CLARIAH Tech Dag

October 7th, 2016

Wouter Beek (w.g.j.beek@vu.nl)

Metcalfe's Law

The value of a network is proportional to the square of the number of connected nodes

So... how many connected nodes does the SW have?

Data growth is exponential

SW growth is linear

How to solve this?

lodlaundromat.org



Beek & Rietveld & Bazoobandi & Wielemaker & Schlobach “LOD laundromat: A Uniform Way of Publishing Other People’s Dirty Data” ISWC, 2014.

LOD Laundromat uses the ClioPatria triple store, written in SWI-Prolog

github.com/ClioPatria/ClioPatria

github.com/SWI-Prolog/swipl-devel

Wielemaker & Beek & Hildebrand & Van Ossenbruggen, ‘ClioPatria: A SWI-Prolog Infrastructure for the Semantic Web’ in Semantic Web Journal, 2016.

Header Dictionary Triples (HDT)

rdfhdt.org

Fernández & Martínez-Prieto & Gutiérrez & Polleres & Arias, ‘Binary RDF representation for publication and exchange(HDT)’ in Web Semantics: Science, Services and Agents on the World Wide Web, Vol. 19, p. 22-41, 2013.

How to query >30B statements (1/2)

How to query >30B statements (2/2)

Rietveld & Verborgh & Beek & Vander Sande & Schlobach, “Linked Data-as-a-Service: The Semantic Web Redeployed” ESWC 2015.

SW layer cake

Alt. SW layer cake

Beek & Rietveld & Schlobach & Van Harmelen “LOD Laundromat: Why the Semantic Web Needs Centralization (Even If We Don't Like It)” IEEE Internet Computing 20 (2) p.78-81, 2016.

Hands-on

Find IRIs with LOD Search

SotA findability comparable to 1995 Yahoo! index

Semantic Search Engine

Ilievski & Beek & Van Erp & Rietveld & Schlobach, ‘LOTUS: Adaptive Text Search for Big Linked Data’, ESWC 2016.

lodsearch.org

Find statements with Frank

github.com/LOD-Laundromat/Frank

Beek & Rietveld. “Frank: The LOD Cloud at your Fingertips” Extended Semantic Web Conference: Developers Workshop, 2015.

With Frank

Info about ‘dbr:Monkey’ from any document:

frank statements -s dbr:Monkey


+ show the document

frank statements -s dbr:Monkey -g

Combine multiple Frank calls

frank documents --namespace void --minTriples 1000 |
frank statements --predicate foaf:name |
head -n 5


europa:Eurostat foaf:name "Eurostat".
tw:ReviewCommission foaf:name "Review Commission"^^xsd::string.
sw:gianluca-demartini foaf:name "Gianluca Demartini".
sw:mohammad-mannan foaf:name "Mohammad Mannan".
sw:tom-minka foaf:name "Tom Minka".

Combine Frank with external programs

frank statements -p foaf:knows |
grep last-fm | ./ntriplesToGml > last-fm.gml

Find documents

Find documents by resource (index)

“Documents with resource ‘owl:inverseFunctionalProperty’.”

http://index.lodlaundromat.org/r2d/http%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23InverseFunctionalProperty

Find documents by namespace (index)

“Documents with namespace ‘owl’.”

http://index.lodlaundromat.org/ns2d/http%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23

Find documents by metadata (SPARQL)

“Documents of size 880-900.”

SELECT ?doc WHERE {
  ?doc llo:triples ?n
  FILTER(?n >= 880)
  FILTER(?n <= 900)
}

(link)

Find documents by metadata (Frank)

“Documents of size 880-900.”

frank documents --minTriples 880 --maxTriples 900

Find documents by degree (SPARQL)

“Documents with average in-degree 3 or higher.”

SELECT ?doc ?x WHERE {
  ?doc llm:metrics/llm:inDegree/llm:mean ?x
  FILTER(?x >= 3)
} LIMIT 100

(link)

Find documents by degree (Frank)

“Documents with average in-degree 3 or higher.”

frank documents --minAvgInDegree 3

Find documents by namespace (Frank)

frank documents --namespace http://www.w3.org/2006/vcard/ns#

Thank you!

Mail: w.g.j.beek@vu.nl

WWW: wouterbeek.com

Triply: triply.cc