LOD Lab Tutorial

May 30th, 2016

Wouter Beek & Javier Fernández & Michael Lauruhn & Filip Ilievski

frank statements -p foaf:knows |
grep last-fm | nt2gml > last-fm.gml

Program

  • 14:00-14:10 Introduction round
  • 14:10-14:30 LOD Lab (presentation)
  • 14:30-14:50 Find IRIs with LOTUS (hands-on)
  • 14:50-15:00 break
  • 15:00-15:20 Scalable storage with HDT (presentation)
  • 15:20-15:40 Find statements with HDT, LDF and Frank
  • 15:40-15:50 break
  • 15:50-16:10 Reproducibility (presentation)
  • 16:10-16:40 Find docs with LOD Laundromat (hands-on)
  • 16:40-17:00 Future infra for LOD evaluations (discussion)

Introduction round

Javier Fernández
Michael Lauruhn
Wouter Beek & Filip Ilievski

LOD Lab overview

Find IRIs with LOTUS

lotus.lodlaundromat.org

Find ‘monkey’

http://lotus.lodlaundromat.org/retrieve?string=monkey

“Exclude blank nodes”

http://lotus.lodlaundromat.org/retrieve?string=monkey&noblank=true

Filter by subject: “Only OpenCyc subjects”

http://lotus.lodlaundromat.org/retrieve?string=monkey&noblank=true&subject=sw.opencyc.org

Filter by predicate: “Exclude predicates containing ‘label’)”

http://lotus.lodlaundromat.org/retrieve?string=monkey&noblank=true&subject=sw.opencyc.org&predicate=NOT%20label

Filter by language tag: “Only literals with ‘en’ tag”

http://lotus.lodlaundromat.org/retrieve?string=monkey&noblank=true&subject=sw.opencyc.org&predicate=NOT%20label&langtag=en

Overview of all options

http://lotus.lodlaundromat.org/docs

Header Dictionary Triples (HDT)

Scalable storage & querying

Find statements with HDT, LDT and Frank

Sustainable querying with HDT & LDF

L. Rietveld & R. Verborgh & W. Beek & M. Vander Sande & S. Schlobach, “Linked Data-as-a-Service: The Semantic Web Redeployed”, ESWC 2015

Authority

AAA: Anyone can say anything about anything

W. Beek & L. Rietveld & S. Schlobach & F. Van Harmelen, “LOD Laundromat: Why the Semantic Web Needs Centralization (Even If We Don't Like It)”, IEEE Internet Computing, 20 (2), p.78-81, 2016

Frank

Federated Resource Architecture for Networked Knowledge

https://github.com/LOD-Laundromat/Frank

W. Beek & L. Rietveld. “Frank: The LOD Cloud at your Fingertips” Extended Semantic Web Conference: Developers Workshop, 2015.

LDF: Info about monkeys
from this document

http://ldf.lodlaundromat.org/2642254ff835bbae48848958a7a9a19f ?subject=http%3A%2F%2Fs211.photobucket.com%2Falbums %2Fbb151%2Fsilverhog01%2F%3Faction%3Dview%26current %3Dmonkey.gif%26sort%3Dascending

Frank: Info about monkeys
from everywhere

frank statements -s dbr:Monkey


+ show the document

frank statements -s dbr:Monkey -g

Reproducibility

Find documents with
LOD Laundromat

LOD Laundromat Index

http://index.lodlaundromat.org


LOD Laundromat SPARQL endpoint

http://lodlaundromat.org/sparql

Find documents by size

SPARQL

SELECT ?doc WHERE {
  ?doc llo:triples ?n
  FILTER(?n >= 880)
  FILTER(?n <= 900)
}

http://lodlaundromat.org/sparql


Frank

frank documents --minTriples 880 --maxTriples 900

https://github.com/LOD-Laundromat/Frank

Find documents by degree

SPARQL

SELECT ?doc ?x WHERE {
  ?doc llm:metrics/llm:inDegree/llm:mean ?x
  FILTER(?x >= 3)
}

http://lodlaundromat.org/sparql


Frank

frank documents --minAvgInDegree 3

https://github.com/LOD-Laundromat/Frank

Find documents by namespace


http://index.lodlaundromat.org/ns2d/http%3A%2F%2Fwww.w3.org%2F2006%2Fvcard%2Fns%23




Frank

frank documents --namespace http://www.w3.org/2006/vcard/ns#

https://github.com/LOD-Laundromat/Frank

Combine multiple Frank calls

frank documents --namespace void --minTriples 1000 |
frank statements --predicate foaf:name |
head -n 5


europa:Eurostat foaf:name "Eurostat".
tw:ReviewCommission foaf:name "Review Commission"^^xsd::string.
sw:gianluca-demartini foaf:name "Gianluca Demartini".
sw:mohammad-mannan foaf:name "Mohammad Mannan".
sw:tom-minka foaf:name "Tom Minka".

Combine Frank with external programs

frank statements -p foaf:knows |
grep last-fm | ./ntriplesToGml > last-fm.gml

Discussion

  • Is centralization necessary to make the WoD a reproducibility platform?
  • What about complex queries & reasoning?
  • Evaluations over real-time (e.g. sensor) data
  • Current tooling is optimized for files, but most of the time the file is not the right granularity level
  • The WoD is optimized for human interaction patterns