Overview

May 30th, 2016

Wouter Beek & Filip Ilievski
http://wouterbeek.github.io

How generalizable is SW research?

ISWC 2014 Research Track

The ‘economics’ of LOD evaluations


  • Dataset cleaning is manual and wasteful
  • Cleaning the next dataset requires similar amount of work




  • High cost per evaluation: Disincentivizes reproducibility
  • High cost per dataset: Disinsentivizes generalizability

Today's WoD unsuitable as evaluation platform

  • Datasets cannot be found
  • IRI dereferening is broken
  • SPARQL enforces restrictions
  • Bulk downloads cannot be queried online
  • Bulk downloads are not standards conform
  • Corrections cannot be written to the WoD

Result

  • Evaluations are run locally
  • Data cleaning is performed on local copy
  • Results of data cleaning are removed with the local copy

LOD Laundromat



http://lodlaundromat.org
Beek & Rietveld & Bazoobandi & Wielemaker & Schlobach “LOD laundromat: A Uniform Way of Publishing Other People’s Dirty Data” ISWC 2014

SW layer cake

Alt. SW layer cake

Reproducing “RDF Vault”
(Bazoobandi 2015)

RDF Vault LOD Lab
              frank documents --downloadUri \
                --minTriples 1000 --maxTriples 100000 |
                ./runVaultExperimentForFile
            
L. Rietveld & W. Beek & S. Schlobach, “LOD Lab: Experiments at LOD Scale”, International Semantic Web Conference, 2015 (Best Paper Award)

Reproducing “Linked Data Best Practices”
(Schmachtenberg 2014)

Original LOD Lab
Prefix #datasets %datasets Prefix #documents %documents
rdf 996 98.22% rdf 639,575 98.40%
rdfs 736 72.58% time 443,222 68.19%
foaf 701 69.13% cube 155,460 23.92%
dcterm 568 56.01% sdmxdim 154,940 23.84%
owl 370 36.49% worldbank 147,362 22.67%
              frank documents --downloadUri |
                ./countNamespacesForDocument
            
L. Rietveld & W. Beek & S. Schlobach, “LOD Lab: Experiments at LOD Scale”, International Semantic Web Conference, 2015 (Best Paper Award)

Large-scale Data Quality Improvement

Datatypes

Language tags

LOTUS

Natural language entry point to LOD Laundromat

Large-scale: 4,334,672,073 natural language literals

Configurable: Filtering based on original language, auto-detected language, subject, predicate, 32 retrieval options


F. Ilievski & W. Beek & M. Van Erp & L. Rietveld & S. Schlobach, “LOTUS: Adaptive Text Search for Big Linked Data”, ESWC 2016
F. Ilievski & W. Beek & M. Van Erp & L. Rietveld & S. Schlobach, “LOTUS: Adaptive Text Search for Big Linked Data”, ESWC 2016