Linked Data as-a-Service

The Semantic Web Redeployed

Laurens Rietveld, Ruben Vergorgh, Wouter Beek, Miel vander Sande

Slides at http://wouterbeek.github.io

Problem 1

The LOD Cloud cannot be uniformly queried today.

  • Most SW datasets are not available online
  • Those that are online are often data dumps
  • Many of those are not standards-compliant
  • Many datasets that can be queried live have a custom API
  • Most custom APIs are not self-describing
  • Many live-queryable datasets pose restrictions
  • Others have low availability
  • Different SPARQL endpoints pose different restrictions

Problem 2

Existing deployment techniques are unable to close the gap between downloadable data dumps and live queryable data

Query endpoint availablility

Problem 3: Service level

Even though a technology stack for publishing Semantic Web data exists today, there is currently no simplified Web Service that does the same thing on a Web-scale.

Problem 4: Federation

Tthere are no LOD Cloud-wide guarantees as to whether, and if so how, sub-queries will be evaluated by different endpoints.

Related work

Large Linked Datasets: BTC, Freebase

Large Linked Data Idexes: Sindice, LODCache, DyLDO

Cloud-based triple store: Dydra

Solution 1: Machine readability

Clean & republish all data documents.

LOD Laundromat



https://lodlaundromat.org

Open source (of course)

Solution 2: Availability

Strike a balance between server- and client-side processing.

Server-side

Cleant- and servers-side (LDF)

Solution 3: Servicability

  • Integration with popular services (Dropbox)
  • Self-descriptive Web Services (Hydra)
  • Command-line tool Frank
  • Libraries for popular programming languages

Scalable

  • 38B ground statements
  • 650K data documents (265GB HDT, 193GB raw)

Usage numbers (April stats)

  • 2,150 users
  • 2,119,218 downloads
  • 8,586,193 queries

Solution 4: Federated querying

Questions?