Slides at http://wouterbeek.github.io
Probably comparable to HTML/WWW, but that has:
There are at least millions of data documents but only hundreds of live query endpoints with reasonable availability.
Existing deployment techniques are unable to close the gap between downloadable data dumps and live queryable data: data is growing faster than SPARQL deployment uptake
Targeted towards human data publishers
Inherently slow approaches
Mixed results after the first 15 years of deployment.
(1) Automate conformity to standards
"Days not decades"
(2) Tools → Web Service
Look at email
The cost/benefit model of SW data publishing is broken
The incentive model for data publishing is the wrong way around:
(1) Publishing data should be (near) free
(2a) Asking more questions should cost more (CPU)
(2b) Asking complexer questions should cost more (CPU)
HDT: Disk-based, efficient yet queryable storage
SSD: Disks become faster and cheaper
LDF: BGP queries require client-side joins
|# literals encountered||12,018,939,378|
|# integers and dates||6,699,148,542|
|# indexed lexical strings||5,319,790,836|
|# distinct sources||508,244|
|# distinct language tags||713|
|# hours to create index||56|
|disk space use||484.77 GB|
Datasets used in ISWC 2014 research track papers
17 datasets are used in total
1-6 datasets per article
2 datasets per article on average
|Triples (M)||Docs||Size (MB)||Compr. rate||Docs||Size (MB)||Compr. rate|
Relate HDT compression rate to average degree
|Avg. Degree||Docs||Compr rate|
abox:item1024 rdf:type tbox:Tent . abox:item1024 tbox:soldAt abox:shop72 . abox:shop72 rdf:type tbox:Store .
fy:jufn1024 pe:ko9sap_ fyufnt:Ufou . fy:jufn1024 fyufnt:tmffqt fy:aHup . fy:aHup pe:ko9sap_ fyufnt:70342 .
These graphs denote the same models.
According to model theory, IRIs are individual constants or predicate letters whose names are chosen arbitrarily and thus carry no meaning.
Try to refute the hypothesis that names and meanings are independent.