LOD-a-lot
A Single-File Enabler for Data Science
What is the cost of access?
Datasets used by us
What is the cost of access?
You need a €10K+ cluster!
Low-cost LOD access
- 1 file
- 28,362,198,927 unique triples
- >650K data documents
- 524 GB of disk space
- 15.7 GB of RAM
- €305,- hardware cost
(1/3) LOD Laundromat
(2/3) Header Dictionary Triples (HDT)
A Single-File Enabler for Data Science
Capabilities
- Enumerate terms
- Query for Triple Patterns
- Retrieve metrics
Data Science use cases
- obtaining statistics
- enumerating schema
- identity closure
- graph navigation
- query planning
- random sampling for Machine Learning
- generating specialized indexes
- versioning
- analyzing inconsistencies
Use case 1/3: Obtaining statistics
triples |
28,362,198,927 |
subject |
3,214,3.8,198 |
predicates |
1,168,932 |
objects |
3,178,409,386 |
subject & object |
1,298,808,567 |
Use case 3/3: Identity closure
558,943,116 owl:sameAs
triples