June 11th, 2018
Wouter Beek (w.g.j.beek@vu.nl), Joe Raad (joe.raad@agroparistech.fr), Jan Wielemaker, and Frank van Harmelen
owl:sameAs
〈x, owl:sameAs, y〉
means
that (∀P)(Px ↔ Py)
“Include links to other URIs, to discover more
things.”
[4th Linked Data principle]
owl:sameAs
owl:sameAs
is
‘clever’ replacements for owl:sameAs
.
lexvo:nearlySameAs lexvo:somewhatSameAs owl:sameAs.
lexvo:nearlySameAs lexvo:nearlySameAs lexvo:somewhatSameAs?
owl:sameAs lexvo:somewhatSameAs bbc:sameAs?
We need an enabler for empirical research into
how owl:sameAs
is actually being used.
The analytic approach: “people make mistakes” / “it's just noise” is not enough.
sameas.org | www.sameas.cc | |
---|---|---|
№ terms | 203M | 180M |
№ statements | 345M | 559M |
№ identity sets | 63M | 49M |
www.sameAs.cc
requirements
skos:exactMatch
, rdfs:seeAlso
).Identity is the smallest equivalence relation, it is:
{:a,:b,:c,:d}
):a owl:sameAs :b.
:d owl:sameAs :b.
:a owl:sameAs :a.
:a owl:sameAs :b.
:a owl:sameAs :d.
:b owl:sameAs :a.
:b owl:sameAs :b.
:b owl:sameAs :d.
:c owl:sameAs :c.
:d owl:sameAs :a.
:d owl:sameAs :b.
:d owl:sameAs :d.
http://lod-a-lot.lod.labs.vu.nl
Fernández et al. 2017
prefix owl: <http://www.w3.org/2002/07/owl#>
construct {
?s owl:sameAs ?o
} where {
{
select distinct ?s ?o {
?s owl:sameAs ?o
filter(?s < ?o)
}
}
}
Result set size: 558.9M
Create an HDT file in 4 hours (1 CPU core); 4.5GB + 2.2GB index
For calculating the implicit identity relation we do not need the full explicit identity relation (558.9M):
Compaction reduces size by 42% (311M triples).
Run time: 5 hours (2 CPU cores); 9.3GB disk (RocksDB)
Relatively few namespaces have internal links. (Indicator that datasets enforce UNA internally.)
Domain-specific identity hubs:
www.bibsonomy.org
geonames.org
bio2rdf.org
revyu.com
rdf:type
(639,478 documents,
3,321,354,308 triples)31,3.8,556 identity sets (63.96%) have cardinality 2.
The largest identity set has cardinality 177,794. It includes Albert Einstein, the countries of the world, and the empty string. Responsible for 31,610,706,436 (90%) of the implicit identity relation.
The size of a minimal explicit identity relation that denotes the same implicit identity relation.
http://als.dbpedia.org/resource/Barack_Obama
http://am.dbpedia.org/resource/ባራክ_ኦባማ
http://data.nytimes.com/obama_barack_per
http://viaf.org/viaf/52010985
http://yago-knowledge.org/resource/Barack_Obama
http://rdf.freebase.com/ns/m.02mjmr
http://dbpedia.org/resource/Administration_of_Barack_Obama
http://dbpedia.org/resource/Barack_Obama_Cabinet
http://dbpedia.org/resource/Barack_Obama_presidency
http://yago-knowledge.org/resource/Presidency_of_Barack_Obama
http://rdf.freebase.com/ns/m.05b6w1g
Communities correspond to roles:
http://dbpedia.org/resource/Crete
https://dbpedia.org/resource/Crete