Empirical Semantics

Wouter Beek (wouter@triply.cc),
Frank van Harmelen (Frank.van.Harmelen@vu.nl)

2 notions of meaning

What formal semantics prescribes

What people do with it in practice

Jim Hendler, ESWC 2016

What is Empirial Semantics?

The empirical (i.e., non-analytic) analysis of meaning.

(We still use model theory and other formalisms in order to describe the outcomes of our analyses, but we do not use formalisms in order to prescribe what a given expressions ought to mean.)

Why is Empirical Semantics needed? (1/2)

Some aspects of meaning cannot be captured by formal meaning, but we still want to study them.

(We must observe these non-formal aspects of meaning empirically.)

Formal semantics cannot capture all aspects of meaning

Graph G₁

id:store def:sells id:tent.
id:tent  def:costs "¥150,000".
id:tent  rdf:type  id:Product.

Graph G₂

fy:aHup   pe:ko9sap_ fy:jufn12.
fy:jufn12 pe:oao9_   "Ufou".
fy:jufn12 rdf:type   fyufnt:tmffqt.

Graphs G₁ and G₂ are true in the same models.

Social Meaning

In the early days of the Semantic Web (2003) the non-formal aspects of meaning were actively discussed: link.


“An RDF graph may contain "defining information" that is opaque to logical reasoners. This information may be used by human interpreters of RDF information.”
“Human publishers of RDF content commit themselves to the mechanically-inferred social obligations.”
“The meaning of an RDF document includes the social meaning, the formal meaning, and the social meaning of the formal entailments.”

Why is Empirical Semantics needed? (2/2)

Some aspects of meaning could (theoretically) have been captured by formal meaning, but are observed to not be captured as such in common practice.

(We must observe what ‘common practice’ is empirically.)

Formally incorrect, but not meaningless


bpo:has_event rdfs:domain bpo:person.
bpo:has_event rdfs:domain bpo:event.
bpo:has_event rdfs:domain bpo:disease.
Examples from BioPortal.

Empirical research fields require infra

Like other empirical research fields, Empirical Semantics requires a serious investment in infrastructure.

LOD Observatories are needed to observe and analyse the large-scale use of Knowledge Graphs in practice.

LOD Laundromat


Published at DANS

https://doi.org/10.17026/dans-znh-bcg3

>65K datasets, >38B facts

The problem of identity

AAA:

“Anyone can say anything about anything.”

AAA adapted for identity:

Anyone can say that anything is identical to anything (and they do).

bbc:sameAs
bbc:sameAs
owl:sameAs
?

Leibniz's Law

$$a = b \leftrightarrow (\forall \phi \in \Psi)(\phi(a) = \phi(b))$$

Pragmatics of owl:sameAs

Include links to other URIs. so that they can discover more things
Tim Berners-Lee, Linked Data, 2016

Relatedness cannot replace identity

SKOS exactMatch indicates a high degree of confidence that two concepts can be used interchangeably across a wide range of information retrieval applications.
From the SKOS standard

‘Barack Obama’ in LOD

But are these links correct?

http://als.dbpedia.org/resource/Barack_Obama
http://am.dbpedia.org/resource/ባራክ_ኦባማ
http://data.nytimes.com/obama_barack_per
http://viaf.org/viaf/52010985
http://yago-knowledge.org/resource/Barack_Obama
http://rdf.freebase.com/ns/m.02mjmr
http://dbpedia.org/resource/Administration_of_Barack_Obama
http://dbpedia.org/resource/Barack_Obama_Cabinet
http://dbpedia.org/resource/Barack_Obama_presidency
http://yago-knowledge.org/resource/Presidency_of_Barack_Obama
http://rdf.freebase.com/ns/m.05b6w1g

Cluster detection for ‘Barack Obama’

  • person
  • senator
  • president
  • government

Naming

Empirical Semantics Approach

Take a meta-assertion from Analytic Semantics and evaluate it as an empirical hypothesis.

Our semantic meta-assertion / hypothesis for naming

“Names are chosen arbitrarily and have no meaning.”

Names on the Web

3,543,226,266 unique IRIs (names) on the logarithmic X-axis; the corresponding number of documets in which each IRI occurs on the logarithmic Y-axis.

Quantifying the meaning of names

Mutual Information = encode(FORMAL_MEANING) +
                     encode(NAMES) -
                     encode(FORMAL_MEANING + NAMES)

Two hypotheses

$H_X$
Names do not encode predicate information.
$H_Y$
Names do not encode type information.

Evaluated over ≥600,000 datasets

S. De Rooij & W. Beek & P. Bloem & S. Schlobach & F. Van Harmelen, “Are Names Meaningful? Quantifying Social Meaning on the Semantic Web”, ISWC, 2016.

Network Structure as a Proxy for Meaning

Network structure visually corresponds to aspects of meaning.

skos:exactMatch
foaf:knows
osspr:contains
geopolitics:hasBorderWidth

Thank you for your attention!

Wouter Beek (wouter@triply.cc),
Frank van Harmelen (Frank.van.Harmelen@vu.nl)

Further reading

Reproducible research
  • L. Rietveld, W. Beek & S. Schlobach, 2015. “LOD Lab: Experiments at LOD Scale”, ISWC 2015. Best Paper Award.
Large-scale data cleaning
  • W. Beek, F. Ilievski, J. Debattista, S. Schlobach & J. Wielemaker, “Literally better: Analyzing and Improving the Quality of Literals”, Semantic Web Journal 2017.
Semantic search engines
  • F. Ilievski, W. Beek, M. Van Erp, L. Rietveld & S. Schlobach, “LOTUS: Adaptive Text Search for Big Linked Data”, ESWC 2016. Best LOD Application Award.
Large-scale querying
  • J. Fernández, W. Beek, M. Martínez-Prieto & M. Arias, “LOD-a-lot: A Queryable Dump of the LOD Cloud”, ISWC 2017.
  • W. Beek, J. Fernández & R. Verborgh, “LOD-a-lot: A Single-file Enabler for Data Science”, 13th Int. Conf. on Semantic Systems 2017.
  • W. Beek, L. Rietveld, S. Schlobach & F. Van Harmelen, “LOD Laundromat: Why the Semantic Web Needs Centralization (Even If We Don't Like It)”, IEEE Internet Computing 2016.
  • L. Rietveld, R. Verborgh, W. Beek, M. Vander Sande & S. Schlobach. 2015. “Linked Data-as-a-Service: The Semantic Web Redeployed”, ESWC 2015.
Erroneous link detection
  • W. Beek, J. Raad, J. Wielemaker & F. van Harmelen “sameAs.cc: The Closure of 500M owl:sameAs Statements”, ESWC 2018. Best Resource Paper Award.
  • J. Raad, W. Beek, F. Van Harmelen, N. Pernelle & F. Saïs, “Detecting Erroneous Identity Links on the Web using Network Metrics”, ISWC 2018.