SPARQL By Example


Wouter Beek w.g.j.beek@vu.nl

Stefan Schlobach k.s.schlobach@vu.nl

April 10th, 2018

Part I: The first SPARQL query

The first SPARQL query


select ?s ?p ?o {
  ?s ?p ?o
}
limit 5
            

Components:
projection: select ?s ?p ?o
graph pattern: ?s ?p ?o .
limit: limit 5

Change the projection


select ?o ?p ?s {
  ?s ?p ?o
}
limit 5
            

Change the limit


select ?s ?p ?o {
  ?s ?p ?o
}
limit 10
            

Add an offset


select ?s ?p ?o {
  ?s ?p ?o
}
limit 5
offset 5
            

Part II: Naming & Ambiguity

In Linked Data, everything gets a name

But sometimes 2 different things carry the same name

Which things are called ‘Amsterdam’?


select ?s {
  ?s ?p "Amsterdam" .
}
limit 100
            

"Amsterdam" is a literal.

There are more things called ‘Amsterdam’…


select ?s ?p {
  ?s ?p "Amsterdam"@nl .
}
limit 100
            

"Amsterdam"@nl is a language-tagged string. This means that "Amsterdam" must be interpreted in the Dutch language (nl).

Human-readable labels


select ?s {
  ?s <http://www.w3.org/2000/01/rdf-schema#label> "Amsterdam"@nl .
}
limit 100
            

The IRI http://www.w3.org/2000/01/rdf-schema#label is commonly used to denote human-readable labels.

Abbreviate IRIs


prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?s {
  ?s rdfs:label "Amsterdam"@nl .
}
limit 100
            

IRIs can be optionally abbreviated by aliases declared with the prefix keyword.

Query both patterns


prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?s {
  {
    ?s ?p "Amsterdam" .
  } union {
    ?s rdfs:label "Amsterdam"@nl .
  }
}
limit 100
            

{ A } union { B } gives the results of BGP A ánd the results of BGP B.

Other labels for the same thing


prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?altLabel {
  ?s rdfs:label "Amsterdam"@nl .
  ?s rdfs:label ?altLabel .
}
limit 100
            

This is the first BGP that is not a (simple) Triple Pattern.

Do not repeat subject


prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?altLabel {
  ?s rdfs:label "Amsterdam"@nl ;
     rdfs:label ?altLabel .
}
limit 100
            

BGPs with repeating subject terms can be optionally abbreviated (;).

Do not repeat subject & predicate


prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?altLabel {
  ?s rdfs:label "Amsterdam"@nl , ?altLabel .
}
limit 100
            

BGPs with repeating subject/predicate pairs can be optionally abbreviated (,).

Find a specific translation


prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?altLabel {
  ?s rdfs:label "Amsterdam"@nl , ?altLabel .
  filter (lang(?altLabel) = "ja")
}
limit 100
            

Solutions for a BGP are excluded when they do not adhere to the expression in the filter clause.

Part III: Classes & Ambiguity

Instances & classes


prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select ?class {
  ?instance ?p "Amsterdam"@nl ;
            rdf:type ?class .
}
limit 100
            

rdf:type denotes the relationship between an instance and one of its classes.

Abbreviation ‘a’


select ?class {
  ?instance ?p "Amsterdam"@nl ;
            a ?class .
}
limit 100
            

Optionally abbreviate rdf:type with ‘a’.

Distinct classes


select distinct ?class {
  ?instance ?p "Amsterdam"@nl ;
            a ?class .
}
limit 100
            

Only include unique results in the projection by using the distinct keyword.

Distinct class labels


prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select distinct ?classLabel {
  ?instance ?p "Amsterdam"@nl ;
            a ?class .
  ?class rdfs:label ?classLabel .
}
limit 100
            

Notice that bindings for ?class are not included in the projection.

Skip unused nodes


prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select distinct ?classLabel {
  ?instance ?p "Amsterdam"@nl ;
            a/rdfs:label ?classLabel .
}
limit 100
            

Use Property Path notation to skip the ?class node.

Ambiguity + classes + geography


prefix geo: <http://www.opengis.net/ont/geosparql#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?wkt ?wktLabel {
  ?s rdfs:label "Amsterdam"@nl ;
     geo:hasGeometry/geo:asWKT ?wkt ;
     a/rdfs:label ?wktLabel .
}
limit 100
            

Properties geo:hasGeometry and geo:asWKT are standardized by the Open Geospatial Consortium (OGC).

Well-Known Text (WKT) is a standardized serialization for shapes. Literals that encoded WKT shapes are not plain strings, they have datatype IRI geo:wktLiteral.

Notice that we use different vocabularies/ontologies within one query.

🏙 Which Amsterdam do you mean?


prefix brt: <http://brt.basisregistraties.overheid.nl/def/top10nl#>
prefix geo: <http://www.opengis.net/ont/geosparql#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?wkt ?wktLabel {
  ?instantie rdfs:label "Apeldoorn"@nl ;
             geo:hasGeometry/geo:asWKT ?wkt ;
             a brt:RegistratiefGebied .
}
limit 100
            

I mean the ‘registrational area’ (‘registratief gebied’ in Dutch).

eGov: Buildings Base Registry

With these ingredients, your can already write queries that are useful.

Another example

Part IV: Federation

LOD Cloud: 2014

There are many SPARQL endpoints out there.

LOD Cloud: 2017

And more SPARQL endpoints are becoming available each year.

Dutch municipality → DBpedia 🕸


prefix brt: <http://brt.basisregistraties.overheid.nl/def/top10nl#>
prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix geo: <http://www.opengis.net/ont/geosparql#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select ?wkt ?wktLabel {
  ?plaats1 rdfs:label "Swalmen"^^xsd:string ;
           brt:isBAGwoonplaats true ;
           geo:hasGeometry/geo:asWKT ?wkt .
  service <https://dbpedia.org/sparql> {
    ?plaats2 rdfs:label "Swalmen"@nl ; foaf:depiction ?vlag . }
  }
}
limit 1
            

service <URL> { A } means that subquery A is executed on a different SPARQL endpoint. The results are received from that endpoint, and integrated within the overall query results.

Apeldoorn does not have a flag


prefix brt: <http://brt.basisregistraties.overheid.nl/def/top10nl#>
prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix geo: <http://www.opengis.net/ont/geosparql#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select ?wkt ?wktLabel {
  ?plaats1 rdfs:label "Apeldoorn"^^xsd:string ;
           brt:isBAGwoonplaats true ;
           geo:hasGeometry/geo:asWKT ?wkt .
  service <https://dbpedia.org/sparql> {
    ?plaats2 rdfs:label "Apeldoorn"@nl .
    optional { ?plaats2 foaf:depiction ?vlag . }
  }
}
limit 1
            

When you query the web, not all information is there all the time. With optional { A } you make your queries resilient against potentially missing information.

Also useful for local databases that are incomplete and/or have missing values.

Part V:
Aggregation

Oldest buildings in Apeldoorn 👵👴


prefix bag: 
prefix geo: 
select ?wkt ?wktLabel {
  ?verblijfsobject bag:hoofdadres/bag:bijbehorendeOpenbareRuimte/bag:bijbehorendeWoonplaats/bag:naamWoonplaats "Apeldoorn" ;
                   bag:pandrelatering ?pand .
  ?pand bag:oorspronkelijkBouwjaar ?wktLabel ;
        bag:geometriePand/geo:asWKT ?wkt .
}
order by asc(?wktLabel)
limit 50
            

You can try out ‘Amsterdam’, but the results…

Part VI: Semantics is important for complex query writing

RDF semantics

  • IRIs I
  • Literals L
  • Names N := I ∪ L
  • Blank nodes B
  • Terms T := B ∪ N
  • Graph G ⊆ (B ∪ I) ⨯ I ⨯ T
  • Instance mapping σ: B → T

SPARQL algebra

  • Variables V
  • Solution mapping μ:V → T
  • Pattern instance mapping P := μ ∘ σ
  • Basic Graph Pattern (BGP) x
  • μ is a solution for x from G, if (∃P)(P(x) ⊆ G and μ is the restriction of P to V(x)

Example 1

Dataset


prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
rdfs:Class a rdfs:Class .
            

BGP

?x a ?y .

Solution μ = {(?x,rdfs:Class),(?y,rdfs:Class)}

Example 2

Dataset


prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
rdfs:Class a rdfs:Class .
            

BGP

_:1 a _:2 .

Solution μ = ∅

σ = {(_:1,rdfs:Class), (_:2,rdfs:Class)}

Classes of instances called ‘Amsterdam’


prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?instance ?class ?classLabel {
  ?instance rdfs:label "Amsterdam"@nl ; a ?class .
  ?class rdfs:label ?classLabel .
}
limit 100
            

It is not so useful to learn that something called ‘Amsterdam’ is “a thing” (owl:Thing). We only want the most specific classes.

Negation As Failure (NAF)


prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?i ?c ?label {
  ?i ?p "Amsterdam"@en ;
    a ?c .
  optional {
    ?d rdfs:subClassOf ?c .
  }
  filter (!bound(?d))
  ?c rdfs:label ?label .
}
limit 100

Only retrieve the minimal classes for instances called ‘Amsterdam’.

Use Case A: Real Estate Market

Average value of real estate in Amsterdam

The Dutch government publishes the average value of real estate throughout the Netherlands each year. The statistics are aggregates to the level of municipalities (level 1), larger neighborhoods (level 2), and smaller neighborhoods (level 3).

Determine min/max

In order to properly compare these statistics, we must first obtain the minimum and maximum average real estate value.

This can be obtained with an aggregate query, using the functions min/1 and max/1.

Example query

Putting it all together

We can now create a so-called ‘thematic map’, where the theme is real estate value prices, and the coloring of polygons is based on the average real estate price statistic. Notice how the min/max values are calculated in a sub-query (a select clause within a select clause). This query looks (relatively) complex, but it is basically integrating the two previous queries.

Example query