June 3rd, 2018
Wouter Beek (w.g.j.beek@vu.nl), VU University Amsterdam (VUA), Triply B.V.
Richard Zijdeman (richard.zijdeman@iisg.nl), International Institute for Social History (IISH)
Example of an artifact with poor geodata: link
Dataset | № statements | Main concepts | № geometries | Timeframe |
---|---|---|---|---|
CShapes | 6,120 | countries, cities | 510 | 1920-present |
Mint Authorities | 6,987 | authorities, houses | 950 | 565-present |
Gemeentegeschiedenis | 46,929 | municipalities, provinces | 3,219 | 1813-present |
nlGis | 60,036 | features, geometries | 4,679 |
:Greece
iisg:cowStartDay "1"^^xsd:gDay;
iisg:cowStartMonth "1"^^xsd:gMonth;
iisg:cowStartYear "1946"^^xsd:gYear.
:Greece iisg:cowStart "1946-01-01"^^xsd:date.
(In CShapes, ‘cow’ stands for Correlates of War.)
Longitude/latitude
:somewhere
wgs84:lat "…";
wgs84:lat "…";
wgs84:long "…";
wgs84:long "…".
There is also wgs84:lat_long
, but it is
almost never used.
OCLC VIAF example
:EmmaGoldman
schema:givenName "Ema";
schema:givenName "Ėmma";
schema:familyName "Gol'dman";
schema:familyName "Gōrudoman".
There are many other instances of this problem, e.g.,
foaf:firstName
and foaf:lastName
.
CShapes uses -1
to denote an unknown
year.
In the context of CShapes (countries after 1920) this makes sense.
But on the web we can query CShapes ánd Pleiades.
No currently available data transformation tool implements these two core requirements.
Proprietary formats can sometimes be transformed into open formats, e.g., ESRI ShapeFile.
Be able to stream through the data at the required granularity level.
# , name , population , shape
1 , Amsterdam , 1.3M , MultiPolygon((…))
2 , Athens , 3.1M , MultiPolygon((…))
…
GeoSPARQL support is either absent, not standards-compliant, or not performant.
prefix geo: <http://data.ordnancesurvey.co.uk/ontology/geometry/>
place:Athens a lawd:Place;
geo:hasGeometry [ geo:asWKT "LineString(5.16 52.05,…)"].
Without interoperable representations:
Large-scale empirical analyses (lod-a-lot.lod.labs.vu.nl).
Property | № statements | № documents |
---|---|---|
wgs84:alt | 2,349,607 | 9,843 |
wgs84:lat | 42,883,363 | 11,134 |
wgs84:lat_long | 283 | 173 |
wgs84:location | 14,688,561 | 117 |
wgs84:long | 42,916,785 | 11,134 |
geo:asGML | 0 | 1 |
geo:asWKT | 188,427,329 | 50 |
geo:hasGeometry | 28,366,268 | 7 |
Based on the LOD-a-lot data collection (Fernández et al. 2017).
Unfortunately, these two popular formats are incompatible:
This may be fixed in future a version of the JSON-LD standard.
https://druid.datalegend.net/nlgis
Dataset | № statements | Main concepts | № geometries | Timeframe |
---|---|---|---|---|
CShapes | 6,120 | countries, cities | 510 | 1920-present |
Mint Authorities | 6,987 | authorities, houses | 950 | 565-present |
Gemeentegeschiedenis | 46,929 | municipalities, provinces | 3,219 | 1813-present |
nlGis | 60,036 | features, geometries | 4,679 |
Dutch Cultural Heritage institutes already use this to annotate their collection with (example).