Prolog API for HDT (rewrite) v0.2

Table of Contents

This document is an attempt to create a new API for using HDT storage in SWI-Prolog. The code for the Prolog API for HDT is hosted on Github.

The new API will later be used to expose advanced HDT features, like term/ID translation, retrieval of terms by kind, and retrieval of random triples/terms. The code for the HDT-Server is hosted on Github.

The new functionality introduced in this document will be a direct enabler for innovative Data Science and Machine Learning use cases, as formulated in our recent SEMANTiCS paper on LOD-a-lot. Specifically, the new functionality introduced in this document will enable several use cases that are currently blocked in MaestroGraph.

1 API proposal

1.1 Predicates for working with HDT files

Old API:

hdt_open/2,		% -HDT, +Path
hdt_open/3,		% -HDT, +Path, +Options
hdt_close/1,		% +HDT
hdt_create_from_file/3, % +HDTFile, +RDFFile, +Options

New API:

hdt_create/2,	% +RDFFile, +HDTFile
hdt_create/3,	% +RDFFile, +HDTFile, +Options

hdt_open/2,	% +File, -HDT
hdt_open/3,	% +File, -HDT, +Options
hdt_close/1,	% +HDT

Changes:

  1. Changed the name hdt_create_from_file/3 to hdt_create/3.
  2. Changed the argument order for creating an HDT file: from an RDF file (arg1) to an HDT file (arg2), i.o. the other way around.
  3. Added wrapper predicate hdt_create/2 that uses the default options.
  4. Changed the argument order for opening an HDT file: from an HDT file (arg1) to an HDT handle (arg2), i.o. the other way around.

1.2 Predicates for converting between terms and IDs

Old API:

hdt_subject_id/3,	% +HDT, ?Subject, ?Id
hdt_predicate_id/3,	% +HDT, ?Predicate, ?Id
hdt_object_id/3,	% +HDT, ?Object, ?Id

New API:

hdt_term_id/4,	% +HDT, +Role, ?Term, ?ID

Changes:

  1. The role is now given as an argument.
  2. The role can now also be node or shared.

1.3 Predicates for working with triples

Old API:

hdt_search/4,		% +HDT, ?S,?P,?O
hdt_search_cost/5,	% +HDT, ?S,?P,?O, -Cost
hdt_search_id/4,	% +HDT, ?S,?P,?O

New API:

hdt/4,		% +HDT, ?S,?P,?O
hdt_id/4,	% +HDT, ?SID,?PID,?OID

hdt_count/5,	% +HDT, ?S,?P,?O, -Count
hdt_count_id/5,	% +HDT, ?SID,?PID,?OID, -Count

hdt_rnd/4,	% +HDT, ?S,?P,?O
hdt_rnd_id/4,	% +HDT, ?SID,?PID,?OID

Changes:

  1. Estimates/costs are now called counts.
  2. Renamed hdt_search/4 to hdt/4.
  3. Renamed hdt_search_cost/5 to hdt_count/5.
  4. Renamed hdt_search_id/4 to hdt_id/4.
  5. Added predicate for retrieving counts based on a Triple Pattern specified by IDs (hdt_id_count/5).
  6. Added predicates for retrieving random triples (hdt_id_rnd/4 for IDs; hdt_rnd/4 for terms).

1.4 Predicates for working with terms

Old API:

hdt_subject/2,		% +HDT, -Subject
hdt_predicate/2,	% +HDT, -Predicate
hdt_shared/2,		% +HDT, -Shared
hdt_object/2,		% +HDT, -Object

New API:

hdt_term/3,		% +HDT, +Role, -Term
hdt_term_id/3,		% +HDT, +Role, -ID

hdt_term_count/3,	% +HDT, +Role, -Count

hdt_term_id_rnd/3,	% +HDT, +Role, -ID
hdt_term_rnd/3,		% +HDT, +Role, -Term

Changes:

  1. The 4 old predicates are replaced by hdt_term/3, where the type of term is given in argument Role.
  2. Role can be object, predicate, shared, or subject, as before, but also bnode, iri, literal, name, node, or term.
  3. Added a simple way to retrieve the number of terms of given Role (hdt_term_count/3).
  4. Added variant of hdt_term/3 that enumerates IDs (hdt_term_id/3).
  5. Added predicate for retrieving random terms

We do not allow Role to have value term, because terms that are both predicates and nodes have two IDs.

1.5 Predicates for working with terms based on a prefix

Old API:

hdt_suggestions/5, % +HDT, +Base, +Role, +MaxCount, -List

New API:

hdt_term/4,		% +HDT, +Role, +Prefix, -Term
hdt_term_id/4,		% +HDT, +Role, +Prefix, -ID

hdt_term_count/4,	% +HDT, +Role, +Prefix, -Count

hdt_term_rnd/4,		% +HDT, +Role, +Prefix, -Term
hdt_term_rnd_id/4,	% +HDT, +Role, +Prefix, -ID

Changes:

  1. HDT suggestions are now an extension of the term predicates (see previous section).

1.6 TODO A mapping between HDT files and named graphs

Jan already has an API + implementation for this.

1.7 Other predicates

Old API:

hdt_header/4,	% +HDT, ?S,?P,?O
hdt_property/2,	% +HTD, -Property

These will not be changed in the new API.

2 TO-DO list for hdt-cpp

2.1 DONE iterator-based suggestions function

2.2 TODO test goTo function, used by random functions

2.3 TODO store the offset of literals in the header

2.4 TODO store the offsets (S, P, O, and SO) of IRIs in the header

2.5 TODO how to guarantee uniqueness for IRI enumeration?

2.6 TODO store the offsets (S, O, and SO) of blank nodes in the header

3 TO-DO list for hdt4swipl

3.1 TODO non-deterministic reimplementation of suggestions function

3.2 TODO test random function

3.3 TODO random function with IDs

4 TO-DO list for hdt.pl

4.1 TODO hdt_rnd/4

4.2 TODO hdt_rnd_id/4

4.3 TODO hdt_term/3

4.3.1 TODO role bnode

4.3.2 TODO role iri

4.3.3 TODO role literal

4.3.4 TODO role term

4.4 TODO hdt_term/4

4.5 TODO hdt_term_count/4

4.6 TODO hdt_term_id/3

4.7 TODO hdt_term_rnd/3

4.8 TODO hdt_term_rnd/4

4.9 TODO hdt_term_rnd_id/3

4.10 TODO hdt_term_rnd_id/4

Author: Wouter Beek, Jan Wielemaker, Javier Fernández

Created: 2017-09-10 Sun 14:36

Validate