Provenance Report #
In order to address the use case around being able to track the evidence to understand the reported results, the data registry has the capability to produce provenance reports for each of the data products.
Provenance is the documented history of processes in a digital object’s lifecycle.
The provenance reports generated by the registry are based around the concepts
of activities, agents and entities. For more information about these
concepts see the
PROV Ontology or PROV-O.
Provenance reports are only available for DataProducts and can be accessed via
the RESTful API for example:
https://data.fairdatapipeline.org/api/prov-report/3/
Query parameters #
format
- api: a html representation of the report with media type of
text/html - json: a json representation of the report with media type of
application/json - json-ld: a json-ld representation of the report with media type of
application/ld+json - jpg: an image representing the provenance with media type of
image/jpeg - svg: an interactive image representing the provenance with media type of
image/svg+xml - xml: an XML representation of the report with media type of
text/xml - provn: a PROV-N representation of the report with media type of
text/provenance-notation
- api: a html representation of the report with media type of
aspect_ratio
- <float>: a float used to define the ratio for the
JPEGandSVGimages. The default is 0.71, which is equivalent to A4 landscape.
- <float>: a float used to define the ratio for the
attributes
- True (default): show the attributes associated with an object on the image
- False: hide the attributes associated with an object on the image
dpi
- <float>: a float used to define the dpi for the
JPEGandSVGimages
- <float>: a float used to define the dpi for the
depth
- <integer>: an integer used to determine how many levels of code runs to include, the default is 1
Prefixes #
All activities, agents and entities have a URI. Prefixes are used to
represent the base component of these URIs. Two different prefixes are used,
reg and lreg, where reg is used as the prefix for the central registry and
lreg is used to refer to a local registry. Hence dependent on the location of
the object you may see:
reg:api/data_product/1
or
lreg:api/data_product/1
Using provn as an example you may see a section similar to:
prefix lreg <http://192.168.20.10:8000/>
prefix fair <https://data.fairdatapipeline.org/vocab/#>
prefix dcat <http://www.w3.org/ns/dcat#>
prefix dcmitype <http://purl.org/dc/dcmitype/>
prefix dcterms <http://purl.org/dc/terms/>
prefix foaf <http://xmlns.com/foaf/spec/#>
Examples #
Basic Example #
For a simple case the report will contain two entities, a DataProduct and
an ExternalObject, where the DataProduct is a specializationOf an
ExternalObject.
An example of a basic provenance diagram #

And this is an example of the XML that is produced:
<prov:document xmlns:lreg="http://192.168.20.10:8000/" xmlns:fair="https://data.fairdatapipeline.org/vocab/#" xmlns:dcat="http://www.w3.org/ns/dcat#" xmlns:dcmitype="http://purl.org/dc/dcmitype/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:foaf="http://xmlns.com/foaf/spec/#" xmlns:prov="http://www.w3.org/ns/prov#" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<prov:entity prov:id="lreg:api/data_product/1">
<prov:type xsi:type="xsd:QName">dcat:Dataset</prov:type>
<dcat:hasVersion>0.20210915.0</dcat:hasVersion>
<dcterms:description>Static parameters of the model</dcterms:description>
<dcterms:format>Comma-Separated Values File</dcterms:format>
<dcterms:modified xsi:type="xsd:dateTime">2021-09-15T14:16:42.899768+00:00</dcterms:modified>
<dcterms:title>
disease/sars_cov2/SEIRS_model/parameters/static_params
</dcterms:title>
<fair:namespace>PSU</fair:namespace>
<prov:atLocation>
file:///var/folders/0f/fj5r_1ws15x4jzgnm27h_y6h0000gr/T/tmpukqzlyig/data_store//PSU/disease/sars_cov2/SEIRS_model/parameters/static_params/0.20210915.0.csv
</prov:atLocation>
</prov:entity>
<prov:person prov:id="lreg:api/author/1">
<foaf:name>Interface Test</foaf:name>
</prov:person>
<prov:wasAttributedTo>
<prov:entity prov:ref="lreg:api/data_product/1"/>
<prov:agent prov:ref="lreg:api/author/1"/>
<prov:role xsi:type="xsd:QName">dcterms:creator</prov:role>
</prov:wasAttributedTo>
<prov:entity prov:id="lreg:api/external_object/1">
<prov:type xsi:type="xsd:QName">dcat:Dataset</prov:type>
<dcat:hasVersion>0.20210915.0</dcat:hasVersion>
<dcterms:issued xsi:type="xsd:dateTime">2021-09-15T15:16:42+00:00</dcterms:issued>
<dcterms:title>Static parameters of the model</dcterms:title>
<fair:alternate_identifier>
SEIRS model parameters - Static parameters of the model
</fair:alternate_identifier>
<fair:alternate_identifier_type>SEIRS_model_params</fair:alternate_identifier_type>
</prov:entity>
<prov:specializationOf>
<prov:specificEntity prov:ref="lreg:api/external_object/1"/>
<prov:generalEntity prov:ref="lreg:api/data_product/1"/>
</prov:specializationOf>
</prov:document>
A Data Product Generated from a Code Run #
In a complete example a DataProduct entity would have a relationship of
wasGeneratedBy with a CodeRun activity, it would have a relationship of
wasAttributedTo with an Author agent and it would have a relationship of
wasDerivedFrom one or more DataProducts entities.
In turn the CodeRun would have a relationship of wasStartedBy with an
Author, it would have used a model_cofiguration, submission_script,
CodeRepoRelease and one or more DataProducts
An example of a provenance diagram #

