Provenance Report #
In order to address the use case around being able to track the evidence to understand the reported results, the data registry has the capability to produce provenance reports for each of the data products.
Provenance is the documented history of processes in a digital object’s lifecycle.
The provenance reports generated by the registry are based around the concepts
of activities
, agents
and entities
. For more information about these
concepts see the
PROV Ontology or PROV-O.
Provenance reports are only available for DataProducts
and can be accessed via
the RESTful API for example:
https://data.fairdatapipeline.org/api/prov-report/3/
Query parameters #
format
- api: a html representation of the report with media type of
text/html
- json: a json representation of the report with media type of
application/json
- json-ld: a json-ld representation of the report with media type of
application/ld+json
- jpg: an image representing the provenance with media type of
image/jpeg
- svg: an interactive image representing the provenance with media type of
image/svg+xml
- xml: an XML representation of the report with media type of
text/xml
- provn: a PROV-N representation of the report with media type of
text/provenance-notation
- api: a html representation of the report with media type of
aspect_ratio
- <float>: a float used to define the ratio for the
JPEG
andSVG
images. The default is 0.71, which is equivalent to A4 landscape.
- <float>: a float used to define the ratio for the
attributes
- True (default): show the attributes associated with an object on the image
- False: hide the attributes associated with an object on the image
dpi
- <float>: a float used to define the dpi for the
JPEG
andSVG
images
- <float>: a float used to define the dpi for the
depth
- <integer>: an integer used to determine how many levels of code runs to include, the default is 1
Prefixes #
All activities
, agents
and entities
have a URI. Prefixes are used to
represent the base component of these URIs. Two different prefixes are used,
reg
and lreg
, where reg
is used as the prefix for the central registry and
lreg
is used to refer to a local registry. Hence dependent on the location of
the object you may see:
reg:api/data_product/1
or
lreg:api/data_product/1
Using provn
as an example you may see a section similar to:
prefix lreg <http://192.168.20.10:8000/>
prefix fair <https://data.fairdatapipeline.org/vocab/#>
prefix dcat <http://www.w3.org/ns/dcat#>
prefix dcmitype <http://purl.org/dc/dcmitype/>
prefix dcterms <http://purl.org/dc/terms/>
prefix foaf <http://xmlns.com/foaf/spec/#>
Examples #
Basic Example #
For a simple case the report will contain two entities, a DataProduct
and
an ExternalObject
, where the DataProduct
is a specializationOf
an
ExternalObject
.
An example of a basic provenance diagram #
And this is an example of the XML that is produced:
<prov:document xmlns:lreg="http://192.168.20.10:8000/" xmlns:fair="https://data.fairdatapipeline.org/vocab/#" xmlns:dcat="http://www.w3.org/ns/dcat#" xmlns:dcmitype="http://purl.org/dc/dcmitype/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:foaf="http://xmlns.com/foaf/spec/#" xmlns:prov="http://www.w3.org/ns/prov#" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<prov:entity prov:id="lreg:api/data_product/1">
<prov:type xsi:type="xsd:QName">dcat:Dataset</prov:type>
<dcat:hasVersion>0.20210915.0</dcat:hasVersion>
<dcterms:description>Static parameters of the model</dcterms:description>
<dcterms:format>Comma-Separated Values File</dcterms:format>
<dcterms:modified xsi:type="xsd:dateTime">2021-09-15T14:16:42.899768+00:00</dcterms:modified>
<dcterms:title>
disease/sars_cov2/SEIRS_model/parameters/static_params
</dcterms:title>
<fair:namespace>PSU</fair:namespace>
<prov:atLocation>
file:///var/folders/0f/fj5r_1ws15x4jzgnm27h_y6h0000gr/T/tmpukqzlyig/data_store//PSU/disease/sars_cov2/SEIRS_model/parameters/static_params/0.20210915.0.csv
</prov:atLocation>
</prov:entity>
<prov:person prov:id="lreg:api/author/1">
<foaf:name>Interface Test</foaf:name>
</prov:person>
<prov:wasAttributedTo>
<prov:entity prov:ref="lreg:api/data_product/1"/>
<prov:agent prov:ref="lreg:api/author/1"/>
<prov:role xsi:type="xsd:QName">dcterms:creator</prov:role>
</prov:wasAttributedTo>
<prov:entity prov:id="lreg:api/external_object/1">
<prov:type xsi:type="xsd:QName">dcat:Dataset</prov:type>
<dcat:hasVersion>0.20210915.0</dcat:hasVersion>
<dcterms:issued xsi:type="xsd:dateTime">2021-09-15T15:16:42+00:00</dcterms:issued>
<dcterms:title>Static parameters of the model</dcterms:title>
<fair:alternate_identifier>
SEIRS model parameters - Static parameters of the model
</fair:alternate_identifier>
<fair:alternate_identifier_type>SEIRS_model_params</fair:alternate_identifier_type>
</prov:entity>
<prov:specializationOf>
<prov:specificEntity prov:ref="lreg:api/external_object/1"/>
<prov:generalEntity prov:ref="lreg:api/data_product/1"/>
</prov:specializationOf>
</prov:document>
A Data Product Generated from a Code Run #
In a complete example a DataProduct
entity would have a relationship of
wasGeneratedBy
with a CodeRun
activity, it would have a relationship of
wasAttributedTo
with an Author
agent and it would have a relationship of
wasDerivedFrom
one or more DataProducts
entities.
In turn the CodeRun
would have a relationship of wasStartedBy
with an
Author
, it would have used
a model_cofiguration, submission_script,
CodeRepoRelease
and one or more DataProducts