fair
CLI functions
#
Note that this is a living document and the following is subject to change.
A simple example of how the data pipline should run from the command line:
fair pull config.yaml
fair run config.yaml
fair push config.yaml
fair pull
#
- download any data required by
read:
from the remote data store and record metadata in the data registry (whilst editing relevant entries, e.g.storage_root
) - pull meta data associated with all previous versions of these objects listed in
write:
from the remote data registry - download any data listed in
register:
from the original source and record metadata in the data registry
fair run
#
- read (and validate) the config.yaml file
- generate a working config.yaml file (see
Working example)
-
globbing is used to interpret
*
as all matching objects as well as the original string returned, e.g. ifreal/data/1
version 0.0.1 andreal/data/thing/1
version 0.0.1 already exist in the registry, the user-written config:write: - data_product: real/data/* description: general description for all data products use: namespace: someone version: ${{MINOR}}
should return:
write: - data_product: real/data/1 use: data_product: real/data/1 description: general description for all data products version: 0.1.0 namespace: someone - data_product: real/data/thing/1 use: data_product: real/data/thing/1 description: general description for all data products version: 0.1.0 namespace: someone - data_product: real/data/* use: data_product: real/data/* description: general description for all data products version: 0.0.1 namespace: someone
-
specific version numbers and any variables in
run_metadata:
,register:
,read:
, andwrite:
are replaced with true values, e.g.${{CONFIG_DIR}}
is replaced by the directory within which the working config.yaml file residesrelease_date: ${{DATETIME}}
is replaced byrelease_date: 2021-04-14T11:34:37
which is a valid form for the registry.version: 0.${{DATE}}.0
is replaced byversion: 0.20210414.0
version: ${{PATCH}}
should increment version by patch; andversion: 0.${{DATETIME-%Y%m%d}}.0
or any variants thereof are replaced by an appropriately formatted string.
-
if no version is given, then one should be written such that patch is incremented if the data product already exists, otherwise version should be set to 0.0.1.
-
register:
is removed andexternal_object
s are written toread:
asdata_product
s -
populate
public:
field inrun_metadata:
section (default istrue
) -
populate
version:
field inuse:
section of whether the user-written config contained the field or not
-
local_repo:
must always be given in the config.yaml file- ensure the repo is clean
- get the hash of the latest commit and add to the working config.yaml file in
run_metadata: latest_commit:
- if
run_metadata: remote_repo:
isfalse
, thenfair push
should copy the repo to the file store - if
run_metadata: remote_repo:
is absent or doesn’t contain a URL, thenfair run
should try to get the remote repo url from the local repo - note that there are exceptions and the user may reference a script located outside of a repository
- save the working config.yaml file in the local data store, in <local_store>/coderun/<date>T<time>/config.yaml, e.g. datastore/coderun/20210625T165552/config.yaml
- save the submission script to the local data store in <local_store>/coderun/<date>T<time>/script.sh
- note that config.yaml should contain either
script:
that should be saved as the submission script, orscript_path:
that points to the file that should be saved as the submission script
- note that config.yaml should contain either
- save the path to <local_store>/coderun/<date>T<time>/ in the global environment as
$FDP_CONFIG_DIR
so that it can be picked up by the script that is run after this has been completed - execute the submission script
fair push
#
- push new files (generated from
write:
andregister:
) to the remote data store - record metadata in the data registry (whilst editing relevant entries, e.g.
storage_root
)