The user should write a config.yaml file containing information pertaining to the data products used in the code run. The example config.yaml file below describes a code run with inputs:
disease/sars_cov2/SEIRS_model/parameters/static_params
These inputs are listed in the register
block, meaning that they should be downloaded to the local data store from an external source, with associated metadata stored in the local registry. These inputs are automatically converted into a read
block by fair run
(when data products are already present in the data registry, inputs should be listed in the read
block).
A code run usually also has outputs, which are listed in the write
block. In the example below, our outputs are:
SEIRSconfig.yaml:
run_metadata:
default_input_namespace: sonia
description: SEIRS Model R
script: R -f inst/extdata/SEIRS.R
register:
- namespace: PSU
full_name: Pennsylvania State University
website: https://ror.org/04p491231
- external_object: SEIRS_model/parameters
namespace_name: PSU
root: https://raw.githubusercontent.com/
path: FAIRDataPipeline/rSimpleModel/main/inst/extdata/static_params_SEIRS.csv
title: Parameters of SEIRS model
description: Static parameters of SEIRS model from Figure 1
identifier: https://doi.org/10.1038/s41592-020-0856-2
file_type: csv
release_date: 2020-06-01T12:00
version: "1.0.0"
primary: False
write:
- data_product: model_output
description: SEIRS model results
file_type: csv
use:
data_product: SEIRS_model/results/model_output/R
- data_product: figure
description: SEIRS output plot
file_type: pdf
use:
data_product: SEIRS_model/results/figure/R
The submission script should call initialise()
to set up the code run, then perhaps read in some data using one of the read_*()
functions (for internal file formats) or link_read()
(for external file formats such as csvs). The data might now be processed in some way, or a model / analysis might bw carried out, after which the results should be saved in the local data store via one of the write_*()
functions or link_write()
. When the code run is complete, finalise()
should be called to register the all metadata with the local registry.
fair pull
Using the CLI tool, fair pull
identifies any data products listed in the register
field of the config.yaml. These data products are downloaded to the local data store whilst associated metadata is registered in the local registry.
fair init --ci
fair pull inst/extdata/SEIRSconfig.yaml
#> FAIR repository is already initialised.
#> Updating registry from inst/extdata/SEIRSconfig.yaml
#> WARNING:FAIRDataPipeline.ConfigYAML:Remote registry pulls are not yet implemented
#> WARNING:FAIRDataPipeline.ConfigYAML:Remote registry pulls are not yet implemented
The local registry should now contain three data products:
disease/sars_cov2/SEIRS_model/parameters/static_params
,disease/sars_cov2/SEIRS_model/parameters/rts
, anddisease/sars_cov2/SEIRS_model/parameters/efoi
.fair run
Again using the CLI tool, fair run
performs the code run, as written in the. submission script. In preparation for this, it translates the user-written config.yaml file for use by the Data Pipeline API. Any variables / wildcards specified by the user in the config file are cross referenced with the registry, and any data products registered by fair pull
are made available to read by the current code run.
fair run inst/extdata/SEIRSconfig.yaml
#> Updating registry from inst/extdata/SEIRSconfig.yaml
#>
#> R version 4.2.0 (2022-04-22) -- "Vigorous Calisthenics"
#> Copyright (C) 2022 The R Foundation for Statistical Computing
#> Platform: x86_64-pc-linux-gnu (64-bit)
#>
#> R is free software and comes with ABSOLUTELY NO WARRANTY.
#> You are welcome to redistribute it under certain conditions.
#> Type 'license()' or 'licence()' for distribution details.
#>
#> Natural language support but running in an English locale
#>
#> R is a collaborative project with many contributors.
#> Type 'contributors()' for more information and
#> 'citation()' on how to cite R or R packages in publications.
#>
#> Type 'demo()' for some demos, 'help()' for on-line help, or
#> 'help.start()' for an HTML browser interface to help.
#> Type 'q()' to quit R.
#>
#> > library(rSimpleModel)
#> > library(rDataPipeline)
#> > library(ggplot2)
#> > library(dplyr)
#>
#> Attaching package: ‘dplyr’
#>
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#>
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
#>
#> > library(magrittr)
#> >
#> > # Initialise code run
#> > conf.dir <- Sys.getenv("FDP_CONFIG_DIR")
#> > config <- file.path(conf.dir, "config.yaml")
#> > script <- file.path(conf.dir, "script.sh")
#> > if(.Platform$OS.type != "unix") {
#> + script <- file.path(conf.dir, "script.bat")
#> + }
#> > handle <- initialise(config, script)
#> ℹ Reading config.yaml from data store
#> ✔ Writing /home/runner/work/rSimpleModel/rSimpleModel/data_store/jobs/2022-06-21_14_30_51_305922/config.yaml to local registry
#> ✔ Writing /home/runner/work/rSimpleModel/rSimpleModel/data_store/jobs/2022-06-21_14_30_51_305922/script.sh to local registry
#> ✔ Writing FAIRDataPipeline/rSimpleModel to local registry
#> ✔ Writing new code_run to local registry
#> >
#> > # Read model parameters
#> > params <- handle %>% link_read("SEIRS_model/parameters") %>% read.csv
#> ℹ Locating 'SEIRS_model/parameters'
#> > a <- params %>% filter(param == "alpha") %$% value
#> > b <- params %>% filter(param == "beta") %$% value
#> > ig <- params %>% filter(param == "inv_gamma") %$% value
#> > io <- params %>% filter(param == "inv_omega") %$% value
#> > im <- params %>% filter(param == "inv_mu") %$% value
#> > is <- params %>% filter(param == "inv_sigma") %$% value
#> >
#> > # Set initial state
#> > initial.state <- data.frame(S = 0.999, E = 0.001, I = 0, R = 0)
#> >
#> > # Run the model
#> > results <- SEIRS_model(initial.state, timesteps = 1000, years = 5,
#> + alpha = a, beta = b, inv_gamma = ig,
#> + inv_omega = io, inv_mu = im, inv_sigma = is)
#> > g <- plot_SEIRS(results)
#> >
#> > # Save outputs to data store
#> > results %>% write.csv(link_write(handle, "model_output"), row.names = FALSE)
#> >
#> > handle %>% link_write("figure") %>% ggsave(g, width=20, height=10, units="cm")
#> >
#> > # Register code run in local registry
#> > finalise(handle)
#> ✔ Writing 'SEIRS_model/results/model_output/R' to local registry
#> ✔ Writing 'SEIRS_model/results/figure/R' to local registry
#> -> PATCH /api/code_run/2/ HTTP/1.1
#> -> Host: 127.0.0.1:8000
#> -> User-Agent: libcurl/7.68.0 r-curl/4.3.2 httr/1.4.3
#> -> Accept-Encoding: deflate, gzip, br
#> -> Accept: application/json, text/xml, application/xml, */*
#> -> Content-Type: application/json
#> -> Authorization: token d946655533485fed81ce0d9710815a6441f93adb
#> -> Content-Length: 200
#> ->
#> >> {
#> >> "inputs": [
#> >> "http://127.0.0.1:8000/api/object_component/9/"
#> >> ],
#> >> "outputs": [
#> >> "http://127.0.0.1:8000/api/object_component/13/",
#> >> "http://127.0.0.1:8000/api/object_component/14/"
#> >> ]
#> >> }
#>
#> <- HTTP/1.1 200 OK
#> <- Date: Tue, 21 Jun 2022 14:30:58 GMT
#> <- Server: WSGIServer/0.2 CPython/3.9.13
#> <- Content-Type: application/json
#> <- Vary: Accept, Cookie
#> <- Allow: GET, PUT, PATCH, DELETE, HEAD, OPTIONS
#> <- X-Frame-Options: DENY
#> <- Content-Length: 585
#> <- X-Content-Type-Options: nosniff
#> <- Referrer-Policy: same-origin
#> <-
#> No encoding supplied: defaulting to UTF-8.
#> >