DPAPI examples #
Note that this is a living document and the following is subject to change.
This page gives simple examples of the user written config.yaml file alongside the working config file generated by FAIR run
. Note that the Data Pipeline API will take the working config file as an input.
Empty code run #
User written config.yaml #
run_metadata:
description: An empty code run
local_data_registry_url: https://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: soniamitchell
default_output_namespace: soniamitchell
write_data_store: /Users/SoniaM/datastore/
local_repo: /Users/Soniam/Desktop/git/FAIRDataPipeline/FDP_validation/
script: |-
R -f simple_working_examples/empty_script.R ${{CONFIG_DIR}}
Working config.yaml #
fair run
should create a working config.yaml file, which is read by the Data Pipeline API. In this example, the working config.yaml file is pretty much identical to the original config.yaml file, only ${{CONFIG_DIR}}
is replaced by the directory in which the working config.yaml file resides.
run_metadata:
description: An empty code run
local_data_registry_url: https://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: soniamitchell
default_output_namespace: soniamitchell
write_data_store: /Users/SoniaM/datastore/
local_repo: /Users/Soniam/Desktop/git/FAIRDataPipeline/FDP_validation/
script: |-
R -f simple_working_examples/empty_script.R /Users/SoniaM/datastore/coderun/20210511-231444/
latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
remote_repo: https://github.com/FAIRDataPipeline/FDP_validation
Submission script (R) #
library(rDataPipeline)
# Initialise Code Run
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)
finalise(handle)
Write data product (HDF5) #
User written config.yaml #
run_metadata:
description: Write an array
local_data_registry_url: https://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: soniamitchell
default_output_namespace: soniamitchell
write_data_store: /Users/SoniaM/datastore/
local_repo: /Users/Soniam/Desktop/git/FAIRDataPipeline/FDP_validation/
script: |-
R -f simple_working_examples/write_array.R ${{CONFIG_DIR}}
write:
- data_product: test/array
description: test array with simple data
Working config.yaml #
fdp run
should create a working config.yaml file, which is read by the Data Pipeline API. In this example, the working config.yaml file is pretty much identical to the original config.yaml file, only ${{CONFIG_DIR}}
is replaced by the directory in which the working config.yaml file resides.
run_metadata:
description: Write an array
local_data_registry_url: https://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: soniamitchell
default_output_namespace: soniamitchell
write_data_store: /Users/SoniaM/datastore/
local_repo: /Users/Soniam/Desktop/git/FAIRDataPipeline/FDP_validation/
script: |-
R -f simple_working_examples/write_array.R /Users/SoniaM/datastore/coderun/20210511-231444/
public: true
latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
remote_repo: https://github.com/FAIRDataPipeline/FDP_validation
write:
- data_product: test/array
description: test array with simple data
use:
version: 0.1.0
Note that, although use:
is reserved for aliasing in the user-written config, for simplicity the CLI will always write version
here.
Note also that by default, the CLI will write public: true
to run_metadata:
. The user is however free to specify public: false
for individual writes.
Submission script (R) #
library(rDataPipeline)
# Initialise Code Run
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)
df <- data.frame(a = 1:2, b = 3:4)
rownames(df) <- 1:2
write_array(array = as.matrix(df),
handle = handle,
data_product = "test/array",
component = "component1/a/s/d/f/s",
description = "Some description",
dimension_names = list(rowvalue = rownames(df),
colvalue = colnames(df)),
dimension_values = list(NA, 10),
dimension_units = list(NA, "km"),
units = "s")
finalise(handle)
Read data product (HDF5) #
User written config.yaml #
run_metadata:
description: Read an array
local_data_registry_url: https://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: soniamitchell
default_output_namespace: soniamitchell
write_data_store: /Users/SoniaM/datastore/
local_repo: /Users/Soniam/Desktop/git/FAIRDataPipeline/FDP_validation/
script: |-
R -f simple_working_examples/read_array.R ${{CONFIG_DIR}}
read:
- data_product: test/array
Working config.yaml #
fdp run
should create a working config.yaml file, which is read by the Data Pipeline API. In this example, the working config.yaml file is pretty much identical to the original config.yaml file, only ${{CONFIG_DIR}}
is replaced by the directory in which the working config.yaml file resides.
run_metadata:
description: Read an array
local_data_registry_url: https://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: soniamitchell
default_output_namespace: soniamitchell
write_data_store: /Users/SoniaM/datastore/
local_repo: /Users/Soniam/Desktop/git/FAIRDataPipeline/FDP_validation/
script: |-
R -f simple_working_examples/read_array.R /Users/SoniaM/datastore/coderun/20210511-231444/
latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
remote_repo: https://github.com/FAIRDataPipeline/FDP_validation
read:
- data_product: test/array
use:
version: 0.1.0
Submission script (R) #
library(rDataPipeline)
# Open the connection to the local registry with a given config file
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)
data_product <- "test/array"
component <- "component1/a/s/d/f/s"
dat <- read_array(handle = handle,
data_product = data_product,
component = component)
finalise(handle)
Write data product (csv) #
User written config.yaml #
run_metadata:
description: Write csv file
local_data_registry_url: https://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: soniamitchell
default_output_namespace: soniamitchell
write_data_store: /Users/SoniaM/datastore/
local_repo: /Users/Soniam/Desktop/git/FAIRDataPipeline/FDP_validation/
script: |-
R -f simple_working_examples/write_csv.R ${{CONFIG_DIR}}
write:
- data_product: test/csv
description: test csv file with simple data
file_type: csv
Working config.yaml #
run_metadata:
description: Write csv file
local_data_registry_url: https://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: soniamitchell
default_output_namespace: soniamitchell
write_data_store: /Users/SoniaM/datastore/
local_repo: /Users/Soniam/Desktop/git/FAIRDataPipeline/FDP_validation/
script: |-
R -f simple_working_examples/write_csv.R /Users/SoniaM/datastore/coderun/20210511-231444/
public: true
latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
remote_repo: https://github.com/FAIRDataPipeline/FDP_validation
write:
- data_product: test/csv
description: test csv file with simple data
file_type: csv
use:
version: 0.0.1
Submission script (R) #
library(rDataPipeline)
# Open the connection to the local registry with a given config file
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)
df <- data.frame(a = 1:2, b = 3:4)
rownames(df) <- 1:2
path <- link_write(handle, "test/csv")
write.csv(df, path)
finalise(handle)
Read data product (csv) #
User written config.yaml #
run_metadata:
description: Read csv file
local_data_registry_url: https://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: soniamitchell
default_output_namespace: soniamitchell
write_data_store: /Users/SoniaM/datastore/
local_repo: /Users/Soniam/Desktop/git/FAIRDataPipeline/FDP_validation/
script: |-
R -f simple_working_examples/read_csv.R ${{CONFIG_DIR}}
read:
- data_product: test/csv
Working config.yaml #
run_metadata:
description: Read csv file
local_data_registry_url: https://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: soniamitchell
default_output_namespace: soniamitchell
write_data_store: /Users/SoniaM/datastore/
local_repo: /Users/Soniam/Desktop/git/FAIRDataPipeline/FDP_validation/
script: |-
R -f simple_working_examples/read_csv.R /Users/SoniaM/datastore/coderun/20210511-231444/
latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
remote_repo: https://github.com/FAIRDataPipeline/FDP_validation
read:
- data_product: test/csv
use:
version: 0.0.1
Submission script (R) #
library(rDataPipeline)
# Open the connection to the local registry with a given config file
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)
path <- link_read(handle, "test/csv")
df <- read.csv(path)
finalise(handle)
Write data product (point estimate) #
User written config.yaml #
run_metadata:
description: Write point estimate
local_data_registry_url: https://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: soniamitchell
default_output_namespace: soniamitchell
write_data_store: /Users/SoniaM/datastore/
local_repo: /Users/Soniam/Desktop/git/SCRC/SCRCdata/
script: |-
R -f simple_working_examples/write_point_estimate.R ${{CONFIG_DIR}}
write:
- data_product: test/estimate/asymptomatic-period
description: asymptomatic period
Working config.yaml #
run_metadata:
description: Write point estimate
local_data_registry_url: https://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: soniamitchell
default_output_namespace: soniamitchell
write_data_store: /Users/SoniaM/datastore/
local_repo: /Users/Soniam/Desktop/git/SCRC/SCRCdata/
script: |-
R -f simple_working_examples/write_point_estimate.R /Users/SoniaM/datastore/coderun/20210511-231444/
public: true
latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
remote_repo: https://github.com/FAIRDataPipeline/FDP_validation
write:
- data_product: test/estimate/asymptomatic-period
description: asymptomatic period
use:
version: 0.0.1
Submission script (R) #
library(rDataPipeline)
# Open the connection to the local registry with a given config file
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)
write_estimate(value = 9,
handle = handle,
data_product = "test/distribution/asymptomatic-period",
component = "asymptomatic-period",
description = "asymptomatic period")
finalise(handle)
Read data product (point estimate) #
User written config.yaml #
run_metadata:
description: Read point estimate
local_data_registry_url: https://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: soniamitchell
default_output_namespace: soniamitchell
write_data_store: /Users/SoniaM/datastore/
local_repo: /Users/Soniam/Desktop/git/SCRC/SCRCdata/
script: |-
R -f simple_working_examples/read_point_estimate.R ${{CONFIG_DIR}}
read:
- data_product: test/estimate/asymptomatic-period
Working config.yaml #
run_metadata:
description: Read point estimate
local_data_registry_url: https://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: soniamitchell
default_output_namespace: soniamitchell
write_data_store: /Users/SoniaM/datastore/
local_repo: /Users/Soniam/Desktop/git/SCRC/SCRCdata/
script: |-
R -f simple_working_examples/read_point_estimate.R /Users/SoniaM/datastore/coderun/20210511-231444/
latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
remote_repo: https://github.com/FAIRDataPipeline/FDP_validation
read:
- data_product: test/estimate/asymptomatic-period
use:
version: 0.0.1
Submission script (R) #
library(rDataPipeline)
# Open the connection to the local registry with a given config file
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)
read_estimate(handle = handle,
data_product = "test/distribution/asymptomatic-period",
component = "asymptomatic-period")
finalise(handle)
Write data product (distribution) #
User written config.yaml #
run_metadata:
description: Write distribution
local_data_registry_url: https://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: soniamitchell
default_output_namespace: soniamitchell
write_data_store: /Users/SoniaM/datastore/
local_repo: /Users/Soniam/Desktop/git/SCRC/SCRCdata/
script: |-
R -f simple_working_examples/write_distribution.R ${{CONFIG_DIR}}
write:
- data_product: test/distribution/symptom-delay
description: Estimate of symptom delay
Working config.yaml #
run_metadata:
description: Write distribution
local_data_registry_url: https://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: soniamitchell
default_output_namespace: soniamitchell
write_data_store: /Users/SoniaM/datastore/
local_repo: /Users/Soniam/Desktop/git/SCRC/SCRCdata/
script: |-
R -f simple_working_examples/write_distribution.R /Users/SoniaM/datastore/coderun/20210511-231444/
public: true
latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
remote_repo: https://github.com/FAIRDataPipeline/FDP_validation
write:
- data_product: test/distribution/symptom-delay
description: Estimate of symptom delay
use:
version: 0.0.1
Submission script (R) #
library(rDataPipeline)
# Open the connection to the local registry with a given config file
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)
write_distribution(handle = handle,
data_product = "test/distribution/symptom-delay",
component = "symptom-delay",
distribution = "Gaussian",
parameters = list(mean = -16.08, SD = 30),
description = "symptom delay")
finalise(handle)
Read data product (distribution) #
User written config.yaml #
run_metadata:
description: Read distribution
local_data_registry_url: https://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: soniamitchell
default_output_namespace: soniamitchell
write_data_store: /Users/SoniaM/datastore/
local_repo: /Users/Soniam/Desktop/git/SCRC/SCRCdata/
script: |-
R -f simple_working_examples/read_distribution.R ${{CONFIG_DIR}}
read:
- data_product: test/distribution/symptom-delay
Working config.yaml #
run_metadata:
description: Read distribution
local_data_registry_url: https://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: soniamitchell
default_output_namespace: soniamitchell
write_data_store: /Users/SoniaM/datastore/
local_repo: /Users/Soniam/Desktop/git/SCRC/SCRCdata/
latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
remote_repo: https://github.com/FAIRDataPipeline/FDP_validation
script: |-
R -f simple_working_examples/read_distribution.R /Users/SoniaM/datastore/coderun/20210511-231444/
latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
remote_repo: https://github.com/FAIRDataPipeline/FDP_validation
read:
- data_product: test/distribution/symptom-delay
use:
version: 0.0.1
Submission script (R) #
library(rDataPipeline)
# Open the connection to the local registry with a given config file
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)
read_distribution(handle = handle,
data_product = "test/distribution/symptom-delay",
component = "symptom-delay")
finalise(handle)
Attach issue to component #
User written config.yaml #
run_metadata:
description: Register a file in the pipeline
local_data_registry_url: http://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: username
default_output_namespace: username
write_data_store: /Users/username/datastore/
local_repo: local_repo
script: |-
R -f simple_working_examples/attach_issue.R ${{CONFIG_DIR}}
write:
- data_product: test/array/issues/component
description: a test array
Working config.yaml #
run_metadata:
description: Register a file in the pipeline
local_data_registry_url: http://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: username
default_output_namespace: username
write_data_store: /Users/username/datastore/
local_repo: local_repo
script: |-
R -f simple_working_examples/attach_issue.R /Users/SoniaM/datastore/coderun/20210511-231444/
public: true
latest_commit: e3c0ebdf5ae079bd72f601ec5eefdf998c4fc8ec
remote_repo: https://github.com/fake_org/fake_repo
read: []
write:
- data_product: test/array/issues/component
description: a test array
use:
version: 0.1.0
Submission script (R) #
In R, we can attach issues to components in different ways. If there’s a more elegant way to do this, please tell me!
Attach an issue on the fly by referencing an index in the handle:
library(rDataPipeline)
# Initialise Code Run
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)
df <- data.frame(a = 1:2, b = 3:4)
rownames(df) <- 1:2
component_id <- write_array(array = as.matrix(df),
handle = handle,
data_product = "test/array/issues/component",
component = "component1/a/s/d/f/s",
description = "Some description",
dimension_names = list(rowvalue = rownames(df),
colvalue = colnames(df)),
dimension_values = list(NA, 10),
dimension_units = list(NA, "km"),
units = "s")
issue <- "some issue"
severity <- 7
raise_issue(index = component_id,
handle = handle,
issue = issue,
severity = severity)
finalise(handle)
Attaching an issue to a data product that already exists in the data registry by referencing it explicitly:
library(rDataPipeline)
# Initialise Code Run
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)
issue <- "some issue"
severity <- 7
raise_issue(handle = handle,
data_product = "test/array/issues/component",
component = "component1/a/s/d/f/s",
version = "0.1.0",
namespace = "username",
issue = issue,
severity = severity)
finalise(handle)
Attaching an issue to multiple components at the same time:
raise_issue(index = c(component_id1, component_id2),
handle = handle,
issue = issue,
severity = severity)
or
raise_issue(handle = handle,
data_product = "test/array/issues/component",
component = c("component1/a/s/d/f/s", "component2/a/s/d/f/s"),
version = "0.1.0",
namespace = "username",
issue = issue,
severity = severity)
Attach issue to whole data product #
User written config.yaml #
run_metadata:
description: Register a file in the pipeline
local_data_registry_url: http://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: username
default_output_namespace: username
write_data_store: /Users/username/datastore/
local_repo: local_repo
script: |-
R -f simple_working_examples/attach_issue.R ${{CONFIG_DIR}}
write:
- data_product: "test/array/issues/whole"
description: a test array
file_type: csv
Working config.yaml #
run_metadata:
description: Register a file in the pipeline
local_data_registry_url: http://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: username
default_output_namespace: username
write_data_store: /Users/username/datastore/
local_repo: local_repo
script: |-
R -f simple_working_examples/attach_issue.R /Users/SoniaM/datastore/coderun/20210511-231444/
public: true
latest_commit: 40725b40252fd55ba355f7ed66f5a42387f1674f
remote_repo: https://github.com/fake_org/fake_repo
read: []
write:
- data_product: test/array/issues/whole
description: a test array
file_type: csv
use:
version: 0.1.0
Submission script (R) #
In R, we can attach issues to data products in different ways. If there’s a more elegant way to do this, please tell me!
Attach an issue on the fly by referencing an index in the handle:
library(rDataPipeline)
# Initialise Code Run
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)
df <- data.frame(a = 1:2, b = 3:4)
rownames(df) <- 1:2
index <- write_array(array = as.matrix(df),
handle = handle,
data_product = "test/array/issues/whole",
component = "component1/a/s/d/f/s",
description = "Some description",
dimension_names = list(rowvalue = rownames(df),
colvalue = colnames(df)))
write_array(array = as.matrix(df),
handle = handle,
data_product = "test/array/issues/whole",
component = "component2/a/s/d/f/s",
description = "Some description",
dimension_names = list(rowvalue = rownames(df),
colvalue = colnames(df)))
issue <- "some issue"
severity <- 7
raise_issue(index = index,
handle = handle,
issue = issue,
severity = severity,
whole_object = TRUE)
finalise(handle)
Attaching an issue to a data product that already exists in the data registry by referencing it explicitly:
library(rDataPipeline)
# Initialise Code Run
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)
issue <- "some issue"
severity <- 7
raise_issue(handle = handle,
data_product = "test/array/issues/whole",
version = "0.1.0",
namespace = "username",
issue = issue,
severity = severity)
finalise(handle)
Attaching an issue to multiple data products at the same time:
raise_issue(index = c(index1, index2),
handle = handle,
issue = issue,
severity = severity,
whole_object = TRUE)
or
raise_issue(handle = handle,
data_product = c("test/array/issues/whole", "test/array/issues/whole/2"),
version = c("0.1.0", "0.1.0"),
namespace = "username",
issue = issue,
severity = severity)
Attach issue to config #
User written config.yaml #
run_metadata:
description: Register a file in the pipeline
local_data_registry_url: http://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: username
default_output_namespace: username
write_data_store: /Users/username/datastore/
local_repo: local_repo
script: |-
R -f simple_working_examples/attach_issue.R ${{CONFIG_DIR}}
Working config.yaml #
run_metadata:
description: Register a file in the pipeline
local_data_registry_url: http://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: username
default_output_namespace: username
write_data_store: /Users/username/datastore/
local_repo: local_repo
script: |-
R -f simple_working_examples/attach_issue.R /Users/SoniaM/datastore/coderun/20210511-231444/
latest_commit: 0d98e732b77e62a6cd390c6aec655f260f5f9b33
remote_repo: https://github.com/fake_org/fake_repo
read: []
write: []
Submission script (R) #
library(rDataPipeline)
# Initialise Code Run
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)
config_issue <- "issue with config"
config_severity <- 7
raise_issue_config(handle = handle,
issue = config_issue,
severity = config_severity)
finalise(handle)
Attach issue to submission script #
User written config.yaml #
run_metadata:
description: Register a file in the pipeline
local_data_registry_url: http://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: username
default_output_namespace: username
write_data_store: /Users/username/datastore/
local_repo: local_repo
script: |-
R -f simple_working_examples/attach_issue.R ${{CONFIG_DIR}}
Working config.yaml #
run_metadata:
description: Register a file in the pipeline
local_data_registry_url: http://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: username
default_output_namespace: username
write_data_store: /Users/username/datastore/
local_repo: local_repo
script: |-
R -f simple_working_examples/attach_issue.R /Users/SoniaM/datastore/coderun/20210511-231444/
latest_commit: 358f64c4044f3b3f761865ee8e9f4375cf41d155
remote_repo: https://github.com/fake_org/fake_repo
read: []
write: []
Submission script (R) #
library(rDataPipeline)
# Initialise Code Run
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)
script_issue <- "issue with script"
script_severity <- 7
raise_issue_script(handle = handle,
issue = script_issue,
severity = script_severity)
finalise(handle)
Attach issue to GitHub repository #
User written config.yaml #
run_metadata:
description: Register a file in the pipeline
local_data_registry_url: http://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: username
default_output_namespace: username
write_data_store: /Users/username/datastore/
local_repo: local_repo
script: |-
R -f simple_working_examples/attach_issue.R ${{CONFIG_DIR}}
Working config.yaml #
run_metadata:
description: Register a file in the pipeline
local_data_registry_url: http://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: username
default_output_namespace: username
write_data_store: /Users/username/datastore/
local_repo: local_repo
script: |-
R -f simple_working_examples/attach_issue.R /Users/SoniaM/datastore/coderun/20210511-231444/
latest_commit: 6b23ec822bfd7ea5f419c70ce18fb73b59c90754
remote_repo: https://github.com/fake_org/fake_repo
read: []
write: []
Submission script (R) #
library(rDataPipeline)
# Initialise Code Run
config <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "config.yaml")
script <- file.path(Sys.getenv("FDP_CONFIG_DIR"), "script.sh")
handle <- initialise(config, script)
repo_issue <- "issue with repo"
repo_severity <- 7
raise_issue_repo(handle = handle,
issue = repo_issue,
severity = repo_severity)
finalise(handle)
Attach issue to external object #
This is not something we want to do.
Attach issue to code run #
This might be something we want to do in the future, but not now.
Delete DataProduct (optionally) if identical to previous version #
Delete CodeRun (optionally) if nothing happened #
That is, if no output was created and no issue was raised
CodeRun with aliases (use
block example)
#
User written config.yaml #
run_metadata:
description: A test model
local_data_registry_url: https://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: SCRC
default_output_namespace: soniamitchell
write_data_store: /Users/SoniaM/datastore/
local_repo: /Users/Soniam/Desktop/git/SCRC/SCRCdata
script: |-
R -f inst/SCRC/scotgov_management/submission_script.R ${{CONFIG_DIR}}
read:
- data_product: test/data/alias
use:
namespace: johnsmith
data_product: scotland/human/population
write:
- data_product: human/outbreak-timeseries
description: data product description
use:
data_product: scotland/human/outbreak-timeseries
- data_product: human/outbreak/simulation_run
description: another data product description
use:
data_product: human/outbreak/simulation_run-${{RUN_ID}}
Working config.yaml #
fair run
should create a working config.yaml file, which is read by the Data Pipeline API. In this example, the working config.yaml file is pretty much identical to the original config.yaml file, only ${{CONFIG_DIR}}
is replaced by the directory in which the working config.yaml file resides.
run_metadata:
description: A test model
local_data_registry_url: https://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: soniamitchell
default_output_namespace: soniamitchell
write_data_store: /Users/SoniaM/datastore/
public: true
local_repo: /Users/Soniam/Desktop/git/SCRC/SCRCdata
latest_commit: 221bfe8b52bbfb3b2dbdc23037b7dd94b49aaa70
remote_repo: https://github.com/ScottishCovidResponse/SCRCdata
script: |-
R -f inst/SCRC/scotgov_management/submission_script.R /Users/SoniaM/datastore/coderun/20210511-231444/
read:
- data_product: human/population
use:
data_product: scotland/human/population
version: 0.1.0
namespace: johnsmith
write:
- data_product: human/outbreak-timeseries
description: data product description
use:
data_product: scotland/human/outbreak-timeseries
version: 0.1.0
- data_product: human/outbreak/simulation_run
description: another data product description
use:
data_product: human/outbreak/simulation_run-${{RUN_ID}}
version: 0.1.0
CodeRun with read globbing #
This example makes use of globbing in the read:
block.
First we need to populate your local registry with something to read:
User written config.yaml #
run_metadata:
description: Register a file in the pipeline
local_data_registry_url: http://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: username
default_output_namespace: username
write_data_store: /Users/username/datastore/
local_repo: local_repo
script: |-
R -f simple_working_examples/input_globbing.R ${{CONFIG_DIR}}
write:
- data_product: real/data/1d06c1840618f1cd0ff29177b34fa68df939a9a8/1
description: A csv file
file_type: csv
- data_product: real/data/1d06c1840618f1cd0ff29177b34fa68df939a9a8/thing/1
description: A csv file
file_type: csv
Working config.yaml #
run_metadata:
description: Register a file in the pipeline
local_data_registry_url: http://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: username
default_output_namespace: username
write_data_store: /Users/username/datastore/
local_repo: local_repo
script: |-
R -f simple_working_examples/input_globbing.R /Users/SoniaM/datastore/coderun/20210511-231444/
public: yes
latest_commit: 064e900b691e80058357a344f02cf73de0166fab
remote_repo: https://github.com/fake_org/fake_repo
read: []
write:
- data_product: real/data/1d06c1840618f1cd0ff29177b34fa68df939a9a8/1
description: A csv file
file_type: csv
use:
version: 0.0.1
- data_product: real/data/1d06c1840618f1cd0ff29177b34fa68df939a9a8/thing/1
description: A csv file
file_type: csv
use:
version: 0.0.1
Now that our local registry is populated, we can try globbing:
User written config.yaml #
run_metadata:
description: Register a file in the pipeline
local_data_registry_url: http://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: username
default_output_namespace: username
write_data_store: /Users/username/datastore/
local_repo: local_repo
script: |-
R -f simple_working_examples/input_globbing.R ${{CONFIG_DIR}}
read:
- data_product: real/data/1d06c1840618f1cd0ff29177b34fa68df939a9a8/*
Working config.yaml #
run_metadata:
description: Register a file in the pipeline
local_data_registry_url: http://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: username
default_output_namespace: username
write_data_store: /Users/username/datastore/
local_repo: local_repo
script: |-
R -f simple_working_examples/input_globbing.R /Users/SoniaM/datastore/coderun/20210511-231444/
latest_commit: b9e2187b3796f06ca33f92c3a82863215917ed0e
remote_repo: https://github.com/fake_org/fake_repo
read:
- data_product: real/data/1d06c1840618f1cd0ff29177b34fa68df939a9a8/thing/1
use:
version: 0.0.1
- data_product: real/data/1d06c1840618f1cd0ff29177b34fa68df939a9a8/1
use:
version: 0.0.1
write: []
CodeRun with write globbing #
This example makes use of globbing in the write:
block.
First we need to populate your local registry with some data:
User written config.yaml #
run_metadata:
description: Register a file in the pipeline
local_data_registry_url: http://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: username
default_output_namespace: username
write_data_store: /Users/username/datastore/
local_repo: local_repo
script: |-
R -f simple_working_examples/output_globbing.R ${{CONFIG_DIR}}
write:
- data_product: real/data/e8d7af00c8f8e24c2790e2a32241bc1bfc8cf011/1
description: A csv file
file_type: csv
- data_product: real/data/e8d7af00c8f8e24c2790e2a32241bc1bfc8cf011/thing/1
description: A csv file
file_type: csv
Working config.yaml #
run_metadata:
description: Register a file in the pipeline
local_data_registry_url: http://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: username
default_output_namespace: username
write_data_store: /Users/username/datastore/
local_repo: local_repo
script: |-
R -f simple_working_examples/output_globbing.R /Users/SoniaM/datastore/coderun/20210511-231444/
public: yes
latest_commit: 2a8688677321b99e3a2545ce020992d136334b71
remote_repo: https://github.com/fake_org/fake_repo
read: []
write:
- data_product: real/data/e8d7af00c8f8e24c2790e2a32241bc1bfc8cf011/1
description: A csv file
file_type: csv
use:
version: 0.0.1
- data_product: real/data/e8d7af00c8f8e24c2790e2a32241bc1bfc8cf011/thing/1
description: A csv file
file_type: csv
use:
version: 0.0.1
Now that our local registry is populated, we can try globbing:
User written config.yaml #
run_metadata:
description: Register a file in the pipeline
local_data_registry_url: http://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: username
default_output_namespace: username
write_data_store: /Users/username/datastore/
local_repo: local_repo
script: |-
R -f simple_working_examples/output_globbing.R ${{CONFIG_DIR}}
write:
- data_product: real/data/e8d7af00c8f8e24c2790e2a32241bc1bfc8cf011/*
description: A csv file
file_type: csv
use:
version: ${{MAJOR}}
Working config.yaml #
run_metadata:
description: Register a file in the pipeline
local_data_registry_url: http://localhost:8000/api/
remote_data_registry_url: https://data.fairdatapipeline.org/api/
default_input_namespace: username
default_output_namespace: username
write_data_store: /Users/username/datastore/
local_repo: local_repo
script: |-
R -f simple_working_examples/output_globbing.R /Users/SoniaM/datastore/coderun/20210511-231444/
public: yes
latest_commit: f95815976cd4d93c062f94a48525fcec88b6ef34
remote_repo: https://github.com/fake_org/fake_repo
read: []
write:
- data_product: real/data/e8d7af00c8f8e24c2790e2a32241bc1bfc8cf011/*
description: A csv file
file_type: csv
use:
version: 1.0.0
- data_product: real/data/e8d7af00c8f8e24c2790e2a32241bc1bfc8cf011/thing/1
description: A csv file
file_type: csv
use:
version: 1.0.0
- data_product: real/data/e8d7af00c8f8e24c2790e2a32241bc1bfc8cf011/1
description: A csv file
file_type: csv
use:
version: 1.0.0
```