Installation #
The remote registry is a (django)[https://www.djangoproject.com/] REST API and can easily be installed on any server which meets the following requirements.
Pre-requisites #
To install a remote registry you will require the following:
A server caple of running a webserver with the following installed:
git #
Available through package managers such as apt
, brew
or chocolatey
Python 3.8+ #
With pip
and the venv
module installed.
Available through package managers such as apt
, brew
or chocolatey
A database #
Such as PostgreSQL
(recommended), MariaDB
, MySQL
, Oracle
, SQLite
(not recommended for production). Full database support can be found on the
django site
N.B. You may also need to install the corresponding database connector
Graphviz
#
Graphiz
is needed to create provernance reports, full installations instructions can be found on the
Graphviz
site
An S3 Object Storage #
This can be and S3 capible storage e.g. AWS S3, Openstack or Minio
If you are self hosted we would recommend using Minio
Installation on a Debian based system e.g. Ubuntu #
This installation guide will focus on debian based systems but can be easily adapted.
Step 1. Clone the repository #
Clone the dataregistry repository using git
git clone https://github.com/FAIRDataPipeline/data-registry.git
Step 2. Create and activate a Python virtual environment (venv
) in the repository folder
#
python3 -m venv venv
source venv/bin/activate
Step 3. Install the python
requirements
#
pip3 install -r local-requirements.txt
N.B. the local requirements match the remote requirements
Step 4. Generate a secret key #
python3 -c 'from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())'
Step 5. Edit drams/settings.py #
The settings.py
file within the drams
folder should be modified to match your requirements.
Add your secret key #
The secret key you generated in the previous step can either be added directly to the settings.py
file:
SECRET_KEY = '<generated secret key>'
N.B If you do this remember to remove the following lines:
with open('/home/ubuntu/secret_key.txt') as f:
SECRET_KEY = f.read().strip()
Alternatively the secret key can be stored in a file and read. To do this edit the following line with the file path a text file containing your secret key:
with open('/home/ubuntu/secret_key.txt') as f:
SECRET_KEY = f.read().strip()
Edit ALLOWED_HOSTS
#
Add your domain to ALLOWED_HOSTS
replacing data.fairdatapipeline.org
Edit DOMAIN_URL
#
Replace the DOMAIN_URL
with your fully qualified domain url followed by a trailing slash e.g. https://data.fairdatapipeline.org/
Edit DATABASES
#
The databases
dictionary (fields) should be modified to contain a single default
database containing the following fields:
ENGINE
the engine for your database see the
django documentation for more details.
NAME
the database names
USER
username of a user with read and write access to the database
PASSWORD
password of the above user
HOST
domain or ip of the database server
PORT
port the databas is running on
Edit BUCKETS
#
The BUCKETS
dictionary should be modified to contain a single default
S3 bucket containing the following fields:
url
the fully qualified URL of the S3 server e.g. https://s3.domain.com:port/
, the port is optional if not running on the default port. For AWS the endpoint URL for different regions can be found in the
AWS documentation
bucket_name
the name of the bucket
access_key
access key to bucket belonging to a user with read and write access
secret_key
matching secret key to the above
duration
how long urls generated for the bucket should be valid for in seconds (default: 600)
Set Up Authentication #
The remote registry is designed to be authenticated with GitHub or GitLab OAuth, to do this you need to create either a GitHub OAuth App or a GitLab Oauth application.
To configure these you will need the following callback URLs GitHub:
<registry_url>/complete/github/
GitLab:
<registry_url>/complete/gitlab/
Where <registry_url>
is the fully qualified url of the registry e.g. https://data.fairdatapipeline.org
Once the application has been setup the following fields will need to be added to settings.py:
GitHub:
SOCIAL_AUTH_GITHUB_KEY = '<key>'
SOCIAL_AUTH_GITHUB_SECRET = '<secret>'
AUTH_METHOD = "GitHub"
Replacing the key and secret from the GitHub OAuth App.
GitLab:
SOCIAL_AUTH_GITLAB_SCOPE = ['api']
SOCIAL_AUTH_GITLAB_KEY = '<key>'
SOCIAL_AUTH_GITLAB_SECRET = '<secret>'
AUTH_METHOD = "GitLab"
Replacing the key and secret from the GitLab OAuth Application
**N.B. if you are running your own instance of GitLab, you will also need to add the following field:
SOCIAL_AUTH_GITLAB_API_URL = 'https://example.com'
Where https://example.com
is the full url to your GitLab domain adding the port if not running on the default port e.g. https://example.com:1234
Authenticating Users #
With the above authentication, by default anyone with an account on the given provider can login to the data registry.
To restrict access: The registry can accept a yaml file containing users which are allowed to log into the registry. To do this create a yaml file containing the following:
username: GitHub Username [required string]
email: email address [optional string]
fullname: Full Name [required string]
orgs: organisation names [optional array]
Then add the following field to the settings.py
file with full path to the yaml file:
AUTHORISED_USER_FILE = 'authorised_users/authorised_users.yaml'
an example of this yaml file is located on the FAIRDataPipeline/authorised_users repository
Step 6: Initialise the registry #
The registry should now be initialised, to do this run the following commands from the folder where you cloned the registry:
source venv/bin/activate
export DJANGO_SETTINGS_MODULE="drams.settings"
export DJANGO_SUPERUSER_USERNAME=admin
export DJANGO_SUPERUSER_PASSWORD=password
export FAIR_USE_SUPERUSER="True"
cd scripts
chmod +x rebuild.sh
./rebuild.sh
Replacing admin
and password
with your desired superuser account.
Step 7: Configure the Service using gunicorn #
Create the following files:
/etc/systemd/system/gunicorn.socket
[Unit]
Description=gunicorn socket
[Socket]
ListenStream=/home/ryan/data-registry/gunicorn.sock
[Install]
WantedBy=sockets.target
Replacing /home/ryan/data-registry/
with the directory where you cloned the git repository
/etc/systemd/system/gunicorn.service
[Unit]
Description=gunicorn daemon
Requires=gunicorn.socket
After=network.target
[Service]
User=ryan
Group=ryan
WorkingDirectory=/home/ryan/data-registry
Environment="DJANGO_SETTINGS_MODULE=drams.settings"
ExecStart=/home/ryan/data-registry/venv/bin/gunicorn \
--access-logfile - \
--workers 3 \
--bind unix:/home/ryan/data-registry/gunicorn.sock \
drams.wsgi:application
[Install]
WantedBy=multi-user.target
Again replacing both instances of /home/ryan/data-registry/
with the directory where you cloned the git repository
Enable and Start the Service #
Run the following commands to enable and run the data registry service
sudo systemctl start gunicorn.socket
sudo systemctl enable gunicorn.socket
Step 8: Configure a reverse proxy with nginx #
Edit or add a site to your nginx sites-enabled (for this example we will use the default site at /etc/nginx/sites-enabled/default
). If you choose to add a site, remember to symlink the file to the sites-available directory.
server {
listen 80 default_server;
listen [::]:80 default_server;
location = /favicon.ico { access_log off; log_not_found off; }
location /static/ {
root /home/ryan/data-registry;
}
location / {
include proxy_params;
proxy_pass http://unix:/home/ryan/data-registry/gunicorn.sock;
}
}
Replacing both instances of /home/ryan/data-registry/
with the directory where you cloned the git repository
N.B. for multisite nginx servers you should add the server name to the file e.g.
server_name data.fairdatapipeline.org;
Finally restart nginx:
sudo systemctl restart nginx
You may also need to install and SSL certificate, which can be done using certbot.