Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Welcome to the official documentation of the Snow Owl Authoring Platform. If you want to learn how to install and provision Snow Owl AP, you've come to the right place. This guide shows you how to:
Select the appropriate hardware and software environment to host the service
Download, install and configure the entire technology stack necessary for operating the server
Handle release packages to upgrade to a newer version
Perform a data backup or a restore
Manage intermittent tasks, e.g. adding/revoking user access
In case you would like to skip ahead, here is a set of quick links leading to different sections of the guide.
Authoring Platform releases are shared with customers through custom download URLs. The downloaded server artifact is a Linux (tar.gz) archive that contains:
an initial folder structure
the configuration files for all services
a docker-compose.yml file that brings together the entire technology stack to run and manage the service
the credentials required to pull our proprietary docker images
As a best practice, it is advised to extract the content of the archive under /opt
. So the deployment folder will be /opt/snow-owl
. The docker-compose setup will rely on this path, however, if required it can be changed by editing the ./snow-owl/docker/.env
file later on (see DEPLOYMENT_FOLDER
environment variable).
When decompressing the archive it is important to use the --same-owner
and --preserve-permissions
options so the docker containers can access the files and folders appropriately.
The next page will describe the content of the release package in more detail.
Having secure HTTP in case the Authoring Platform is a public-facing instance is definitely a must. For such cases, we are providing a pre-configured environment and a convenience script to acquire the necessary SSL certificate.
SSL certificate retrieval and renewal are managed by certbot, the official ACME client recommended by Let's Encrypt.
To be able to obtain an SSL certificate the following requirements must be met:
docker and docker-compose are installed
the server instance has a public IP address
a DNS A record is configured for the desired domain name routing to the server's IP address
For the sake of example let's say the target domain name is snow-owl.b2ihealthcare.com
.
Go to the sub-folder called ./snow-owl/docker/configs/cert
. Make sure the init-certificate.sh
script has permissions to be executable and get some details about its parameters:
As you can see -d
is used for specifying the domain name, and -e
is used for specifying a contact email address (optional). Now execute the script with our example parameters:
Script execution will overwrite the files under ./snow-owl/docker/docker-compose.yml and ./snow-owl/docker/configs/nginx/nginx.conf. Make a note of any changes if required.
After successful execution, a new folder is created ./snow-owl/cert
which contains all the certificate files required by NGINX. The docker-compose.yml file is also amended with a piece of code that guarantees automatic renewal of the certificate:
At this point everything is prepared for having secure HTTP, let's see what else needs to be configured before spinning up the service.
For installations where Snow Owl AP and Elasticsearch is co-located we recommend the following hardware specification:
Snow Owl AP + ES | Cloud | Dedicated |
---|---|---|
For installations where Snow Owl AP connects to a managed Elasticsearch cluster at elastic.co we recommend the following hardware specification:
Snow Owl AP | Cloud | Dedicated |
---|---|---|
Elasticsearch @ elastic.co | Cloud |
---|---|
Here are a few examples of which Virtual Machine types could be used for hosting the Terminology Server at the three most popular Cloud providers (including but not limited to):
The technology stack behind the Authoring Platform consists of the following components:
The Snow Owl Terminology Server application
Elasticsearch as the data layer
An LDAP-compliant authentication and authorization service
Bugzilla Issue Tracker to back collaborative workflows
A MySQL database to store Bugzilla's data
The Snow Owl thick client
Optional: A reverse-proxy handling the requests towards the REST API or Bugzilla
Outgoing communication from the Terminology Server goes via:
HTTP(s) towards Elasticsearch and Bugzilla
LDAP(s) towards the A&A service
Incoming communication is handled through:
the HTTP port of 8080 (from the REST API)
the TCP port of 2036 (from the thick client)
A selected reverse proxy solution is responsible for channeling all incoming HTTP traffic through to the Terminology Server.
The Elasticsearch cluster can either be:
a co-located, single-node, self-hosted cluster
To support the Authoring Platform's collaborative authoring features certain workflow items are stored in a Bugzilla Issue Tracker. However, it is not used as an Issue Tracker in our context but as a work item tracker, so-called Tasks.
Bugzilla stores its internal data in a MySQL database, hence why the technology stack contains one.
The Snow Owl thick client is the desktop application that is used for doing the actual authoring work. It is distributed together with the Authoring Platform release files but as a separate downloadable zip archive. Currently, only 64 bit Windows operating systems are supported.
With a preconfigured domain name and DNS record, the default installation package can take care of requesting and maintaining the necessary certificates for secure HTTP. See the details of this in the Configuration section.
For simplifying the initial setup process we are shipping the Terminology Server with a default configuration of a co-located Elasticsearch cluster, a pre-populated OpenLDAP server, Bugzilla, MySQL, and an NGINX reverse proxy with the ability to opt-in for an SSL certificate.
Here is the list of files and folders extracted from the release package and their role described down below.
Contains every configuration file used for the docker stack, including docker-compose.yml.
This folder is considered to be the context by docker, which means that upon executing commands one must either address the config file explicitly or execute docker-compose commands directly inside here.
E.g. to verify the status of the stack there are two approaches:
Execute the command inside ./snow-owl/docker
:
Execute the command from somewhere else then ./snow-owl/docker
:
This folder contains the files necessary to acquire an SSL certificate. None of the files should be changed here ideally.
There is one important file here, elasticsearch.yml
which can be used for fine-tuning the Elasticsearch cluster. However, this is not necessary by default, only if an advanced configuration is required.
This folder contains the files used upon the first start of the OpenLDAP server. The files within describe a set of groups and users to set up an initial user access model. User credentials for the test users can be found in the file called 200_users.ldif
.
This folder contains the configuration file of the MySQL server.
Location of all configuration files for NGINX. By default, a non-secure HTTP configuration is assumed. If there is no need for an SSL certificate, then the files here will be used. If an SSL certificate was acquired, then the main configuration file of NGINX (nginx.conf) will be overwritten with the one under /docker/cert/nginx.conf
.
snowowl.yml: this file is the default configuration file of the Terminology Server. It does not need any changes by default either.
users: list of users for file-based authentication. There is one default user called snowowl
for which the credentials can be found under ./docker/.env
.
The main configuration file for the docker stack. This file is replaced in case an SSL certificate was acquired (with file /docker/cert/docker-compose.yml
). This is where volumes, ports, or environment variables can be configured.
The credentials to use for authenticating with the B2i private docker registry.
The collection of environment variables for the docker-compose.yml file.
This is the file to configure most of the settings of the Terminology Server. Including java heap size, Snow Owl or Elasticsearch version, passwords, or folder structure.
The location where the OpenLDAP server stores its data.
Log files of the Terminology Server
Location of Elasticsearch and Snow Owl resources.
This is the data folder of Elasticsearch. Datasets must be extracted to this directory.
Snow Owl's local file storage. Import and export artifacts are stored here.
In case an SSL certificate is acquired, all the files used by certbot
and NGINX are stored here. This folder is automatically created by the certificate retrieval script.
This is the initial folder of all backup artifacts. This should be configured as a network mount to achieve data redundancy.
The release package contains everything that is required to use a co-located Elasticsearch instance by default. Follow these steps only when there is a need for a remote Elasticsearch cluster.
To configure the Authoring Platform to work with a managed Elasticsearch cluster two settings require attention.
First, the local Elasticsearch container and all its configurations should be removed from the docker-compose.yml file. Once that is done, we have to tell the Terminology Server where to find the cluster. This can be set in the file ./snow-owl/docker/configs/snowowl/snowowl.yml
:
Snow Owl TS leverages Elasticssearch's synonym filters. To have this feature work properly with a managed Elasticsearch cluster our custom dictionary has to be uploaded and configured. The synonym file can be found in the release package under ./snow-owl/docker/configs/elasticsearch/synonym.txt
. This file needs to be compressed as an zip
archive by following this structure:
For the managed Elasticsearch instance this zip file needs to be configured as a bundle extension. The steps required are covered in this guide in great detail:
Once the bundle is configured and the cluster is up we can (re)start the docker stack. In case there are any troubles the Terminology Server will refuse to initialize and let you know what the problem is in its log files.
In certain cases, a pre-built dataset is also shipped together with the Authoring Platform. This is to ease the initial setup procedure and get going fast.
This method is only applicable to deployments where the Elasticsearch cluster is co-located with the Terminology Server.
To load data into a managed Elasticsearch cluster, there are several options:
use
use
use Snow Owl to rebuild the data to the remote cluster
These datasets are the compressed form of the Elasticsearch data folder which follows the same structure. Except for having a top folder called indexes
. This is the same folder as in ./snow-owl/resources/indexes
. So to be able to load the dataset one should just extract the contents of the dataset archive to this path.
Make sure to validate the file ownership of the indexes folder after decompression. Elasticsearch requires UID=1000 and GID=0 to be set for its data folder.
Cloud Provider | VM type |
---|---|
The currently supported version of Elasticsearch is which is upward compatible with any patch releases coming on the 7.x version stream. Elasticsearch v8 is not supported yet.
a managed Elasticsearch cluster hosted by
Having a co-located Elasticsearch service next to the Terminology Server has a direct impact on the hardware requirements. See our list of recommended hardware on the .
For authorization and authentication, the application supports any traditional LDAP Directory Servers. We recommend starting with and evolving to other solutions later because it is easy to set up and maintain while keeping Snow Owl's user data isolated from any other A&A services.
A reverse proxy, such as is recommended to be utilized between the Terminology Server and either the intranet or the internet. This will increase security and help with channeling REST API requests appropriately.
Pro tip: in case the Terminology Server is deployed to the cloud, make sure this path is served by a fast SSD disk (local or ephemeral SSD is the best). This will make import or export processes even faster.
vCPU
8
8
Memory
32 GB
32 GB
I/O performance
>= 5000 IOPS SSD
>= 5000 IOPS SSD
Disk space
200 GB
200 GB
vCPU
8 (compute optimized)
8
Memory
16 GB
16 GB
I/O performance
OS: balanced disk
TS file storage: local SSD
OS: HDD / SSD
TS file storage: SSD
Disk space
OS: 20 GB
TS file storage: 100 GB
OS: 20 GB
TS file storage: 100 GB
vCPU
8 (compute optimized)
Memory
4 GB
I/O performance
handled by elastic.co
Disk space
180 GB
GCP
AWS
Azure
Using the custom backup container it is possible to restore:
the Elasticsearch indices
the Bugzilla database
the OpenLDAP database (if present)
To restore any of the data the following steps have to be performed:
stop Snow Owl, Elasticsearch, and the OpenLDAP containers (in the folder ./snow-owl/docker
):
(re)move the contents of the old / corrupted Elasticsearch data folder:
restart the Elasticsearch container only (keep Snow Owl stopped):
use the backup container's terminal and execute the restore script:
without any parameters, if only the Elasticsearch indices have to be restored
with parameter -l
in case the Elasticsearch indices and the OpenLDAP database have to be restored at the same time
with parameter -b
in case the Elasticsearch indices and the Bugzilla database have to be restored at the same time
with parameter -b
and -l
if a full restore has to be performed
the script will list all available backups and prompts for selection:
enter the numerical identifier of the backup to restore and wait until the process finishes
exit the backup container and restart all containers:
In case only the contents of the OpenLDAP server has to be restored, it is sufficient to just extract the contents of the backup archive to ./snow-owl/ldap
and restart the container.
When a new Snow Owl Authoring Platform release is available we recommend performing the following steps.
New releases are going to be distributed the same way: a docker stack and its configuration within an archive.
It is advised to decompress the new release files to a temporary folder and compare the contents of ./snow-owl/docker
.
The changes usually are restricted to version numbers in the .env
file. In such cases, it is equally acceptable to overwrite the contents of the ./snow-owl/docker
folder as is or cherry-pick the necessary modifications by hand.
Once the new version of the files is in place it is sufficient to just issue the following commands, an explicit stop of the service is not even required (in the folder ./snow-owl/docker
):
Do not usedocker-compose restart
because it won't pick up any .yml or .env file changes. See the official docker guider.
This method is only applicable to deployments where the Elasticsearch cluster is co-located with the Snow Owl Authoring Platform.
A managed Elasticsearch service will automatically configure a snapshot policy upon creation. See details here.
The Authoring Platform release package contains a built-in solution to perform rolling and permanent data backups. The docker stack has a specialized container (called snow-owl-backup
) that is responsible for creating scheduled backups of:
the Elasticsearch indices
the OpenLDAP database (if present)
Bugzilla's configuration files and SQL database
For the Elasticsearch indices, the backup container uses the Snapshot API. Snapshots are labeled in a predefined format with timestamps. E.g. snowowl-daily-20220324030001
The OpenLDAP database is backed up by compressing the contents of the folder under ./snow-owl/ldap
. Filenames are generated using the name of the corresponding Elasticsearch snapshot. E.g. snowowl-daily-20220324030001.tar.gz
.
Bugzilla is backed up by compressing the configuration files and the MySQL database dump. Filenames are generated using the name of the corresponding Elasticsearch snapshot. E.g. snowowl-daily-20220324030001.tar.gz
.
Backup Window: when a backup operation is running the Terminology Server blocks all write operations on the Elasticsearch indices. This is to prevent data loss and have consistent backups.
Backup Duration: the very first backup of an Elasticsearch cluster takes a bit more time (depends on the size and I/O performance but between 20 minutes - 40 minutes), subsequent backups should take significantly less: 1 - 5 minutes.
Daily backups are rolling backups, scheduled, and cleaned up based on the settings specified in the ./snow-owl/docker/.env
file. Here is a summary of the important settings that could be changed.
To store backups redundantly it is advised to mount a remote file share to a local path on the host. By default, this folder is configured to be at ./snow-owl/backup
. It contains:
the snapshot files of the Elasticsearch cluster
the backup files of the OpenLDAP database
the backup files of Bugzilla
extra configuration files
Make sure the remote file share has enough free space to store around the double of the ./snow-owl/resources/indexes
folder.
Backup jobs are scheduled by crond, so cron-expressions can be defined here to specify the time a daily backup should happen.
This is used to tell the backup container how many daily backups must be kept.
Let's say we have an external file share mounted to /mnt/external_folder
. There is a need to create daily backups after each working day, during the night at 2:00 am. Only the last two-weeks-worth of data should be kept (assuming 5 working days each week).
It is also possible to perform backups occasionally, e.g. before versioning an important SNOMED CT release or before a Terminology Server version upgrade. These backups are kept until manually removed.
To create such backups the following command needs to be executed using the backup container's terminal:
The script will create a snapshot backup of the Elasticsearch data with a label snowowl-my-backup-label-20220405030002,
an archive that contains the database of the OpenLDAP server and an archive that contains the configuration and database of Bugzilla with the name snowowl-my-backup-label-20220405030002.tar.gz
.
The server-side components of the Authoring Platform are recommended to be installed on x86_64 / amd64 Linux operating systems where Docker Engine is available. See the list of supported distributions by Docker:
Here is the list of distributions that we suggest in the order of recommendation:
CentOS 7
Ubuntu 20.04 (or 18.04)
Debian 10 - Buster
Before starting the actual deployment of the Authoring Platform make sure that the following packages are installed and configured properly:
Docker Engine
ability to execute bash scripts
Regardless of having a reverse proxy, the following port must be opened to let the Snow Owl thick client communicate with the Terminology Server:
tcp:2036
In case a reverse proxy is used, the Terminology Server requires two ports to be opened either towards the intranet or the internet (depends on usage):
http:80
https:443
In case there is no reverse proxy installed, the following port must be opened to be able to access the server's REST API:
http:8080
The Snow Owl Authoring Platform has two different ways to manage users. The primary authentication and authorization service is the LDAP Directory Server. The secondary option is a file-based database, used only for administrative purposes. Whenever user access has to be granted or revoked the following methods could be applied.
This is only applicable to the default deployment setup where a co-located OpenLDAP server is used alongside the Terminology Server.
While Bugzilla is configured to use the same OpenLDAP server for authentication, it also maintains its own user database in SQL. This user database is indirectly used by the Snow Owl thick client, so it is important to keep LDAP and Bugzilla in sync.
This means that whenever a change was made to the LDAP user database an explicit Bugzilla container restart is required withdocker-compose restart bugzilla
The container restart will take care of bringing over any changes made to the LDAP users.
There are several ways to access and manage an OpenLDAP server, hereby we will only describe one of them, through the Apache Directory Studio.
Apache Directory Studio is an open-source, free application. It is available to download for different platforms (Windows, macOS, and Linux).
The OpenLDAP server uses port 389 for communication. This is the port that needs to be tunneled through the SSH connection. Here is what the final configuration looks like in PuTTY:
Once the SSH tunnel works, it's time to set up our connection in Apache DS. Go to File -> New -> LDAP Connection and set the following:
Hit the "Check Network Parameter" button to verify the network connection.
Go to the next page of the wizard and provide your credentials. The default Bind DN and Bind password can be found in the Authoring Platform release package under ./snow-owl/docker/.env
.
Hit the "Check Authentication" button to verify your credentials. Hit Finish to complete the setup procedure.
All users and groups should be browseable now through the LDAP Browser view:
To grant access to a new user an LDAP entry has to be created. Go to the LDAP Browse view and right-click on the organization node, then New -> New Entry:
It is the easiest to use an existing entry as a template:
Leave everything as is on the Object Classes page, then hit Next. Fill in the new user's credentials:
On the final page, double click on the userPassword row and provide the user's password:
Hit Finish to add the user to the database.
Now we need to assign a role for the user. Before going forward, get ahold of the user's DN using the LDAP Browser view:
Select the desired role group in the Browser view and add a new attribute:
Select the attribute type uniqueMember
and hit Finish:
Paste the user's DN as the value of the attribute and hit Enter to make your changes permanent:
To revoke access the user has to be deleted from the list of users:
And also has to be removed from the role group:
To change either the first or last name, or the password of a user, just edit any of the attributes in the user editor:
To apply any changes made to the users
file the Terminology Server has to be restarted afterward.
To grant access the users
file has to be amended with the new user and its credentials. There are several ways to encrypt a password using the bcrypt algorithm but here is one that is easy and available on most of the Linux variants. The package called htpasswd
has to be installed:
It will prompt for the password and will amend the file with the new user at the end.
Simply remove the user's line from the file and restart the service.
Before accessing the LDAP database there is one technical prerequisite to satisfy. The OpenLDAP server has to be accessible from the machine Apache Directory Studio is installed. The best and most secure way to achieve that is to set up an SSH tunnel. Follow to an article that describes how to configure an SSH tunnel using PuTTY and Windows.
There is a configuration file ./snow-owl/docker/configs/snowowl/users
that contains the list of users with their credentials encrypted. The passwords are encrypted using the hash algorithm (variant $2a$). This method of authentication though should be used only for administration purposes (e.g. bot users) because any of the users added here will have admin privileges.
Remove the user's line from the file and regenerate the credentials according to the section.
Full list of steps to perform before spinning up the service:
Extract the Authoring Platform release archive to a folder. E.g. /opt/snow-owl
(Optional) Obtain an SSL certificate
Make sure a DNS A record is routed to the host's public IP address
Go into the folder ./snow-owl/docker/cert
Execute the ./init-certificate.sh
script:
(Optional) Configure access for managed Elasticsearch Cluster (elastic.co)
(Optional) Extract dataset to ./snow-owl/resources
where folder structure should look like ./snow-owl/resources/indexes/nodes/0
at the end.
Verify file ownership to be UID=1000 and GID=0:
Check any credentials or settings that need to be changed in ./snow-owl/docker/.env
Make sure the hostname is configured properly in the environment file. This is important for being able to reach Bugzilla through a web browser.
Authenticate with our private docker registry while in the folder ./snow-owl/docker
:
Issue a pull (in folder ./snow-owl/docker
)
Spin up the service (in the folder ./snow-owl/docker
)
Verify that the REST API of the Terminology Server is available at:
With SSL: https://snow-owl.example.com/snowowl
Without SSL: http://hostname:8080/snowowl
Verify that the server and cluster status is GREEN by querying the following REST API endpoint:
With SSL:
Without SSL:
Verify that Bugzilla's web UI is available at:
With SSL:
Without SSL:
Enjoy using the Snow Owl Authoring Platform
The Authoring Platform release package contains a download URL for the Snow Owl thick client.
There are no special requirements to install, it is as easy as extracting the contents of the archive to a folder.
To avoid having issues the following should be considered:
make sure to extract the files to a folder that has sufficient permissions for the logged-in user
try to avoid folders that are "too deep" in the file system tree. Certain Windows versions have trouble dealing with long file paths. Do not use "Desktop" or "Program Files" preferably.
Upon the first start of the client make sure the correct remote repository address is configured for the given user: