Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
For installations where Snow Owl TS and Elasticsearch is co-located we recommend the following hardware specification:
Snow Owl TS + ES | Cloud | Dedicated |
---|---|---|
For installations where Snow Owl TS connects to a managed Elasticsearch cluster at elastic.co we recommend the following hardware specification:
Snow Owl TS | Cloud | Dedicated |
---|---|---|
Elasticsearch @ elastic.co | Cloud |
---|---|
Here are a few examples of which Virtual Machine types could be used for hosting the Terminology Server at the three most popular Cloud providers (including but not limited to):
Cloud Provider | VM type |
---|---|
vCPU
8
8
Memory
32 GB
32 GB
I/O performance
>= 5000 IOPS SSD
>= 5000 IOPS SSD
Disk space
200 GB
200 GB
vCPU
8 (compute optimized)
8
Memory
16 GB
16 GB
I/O performance
OS: balanced disk
TS file storage: local SSD
OS: HDD / SSD
TS file storage: SSD
Disk space
OS: 20 GB
TS file storage: 100 GB
OS: 20 GB
TS file storage: 100 GB
vCPU
8 (compute optimized)
Memory
4 GB
I/O performance
handled by elastic.co
Disk space
180 GB
GCP
AWS
Azure
Terminology Server releases are shared with customers through custom download URLs. The downloaded artifact is a Linux (tar.gz) archive that contains:
an initial folder structure
the configuration files for all services
a docker-compose.yml file that brings together the entire technology stack to run and manage the service
the credentials required to pull our proprietary docker images
As a best practice, it is advised to extract the content of the archive under /opt
. So the deployment folder will be /opt/snow-owl
. The docker-compose setup will rely on this path, however, if required it can be changed by editing the ./snow-owl/docker/.env
file later on (see DEPLOYMENT_FOLDER
environment variable).
When decompressing the archive it is important to use the --same-owner
and --preserve-permissions
options so the docker containers can access the files and folders appropriately.
The next page will describe the content of the release package in more detail.
The technology stack behind the Terminology Server consists of the following components:
The Terminology Server application
Elasticsearch as the data layer
An LDAP-compliant authentication and authorization service
Optional: A reverse-proxy handling the requests towards the REST API
Outgoing communication from the Terminology Server goes via:
HTTP(s) towards Elasticsearch
LDAP(s) towards the A&A service
Incoming communication is handled through the HTTP port of 8080.
A selected reverse proxy solution is responsible for channeling all incoming traffic through to the Terminology Server.
The Elasticsearch cluster can either be:
a co-located, single-node, self-hosted cluster
With a preconfigured domain name and DNS record, the default installation package can take care of requesting and maintaining the necessary certificates for secure HTTP. See the details of this in the Configuration section.
For simplifying the initial setup process we are shipping the Terminology Server with a default configuration of a co-located Elasticsearch cluster, a pre-populated OpenLDAP server, and an NGINX reverse proxy with the ability to opt-in for an SSL certificate.
Welcome to the official documentation of the Snow Owl Terminology Server: the search and authoring engine that powers the Snow Owl Authoring Platform and the Snowray Terminology Service. If you want to learn how to install and provision the Terminology Server, you've come to the right place. This guide shows you how to:
Select the appropriate hardware and software environment to host the service
Download, install and configure the entire technology stack necessary for operating the server
Handle release packages to upgrade to a newer version
Perform a data backup or a restore
Manage intermittent tasks, e.g. adding/revoking user access
In case you would like to skip ahead, here is a set of quick links leading to different sections of the guide.
Here is the list of files and folders extracted from the release package and their role described down below.
Contains every configuration file used for the docker stack, including docker-compose.yml.
This folder is considered to be the context by docker, which means that upon executing commands one must either address the config file explicitly or execute docker-compose commands directly inside here.
E.g. to verify the status of the stack there are two approaches:
Execute the command inside ./snow-owl/docker
:
Execute the command from somewhere else then ./snow-owl/docker
:
This folder contains the files necessary to acquire an SSL certificate. None of the files should be changed here ideally.
There is one important file here, elasticsearch.yml
which can be used for fine-tuning the Elasticsearch cluster. However, this is not necessary by default, only if an advanced configuration is required.
This folder contains the files used upon the first start of the OpenLDAP server. The files within describe a set of groups and users to set up an initial user access model. User credentials for the test users can be found in the file called 200_users.ldif
.
Location of all configuration files for NGINX. By default, a non-secure HTTP configuration is assumed. If there is no need for an SSL certificate, then the files here will be used. If an SSL certificate was acquired, then the main configuration file of NGINX (nginx.conf) will be overwritten with the one under /docker/cert/nginx.conf
.
snowowl.yml: this file is the default configuration file of the Terminology Server. It does not need any changes by default either.
users: list of users for file-based authentication. There is one default user called snowowl
for which the credentials can be found under ./docker/.env
.
The main configuration file for the docker stack. This file is replaced in case an SSL certificate was acquired (with file /docker/cert/docker-compose.yml
). This is where volumes, ports, or environment variables can be configured.
The credentials to use for authenticating with the B2i private docker registry.
The collection of environment variables for the docker-compose.yml file.
This is the file to configure most of the settings of the Terminology Server. Including java heap size, Snow Owl or Elasticsearch version, passwords, or folder structure.
The location where the OpenLDAP server stores its data.
Log files of the Terminology Server
Location of Elasticsearch and Snow Owl resources.
This is the data folder of Elasticsearch. Datasets must be extracted to this directory.
Snow Owl's local file storage. Import and export artifacts are stored here.
In case an SSL certificate is acquired, all the files used by certbot
and NGINX are stored here. This folder is automatically created by the certificate retrieval script.
This is the initial folder of all backup artifacts. This should be configured as a network mount to achieve data redundancy.
The Terminology Server is recommended to be installed on x86_64 / amd64 Linux operating systems where Docker Engine is available. See the list of supported distributions by Docker:
Here is the list of distributions that we suggest in the order of recommendation:
CentOS 7
Ubuntu 20.04 (or 18.04)
Debian 10 - Buster
Before starting the actual deployment of the Terminology Server make sure that the following packages are installed and configured properly:
Docker Engine
ability to execute bash scripts
In case a reverse proxy is used, the Terminology Server requires two ports to be opened either towards the intranet or the internet (depends on usage):
http:80
https:443
In case there is no reverse proxy installed, the following port must be opened to be able to access the server's REST API:
http:8080
Having secure HTTP in case the Terminology Server is a public-facing instance is definitely a must. For such cases, we are providing a pre-configured environment and a convenience script to acquire the necessary SSL certificate.
SSL certificate retrieval and renewal are managed by , the official ACME client recommended by .
To be able to obtain an SSL certificate the following requirements must be met:
docker and docker-compose are installed
the server instance has a public IP address
a DNS A record is configured for the desired domain name routing to the server's IP address
For the sake of example let's say the target domain name is snow-owl.b2ihealthcare.com
.
Go to the sub-folder called ./snow-owl/docker/configs/cert
. Make sure the init-certificate.sh
script has permissions to be executable and get some details about its parameters:
As you can see -d
is used for specifying the domain name, and -e
is used for specifying a contact email address (optional). Now execute the script with our example parameters:
Script execution will overwrite the files under ./snow-owl/docker/docker-compose.yml and ./snow-owl/docker/configs/nginx/nginx.conf. Make a note of any changes if required.
After successful execution, a new folder is created ./snow-owl/cert
which contains all the certificate files required by NGINX. The docker-compose.yml file is also amended with a piece of code that guarantees automatic renewal of the certificate:
At this point everything is prepared for having secure HTTP, let's see what else needs to be configured before spinning up the service.
This method is only applicable to deployments where the Elasticsearch cluster is co-located with the Snow Owl Terminology Server.
A managed Elasticsearch service will automatically configure a snapshot policy upon creation. See details .
The Terminology Server release package contains a built-in solution to perform rolling and permanent data backups. The docker stack has a specialized container (called snow-owl-backup
) that is responsible for creating scheduled backups of:
the Elasticsearch indices
the OpenLDAP database (if present)
For the Elasticsearch indices, the backup container uses the . Snapshots are labeled in a predefined format with timestamps. E.g. snowowl-daily-20220324030001
The OpenLDAP database is backed up by compressing the contents of the folder under ./snow-owl/ldap
. Filenames are generated using the name of the corresponding Elasticsearch snapshot. E.g. snowowl-daily-20220324030001.tar.gz
.
Backup Window: when a backup operation is running the Terminology Server blocks all write operations on the Elasticsearch indices. This is to prevent data loss and have consistent backups.
Backup Duration: the very first backup of an Elasticsearch cluster takes a bit more time (depends on the size and I/O performance but between 20 minutes - 40 minutes), subsequent backups should take significantly less: 1 - 5 minutes.
Daily backups are rolling backups, scheduled, and cleaned up based on the settings specified in the ./snow-owl/docker/.env
file. Here is a summary of the important settings that could be changed.
To store backups redundantly it is advised to mount a remote file share to a local path on the host. By default, this folder is configured to be at ./snow-owl/backup
. It contains:
the snapshot files of the Elasticsearch cluster
the backup files of the OpenLDAP database
extra configuration files
Make sure the remote file share has enough free space to store around the double of the ./snow-owl/resources/indexes
folder.
Backup jobs are scheduled by crond, so cron-expressions can be defined here to specify the time a daily backup should happen.
This is used to tell the backup container how many daily backups must be kept.
Let's say we have an external file share mounted to /mnt/external_folder
. There is a need to create daily backups after each working day, during the night at 2:00 am. Only the last two-weeks-worth of data should be kept (assuming 5 working days each week).
It is also possible to perform backups occasionally, e.g. before versioning an important SNOMED CT release or before a Terminology Server version upgrade. These backups are kept until manually removed.
To create such backups the following command needs to be executed using the backup container's terminal:
The script will create a snapshot backup of the Elasticsearch data with a label snowowl-my-backup-label-20220405030002
and an archive that contains the database of the OpenLDAP server with the name snowowl-my-backup-label-20220405030002.tar.gz
.
In certain cases, a pre-built dataset is also shipped together with the Terminology Server. This is to ease the initial setup procedure and get going fast.
This method is only applicable to deployments where the Elasticsearch cluster is co-located with the Terminology Server.
To load data into a managed Elasticsearch cluster, there are several options:
use
use
use Snow Owl to rebuild the data to the remote cluster
These datasets are the compressed form of the Elasticsearch data folder which follows the same structure. Except for having a top folder called indexes
. This is the same folder as in ./snow-owl/resources/indexes
. So to be able to load the dataset one should just extract the contents of the dataset archive to this path.
Make sure to validate the file ownership of the indexes folder after decompression. Elasticsearch requires UID=1000 and GID=0 to be set for its data folder.
The release package contains everything that is required to use a co-located Elasticsearch instance by default. Follow these steps only when there is a need for a remote Elasticsearch cluster.
To configure the Terminology Server to work with a managed Elasticsearch cluster two settings require attention.
First, the local Elasticsearch container and all its configurations should be removed from the docker-compose.yml file. Once that is done, we have to tell the Terminology Server where to find the cluster. This can be set in the file ./snow-owl/docker/configs/snowowl/snowowl.yml
:
Snow Owl TS leverages Elasticssearch's synonym filters. To have this feature work properly with a managed Elasticsearch cluster our custom dictionary has to be uploaded and configured. The synonym file can be found in the release package under ./snow-owl/docker/configs/elasticsearch/synonym.txt
. This file needs to be compressed as an zip
archive by following this structure:
For the managed Elasticsearch instance this zip file needs to be configured as a bundle extension. The steps required are covered in this guide in great detail:
Once the bundle is configured and the cluster is up we can (re)start the docker stack. In case there are any troubles the Terminology Server will refuse to initialize and let you know what the problem is in its log files.
When a new Snow Owl Terminology Server release is available we recommend performing the following steps.
New releases are going to be distributed the same way: a docker stack and its configuration within an archive.
It is advised to decompress the new release files to a temporary folder and compare the contents of ./snow-owl/docker
.
The changes usually are restricted to version numbers in the .env
file. In such cases, it is equally acceptable to overwrite the contents of the ./snow-owl/docker
folder as is or cherry-pick the necessary modifications by hand.
Once the new version of the files is in place it is sufficient to just issue the following commands, an explicit stop of the service is not even required (in the folder ./snow-owl/docker
):
Do not usedocker-compose restart
because it won't pick up any .yml or .env file changes. See the .
Full list of steps to perform before spinning up the service:
Extract the Terminology Server release archive to a folder. E.g. /opt/snow-owl
(Optional) Obtain an SSL certificate
Make sure a DNS A record is routed to the host's public IP address
Go into the folder ./snow-owl/docker/cert
Execute the ./init-certificate.sh
script:
(Optional) Configure access for managed Elasticsearch Cluster (elastic.co)
(Optional) Extract dataset to ./snow-owl/resources
where folder structure should look like ./snow-owl/resources/indexes/nodes/0
at the end.
Verify file ownership to be UID=1000 and GID=0:
Check any credentials or settings that need to be changed in ./snow-owl/docker/.env
Authenticate with our private docker registry while in the folder ./snow-owl/docker
:
Issue a pull (in folder ./snow-owl/docker
)
Spin up the service (in the folder ./snow-owl/docker
)
Verify that the REST API of the Terminology Server is available at:
With SSL: https://snow-owl.example.com/snowowl
Without SSL: http://hostname:8080/snowowl
Verify that the server and cluster status is GREEN by querying the following REST API endpoint:
With SSL:
Without SSL:
Enjoy using the Snow Owl Terminology Server
The currently supported version of Elasticsearch is which is upward compatible with any patch releases coming on the 7.x version stream. Elasticsearch v8 is not supported yet.
a managed Elasticsearch cluster hosted by
Having a co-located Elasticsearch service next to the Terminology Server has a direct impact on the hardware requirements. See our list of recommended hardware on the .
For authorization and authentication, the application supports any traditional LDAP Directory Servers. We recommend starting with and evolving to other solutions later because it is easy to set up and maintain while keeping Snow Owl's user data isolated from any other A&A services.
A reverse proxy, such as is recommended to be utilized between the Terminology Server and either the intranet or the internet. This will increase security and help with channeling REST API requests appropriately.
Pro tip: in case the Terminology Server is deployed to the cloud, make sure this path is served by a fast SSD disk (local or ephemeral SSD is the best). This will make import or export processes even faster.
Using the custom backup container it is possible to restore:
the Elasticsearch indices
the OpenLDAP database (if present)
To restore any of the data the following steps have to be performed:
stop Snow Owl, Elasticsearch, and the OpenLDAP containers (in the folder ./snow-owl/docker
):
(re)move the contents of the old / corrupted Elasticsearch data folder:
restart the Elasticsearch container only (keep Snow Owl stopped):
use the backup container's terminal and execute the restore script:
without any parameters, if only the Elasticsearch indices have to be restored
with parameter -l
in case the Elasticsearch indices and the OpenLDAP database have to be restored at the same time
the script will list all available backups and prompts for selection:
enter the numerical identifier of the backup to restore and wait until the process finishes
exit the backup container and restart all containers:
In case only the contents of the OpenLDAP server has to be restored, it is sufficient to just extract the contents of the backup archive to ./snow-owl/ldap
and restart the container.
The Snow Owl Terminology Server has two different ways to manage users. The primary authentication and authorization service is the LDAP Directory Server. The secondary option is a file-based database, used only for administrative purposes. Whenever user access has to be granted or revoked the following methods could be applied.
This is only applicable to the default deployment setup where a co-located OpenLDAP server is used alongside the Terminology Server.
There are several ways to access and manage an OpenLDAP server, hereby we will only describe one of them, through the Apache Directory Studio.
Apache Directory Studio is an open-source, free application. It is available to download for different platforms (Windows, macOS, and Linux).
Before accessing the LDAP database there is one technical prerequisite to satisfy. The OpenLDAP server has to be accessible from the machine Apache Directory Studio is installed. The best and most secure way to achieve that is to set up an SSH tunnel. Follow this link to an article that describes how to configure an SSH tunnel using PuTTY and Windows.
The OpenLDAP server uses port 389 for communication. This is the port that needs to be tunneled through the SSH connection. Here is what the final configuration looks like in PuTTY:
Once the SSH tunnel works, it's time to set up our connection in Apache DS. Go to File -> New -> LDAP Connection and set the following:
Hit the "Check Network Parameter" button to verify the network connection.
Go to the next page of the wizard and provide your credentials. The default Bind DN and Bind password can be found in the Terminology Server release package under ./snow-owl/docker/.env
.
Hit the "Check Authentication" button to verify your credentials. Hit Finish to complete the setup procedure.
All users and groups should be browseable now through the LDAP Browser view:
To grant access to a new user an LDAP entry has to be created. Go to the LDAP Browse view and right-click on the organization node, then New -> New Entry:
It is the easiest to use an existing entry as a template:
Leave everything as is on the Object Classes page, then hit Next. Fill in the new user's credentials:
On the final page, double click on the userPassword row and provide the user's password:
Hit Finish to add the user to the database.
Now we need to assign a role for the user. Before going forward, get ahold of the user's DN using the LDAP Browser view:
Select the desired role group in the Browser view and add a new attribute:
Select the attribute type uniqueMember
and hit Finish:
Paste the user's DN as the value of the attribute and hit Enter to make your changes permanent:
To revoke access the user has to be deleted from the list of users:
And also has to be removed from the role group:
To change either the first or last name, or the password of a user, just edit any of the attributes in the user editor:
There is a configuration file ./snow-owl/docker/configs/snowowl/users
that contains the list of users with their credentials encrypted. The passwords are encrypted using the bcrypt hash algorithm (variant $2a$). This method of authentication should be used for testing or internal purposes only, users added here will have elevated privileges.
To apply any changes made to the users
file the Terminology Server has to be restarted afterward.
To grant access the users
file has to be amended with the new user and its credentials. There are several ways to encrypt a password using the bcrypt algorithm but here is one that is easy and available on most of the Linux variants. The package called htpasswd
has to be installed:
It will prompt for the password and will amend the file with the new user at the end.
Simply remove the user's line from the file and restart the service.
Remove the user's line from the file and regenerate the credentials according to the Grant user access section.