1 of 21

8.9.2 Introduction

Snow Owl® TS Admin Guide

Introduction

Welcome to the official documentation of the Snow Owl Terminology Server: the search and authoring engine that powers the Snow Owl Authoring Platform and the Snowray Terminology Service. If you want to learn how to install and provision the Terminology Server, you've come to the right place. This guide shows you how to:

Select the appropriate hardware and software environment to host the service
Download, install and configure the entire technology stack necessary for operating the server
Handle release packages to upgrade to a newer version
Perform a data backup or a restore
Manage intermittent tasks, e.g. adding/revoking user access

In case you would like to skip ahead, here is a set of quick links leading to different sections of the guide.

Plan your deployment

Technology stack

The technology stack behind the Terminology Server consists of the following components:

The Terminology Server application
Elasticsearch as the data layer
An LDAP-compliant authentication and authorization service
Optional: A reverse-proxy handling the requests towards the REST API

Terminology Server

Outgoing communication from the Terminology Server goes via:

HTTP(s) towards Elasticsearch
LDAP(s) towards the A&A service

Incoming communication is handled through the HTTP port of 8080.

A selected reverse proxy solution is responsible for channeling all incoming traffic through to the Terminology Server.

Elasticsearch

The currently supported version of Elasticsearch is v7.17.1 which is upward compatible with any patch releases coming on the 7.x version stream. Elasticsearch v8 is not supported yet.

The Elasticsearch cluster can either be:

a co-located, single-node, self-hosted cluster
a managed Elasticsearch cluster hosted by elastic.co

Having a co-located Elasticsearch service next to the Terminology Server has a direct impact on the hardware requirements. See our list of recommended hardware on the next page.

LDAP-compliant A&A service

For authorization and authentication, the application supports any traditional LDAP Directory Servers. We recommend starting with OpenLDAP and evolving to other solutions later because it is easy to set up and maintain while keeping Snow Owl's user data isolated from any other A&A services.

Reverse proxy

A reverse proxy, such as NGINX is recommended to be utilized between the Terminology Server and either the intranet or the internet. This will increase security and help with channeling REST API requests appropriately.

With a preconfigured domain name and DNS record, the default installation package can take care of requesting and maintaining the necessary certificates for secure HTTP. See the details of this in the Configuration section.

For simplifying the initial setup process we are shipping the Terminology Server with a default configuration of a co-located Elasticsearch cluster, a pre-populated OpenLDAP server, and an NGINX reverse proxy with the ability to opt-in for an SSL certificate.

Hardware requirements

Snow Owl TS and co-located Elasticsearch cluster

For installations where Snow Owl TS and Elasticsearch is co-located we recommend the following hardware specification:

Snow Owl TS + ES

Cloud

Dedicated

Snow Owl TS and managed Elasticsearch cluster

For installations where Snow Owl TS connects to a managed Elasticsearch cluster at we recommend the following hardware specification:

Snow Owl TS

Cloud

Dedicated

Elasticsearch @ elastic.co

Cloud

Cloud VMs

Here are a few examples of which Virtual Machine types could be used for hosting the Terminology Server at the three most popular Cloud providers (including but not limited to):

Cloud Provider

VM type

Software requirements

Operating System

The Terminology Server is recommended to be installed on x86_64 / amd64 Linux operating systems where Docker Engine is available. See the list of supported distributions by Docker:

Here is the list of distributions that we suggest in the order of recommendation:

CentOS 7
Ubuntu 20.04 (or 18.04)
Debian 10 - Buster

Software packages

Before starting the actual deployment of the Terminology Server make sure that the following packages are installed and configured properly:

Docker Engine
ability to execute bash scripts

Firewall

In case a reverse proxy is used, the Terminology Server requires two ports to be opened either towards the intranet or the internet (depends on usage):

http:80
https:443

In case there is no reverse proxy installed, the following port must be opened to be able to access the server's REST API:

http:8080

Configuration

Release package

Terminology Server releases are shared with customers through custom download URLs. The downloaded artifact is a Linux (tar.gz) archive that contains:

an initial folder structure
the configuration files for all services
a docker-compose.yml file that brings together the entire technology stack to run and manage the service
the credentials required to pull our proprietary docker images

As a best practice, it is advised to extract the content of the archive under /opt. So the deployment folder will be /opt/snow-owl. The docker-compose setup will rely on this path, however, if required it can be changed by editing the ./snow-owl/docker/.env file later on (see DEPLOYMENT_FOLDER environment variable).

When decompressing the archive it is important to use the --same-owner and --preserve-permissions options so the docker containers can access the files and folders appropriately.

The next page will describe the content of the release package in more detail.

Folder structure

Here is the list of files and folders extracted from the release package and their role described down below.

/docker

Contains every configuration file used for the docker stack, including docker-compose.yml.

This folder is considered to be the context by docker, which means that upon executing commands one must either address the config file explicitly or execute docker-compose commands directly inside here.

E.g. to verify the status of the stack there are two approaches:

Execute the command inside ./snow-owl/docker:

Execute the command from somewhere else then ./snow-owl/docker:

/docker/configs/cert

This folder contains the files necessary to acquire an SSL certificate. None of the files should be changed here ideally.

/docker/configs/elasticsearch

There is one important file here, elasticsearch.yml which can be used for fine-tuning the Elasticsearch cluster. However, this is not necessary by default, only if an advanced configuration is required.

/docker/configs/ldap-boostrap

This folder contains the files used upon the first start of the OpenLDAP server. The files within describe a set of groups and users to set up an initial user access model. User credentials for the test users can be found in the file called 200_users.ldif.

/docker/configs/nginx

Location of all configuration files for NGINX. By default, a non-secure HTTP configuration is assumed. If there is no need for an SSL certificate, then the files here will be used. If an SSL certificate was acquired, then the main configuration file of NGINX (nginx.conf) will be overwritten with the one under /docker/cert/nginx.conf.

/docker/configs/snowowl

snowowl.yml: this file is the default configuration file of the Terminology Server. It does not need any changes by default either.

users: list of users for file-based authentication. There is one default user called snowowl for which the credentials can be found under ./docker/.env.

/docker/docker-compose.yml

The main configuration file for the docker stack. This file is replaced in case an SSL certificate was acquired (with file /docker/cert/docker-compose.yml). This is where volumes, ports, or environment variables can be configured.

/docker/docker_login.txt

The credentials to use for authenticating with the B2i private docker registry.

/docker/.env

The collection of environment variables for the docker-compose.yml file.

This is the file to configure most of the settings of the Terminology Server. Including java heap size, Snow Owl or Elasticsearch version, passwords, or folder structure.

/ldap

The location where the OpenLDAP server stores its data.

/logs

Log files of the Terminology Server

/resources

Location of Elasticsearch and Snow Owl resources.

/resources/indexes

This is the data folder of Elasticsearch. Datasets must be extracted to this directory.

/resources/attachments

Snow Owl's local file storage. Import and export artifacts are stored here.

Pro tip💡: in case the Terminology Server is deployed to the cloud, make sure this path is served by a fast SSD disk (local or ephemeral SSD is the best). This will make import or export processes even faster.

/cert (optional)

In case an SSL certificate is acquired, all the files used by certbot and NGINX are stored here. This folder is automatically created by the certificate retrieval script.

/backup (optional)

This is the initial folder of all backup artifacts. This should be configured as a network mount to achieve data redundancy.

Get SSL certificate (optional)

Having secure HTTP in case the Terminology Server is a public-facing instance is definitely a must. For such cases, we are providing a pre-configured environment and a convenience script to acquire the necessary SSL certificate.

SSL certificate retrieval and renewal are managed by , the official ACME client recommended by .

To be able to obtain an SSL certificate the following requirements must be met:

docker and docker-compose are installed
the server instance has a public IP address
a DNS A record is configured for the desired domain name routing to the server's IP address

For the sake of example let's say the target domain name is snow-owl.b2ihealthcare.com .

Go to the sub-folder called ./snow-owl/docker/configs/cert. Make sure the init-certificate.sh script has permissions to be executable and get some details about its parameters:

As you can see -d is used for specifying the domain name, and -e is used for specifying a contact email address (optional). Now execute the script with our example parameters:

Script execution will overwrite the files under ./snow-owl/docker/docker-compose.yml and ./snow-owl/docker/configs/nginx/nginx.conf. Make a note of any changes if required.

After successful execution, a new folder is created ./snow-owl/cert which contains all the certificate files required by NGINX. The docker-compose.yml file is also amended with a piece of code that guarantees automatic renewal of the certificate:

At this point everything is prepared for having secure HTTP, let's see what else needs to be configured before spinning up the service.

Preload dataset (optional)

In certain cases, a pre-built dataset is also shipped together with the Terminology Server. This is to ease the initial setup procedure and get going fast.

This method is only applicable to deployments where the Elasticsearch cluster is co-located with the Terminology Server.

To load data into a managed Elasticsearch cluster, there are several options:

use
use
use Snow Owl to rebuild the data to the remote cluster

These datasets are the compressed form of the Elasticsearch data folder which follows the same structure. Except for having a top folder called indexes . This is the same folder as in ./snow-owl/resources/indexes . So to be able to load the dataset one should just extract the contents of the dataset archive to this path.

Make sure to validate the file ownership of the indexes folder after decompression. Elasticsearch requires UID=1000 and GID=0 to be set for its data folder.

Configure Elastic Cloud (optional)

The release package contains everything that is required to use a co-located Elasticsearch instance by default. Follow these steps only when there is a need for a remote Elasticsearch cluster.

To configure the Terminology Server to work with a managed Elasticsearch cluster two settings require attention.

Configure Terminology Server

First, the local Elasticsearch container and all its configurations should be removed from the docker-compose.yml file. Once that is done, we have to tell the Terminology Server where to find the cluster. This can be set in the file ./snow-owl/docker/configs/snowowl/snowowl.yml:

Configure Elastic Cloud

Snow Owl TS leverages Elasticssearch's synonym filters. To have this feature work properly with a managed Elasticsearch cluster our custom dictionary has to be uploaded and configured. The synonym file can be found in the release package under ./snow-owl/docker/configs/elasticsearch/synonym.txt. This file needs to be compressed as an zip archive by following this structure:

For the managed Elasticsearch instance this zip file needs to be configured as a bundle extension. The steps required are covered in this guide in great detail:

Once the bundle is configured and the cluster is up we can (re)start the docker stack. In case there are any troubles the Terminology Server will refuse to initialize and let you know what the problem is in its log files.

Spin up the service

Full list of steps to perform before spinning up the service:

Extract the Terminology Server release archive to a folder. E.g. /opt/snow-owl
(Optional) Obtain an SSL certificate
1. Make sure a DNS A record is routed to the host's public IP address
2. Go into the folder ./snow-owl/docker/cert
3. Execute the ./init-certificate.sh script:
(Optional) Configure access for managed Elasticsearch Cluster (elastic.co)
(Optional) Extract dataset to ./snow-owl/resources where folder structure should look like ./snow-owl/resources/indexes/nodes/0 at the end.
Verify file ownership to be UID=1000 and GID=0:
Check any credentials or settings that need to be changed in ./snow-owl/docker/.env
Authenticate with our private docker registry while in the folder ./snow-owl/docker:
Issue a pull (in folder ./snow-owl/docker)
Spin up the service (in the folder ./snow-owl/docker)
Verify that the REST API of the Terminology Server is available at:
1. With SSL: https://snow-owl.example.com/snowowl
2. Without SSL: http://hostname:8080/snowowl
Verify that the server and cluster status is GREEN by querying the following REST API endpoint:
1. With SSL:
2. Without SSL:
Enjoy using the Snow Owl Terminology Server 🎉

Software Upgrades

Perform an upgrade

When a new Snow Owl Terminology Server release is available we recommend performing the following steps.

New releases are going to be distributed the same way: a docker stack and its configuration within an archive.

It is advised to decompress the new release files to a temporary folder and compare the contents of ./snow-owl/docker .

[root@host]# diff /opt/snow-owl/docker/ /opt/new-snow-owl-release/snow-owl/docker/
Common subdirectories: /opt/snow-owl/docker/configs and /opt/new-snow-owl-release/snow-owl/docker/configs
diff /opt/snow-owl/docker/.env /opt/new-snow-owl-release/snow-owl/docker/.env
10c10
< ELASTICSEARCH_VERSION=7.16.3
---
> ELASTICSEARCH_VERSION=7.17.1
24c24
< SNOWOWL_VERSION=8.1.0
---
> SNOWOWL_VERSION=8.1.1

The changes usually are restricted to version numbers in the .env file. In such cases, it is equally acceptable to overwrite the contents of the ./snow-owl/docker folder as is or cherry-pick the necessary modifications by hand.

Once the new version of the files is in place it is sufficient to just issue the following commands, an explicit stop of the service is not even required (in the folder ./snow-owl/docker):

docker-compose pull
docker-compose up -d

Do not usedocker-compose restart because it won't pick up any .yml or .env file changes. See the official docker guider.

Backup and Restore

Backup

This method is only applicable to deployments where the Elasticsearch cluster is co-located with the Snow Owl Terminology Server.

A managed Elasticsearch service will automatically configure a snapshot policy upon creation. See details .

The Terminology Server release package contains a built-in solution to perform rolling and permanent data backups. The docker stack has a specialized container (called snow-owl-backup) that is responsible for creating scheduled backups of:

the Elasticsearch indices
the OpenLDAP database (if present)

For the Elasticsearch indices, the backup container uses the . Snapshots are labeled in a predefined format with timestamps. E.g. snowowl-daily-20220324030001

The OpenLDAP database is backed up by compressing the contents of the folder under ./snow-owl/ldap. Filenames are generated using the name of the corresponding Elasticsearch snapshot. E.g. snowowl-daily-20220324030001.tar.gz.

Backup Window: when a backup operation is running the Terminology Server blocks all write operations on the Elasticsearch indices. This is to prevent data loss and have consistent backups.

Backup Duration: the very first backup of an Elasticsearch cluster takes a bit more time (depends on the size and I/O performance but between 20 minutes - 40 minutes), subsequent backups should take significantly less: 1 - 5 minutes.

Daily backups

Daily backups are rolling backups, scheduled, and cleaned up based on the settings specified in the ./snow-owl/docker/.env file. Here is a summary of the important settings that could be changed.

BACKUP_FOLDER

To store backups redundantly it is advised to mount a remote file share to a local path on the host. By default, this folder is configured to be at ./snow-owl/backup. It contains:

the snapshot files of the Elasticsearch cluster
the backup files of the OpenLDAP database
extra configuration files

Make sure the remote file share has enough free space to store around the double of the ./snow-owl/resources/indexes folder.

CRON_DAYS, CRON_HOURS, CRON_MINUTES

Backup jobs are scheduled by crond, so cron-expressions can be defined here to specify the time a daily backup should happen.

NUMBER_OF_DAILY_BACKUPS_TO_KEEP

This is used to tell the backup container how many daily backups must be kept.

Example daily backup config

Let's say we have an external file share mounted to /mnt/external_folder. There is a need to create daily backups after each working day, during the night at 2:00 am. Only the last two-weeks-worth of data should be kept (assuming 5 working days each week).

One-off backups

It is also possible to perform backups occasionally, e.g. before versioning an important SNOMED CT release or before a Terminology Server version upgrade. These backups are kept until manually removed.

To create such backups the following command needs to be executed using the backup container's terminal:

The script will create a snapshot backup of the Elasticsearch data with a label snowowl-my-backup-label-20220405030002 and an archive that contains the database of the OpenLDAP server with the name snowowl-my-backup-label-20220405030002.tar.gz.

Restore

Using the custom backup container it is possible to restore:

the Elasticsearch indices
the OpenLDAP database (if present)

To restore any of the data the following steps have to be performed:

stop Snow Owl, Elasticsearch, and the OpenLDAP containers (in the folder ./snow-owl/docker):

docker-compose stop snowowl elasticsearch ldap

(re)move the contents of the old / corrupted Elasticsearch data folder:

mv -t /tmp ./snow-owl/resources/indexes/nodes

restart the Elasticsearch container only (keep Snow Owl stopped):

docker-compose start elasticsearch

use the backup container's terminal and execute the restore script:
- without any parameters, if only the Elasticsearch indices have to be restored
```
root@host:# docker exec -it backup bash
root@ad36cfb0448c:# /backup/restore.sh
```
- with parameter -l in case the Elasticsearch indices and the OpenLDAP database have to be restored at the same time
```
root@host:# docker exec -it backup bash
root@ad36cfb0448c:# /backup/restore.sh -l
```
the script will list all available backups and prompts for selection:

root@ad36cfb0448c:# /backup/restore.sh

################################
Snow Owl restore script STARTED.

#### Verify Elasticsearch snapshot repository ####

Checking existence of repository 'snowowl-snapshots' ...
Repository with name 'snowowl-snapshots' is present, verifying repository state ...
Repository 'snowowl-snapshots' is functional

#### Select backup to restore ####

Found 10 available backups under '/backup'
Please select the backup to restore by choosing the right number in the menu below (hit Enter when the selection was made)

 1) snowowl-daily-20220323030001
 2) snowowl-daily-20220324030001
 3) snowowl-daily-20220325030002
 4) snowowl-daily-20220326030002
 5) snowowl-daily-20220329030001
 6) snowowl-daily-20220330030001
 7) snowowl-daily-20220331030002
 8) snowowl-daily-20220401030002
 9) snowowl-daily-20220402030001
10) snowowl-daily-20220405030002

#?

enter the numerical identifier of the backup to restore and wait until the process finishes
exit the backup container and restart all containers:

root@ad36cfb0448c:# exit
root@host:# docker-compose up -d

In case only the contents of the OpenLDAP server has to be restored, it is sufficient to just extract the contents of the backup archive to ./snow-owl/ldap and restart the container.

Miscellaneous

User management

The Snow Owl Terminology Server has two different ways to manage users. The primary authentication and authorization service is the LDAP Directory Server. The secondary option is a file-based database, used only for administrative purposes. Whenever user access has to be granted or revoked the following methods could be applied.

LDAP based identity provider

This is only applicable to the default deployment setup where a co-located OpenLDAP server is used alongside the Terminology Server.

There are several ways to access and manage an OpenLDAP server, hereby we will only describe one of them, through the Apache Directory Studio.

Apache Directory Studio is an open-source, free application. It is available to download for different platforms (Windows, macOS, and Linux).

Before accessing the LDAP database there is one technical prerequisite to satisfy. The OpenLDAP server has to be accessible from the machine Apache Directory Studio is installed. The best and most secure way to achieve that is to set up an SSH tunnel. Follow this link to an article that describes how to configure an SSH tunnel using PuTTY and Windows.

The OpenLDAP server uses port 389 for communication. This is the port that needs to be tunneled through the SSH connection. Here is what the final configuration looks like in PuTTY:

Once the SSH tunnel works, it's time to set up our connection in Apache DS. Go to File -> New -> LDAP Connection and set the following:

Hit the "Check Network Parameter" button to verify the network connection.

Go to the next page of the wizard and provide your credentials. The default Bind DN and Bind password can be found in the Terminology Server release package under ./snow-owl/docker/.env.

Hit the "Check Authentication" button to verify your credentials. Hit Finish to complete the setup procedure.

All users and groups should be browseable now through the LDAP Browser view:

Grant user access

To grant access to a new user an LDAP entry has to be created. Go to the LDAP Browse view and right-click on the organization node, then New -> New Entry:

It is the easiest to use an existing entry as a template:

Leave everything as is on the Object Classes page, then hit Next. Fill in the new user's credentials:

On the final page, double click on the userPassword row and provide the user's password:

Hit Finish to add the user to the database.

Now we need to assign a role for the user. Before going forward, get ahold of the user's DN using the LDAP Browser view:

Select the desired role group in the Browser view and add a new attribute:

Select the attribute type uniqueMember and hit Finish:

Paste the user's DN as the value of the attribute and hit Enter to make your changes permanent:

Revoke user access

To revoke access the user has to be deleted from the list of users:

And also has to be removed from the role group:

Change credentials

To change either the first or last name, or the password of a user, just edit any of the attributes in the user editor:

File-based identity provider

There is a configuration file ./snow-owl/docker/configs/snowowl/users that contains the list of users with their credentials encrypted. The passwords are encrypted using the bcrypt hash algorithm (variant $2a$). This method of authentication should be used for testing or internal purposes only, users added here will have elevated privileges.

To apply any changes made to the users file the Terminology Server has to be restarted afterward.

Grant user access

To grant access the users file has to be amended with the new user and its credentials. There are several ways to encrypt a password using the bcrypt algorithm but here is one that is easy and available on most of the Linux variants. The package called htpasswd has to be installed:

htpasswd -nBC 10 my-new-username | head -n1 | sed 's/$2y/$2a/g' >> ./snow-owl/docker/configs/snowowl/users

It will prompt for the password and will amend the file with the new user at the end.

Revoke user access

Simply remove the user's line from the file and restart the service.

Change credentials

Remove the user's line from the file and regenerate the credentials according to the Grant user access section.

Content syndication

In version v8.9.0 a new content syndication feature has been introduced which allows data to be seamlessly moved between different Snow Owl Terminology Server deployments.

This functionality is useful when content created in a central deployment (upstream) needs to be distributed to one or more read-only downstream instances. The resource distribution is designed to be uni-directional and semi-automated where an actor has to configure any new downstream instances to be able to receive data from the central unit.

Configure upstream

To be able to access the upstream server and its content the following items are required:

the HTTP port of Elasticsearch has to be accessible for the downstream Snow Owl and Elasticsearch instances (configured via the http.port property, the default is 9200)
the REST API of Snow Owl has to be accessible for the downstream Snow Owl servers
an Elasticsearch API key with sufficient privileges for authentication and authorization
a Snow Owl API key with sufficient privileges for authentication and authorization
configure selected terminology resources as distributable

Access Elasticsearch

In case Snow Owl uses a self-hosted Elasticsearch instance the HTTP port can be opened by modifying the container settings in the docker-compose.yml file. Make sure to remove the localhost prefix from the port declaration (~~127.0.0.1:~~9200:9200)

docker-compose.yml

...
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:${ELASTICSEARCH_VERSION}
    container_name: elasticsearch
...
    ports:
      - "9200:9200"

When opening up a self-hosted Elasticsearch make sure to use strengthened security with secure HTTP and username/password access.

A detailed guide on Elasticsearch security can be found here:

In the case of a hosted Elasticsearch instance there is nothing to do, it will already be accessible from outside.

Access Snow Owl

The default reverse proxy configuration (shipped in the released package) exposes the Snow Owl REST API via the URL: http(s)://upstream-snow-owl-url/snowowl

Other than that no additional configuration is needed.

Obtain an Elasticsearch API key

Creating a new API key for Elasticsearch is either possible through its Api Key API or - in the case of a hosted instance - from within Kibana.

The content syndication operation requires the following permissions:

cluster privilege: monitor
index privilege: read

Here is an example request body for the Api Key API:

POST /_security/api_key
{
  "name": "syndication-api-key",
  "expiration": "30d",
  "role_descriptors": { 
    "syndicate-role": {
      "cluster": [
        "monitor"
      ],
      "indices": [
        {
          "names": [
            "*"
          ],
          "privileges": [
            "read"
          ]
        }
      ]
    }
  }
}

This request will return with the following response:

{
  "id" : "<token_id>",
  "name" : "syndication-api-key",
  "expiration" : 0,
  "api_key" : "<api_key>",
  "encoded" : "<encoded_api_key>"
}

Take note of the encoded API Key, which is the one that will be used later on.

To obtain an API key using Kibana, follow this guide with the same settings from above:

Obtain a Snow Owl API Key

To request an API key from the upstream Snow Owl Terminology Server the following REST API endpoint must be used:

To request an API key

POST https://upstream-snow-owl-url/snowowl/token

Request Body

Name

Type

Description

username*

String

The username to authenticate with

password*

String

The password belonging to the username

token

String

Previous token to re-new

expiration

String

Expiration interval, e.g. 1d or 2h

permissions

List<String>

List of permissions

{
    token: "<snow-owl-api-key>"
}

Select distributable resources

All three major terminology resource types can be configured as distributable. Resources have a settings map that can be updated via their specific REST API endpoints:

PUT /codesystems/{codeSystemId}
PUT /valuesets/{valueSetId}
PUT /conceptmaps/{conceptMapId}

A setting called distributable has to be set which either could be true or false. Here is an example update request to mark 'Example Code System' as distributable:

PUT /codesystems/example_codesystem_id
{
  "settings": {
    "distributable": true
  }
}

Configure downstream

Elasticsearch

There is one configuration property that must be set before provisioning a new downstream Snow Owl Terminology Server.

Any potential upstream Elasticsearch instance must be listed as an allowed source of information for the downstream Elasticsearch instances via a configuration parameter in the elasticsearch.yml file.

The property is called reindex.remote.whitelist :

elasticsearch.yml

...
http.port: 9200
...
reindex.remote.whitelist: ["upstream-elasticsearch-url.com:9200", "other-upstream-elasticsearch-url.com:9200"]

The whitelisted URL must contain the upstream HTTP port and must not contain the scheme.

Provision a new downstream server

Provisioning a new downstream server has the following prerequisites:

start with an empty dataset
collect all terminology resource identifiers that need to be syndicated
get all the necessary credentials to communicate with upstream
initiate the resource syndication and verify the result

Collect terminology resources for syndication

To populate a downstream server with terminology resources via an upstream source, one must collect the required resource identifiers or resource version identifiers beforehand.

Resource identifiers must be in their simple form, e.g.:

SNOMED-CT
ICD-10
LOINC

Resource version identifiers must be in the following form <resource_id>/<version_id>, e.g.:

SNOMED-CT/2020-01-31
ICD-10/v2019
LOINC/v2.72

To determine which resources are available for syndication, the following upstream REST API endpoint can be used. It returns an atom feed that consists of resource versions from where one can collect the required identifiers.

Retrieve syndication resource feed

GET https://upstream-snow-owl-url/snowowl/syndication/feed.xml

Retrieves the feed of all distributable resources

Query Parameters

Name

Type

Description

resource

List<String>

The resource identifier(s) to include in the feed

resourceType

List<String>

The types of resources to include in the feed (e.g. conceptmaps, valuesets, codesystems)

resourceUrl

List<String>

The URLs of the resources to include in the feed

packageTypes

List<String>

The types of packages to include in the feed. Only BINARY is supported at the moment

effectiveTime

String

The effective time value to match (yyyyMMdd) or an effective time range value to match (yyyyMMdd...yyyyMMdd), inclusive range

createdAt

Long

Exact match filter for the resource version created at field

createdAtFrom

Long

Greater than equal to filter for the resource version created at field

createdAtTo

String

Less than equal to filter for the resource version created at field

limit*

int

The maximum number of items to return

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <id>urn:uuid:ddce3cd6-2efe-3142-9cce-62e73d3031ca</id>
  <title>Snow Owl® Terminology Server Syndication Feed</title>
  ...
  <entry>
    <id>valuesets/1234/V1.0</id>
    ...
    <title>Valueset example</title>
    <category term="BINARY" scheme="https://b2ihealthcare.com/snow-owl/syndication/binary/1.0.0" label="Binary index"/>
    ...
  </entry>
</feed>

It is not required to list all resource version identifiers for an already selected resource. E.g.:

If SNOMED-CT is selected as a resource, it is not required to select all its versions among the version resource identifiers.
If a specific version is selected (SNOMED-CT/2020-01-31) and the resource is not listed among the selected resources, then only versions created until 2020-01-31 will be syndicated

Syndicate resources

To kick off a syndication process the following parameters are required:

the list of resource identifiers
the list of resource version identifiers
the upstream Snow Owl URL without its REST API root context:
- e.g. https://upstream-snow-owl-url.com
the API key to authenticate with the upstream Snow Owl server
the upstream Elasticsearch URL, including the scheme and port:
- e.g. https://upstream-elasticsearch-url.com:9200
the API key to authenticate with the upstream Elasticsearch

When there are no existing resources on the downstream server yet, at least one resource identifier or one resource version identifier must be selected.

Snow Owl will resolve all resource dependencies and will handle syndication requests rigorously. If e.g. a Value Set depends on a specific SNOMED CT version and that version is not among the selected resources - or does not exist on the downstream server yet - the syndication run will fail to note that there is a missing dependency. It is always required to list all dependencies that the selected resources have for a given syndication run.

The above parameters should be fed to the following downstream Snow Owl REST API endpoint:

Syndicate resource(s)

POST https://downstream-snow-owl-url/snowowl/syndication/syndicate

Syndicate resources from a remote Snow Owl instance. In case no resource identifiers are provided, all existing resources will be syndicated to their latest version.

Request Body

Name

Type

Description

resource

List<String>

List of resource identifiers

version

List<String>

List of version resource identifiers

upstreamUrl*

String

The URL of the upstream Snow Owl

upstreamToken*

String

API key for the upstream Snow Owl

upstreamDataUrl*

String

The URL of the upstream Elasticsearch

upstreamDataToken*

String

API key for the upstream Elasticsearch

The syndication process starts in the background as an asynchronous job. It can be tracked by calling the following endpoint using the job identifier returned in the Location:

Retrieve syndication job

GET https://downstream-snow-owl-url/snowowl/syndication/{id}

Returns the specified syndication run's configuration and status.

Path Parameters

Name

Type

Description

id*

String

The identifier of a syndication run

{
    // Response
}

The returned result object will contain all information related to the given syndication run:

status of the run (RUNNING, FINISHED, FAILED)
list of successfully syndicated resource versions
additional details about created or updated Elasticsearch indices

Examples of resource selection

Code Systems

There is a need to syndicate the SNOMED-CT US extension. It depends on the SNOMED CT International version 2021-01-31. Provide the following resource identifier and resource version identifier configuration:

{
  "resource": "SNOMED-CT-US",
  "version": "SNOMED-CT/2021-01-31"
}

This will syndicate all versions of SNOMED-CT-US and all international versions until 2021-01-31.

If the configuration is changed to:

{
  "resource": "SNOMED-CT-US, SNOMED-CT"
  "version": ""
}

This will syndicate all versions of SNOMED-CT-US and SNOMED-CT international, including all international versions even after 2021-01-31.

Value Sets

There is a Value Set with an identifier of VS and members from SNOMED-CT/2020-07-31:

{
  "resource": "VS"
  "version": "SNOMED-CT/2020-07-31"
}

Concept Maps

There is a Concept Map with an identifier of CM mapping concepts between LOINC/v2.72 and ICD-10/v2019:

{
  "resource": "CM"
  "version": "LOINC/v2.72, ICD-10/v2019"
}

Keeping a downstream server up-to-date

If a given downstream server already contains the desired resources and the goal is to keep the content up-to-date, it is not required to fill in the resource and resource version identifiers for the syndication request.

One can call the POST /syndication/syndicate endpoint with all the credentials and URLs but without specifying any resource or version identifier. The server will automatically determine - based on the set of existing downstream resources - if there are any new resource versions available for syndication.

To check whether there are any updates available, there is an endpoint that can be called:

Retrieve a list of resource versions which are available for syndication

GET https://downstream-snow-owl-url/snowowl/syndication/list

Returns the full list of resource versions to be syndicated based on the search criteria. If no filters are provided updates are calculated for all existing resources.

Query Parameters

Name

Type

Description

resource

List<String>

The resource identifier(s) to syndicate, e.g. SNOMEDCT (== latest version)

version

List<String>

The version identifier(s) to syndicate, e.g. SNOMEDCT/2022-07-31

upstreamUrl*

String

The URL of the upstream Snow Owl server

upstreamToken*

String

The token to authenticate with the upstream Snow Owl server

limit*

int

The number of resource versions to return if there are any

{
    "items": [
        {
            "id": "SNOMED-CT/2022-01-31",
            "version": "2022-01-31",
            "description": "2022-01-31",
            "effectiveTime": "2022-01-31",
            "resource": "codesystems/SNOMED-CT"
        },
        {
            "id": "SNOMED-CT/2022-07-31",
            "version": "2022-07-31",
            "description": "2022-07-31",
            "effectiveTime": "2022-07-31",
            "resource": "codesystems/SNOMED-CT"
        }
    ]
    "limit": 50,
    "total": 2
}

If there are any updates this endpoint will return a list of versions, if there are none it will return an empty result.

Content syndication

In version v8.9.0 a new content syndication feature has been introduced which allows data to be seamlessly moved between different Snow Owl Terminology Server deployments.

Configure upstream

To be able to access the upstream server and its content the following items are required:

the HTTP port of Elasticsearch has to be accessible for the downstream Snow Owl and Elasticsearch instances (configured via the http.port property, the default is 9200)
the REST API of Snow Owl has to be accessible for the downstream Snow Owl servers
an Elasticsearch API key with sufficient privileges for authentication and authorization
a Snow Owl API key with sufficient privileges for authentication and authorization
configure selected terminology resources as distributable

Access Elasticsearch

docker-compose.yml

...
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:${ELASTICSEARCH_VERSION}
    container_name: elasticsearch
...
    ports:
      - "9200:9200"

When opening up a self-hosted Elasticsearch make sure to use strengthened security with secure HTTP and username/password access.

A detailed guide on Elasticsearch security can be found here:

In the case of a hosted Elasticsearch instance there is nothing to do, it will already be accessible from outside.

Access Snow Owl

The default reverse proxy configuration (shipped in the released package) exposes the Snow Owl REST API via the URL: http(s)://upstream-snow-owl-url/snowowl

Other than that no additional configuration is needed.

Obtain an Elasticsearch API key

Creating a new API key for Elasticsearch is either possible through its Api Key API or - in the case of a hosted instance - from within Kibana.

The content syndication operation requires the following permissions:

cluster privilege: monitor
index privilege: read

Here is an example request body for the Api Key API:

POST /_security/api_key
{
  "name": "syndication-api-key",
  "expiration": "30d",
  "role_descriptors": { 
    "syndicate-role": {
      "cluster": [
        "monitor"
      ],
      "indices": [
        {
          "names": [
            "*"
          ],
          "privileges": [
            "read"
          ]
        }
      ]
    }
  }
}

This request will return with the following response:

{
  "id" : "<token_id>",
  "name" : "syndication-api-key",
  "expiration" : 0,
  "api_key" : "<api_key>",
  "encoded" : "<encoded_api_key>"
}

Take note of the encoded API Key, which is the one that will be used later on.

To obtain an API key using Kibana, follow this guide with the same settings from above:

Obtain a Snow Owl API Key

To request an API key from the upstream Snow Owl Terminology Server the following REST API endpoint must be used:

To request an API key

POST https://upstream-snow-owl-url/snowowl/token

Request Body

Name

Type

Description

username*

String

The username to authenticate with

password*

String

The password belonging to the username

token

String

Previous token to re-new

expiration

String

Expiration interval, e.g. 1d or 2h

permissions

List<String>

List of permissions

{
    token: "<snow-owl-api-key>"
}

Select distributable resources

All three major terminology resource types can be configured as distributable. Resources have a settings map that can be updated via their specific REST API endpoints:

PUT /codesystems/{codeSystemId}
PUT /valuesets/{valueSetId}
PUT /conceptmaps/{conceptMapId}

A setting called distributable has to be set which either could be true or false. Here is an example update request to mark 'Example Code System' as distributable:

PUT /codesystems/example_codesystem_id
{
  "settings": {
    "distributable": true
  }
}

Configure downstream

Elasticsearch

There is one configuration property that must be set before provisioning a new downstream Snow Owl Terminology Server.

The property is called reindex.remote.whitelist :

elasticsearch.yml

...
http.port: 9200
...
reindex.remote.whitelist: ["upstream-elasticsearch-url.com:9200", "other-upstream-elasticsearch-url.com:9200"]

The whitelisted URL must contain the upstream HTTP port and must not contain the scheme.

Provision a new downstream server

Provisioning a new downstream server has the following prerequisites:

start with an empty dataset
collect all terminology resource identifiers that need to be syndicated
get all the necessary credentials to communicate with upstream
initiate the resource syndication and verify the result

Collect terminology resources for syndication

To populate a downstream server with terminology resources via an upstream source, one must collect the required resource identifiers or resource version identifiers beforehand.

Resource identifiers must be in their simple form, e.g.:

SNOMED-CT
ICD-10
LOINC

Resource version identifiers must be in the following form <resource_id>/<version_id>, e.g.:

SNOMED-CT/2020-01-31
ICD-10/v2019
LOINC/v2.72

Retrieve syndication resource feed

GET https://upstream-snow-owl-url/snowowl/syndication/feed.xml

Retrieves the feed of all distributable resources

Query Parameters

Name

Type

Description

resource

List<String>

The resource identifier(s) to include in the feed

resourceType

List<String>

The types of resources to include in the feed (e.g. conceptmaps, valuesets, codesystems)

resourceUrl

List<String>

The URLs of the resources to include in the feed

packageTypes

List<String>

The types of packages to include in the feed. Only BINARY is supported at the moment

effectiveTime

String

The effective time value to match (yyyyMMdd) or an effective time range value to match (yyyyMMdd...yyyyMMdd), inclusive range

createdAt

Long

Exact match filter for the resource version created at field

createdAtFrom

Long

Greater than equal to filter for the resource version created at field

createdAtTo

String

Less than equal to filter for the resource version created at field

limit*

int

The maximum number of items to return

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <id>urn:uuid:ddce3cd6-2efe-3142-9cce-62e73d3031ca</id>
  <title>Snow Owl® Terminology Server Syndication Feed</title>
  ...
  <entry>
    <id>valuesets/1234/V1.0</id>
    ...
    <title>Valueset example</title>
    <category term="BINARY" scheme="https://b2ihealthcare.com/snow-owl/syndication/binary/1.0.0" label="Binary index"/>
    ...
  </entry>
</feed>

It is not required to list all resource version identifiers for an already selected resource. E.g.:

If SNOMED-CT is selected as a resource, it is not required to select all its versions among the version resource identifiers.
If a specific version is selected (SNOMED-CT/2020-01-31) and the resource is not listed among the selected resources, then only versions created until 2020-01-31 will be syndicated

Syndicate resources

To kick off a syndication process the following parameters are required:

the list of resource identifiers
the list of resource version identifiers
the upstream Snow Owl URL without its REST API root context:
- e.g. https://upstream-snow-owl-url.com
the API key to authenticate with the upstream Snow Owl server
the upstream Elasticsearch URL, including the scheme and port:
- e.g. https://upstream-elasticsearch-url.com:9200
the API key to authenticate with the upstream Elasticsearch

When there are no existing resources on the downstream server yet, at least one resource identifier or one resource version identifier must be selected.

The above parameters should be fed to the following downstream Snow Owl REST API endpoint:

Syndicate resource(s)

POST https://downstream-snow-owl-url/snowowl/syndication/syndicate

Syndicate resources from a remote Snow Owl instance. In case no resource identifiers are provided, all existing resources will be syndicated to their latest version.

Request Body

Name

Type

Description

resource

List<String>

List of resource identifiers

version

List<String>

List of version resource identifiers

upstreamUrl*

String

The URL of the upstream Snow Owl

upstreamToken*

String

API key for the upstream Snow Owl

upstreamDataUrl*

String

The URL of the upstream Elasticsearch

upstreamDataToken*

String

API key for the upstream Elasticsearch

The syndication process starts in the background as an asynchronous job. It can be tracked by calling the following endpoint using the job identifier returned in the Location:

Retrieve syndication job

GET https://downstream-snow-owl-url/snowowl/syndication/{id}

Returns the specified syndication run's configuration and status.

Path Parameters

Name

Type

Description

id*

String

The identifier of a syndication run

{
    // Response
}

The returned result object will contain all information related to the given syndication run:

status of the run (RUNNING, FINISHED, FAILED)
list of successfully syndicated resource versions
additional details about created or updated Elasticsearch indices

Examples of resource selection

Code Systems

{
  "resource": "SNOMED-CT-US",
  "version": "SNOMED-CT/2021-01-31"
}

This will syndicate all versions of SNOMED-CT-US and all international versions until 2021-01-31.

If the configuration is changed to:

{
  "resource": "SNOMED-CT-US, SNOMED-CT"
  "version": ""
}

This will syndicate all versions of SNOMED-CT-US and SNOMED-CT international, including all international versions even after 2021-01-31.

Value Sets

There is a Value Set with an identifier of VS and members from SNOMED-CT/2020-07-31:

{
  "resource": "VS"
  "version": "SNOMED-CT/2020-07-31"
}

Concept Maps

There is a Concept Map with an identifier of CM mapping concepts between LOINC/v2.72 and ICD-10/v2019:

{
  "resource": "CM"
  "version": "LOINC/v2.72, ICD-10/v2019"
}

Keeping a downstream server up-to-date

To check whether there are any updates available, there is an endpoint that can be called:

Retrieve a list of resource versions which are available for syndication

GET https://downstream-snow-owl-url/snowowl/syndication/list

Returns the full list of resource versions to be syndicated based on the search criteria. If no filters are provided updates are calculated for all existing resources.

Query Parameters

Name

Type

Description

resource

List<String>

The resource identifier(s) to syndicate, e.g. SNOMEDCT (== latest version)

version

List<String>

The version identifier(s) to syndicate, e.g. SNOMEDCT/2022-07-31

upstreamUrl*

String

The URL of the upstream Snow Owl server

upstreamToken*

String

The token to authenticate with the upstream Snow Owl server

limit*

int

The number of resource versions to return if there are any

{
    "items": [
        {
            "id": "SNOMED-CT/2022-01-31",
            "version": "2022-01-31",
            "description": "2022-01-31",
            "effectiveTime": "2022-01-31",
            "resource": "codesystems/SNOMED-CT"
        },
        {
            "id": "SNOMED-CT/2022-07-31",
            "version": "2022-07-31",
            "description": "2022-07-31",
            "effectiveTime": "2022-07-31",
            "resource": "codesystems/SNOMED-CT"
        }
    ]
    "limit": 50,
    "total": 2
}

If there are any updates this endpoint will return a list of versions, if there are none it will return an empty result.