Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The technology stack behind the Terminology Server consists of the following components:
The Terminology Server application
Elasticsearch as the data layer
Optional: Authentication/Authorization service
Either an OpenID Connect/OAuth2.0 compatible external service with JSON Web Token support
Or an LDAP-compliant directory service
Optional: A reverse proxy handling the requests towards the REST API
Outgoing communication from the Terminology Server goes via:
HTTP(s) towards Elasticsearch and to the external OpenID Connect/OAuth2 authorization server
LDAP(s) towards the A&A service
Incoming communication is handled through the HTTP port 8080.
A selected reverse proxy channels all incoming traffic through to the Terminology Server.
Elasticsearch versions supported by each major version of Snow Owl:
The Elasticsearch cluster can either be:
a co-located, single-node, self-hosted cluster
a managed Elasticsearch cluster hosted by elastic.co
Having a co-located Elasticsearch service next to the Terminology Server directly impacts the hardware requirements. See our list of recommended hardware on the next page.
For authorization and authentication, the application supports external OpenID Connect/OAuth2 compatible authorization services (eg. Auth0) and any traditional LDAP Directory Servers. We recommend starting with OpenLDAP and evolving to other solutions later because it is easy to set up and maintain while keeping Snow Owl's user data isolated from any other A&A services.
A reverse proxy, such as NGINX is recommended to be utilized between the Terminology Server and either the intranet or the internet. This will increase security and help with channeling REST API requests appropriately.
With a preconfigured domain name and DNS record, the default installation package can take care of requesting and maintaining the necessary certificates for secure HTTP. See the details of this in the Configuration section.
For simplifying the initial setup process we are shipping the Terminology Server with a default configuration of a co-located Elasticsearch cluster, a pre-populated OpenLDAP server, and an NGINX reverse proxy with the ability to opt-in for an SSL certificate.
Snow Owl 7.x | Snow Owl 8.x | Snow Owl 9.x | |
---|---|---|---|
Elasticsearch 7.x
Elasticsearch 8.x
(deprecated)
For installations where Snow Owl and Elasticsearch are co-located, we recommend the following hardware specifications:
For installations where Snow Owl connects to a managed Elasticsearch cluster at elastic.co we recommend the following hardware specifications:
In case Snow Owl is planned to be used with resource-intensive workloads (large code system upgrades, frequent classification of terminologies, bulk authoring) an 8 vCPU / 4 GB Elasticsearch cluster might not be sufficient. Consider increasing the size of the hosted Elasticsearch instance gradually, so that finding the sweet spot will be straightforward.
Here are a few examples of which Virtual Machine types could be used for hosting the Terminology Server at the three most popular Cloud providers (including but not limited to):
Snow Owl & co-located ES | Cloud | Dedicated |
---|---|---|
Snow Owl | Cloud | Dedicated |
---|---|---|
Elasticsearch @ elastic.co | |
---|---|
Cloud Provider | VM type |
---|---|
vCPU
8
8
Memory
32 GB
32 GB
I/O performance
>= 5000 IOPS SSD
>= 5000 IOPS SSD
Disk space
200 GB
200 GB
vCPU
8 (compute optimized)
8
Memory
16 GB
16 GB
I/O performance
OS: balanced disk
TS file storage: local SSD
OS: HDD / SSD
TS file storage: SSD
Disk space
OS: 20 GB
TS file storage: 100 GB
OS: 20 GB
TS file storage: 100 GB
vCPU
8 (compute optimized)
Memory
4 GB
I/O performance
handled by elastic.co
Disk space
180 GB
GCP
AWS
Azure
Here is the list of distributions that we suggest in the order of recommendation:
Ubuntu LTS releases
Debian LTS releases
CentOS 7 (deprecated)
It is possible to install the server release package on other distributions but bear in mind that there might be limitations.
Before starting the production deployment of the Terminology Server make sure that the following packages are installed and configured properly:
Docker Engine
ability to execute bash scripts
In case a reverse proxy is used, the Terminology Server requires two ports to be opened either towards the intranet or the internet (depending on usage):
http
: port 80
https
: port 443
In case there is no reverse proxy installed, the following port must be opened to be able to access the server's REST API:
http
: port 8080
The Terminology Server is recommended to be installed on x86_64 / amd64 Linux operating systems where Docker Engine is available. See by Docker.
The release package contains everything that is required to use a co-located Elasticsearch instance by default. Only proceed with these steps if a remote Elasticsearch cluster is required.
To configure the Terminology Server to work with a managed Elasticsearch cluster two settings require attention.
First, the local Elasticsearch container and all its configurations should be removed from the docker-compose.yml file. Once that is done, we have to tell the Terminology Server where to find the cluster. This can be set in the file ./snow-owl/docker/configs/snowowl/snowowl.yml
:
The Snow Owl Terminology Server leverages Elasticssearch's synonym filters. To have this feature work properly with a managed Elasticsearch cluster our custom dictionary has to be uploaded and configured. The synonym file can be found in the release package under ./snow-owl/docker/configs/elasticsearch/synonym.txt
. This file needs to be compressed as an zip
archive by following this structure:
For the managed Elasticsearch instance this zip file needs to be configured as a bundle extension. The steps required are covered in this guide in great detail.
Once the bundle is configured and the cluster is up we can (re)start the docker stack. In case there are any troubles the Terminology Server will refuse to initialize and let you know what the problem is in its log files.
Terminology Server releases are shared with customers through custom download URLs. The downloaded artifact is a Linux (tar.gz) archive that contains:
an initial folder structure
the configuration files for all services
a docker-compose.yml
file that brings together the entire technology stack to run and manage the service
the credentials required to pull our proprietary docker images
As a best practice, it is advised to extract the content of the archive under /opt
. So the deployment folder will be /opt/snow-owl
. The docker-compose setup will rely on this path, however, if required it can be changed by editing the ./snow-owl/docker/.env
file later on (see DEPLOYMENT_FOLDER
environment variable).
When decompressing the archive it is important to use the --same-owner
and --preserve-permissions
options so the docker containers can access the files and folders appropriately.
The next page will describe the content of the release package in more detail.
In certain cases, a pre-built dataset is also shipped together with the Terminology Server. This is to ease the initial setup procedure and get going fast.
This method is only applicable to deployments where the Elasticsearch cluster is co-located with the Terminology Server.
To load data into a managed Elasticsearch cluster, there are several options:
use snapshot-restore
use Snow Owl to rebuild the data to the remote cluster
These datasets are the compressed form of the Elasticsearch data folder which follows the same structure. Except for having a top folder called indexes
. This is the same folder as in ./snow-owl/resources/indexes
. So to be able to load the dataset one should just extract the contents of the dataset archive to this path.
Make sure to validate the file ownership of the indexes folder after decompression. Elasticsearch requires UID=1000 and GID=0 to be set for its data folder.
Having secure HTTP in case the Terminology Server is a public-facing instance is definitely a must. For such cases, we are providing a pre-configured environment and a convenience script to acquire the necessary SSL certificate.
SSL certificate retrieval and renewal are managed by certbot, the official ACME client recommended by Let's Encrypt.
To be able to obtain an SSL certificate the following requirements must be met:
docker and docker compose are installed
the server instance has a public IP address
a DNS A record is configured for the desired domain name routing to the server's IP address
For the sake of example let's say the target domain name is snow-owl.b2ihealthcare.com
.
Go to the sub-folder called ./snow-owl/docker/configs/cert
. Make sure the init-certificate.sh
script has permission to be executable and get some details about its parameters:
As you can see -d
is used for specifying the domain name, and -e
is used for specifying a contact email address (optional). Now execute the script with our example parameters:
Script execution will overwrite the files under ./snow-owl/docker/docker-compose.yml
and ./snow-owl/docker/configs/nginx/nginx.conf
. Make a note of any changes if required.
After successful execution, a new folder is created ./snow-owl/cert
which contains all the certificate files required by NGINX. The docker-compose.yml file is also amended with a piece of code that guarantees automatic renewal of the certificate:
At this point everything is prepared for having secure HTTP, let's see what else needs to be configured before spinning up the service.
When a new Snow Owl Terminology Server release is available we recommend performing the following steps.
New releases are going to be distributed the same way: a docker stack and its configuration within an archive.
It is advised to decompress the new release files to a temporary folder and compare the contents of ./snow-owl/docker
.
The changes usually are restricted to version numbers in the .env
file. In such cases, it is equally acceptable to overwrite the contents of the ./snow-owl/docker
folder as is or cherry-pick the necessary modifications by hand.
Once the new version of the files is in place it is sufficient to just issue the following commands, an explicit stop of the service is not even required (in the folder ./snow-owl/docker
):
Do not usedocker compose restart
because it won't pick up any .yml or .env file changes. See the explanation in the official Docker guide.
Using the custom backup container it is possible to restore:
the Elasticsearch indices
the OpenLDAP database (if present)
To restore any of the data the following steps have to be performed:
stop Snow Owl, Elasticsearch, and the OpenLDAP containers (in the folder ./snow-owl/docker
):
(re)move the contents of the old Elasticsearch data folder:
restart the Elasticsearch container only (keep Snow Owl stopped):
use the backup container's terminal and execute the restore script:
without any parameters, if only the Elasticsearch indices have to be restored
with parameter -l
in case the Elasticsearch indices and the OpenLDAP database have to be restored at the same time
the script will list all available backups and prompts for selection:
enter the numerical identifier of the backup to restore and wait until the process finishes
exit the backup container and restart all containers:
In case only the contents of the OpenLDAP server have to be restored, it is sufficient to just extract the contents of the backup archive to ./snow-owl/ldap
and restart the container.
The Snow Owl Terminology Server employs two distinct methods for user management. The primary authentication and authorization service is the LDAP Directory Server, while a secondary option is a file-based database strictly utilized for administrative purposes. The following methods can be applied when granting or revoking user access.
This is only applicable to the default deployment setup where a co-located OpenLDAP server is used alongside the Terminology Server.
Apache Directory Studio is an open-source, free application. It is available for different platforms (Windows, macOS, and Linux).
The OpenLDAP server uses port 389 for communication. This is the port that needs to be tunneled through the SSH connection. Here is what the final configuration looks like in PuTTY:
Once the SSH tunnel works, it's time to set up our connection in Apache DS. Go to File -> New -> LDAP Connection and set the following:
Hit the "Check Network Parameter" button to verify the network connection.
Go to the next page of the wizard and provide your credentials. The default Bind DN and Bind password can be found in the Terminology Server release package under ./snow-owl/docker/.env
.
Hit the "Check Authentication" button to verify your credentials. Hit Finish to complete the setup procedure.
All users and groups should be browseable now through the LDAP Browser view:
To grant access to a new user an LDAP entry has to be created. Go to the LDAP Browse view and right-click on the organization node, then New -> New Entry:
It is the easiest to use an existing entry as a template:
Leave everything as is on the Object Classes page, then hit Next. Fill in the new user's credentials:
On the final page, double-click on the userPassword row and provide the user's password:
Hit Finish to add the user to the database.
Now we need to assign a role for the user. Before going forward, get ahold of the user's DN using the LDAP Browser view:
Select the desired role group in the Browser view and add a new attribute:
Select the attribute type uniqueMember
and hit Finish:
Paste the user's DN as the value of the attribute and hit Enter to make your changes permanent:
To revoke access the user has to be deleted from the list of users:
And also has to be removed from the role group:
To change either the first or last name, or the password of a user, just edit any of the attributes in the user editor:
There is a configuration file ./snow-owl/docker/configs/snowowl/users
that contains the list of users with their credentials encrypted. This method of authentication should be used for testing or internal purposes only, users added here will have elevated privileges.
To apply any changes made to the users
file the Terminology Server has to be restarted afterward.
To grant access the users
file has to be amended with the new user and its credentials. There are several ways to encrypt a password but here is one that is easy and available on most of the Linux variants. The package called htpasswd
has to be installed:
It will prompt for the password and will amend the file with the new user at the end.
Simply remove the user's line from the file and restart the service.
Full list of steps to perform before spinning up the service:
Extract the Terminology Server release archive to a folder. E.g. /opt/snow-owl
(Optional) Obtain an SSL certificate
Make sure a DNS A record is routed to the host's public IP address
Go into the folder ./snow-owl/docker/cert
Execute the ./init-certificate.sh
script:
(Optional) Configure access for managed Elasticsearch Cluster (elastic.co)
(Optional) Extract dataset to ./snow-owl/resources
where folder structure should look like ./snow-owl/resources/indexes/nodes/0
at the end.
Verify file ownership to be UID=1000 and GID=0:
Check any credentials or settings that need to be changed in ./snow-owl/docker/.env
Authenticate with our private docker registry while in the folder ./snow-owl/docker
:
Issue a pull (in folder ./snow-owl/docker
)
Spin up the service (in the folder ./snow-owl/docker
)
Verify that the REST API of the Terminology Server is available at:
With SSL: https://snow-owl.example.com/snowowl
Without SSL: http://hostname:8080/snowowl
Verify that the server and cluster status is GREEN by querying the following REST API endpoint:
With SSL:
Without SSL:
There are several ways to access and manage an OpenLDAP server, hereby we will only describe one of them, through the .
Before accessing the LDAP database there is one technical prerequisite to satisfy. The OpenLDAP server has to be accessible from the machine Apache Directory Studio is installed. The best and most secure way to achieve that is to set up an SSH tunnel. Follow to an article that describes how to configure an SSH tunnel using and Windows.
Remove the user's line from the file and regenerate the credentials according to the section.
Enjoy using the Snow Owl Terminology Server
This method is only applicable to deployments where the Elasticsearch cluster is co-located with the Snow Owl Terminology Server.
A managed Elasticsearch service will automatically configure a snapshot policy upon creation. See details here.
The Terminology Server release package contains a built-in solution to perform rolling and permanent data backups. The docker stack has a specialized container (called snow-owl-backup
) that is responsible for creating scheduled backups of:
the Elasticsearch indices
the OpenLDAP database (if present)
For the Elasticsearch indices, the backup container uses the Snapshot API. Snapshots are labeled in a predefined format with timestamps. E.g. snowowl-daily-20220324030001
The OpenLDAP database is backed up by compressing the contents of the folder under ./snow-owl/ldap
. Filenames are generated using the name of the corresponding Elasticsearch snapshot. E.g. snowowl-daily-20220324030001.tar.gz
.
Backup Window: when a backup operation is running the Terminology Server blocks all write operations on the Elasticsearch indices. This is to prevent data loss and have consistent backups.
Backup Duration: the very first backup of an Elasticsearch cluster takes a bit more time (depending on the size and I/O performance but between 20 minutes - 40 minutes), subsequent backups should take significantly less: 1 - 5 minutes.
Daily backups are rolling backups, scheduled, and cleaned up based on the settings specified in the ./snow-owl/docker/.env
file. Here is a summary of the important settings that could be changed.
To store backups redundantly it is advised to mount a remote file share to a local path on the host. By default, this folder is configured to be at ./snow-owl/backup
. It contains:
the snapshot files of the Elasticsearch cluster
the backup files of the OpenLDAP database
extra configuration files
Make sure the remote file share has enough free space to store around double the ./snow-owl/resources/indexes
folder.
Backup jobs are scheduled by cron, so cron expressions can be defined here to specify the time a daily backup should happen.
This is used to tell the backup container how many daily backups must be kept.
Let's say we have an external file share mounted to /mnt/external_folder
. There is a need to create daily backups after each working day, during the night at 2:00 am. Only the last two-weeks-worth of data should be kept (assuming 5 working days each week).
It is also possible to perform backups occasionally, e.g. before versioning an important SNOMED CT release or before a Terminology Server version upgrade. These backups are kept until manually removed.
To create such backups the following command needs to be executed using the backup container's terminal:
The script will create a snapshot backup of the Elasticsearch data with a label snowowl-my-backup-label-20220405030002
and an archive that contains the database of the OpenLDAP server with the name snowowl-my-backup-label-20220405030002.tar.gz
.
This section includes information on how to set up Snow Owl and get it running using standalone installation methods
Snow Owl is built using Java and requires at least Java 17 to run. Only Oracle’s Java and the OpenJDK are supported.
We recommend installing the latest version in the Java 17 release series. We recommend using a supported LTS version of Java.
The version of Java that Snow Owl will use can be configured by setting the JAVA_HOME environment variable.
Snow Owl is provided in the following package formats:
Package
Description
zip/tar.gz
The zip
and tar.gz
packages are suitable for installation on any system and are the easiest choice for getting started with Snow Owl on most systems.
Install Snow Owl with tar.gz
or zip
rpm
The rpm
package is suitable for installation on Red Hat, Centos, SLES, OpenSuSE and other RPM-based systems. RPMs may be downloaded from the Downloads section.
Install Snow Owl with RPM
deb
The deb
package is suitable for Debian, Ubuntu, and other Debian-based systems. Debian packages may be downloaded from the Downloads section.
Install Snow Owl with Debian Package
Snow Owl is provided as a .zip
and as a .tar.gz
package. These packages can be used to install Snow Owl on any system.
The latest stable version of Snow Owl can be found on the Snow Owl Releases page.
zip
packageThe .zip
archive for Snow Owl can be downloaded and installed as follows:
.tar.gz
packageThe .tar.gz
archive for Snow Owl can be downloaded and installed as follows:
Snow Owl can be started from the command line as follows:
By default, Snow Owl runs in the foreground, prints its logs to the standard output (stdout), and can be stopped by pressing Ctrl-C.
You can test that your instance is running by sending an HTTP request to Snow Owl's status endpoint:
which should give you a response like this:
You can send the Snow Owl process to the background using a combination of nohup
and the &
character:
Log messages can be found in the $SO_HOME/serviceability/logs/
directory.
To shut down Snow Owl, you can kill the process ID directly:
or using the provided shutdown script:
.zip
and .tar.gz
archives:The .zip
and .tar.gz
packages are entirely self-contained. All files and directories are, by default, contained within $SO_HOME
— the directory created when unpacking the archive.
This is very convenient because you don’t have to create any directories to start using Snow Owl, and uninstalling Snow Owl is as easy as removing the $SO_HOME
directory. However, it is advisable to change the default locations of the config directory, the data directory, and the logs directory so that you do not delete important data later on.
Type | Description | Default Location | Setting |
---|---|---|---|
home
Snow Owl home directory or $SO_HOME
Directory created by unpacking the archive
bin
Binary scripts including startup/shutdown to start/stop the instance
$SO_HOME/bin
conf
Configuration files including snowowl.yml
$SO_HOME/configuration
data
The location of the data files and resources.
$SO_HOME/resources
path.data
logs
Log files location.
$SO_HOME/serviceability/logs
This is only relevant if you are running Snow Owl with an embedded Elasticsearch and not connecting it to an existing cluster.
Snow Owl (with embedded Elasticsearch) uses a lot of file descriptors or file handles. Running out of file descriptors can be disastrous and will most probably lead to data loss. Make sure to increase the limit on the number of open files descriptors for the user running Snow Owl to 65,536 or higher.
For the .zip
and .tar.gz
packages, set ulimit -n 65536
as root before starting Snow Owl, or set nofile
to 65536
in /etc/security/limits.conf
.
RPM and Debian packages already default the maximum number of file descriptors to 65536
and do not require further configuration.
The Debian package for Snow Owl can be downloaded from the Downloads section. It can be used to install Snow Owl on any Debian-based system such as Debian and Ubuntu.
Use the update-rc.d command to configure Snow Owl to start automatically when the system boots up:
Snow Owl can be started and stopped using the service command:
If Snow Owl fails to start for any reason, it will print the reason for failure to STDOUT. Log files can be found in /var/log/snowowl/
.
To configure Snow Owl to start automatically when the system boots up, run the following commands:
Snow Owl can be started and stopped as follows:
These commands provide no feedback as to whether Snow Owl was started successfully or not. Instead, this information will be written in the log files located in /var/log/snowowl/
.
You can test that your Snow Owl instance is running by sending an HTTP request to:
which should give you a response something like this:
Snow Owl defaults to using /etc/snowowl
for runtime configuration. The ownership of this directory and all files in this directory are set to root:snowowl
on package installation and the directory has the setgid
flag set so that any files and subdirectories created under /etc/snowowl
are created with this ownership as well (e.g., if a keystore is created using the keystore tool). It is expected that this be maintained so that the Snow Owl process can read the files under this directory via the group permissions.
NOTE: Distributions that use systemd
require that system resource limits be configured via systemd
rather than via the /etc/sysconfig/snowowl
file.
The Debian package places config files, logs, and the data directory in the appropriate locations for a Debian-based system:
Snow Owl loads its configuration from the /etc/snowowl/snowowl.yml
file by default. The format of this config file is explained in .
Type | Description | Default Location | Setting |
---|
Snow Owl uses a number of thread pools for different types of operations. It is important that it is able to create new threads whenever needed. Make sure that the number of threads that the Snow Owl user can create is at least 4096
.
This can be done by setting ulimit -u 4096
as root before starting Snow Owl, or by setting nproc
to 4096
in /etc/security/limits.conf
.
The package distributions when run as services under systemd will configure the number of threads for the Snow Owl process automatically. No additional configuration is required.
home | Snow Owl home directory or |
|
bin | Binary scripts including startup/shutdown to start/stop the instance |
|
conf | Configuration files including |
|
data | The location of the data files and resources. | /var/lib/snowowl | path.data |
logs | Log files location. | /var/log/snowowl |
Snow Owl uses a mmapfs
directory by default to store its data. The default operating system limits on mmap counts is likely to be too low, which may result in out of memory exceptions.
On Linux, you can increase the limits by running the following command as root:
To set this value permanently, update the vm.max_map_count
setting in /etc/sysctl.conf
. To verify after rebooting, run sysctl vm.max_map_count
.
The RPM and Debian packages will configure this setting automatically. No further configuration is required.
By default, Snow Owl includes the OSS version of Elasticsearch and runs it in embedded mode to store terminology data and make it available for search. This is convenient for single-node environments (eg. for evaluation, testing and development), but it might not be sufficient when you go into production.
To configure Snow Owl to connect to an Elasticsearch cluster, change the clusterUrl
property in the snowowl.yml
configuration file:
The value for this setting should be a valid HTTP URL point to the HTTP API of your Elasticsearch cluster, which by default runs on port 9200
.
If you are using the .zip
or .tar.gz
archives, the data and logs directories are sub-folders of $SO_HOME
. If these important folders are left in their default locations, there is a high risk of them being deleted while upgrading Snow Owl to a new version.
In production use, you will almost certainly want to change the locations of the data and log folders.
The RPM and Debian distributions already use custom paths for data and logs.
To allow clients to connect to Snow Owl, make sure you open access to the following ports:
8080/TCP:: Used by Snow Owl Server's REST API for HTTP access
8443/TCP:: Used by Snow Owl Server's REST API for HTTPS access
2036/TCP:: Used by the Net4J binary protocol connecting Snow Owl clients to the server
By default, Snow Owl tells the JVM to use a heap with a minimum and maximum size of 2 GB. When moving to production, it is important to configure heap size to ensure that Snow Owl has enough heap available.
To configure the heap size settings, change the -Xms
and -Xmx
settings in the SO_JAVA_OPTS
environment variable.
The value for these setting depends on the amount of RAM available on your server and whether you are running Elasticsearch on the some node as Snow Owl (either embedded or as a service) or running it in its own cluster. Good rules of thumb are:
Set the minimum heap size (Xms
) and maximum heap size (Xmx
) to be equal to each other.
Too much heap can subject to long garbage collection pauses.
Set Xmx
to no more than 50% of your physical RAM, to ensure that there is enough physical RAM left for kernel file system caches.
Snow Owl connecting to a remote Elasticsearch cluster requires less memory, but make sure you still allocate enough for your use cases (classification, batch processing, etc.).
Snow Owl has three configuration files:
snowowl.yml
for configuring Snow Owl
serviceability.xml
for configuring Snow Owl logging
elasticsearch.yml
for configuring the underlying Elasticsearch instance in case of embedded deployments
These files are located in the config directory, whose default location depends on whether or not the installation is from an archive distribution (tar.gz
or zip
) or a package distribution (Debian or RPM packages).
For the archive distributions, the config directory location defaults to $SO_PATH_HOME/configuration
. The location of the config directory can be changed via the SO_PATH_CONF
environment variable as follows:
Alternatively, you can export the SO_PATH_CONF
environment variable via the command line or via your shell profile.
For the package distributions, the config directory location defaults to /etc/snowowl
. The location of the config directory can also be changed via the SO_PATH_CONF
environment variable, but note that setting this in your shell is not sufficient. Instead, this variable is sourced from /etc/default/snowowl
(for the Debian package) and /etc/sysconfig/snowowl
(for the RPM package). You will need to edit the SO_PATH_CONF=/etc/snowowl
entry in one of these files accordingly to change the config directory location.
The configuration format is YAML. Here is an example of changing the path of the data directory:
Settings can also be flattened as follows:
Environment variables referenced with the ${...}
notation within the configuration file will be replaced with the value of the environment variable, for instance:
An orderly shutdown of Snow Owl ensures that Snow Owl has a chance to clean up and close outstanding resources. For example, an instance that is shutdown in an orderly fashion will initiate an orderly shutdown of the embedded Elasticsearch instance, gracefully close and disconnect connections and perform other related cleanup activities. You can help ensure an orderly shutdown by properly stopping Snow Owl.
If you’re running Snow Owl as a service, you can stop Snow Owl via the service management functionality provided by your installation.
If you’re running Snow Owl directly, you can stop Snow Owl by sending Ctrl-C
if you’re running Snow Owl in the console, or by invoking the provided shutdown
script as follows:
Snow Owl uses SLF4J and Logback for logging.
The logging configuration file (serviceability.xml
) can be used to configure Snow Owl logging. The logging configuration file location depends on your installation method, by default it is located in the ${SO_HOME}/configuration
folder.
Extensive information on how to customize logging and all the supported appenders can be found on the Logback documentation.
You can configure security to communicate with a Lightweight Directory Access Protocol (LDAP) server to authenticate and authorize users.
To integrate with LDAP, you configure an ldap
realm in the snowowl.yml
configuration file.
The following configuration settings are supported:
The default configuration values are selected to support both OpenLDAP and Active Directory without needing to customize the default schema that comes with their default installation.
When users send their username and password with their request in the Authorization header, the LDAP security realm performs the following steps to authenticate the user:
Searches for a user entry in the configured baseDn
to get the DN
Authenticates with the LDAP instance using the received DN
and the provided password
If any of the above-mentioned steps fails for any reason, the user is not allowed to access the terminology server's content and the server will respond with HTTP 401 Unauthorized
.
To configure authentication, you need to configure the uri
, baseDn
, bindDn
, bindDnPassword
, userObjectClass
and userIdProperty
configuration settings.
To add a user in the LDAP realm, create an entry under the specified baseDn
using the configured userObjectClass
as class and the userIdProperty
as the property where the user's username/e-mail address is configured.
Example user entry:
On top of the authentication part, the LDAP realm provides configuration values to support full role-based access control and authorization.
When a user's request is successfully authenticated with the LDAP realm, Snow Owl authorizes the request using the user's currently set roles and permissions in the configured LDAP instance.
To add a role in the LDAP realm, create an entry under the specified baseDn
using the configured roleObjectClass
as class and the configured permissionProperty
and memberProperty
properties for permission and user mappings, respectively.
Example read-only role:
Configuration | Description |
---|---|
uri
The LDAP URI that points to the LDAP/AD server to connect to.
bindDn
The user's DN who has access to the entire baseDn
and roleBaseDn
and can read content from it.
bindDnPassword
The password of the bindDn
user.
baseDn
The base directory where all entries in the entire subtree will be considered as potential matches for all searches.
roleBaseDn
Alternative base directory where all role entries in the entire subtree will be considered. Defaults to the baseDn
value.
userFilter
The search filter to search for user entries under the configured baseDn
. Defaults to (objectClass={userObjectClass})
.
roleFilter
The search filter to search for role entries under the configured roleBaseDn
. Defaults to (objectClass={roleObjectClass})
.
userObjectClass
The user object's class to look for when searching for user entries. Defaults to inetOrgPerson
class.
roleObjectClass
The role object's class to look for when searching for role entries. Defaults to groupOfUniqueNames
class.
userIdProperty
The userId property to access and read for the user's unique identifier. Usually their username or email address. Defaults to uid
property.
permissionProperty
A multi-valued property that is used to store permission information on a role. Defaults to the description
property.
memberProperty
A multi-valued property that is used to store and retrieve user dn
s that belong to a given role. Defaults to the uniqueMember
property.
This guide contains all the necessary details for installing the Snow Owl Terminology Server in a production environment. The following sections will guide you on how to:
Here is the list of files and folders extracted from the release package and their role are described below.
Contains every configuration file used for the docker stack, including docker-compose.yml.
In Docker, this directory serves as the context, implying that when executing commands, one needs to either explicitly reference the configuration file or run docker compose commands directly within this directory.
E.g. to verify the status of the stack there are two approaches:
Execute the command inside ./snow-owl/docker
:
Execute the command from somewhere else then ./snow-owl/docker
:
This folder contains the files necessary to acquire an SSL certificate. None of the files should be changed here ideally.
There is one important file here, elasticsearch.yml
which can be used for fine-tuning the Elasticsearch cluster. However, this is not necessary by default, only if an advanced configuration is required.
This folder contains the files used upon the first start of the OpenLDAP server. The files within describe a set of groups and users to set up an initial user access model. User credentials for the test users can be found in the file called 200_users.ldif
.
Location of all configuration files for NGINX. By default, a non-secure HTTP configuration is assumed. If there is no need for an SSL certificate, then the files here will be used. If an SSL certificate was acquired, then the main configuration file of NGINX (nginx.conf
) will be overwritten with the one under /docker/cert/nginx.conf
.
snowowl.yml
: this file is the default configuration file of the Terminology Server. It does not need any changes by default either.
users
: list of users for file-based authentication. There is one default user called snowowl
for which the credentials can be found under ./docker/.env
.
The main configuration file for the docker stack. This file is replaced in case an SSL certificate was acquired (with file /docker/cert/docker-compose.yml
). This is where volumes, ports, or environment variables can be configured.
The credentials to use for authenticating with the B2i private docker registry.
The collection of environment variables for the docker-compose.yml file.
This is the file to configure most of the settings of the Terminology Server. Including java heap size, Snow Owl or Elasticsearch version, passwords, or folder structure.
The location where the OpenLDAP server stores its data.
Log files of the Terminology Server
Location of Elasticsearch and Snow Owl resources.
This directory serves as the data folder for Elasticsearch, where datasets should be extracted.
Snow Owl's local file storage. Import and export artifacts are stored here.
In case an SSL certificate is acquired, all the files used by certbot
and NGINX are stored here. This folder is automatically created by the certificate retrieval script.
This is the initial folder of all backup artifacts. This should be configured as a network mount to achieve data redundancy.
Some settings may require attention before moving to production. While the steps below may not necessitate any action, there are cases where the host running both Snow Owl and Elasticsearch will require fine-tuning.
Some system-level settings need to be checked before deploying your own Elasticsearch in production.
By default, Snow Owl runs inside the container as user snowowl
using uid:gid 1000:1000
.
If you are bind-mounting a local directory or file:
ensure it is readable by the user mentioned above
ensure that settings, data and log directories are writable as well
A good strategy is to grant group access to gid 1000
or 0
for the local directory.
In case the file system of the docker service on the host is different from what the Snow Owl deployment uses, it could be worthwhile to bind-mount Snow Owl's temporary working folder to a path that has excellent I/O performance. E.g.:
the root file system /
is backed by a block storage that purposefully has lower I/O performance, this is the file system used by the docker service.
the deployment folder /opt/snow-owl
is backed by a fast local SSD
The definition of the snowowl
service in the docker compose file should be amended like this:
Pro tip: in case the Terminology Server is deployed to the cloud, make sure this path is served by a fast SSD disk (local or ephemeral SSD is the best). This will make import or export processes even faster.
Following there is an extensive guide on what to verify, but usually, it comes down to these items:
Most operating systems try to use as much memory as possible for file system caches and eagerly swap out unused application memory. This can result in parts of the JVM heap or even its executable pages being swapped out to disk.
Swapping is very bad for performance, and should be avoided at all costs. It can cause garbage collections to last for minutes instead of milliseconds and can cause services to respond slowly or even time out.
There are two approaches to disabling swapping. The preferred option is to completely disable swap, but if this is not an option, you can minimize swappiness.
Usually Snow Owl is the only service running on a box, and its memory usage is controlled by the JVM options. There should be no need to have swap enabled.
On Linux systems, you can disable swap temporarily by running:
To disable it permanently, you will need to edit the /etc/fstab
file and comment out any lines that contain the word swap
.
Another option available on Linux systems is to ensure that the sysctl value vm.swappiness
is set to 1. This reduces the kernel’s tendency to swap and should not lead to swapping under normal circumstances, while still allowing the whole system to swap in emergency conditions.
The RPM for Snow Owl can be downloaded from the Downloads section. It can be used to install Snow Owl on any RPM-based system such as OpenSuSE, SLES, CentOS, Red Hat, and Oracle Enterprise.
RPM install is not supported on distributions with old versions of RPM, such as SLES 11 and CentOS 5. Please see Install Snow Owl with .zip or .tar.gz instead.
On systemd-based distributions, the installation scripts will attempt to set kernel parameters (e.g., vm.max_map_count
); you can skip this by masking the systemd-sysctl.service unit.
Use the chkconfig command to configure Snow Owl to start automatically when the system boots up:
Snow Owl can be started and stopped using the service command:
If Snow Owl fails to start for any reason, it will print the reason for failure to STDOUT. Log files can be found in /var/log/snowowl/
.
To configure Snow Owl to start automatically when the system boots up, run the following commands:
Snow Owl can be started and stopped as follows:
These commands provide no feedback as to whether Snow Owl was started successfully or not. Instead, this information will be written in the log files located in /var/log/snowowl/
.
You can test that your Snow Owl instance is running by sending an HTTP request to:
which should give you a response something like this:
Snow Owl defaults to using /etc/snowowl
for runtime configuration. The ownership of this directory and all files in this directory are set to root:snowowl
on package installation and the directory has the setgid
flag set so that any files and subdirectories created under /etc/snowowl
are created with this ownership as well (e.g., if a keystore is created using the keystore tool). It is expected that this be maintained so that the Snow Owl process can read the files under this directory via the group permissions.
Snow Owl loads its configuration from the /etc/snowowl/snowowl.yml
file by default. The format of this config file is explained in Configuring Snow Owl.
The RPM places config files, logs, and the data directory in the appropriate locations for an RPM-based system:
Ideally, Snow Owl should run alone on a server and use all of the resources available to it. To do so, you need to configure your operating system to allow the user running Snow Owl to access more resources than allowed by default.
The following settings must be considered before going to production:
Where to configure systems settings depends on which package you have used to install Snow Owl, and which operating system you are using.
When using the .zip
or .tar.gz
packages, system settings can be configured:
When using the RPM or Debian packages, most system settings are set in the system configuration file. However, systems that use systemd require that system limits are specified in a systemd configuration file.
On Linux systems, ulimit
can be used to change resource limits temporarily. Limits usually need to be set as root before switching to the user that will run Snow Owl. For example, to set the number of open file handles (ulimit -n
) to 65,536
, you can do the following:
The new limit is only applied during the current session.
You can consult all currently applied limits with ulimit -a
.
On Linux systems, persistent limits can be set for a particular user by editing the /etc/security/limits.conf
file. To set the maximum number of open files for the snowowl
user to 65,536
, add the following line to the limits.conf file:
This change will only take effect the next time the snowowl
user opens a new session.
When using the RPM or Debian packages, system settings and environment variables can be specified in the system configuration file, which is located in:
However, for systems that use systemd, system limits need to be specified via systemd.
When using the RPM or Debian packages on systems that use systemd, system limits must be specified via systemd.
The systemd service file (/usr/lib/systemd/system/snowowl.service) contains the limits that are applied by default.
To override them, add a file called /etc/systemd/system/snowowl.service.d/override.conf (alternatively, you may run sudo systemctl edit snowowl
which opens the file automatically inside your default editor). Set any changes in this file, such as:
Once finished, run the following command to reload units:
Type | Description | Default Location | Setting |
---|---|---|---|
temporarily with , or
permanently in .
home
Snow Owl home directory or $SO_HOME
/usr/share/snowowl
bin
Binary scripts including startup/shutdown to start/stop the instance
/usr/share/snowowl/bin
conf
Configuration files including snowowl.yml
/etc/snowowl
data
The location of the data files and resources.
/var/lib/snowowl
path.data
logs
Log files location.
/var/log/snowowl
Package | Location |
RPM | /etc/sysconfig/snowowl |
Debian | /etc/default/snowowl |
The method for starting Snow Owl varies depending on how you installed it.
If you installed Snow Owl with a .tar.gz
or zip
package, you can start Snow Owl from the command line.
Snow Owl can be started from the command line as follows:
By default, Snow Owl runs in the foreground, prints some of its logs to the standard output (stdout
), and can be stopped by pressing Ctrl-C
.
To run Snow Owl as a daemon, use the following command:
Log messages can be found in the $SO_HOME/serviceability/logs/
directory.
The startup scripts provided in the RPM and Debian packages take care of starting and stopping the Snow Owl process for you.
Snow Owl is not started automatically after installation. How to start and stop Snow Owl depends on whether your system uses SysV init
or systemd
(used by newer distributions). You can tell which is being used by running this command:
Use the chkconfig
command to configure Snow Owl to start automatically when the system boots up:
Snow Owl can be started and stopped using the service command:
If Snow Owl fails to start for any reason, it will print the reason for failure to STDOUT. Log files can be found in /var/log/snowowl/
.
To configure Snow Owl to start automatically when the system boots up, run the following commands:
Snow Owl can be started and stopped as follows:
These commands provide no feedback as to whether Snow Owl was started successfully or not. Instead, this information will be written in the log files located in /var/log/snowowl/
.
The preferred method of setting JVM options (including system properties and JVM flags) is via the SO_JAVA_OPTS
environment variable. For instance:
When using the RPM or Debian packages, SO_JAVA_OPTS
can be specified in the system configuration file.
By default, Snow Owl is starting and connecting to an embedded Elasticsearch
cluster available on http://localhost:9200
. This cluster has only a single node and its discovery method is set to single-node
, which means it is not able to connect to other Elasticsearch clusters and will be used exclusively by Snow Owl.
This single-node Elasticsearch cluster can easily serve Snow Owl in testing, evaluation and small authoring environments, but it is recommended to customize how Snow Owl connects to an Elasticsearch cluster in larger environments (especially when planning to scale with user demand).
You have two options to configure Elasticsearch used by Snow Owl.
The first option is to configure the underlying Elasticsearch instance by editing the configuration file elasticsearch.yml
, which depending on your installation is available in the configuration directory (you can create the file, if it is not available, Snow Owl will pick it up during the next startup).
The second option is to configure Snow Owl to use a remote Elasticsearch cluster without the embedded instance. To use this feature you need to set the repository.index.clusterUrl
configuration parameter to the remote address of your Elasticsearch cluster. When Snow Owl is configured to connect to a remote Elasticsearch cluster, it won't boot up the embedded instance, which reduces the memory requirements of Snow Owl.
You can manage and authenticate users with the built-in file realm. All the data about the users for the file realm is stored in the users
file. The file is located in SO_PATH_CONF
and is read on startup.
You need to explicitly select the file realm in the snowowl.yml
configuration file in order to use it for authentication.
In the above configuration the file realm is using the users
file to read your users from. Each row in the file represents a username and password delimited by :
character. The passwords are BCrypt encrypted hashes. The default users
file comes with a default snowowl
user with the default snowowl
password.
To simplify file realm configuration, the Snow Owl CLI comes with a command to add a user to the file realm (snowowl users add
). See the command help manual (-h
option) for further details.
Snow Owl security features enable you to easily secure your terminology server. You can password-protect your data as well as implement more advanced security measures such as role-based access control and auditing.
By default, Snow Owl comes without any security features enabled and all read and write operations are unprotected. To configure a security realm, you can choose from the following built-in identity providers:
NOTE: It is recommended in production environments that all communication between a client and Snow Owl is performed through a secure connection.
Snow Owl sends a HTTP 401 Unauthorized
response if a request needs to be authenticated.
If supported by the security realm, Snow Owl will also check whether an authenticated user is permitted to perform the requested action on a given resource.
Within an organization, roles are created for various job functions. The permissions to perform certain operations are assigned to specific roles. Members, staff, or other system users are assigned particular roles, and through those role assignments acquire the permissions needed to perform particular system functions. Since users are not assigned permissions directly, but only acquire them through their role (or roles), management of individual user rights becomes a matter of simply assigning appropriate roles to the user's account; this simplifies common operations, such as adding a user or changing a user's department.
Role assignment: A subject can exercise a permission only if the subject has selected or been assigned a role.
Permission authorization: A subject can exercise a permission only if the permission is authorized for the subject's active role.
With rules 1 and 2, it is ensured that users can exercise only permissions for which they are authorized.
S = Subject = A person or automated agent R = Role = Job function or title which defines an authority level P = Permissions = An approval of a mode of access to a resource
In Snow Owl a permission is a single value that represents both the operation the user would like to perform and the resource that is being accessed. The format is the following: <operation>:<resource>
Currently there are 7
operations supported by Snow Owl:
browse
- read the contents of a resource
edit
- write the contents of the resource, delete the resource
import
- import from external content and formats
export
- export to external content and formats
version
- create a version in a Code System, create a release
promote
- merge content from isolated branch environments to a Code System's development version
classify
- run classifiers and save their results
Resources represent the content that is being accessed by a client. A resource can be anything that can be resolved to a database entry. Currently, the following resource formats are allowed to be used in a permission:
<repositoryId>
- access the entire content available in a terminology repository
<repositoryId>/<branch>
- access the content available on a branch in a terminology repository
<codeSystemId>
- access all content of a Code System, including both the latest development and all previous releases
<codeSystemId>/<versionId>
- access a specific release of a Code System
There is a special *
wild card character that can be used for both the operation and resource parts in a permission value to allow any operation to be performed on any or selected resources, or to allow certain operations to be performed on any available resources.
Examples:
browse:snomedStore
- browse all SNOMED CT Code Systems and their content
edit:SNOMEDCT-UK-CL
- edit the SNOMEDCT-UK-CL
Code System
export:SNOMEDCT-US/2019-03-01
- export the 2019-03-01
US Extension release
*:SNOMEDCT
- allow any operations to be performed on the SNOMEDCT
Code System
browse:*
- allow read operations on all available resources
*:*
- administrator permission, the user can do anything with any of the available resources
The file security realm does NOT support the Authorization formats at the moment. If you are interested in configuring role-based access control for your users, it is recommended to switch to the .
After configuring at least one security realm, Snow Owl will authenticate all incoming requests to ensure that the sender of the request is allowed to access the terminology server and its contents. To authenticate a request, the client must send an HTTP Basic
or Bearer
Authorization header with the request. The value should be a user/pass pair in case of using Basic
authentication or a token generated by Snow Owl if using the Bearer
method.