Content syndication
Last updated
Last updated
With content syndication, data can be seamlessly moved between different Snow Owl Terminology Server deployments.
This functionality is useful when content created in a central deployment (upstream) needs to be distributed to one or more read-only downstream instances. The resource distribution is designed to be uni-directional and semi-automated where an actor has to configure any new downstream instances to be able to receive data from the central unit.
To be able to access the upstream server and its content the following items are required:
the HTTP port of Elasticsearch has to be accessible for the downstream Snow Owl and Elasticsearch instances (configured via the http.port
property, the default is 9200
)
the REST API of Snow Owl has to be accessible for the downstream Snow Owl servers
an Elasticsearch API key with sufficient privileges for authentication and authorization
a Snow Owl API key with sufficient privileges for authentication and authorization
configure selected terminology resources as distributable
In case Snow Owl uses a self-hosted Elasticsearch instance the HTTP port can be opened by modifying the container settings in the docker-compose.yml
file. Make sure to remove the localhost IP prefix from the port declaration:
When opening up a self-hosted Elasticsearch make sure to use strengthened security with secure HTTP and username/password access.
A detailed guide on Elasticsearch security can be found here.
In the case of a hosted Elasticsearch instance there is nothing to do, it will already be accessible from outside.
The default reverse proxy configuration (shipped in the released package) exposes the Snow Owl REST API via the URL: http(s)://upstream-snow-owl-url/snowowl
Other than that no additional configuration is needed.
Creating a new API key for Elasticsearch is either possible through its Api Key API or - in the case of a hosted instance - from within Kibana.
The content syndication operation requires the following permissions:
cluster privilege: monitor
index privilege: read
Here is an example request body for the Api Key API:
This request will return with the following response:
Take note of the encoded API Key, which is the one that will be used later on.
To obtain an API key using Kibana, follow this guide with the same settings from above.
To request an API key from the upstream Snow Owl Terminology Server the following REST API endpoint must be used:
POST
https://upstream-snow-owl-url/snowowl/token
username*
String
The username to authenticate with
password*
String
The password belonging to the username
token
String
Previous token to re-new
expiration
String
Expiration interval, e.g. 1d or 2h
permissions
List<String>
List of permissions
All three major terminology resource types can be configured as distributable. Resources have a settings map that can be updated via their specific REST API endpoints:
PUT /codesystems/{codeSystemId}
PUT /valuesets/{valueSetId}
PUT /conceptmaps/{conceptMapId}
A setting called distributable
has to be set with a value of either true
or false
. Here is an example update request to make the 'Example Code System' distributable:
There is one configuration property that must be set before provisioning a new downstream Snow Owl Terminology Server.
Any potential upstream Elasticsearch instance must be listed as an allowed source of information for the downstream Elasticsearch instances via a configuration parameter in the elasticsearch.yml
file.
The property is called reindex.remote.whitelist
:
The whitelisted URL must contain the upstream HTTP port and must not contain the scheme.
Provisioning a new downstream server has the following prerequisites:
start with an empty dataset
collect all terminology resource identifiers that need to be syndicated
get all the necessary credentials to communicate with upstream
initiate the resource syndication and verify the result
To populate a downstream server with terminology resources via an upstream source, one must collect the required resource identifiers or resource version identifiers beforehand.
Resource identifiers must be in their simple form, e.g.:
SNOMED-CT
ICD-10
LOINC
Resource version identifiers must be in the following form: <resource_id>/<version_id>
, e.g.:
SNOMED-CT/2020-01-31
ICD-10/v2019
LOINC/v2.72
To determine which resources are available for syndication, the following upstream REST API endpoint can be used. It returns an atom feed that consists of resource versions from where one can collect the required identifiers.
GET
https://upstream-snow-owl-url/snowowl/syndication/feed.xml
Retrieves the feed of all distributable resources
resource
List<String>
The resource identifier(s) to include in the feed
resourceType
List<String>
The types of resources to include in the feed (e.g. conceptmaps, valuesets, codesystems)
resourceUrl
List<String>
The URLs of the resources to include in the feed
packageTypes
List<String>
The types of packages to include in the feed. Only BINARY is supported at the moment
effectiveTime
String
The effective time value to match (yyyyMMdd) or an effective time range value to match (yyyyMMdd...yyyyMMdd), inclusive range
createdAt
Long
Exact match filter for the resource version created at field
createdAtFrom
Long
Greater than equal to filter for the resource version created at field
createdAtTo
String
Less than equal to filter for the resource version created at field
limit*
int
The maximum number of items to return
It is not required to list all resource version identifiers for an already selected resource. E.g.:
If SNOMED-CT
is selected as a resource, it is not required to select all its versions among the version resource identifiers.
If a specific version is selected (SNOMED-CT/2020-01-31
) and the resource is not listed among the selected resources, then only versions created until 2020-01-31 will be syndicated
To kick off a syndication process the following parameters are required:
the list of resource identifiers
the list of resource version identifiers
the upstream Snow Owl URL without its REST API root context:
e.g. https://upstream-snow-owl-url.com
the API key to authenticate with the upstream Snow Owl server
the upstream Elasticsearch URL, including the scheme and port:
e.g. https://upstream-elasticsearch-url.com:9200
the API key to authenticate with the upstream Elasticsearch
When there are no existing resources on the downstream server yet, at least one resource identifier or one resource version identifier must be selected.
Snow Owl will resolve all resource dependencies and will handle syndication requests rigorously. If e.g. a Value Set depends on a specific SNOMED CT version and that version is not among the selected resources - or does not exist on the downstream server yet - the syndication run will fail to note that there is a missing dependency. It is always required to list all dependencies that the selected resources have for a given syndication run.
The above parameters should be fed to the following downstream Snow Owl REST API endpoint:
POST
https://downstream-snow-owl-url/snowowl/syndication/syndicate
Syndicate resources from a remote Snow Owl instance. In case no resource identifiers are provided, all existing resources will be syndicated to their latest version.
resource
List<String>
List of resource identifiers
version
List<String>
List of version resource identifiers
upstreamUrl*
String
The URL of the upstream Snow Owl
upstreamToken*
String
API key for the upstream Snow Owl
upstreamDataUrl*
String
The URL of the upstream Elasticsearch
upstreamDataToken*
String
API key for the upstream Elasticsearch
The syndication process starts in the background as an asynchronous job. It can be tracked by calling the following endpoint using the job identifier returned in the Location:
GET
https://downstream-snow-owl-url/snowowl/syndication/{id}
Returns the specified syndication run's configuration and status.
id*
String
The identifier of a syndication run
The returned result object will contain all information related to the given syndication run:
status of the run (RUNNING
, FINISHED
, FAILED
)
list of successfully syndicated resource versions
additional details about created or updated Elasticsearch indices
There is a need to syndicate the SNOMED-CT US extension. It depends on the SNOMED CT International version 2021-01-31. Provide the following resource identifier and resource version identifier configuration:
This will syndicate all versions of SNOMED-CT-US and all international versions until 2021-01-31.
If the configuration is changed to:
This will syndicate all versions of SNOMED-CT-US and SNOMED-CT international, including all international versions even after 2021-01-31.
There is a Value Set with an identifier of VS
and members from SNOMED-CT/2020-07-31
:
There is a Concept Map with an identifier of CM
mapping concepts between LOINC/v2.72
and ICD-10/v2019
:
If a given downstream server already contains the desired resources and the goal is to keep the content up-to-date, it is not required to fill in the resource and resource version identifiers for the syndication request.
One can call the POST /syndication/syndicate
endpoint with all the credentials and URLs but without specifying any resource or version identifier. The server will automatically determine - based on the set of existing downstream resources - if there are any new resource versions available for syndication.
To check whether there are any updates available, there is an endpoint that can be called:
GET
https://downstream-snow-owl-url/snowowl/syndication/list
Returns the full list of resource versions to be syndicated based on the search criteria. If no filters are provided updates are calculated for all existing resources.
resource
List<String>
The resource identifier(s) to syndicate, e.g. SNOMEDCT (== latest version)
version
List<String>
The version identifier(s) to syndicate, e.g. SNOMEDCT/2022-07-31
upstreamUrl*
String
The URL of the upstream Snow Owl server
upstreamToken*
String
The token to authenticate with the upstream Snow Owl server
limit*
int
The number of resource versions to return if there are any
If there are any updates this endpoint will return a list of versions, if there are none it will return an empty result.