Please refer to the official Curator install guide on how to install it on various operating systems.
In order to create backups for Snow Owl, you need a repository in your Elasticsearch cluster.
To create a repository (assuming shared file system repository, fs
), execute the following command:
Elasticsearch requires that the specified /path/to/shared/mount
is whitelisted in the path.repo
configuration setting in the elasticsearch.yml
configuration file. See section Shared file system repository of the Elasticsearch reference for details.
Curator requires a single configuration file to be specified when running it. If you are using a default Elasticsearch cluster with default configurations then the default Curator recommended file should be sufficient. Any configuration changes you have made to your Elasticsearch cluster needs to be changed here as well in this config file so Curator can access your cluster without any issues.
Example curator.yml
:
Curator is using action YML files to perform a set of actions sequentially. See the available steps here: https://www.elastic.co/guide/en/elasticsearch/client/curator/5.8/actions.html
A Snapshot Action that can be used to backup the content from a Snow Owl Terminology Server.
Example snowowl_snapshot.yml
file:
To execute a Snapshot action manually, you can use the following command:
A Restore Action that can be used to restore the latest snapshot (aka backup) to the Snow Owl Terminology Server.
Example snowowl_restore.yml
file:
To execute a Restore action manually, you can use the following command:
To schedule automated backups, you can use Cron on Unix-style operating systems to automate the job. The back up interval depends on your use case and how you are accessing the data. If you have a write-heavy scenario, we recommend a hourly backup interval, otherwise some value between hourly - daily is preferable.
An example crontab entry that initiates a daily backup at 03:00, and captures Curator's output to /var/log/backup.log
(both standard output and standard error) would look like this:
Snow Owl uses a single data source, an Elasticsearch cluster (either embedded or external). To backup and restore the data, we highly recommend the official Snapshot and Restore feature from Elasticsearch. On top of that API, we highly recommend using tools, like Curator to ease the lifecycle management of your Elasticsearch cluster and your indices. See Curator here.
Reminder: for production environment we highly recommend using an external Elasticsearch cluster as opposed to the embedded one. External Elasticsearch clusters are more customizable and can be configured to use other snapshot repository types, like Amazon S3, HDFS, etc.
Below you can find a very simple guide on how to configure the backup and restore process for your Snow Owl Terminology Server using Curator.