Backup

This method is only applicable to deployments where the Elasticsearch cluster is co-located with the Snow Owl Authoring Platform.

A managed Elasticsearch service will automatically configure a snapshot policy upon creation. See details here.

The Authoring Platform release package contains a built-in solution to perform rolling and permanent data backups. The docker stack has a specialized container (called snow-owl-backup) that is responsible for creating scheduled backups of:

  • the Elasticsearch indices

  • the OpenLDAP database (if present)

  • Bugzilla's configuration files and SQL database

For the Elasticsearch indices, the backup container uses the Snapshot API. Snapshots are labeled in a predefined format with timestamps. E.g. snowowl-daily-20220324030001

The OpenLDAP database is backed up by compressing the contents of the folder under ./snow-owl/ldap. Filenames are generated using the name of the corresponding Elasticsearch snapshot. E.g. snowowl-daily-20220324030001.tar.gz.

Bugzilla is backed up by compressing the configuration files and the MySQL database dump. Filenames are generated using the name of the corresponding Elasticsearch snapshot. E.g. snowowl-daily-20220324030001.tar.gz.

Backup Window: when a backup operation is running the Terminology Server blocks all write operations on the Elasticsearch indices. This is to prevent data loss and have consistent backups.

Backup Duration: the very first backup of an Elasticsearch cluster takes a bit more time (depends on the size and I/O performance but between 20 minutes - 40 minutes), subsequent backups should take significantly less: 1 - 5 minutes.

Daily backups

Daily backups are rolling backups, scheduled, and cleaned up based on the settings specified in the ./snow-owl/docker/.env file. Here is a summary of the important settings that could be changed.

BACKUP_FOLDER

To store backups redundantly it is advised to mount a remote file share to a local path on the host. By default, this folder is configured to be at ./snow-owl/backup. It contains:

  • the snapshot files of the Elasticsearch cluster

  • the backup files of the OpenLDAP database

  • the backup files of Bugzilla

  • extra configuration files

Make sure the remote file share has enough free space to store around the double of the ./snow-owl/resources/indexes folder.

CRON_DAYS, CRON_HOURS, CRON_MINUTES

Backup jobs are scheduled by crond, so cron-expressions can be defined here to specify the time a daily backup should happen.

NUMBER_OF_DAILY_BACKUPS_TO_KEEP

This is used to tell the backup container how many daily backups must be kept.

Example daily backup config

Let's say we have an external file share mounted to /mnt/external_folder. There is a need to create daily backups after each working day, during the night at 2:00 am. Only the last two-weeks-worth of data should be kept (assuming 5 working days each week).

BACKUP_FOLDER=/mnt/external_folder
NUMBER_OF_DAILY_BACKUPS_TO_KEEP=10

CRON_DAYS=Tue-Sat
CRON_HOURS=2
CRON_MINUTES=0

One-off backups

It is also possible to perform backups occasionally, e.g. before versioning an important SNOMED CT release or before a Terminology Server version upgrade. These backups are kept until manually removed.

To create such backups the following command needs to be executed using the backup container's terminal:

root@host:/# docker exec -it backup bash
root@ad36cfb0448c:/# /backup/backup.sh -l my-backup-label

The script will create a snapshot backup of the Elasticsearch data with a label snowowl-my-backup-label-20220405030002,an archive that contains the database of the OpenLDAP server and an archive that contains the configuration and database of Bugzilla with the name snowowl-my-backup-label-20220405030002.tar.gz.

Last updated