When you decide to start using Docker Universal Control Plane on a production setting, you should configure it for high availability.
The next step is creating a backup policy and disaster recovery plan.
Docker UCP nodes persist data using named volumes.
As part of your backup policy you should regularly create backups of the controller nodes. Since the nodes used for running user containers don’t persist data, you can decide not to create any backups for them.
To perform a backup of a UCP controller node, use the docker/ucp backup
command. This creates a tar archive with the contents of the volumes used by
UCP on that node, and streams it to stdout.
To create a consistent backup, the backup command temporarily stops the UCP containers running on the node where the backup is being performed. User containers and services are not affected by this.
To have minimal impact on your business, you should:
The example below shows how to create a backup of a UCP controller node:
# Create a backup, encrypt it, and store it on /tmp/backup.tar
$ docker run --rm -i --name ucp \
-v /var/run/docker.sock:/var/run/docker.sock \
docker/ucp backup --interactive \
--passphrase "secret" > /tmp/backup.tar
# Decrypt the backup and list its contents
$ gpg --decrypt /tmp/backup.tar | tar --list
The example below shows how to restore a UCP controller node from an existing backup:
$ docker run --rm -i --name ucp \
-v /var/run/docker.sock:/var/run/docker.sock \
docker/ucp restore --passphrase "secret" < backup.tar
The restore command can be used to create a new UCP cluster from a backup file. After the restore operation is complete, the following data will be copied from the backup file:
The restore operation may be performed against any Docker Engine, regardless of swarm membership, as long as the target Engine is not already managed by a UCP installation. If the Docker Engine is already part of a swarm, that swarm and all deployed containers and services will be managed by UCP after the restore operation completes.
As an example, if you have a cluster with three controller nodes, A, B, and C, and your most recent backup was of node A:
uninstall-ucp
operation.You should now have your UCP cluster up and running.
Additionally, in the event where half or more controller nodes are lost and cannot be recovered to a healthy state, the system can only be restored through the following disaster recovery procedure. It is important to note that this proceedure is not guaranteed to succeed with no loss of either swarm services or UCP configuration data:
docker swarm init
--force-new-cluster
. This will instantiate a new single-manager swarm by
recovering as much state as possible from the existing manager. This is a
disruptive operation and any existing tasks will be either terminated or
suspended.docker swarm leave --force
and then a docker swarm join
operation with the cluster’s new join-token.