Concourse CI on Kubernetes (GKE), Part 6: Concourse & Vault: Backup & Restore

Recreating the Cluster

We want to recreate our cluster while preserving our Vault and Concourse data (we want to recreate our GKE regional cluster as a zonal cluster to take advantage of the GKE free tier which saves us $74.40 per month).

Note: when we say, “recreate the cluster”, we really mean, “recreate the cluster”. We destroy the old cluster, including our worker nodes and persistent volumes.

Backup Vault

In the following example, our storage path is /vault/data, but there’s a chance that yours is different. If it is, replace occurrences of /vault/data with your storage path:

 # Find the storage path
kubectl exec -it vault-0 -n vault -- cat /tmp/storageconfig.hcl # look for storage.path, e.g. "/vault/data"
 # We need to do this in two steps because Vault's tar is BusyBox's, not GNU's
kubectl exec -it -n vault vault-0 -- tar czf /tmp/vault_bkup.tgz /vault/data
 # We encode it in base64 to avoid "tar: Damaged tar archive"
kubectl exec -it -n vault vault-0 -- base64 /tmp/vault_bkup.tgz > ~/Downloads/vault_bkup.tgz.base64

Note: this backup is very specific to our configuration; if your configuration is different (e.g. Consul storage backend, Disaster Recovery Replication enabled), then refer to the Vault documentation.

Check that the backup is valid (that the .tar file isn’t corrupted):

base64 -d < ~/Downloads/vault_bkup.tgz.base64 | tar tvf -

Backup Concourse CI’s Database

Note: the name of our Helm Concourse CI release is “ci-nono-io” (its URL is https://ci.nono.io). Remember: when you see it, substitute the name of your Helm Concourse CI release. You’ll see the release name in pod names (“ci-nono-io-postgresql-0”), secrets (“ci-nono-io-postgresql”), etc. Similarly, our Helm Vault release’s name is “vault”. Yours is probably the same.

Now we move onto backup up Concourse. In the command below, our Concourse CI’s PostgreSQL’s pod’s name is ci-nono-io-postgresql-0. Substitute your pod’s name.

 # the postgres user "concourse"'s password is "concourse"
echo concourse | \
  kubectl exec -it ci-nono-io-postgresql-0 -- \
  pg_dump -Fc -U concourse concourse \
  > ~/Downloads/concourse.dump

Check that the backup is valid. The following command should complete without errors:

pg_restore -l ~/Downloads/concourse.dump

Recreate the Cluster

We burn our cluster to the ground & recreate it from scratch:

terraform destroy
...

Restore Vault

We’ve deployed Vault (helm install vault hashicorp/vault ....), now let’s restore our old vault’s data:

 # Confirm the storage path
kubectl exec -it vault-0 -n vault -- cat /tmp/storageconfig.hcl # look for storage.path, e.g. "/vault/data"
kubectl exec -it vault-0 -n vault -- sh -c "rm -r /vault/data/*"
kubectl exec -it -n vault vault-0 -- \
  sh -c "base64 -d | tar xzvf -" < ~/Downloads/vault_bkup.tgz.base64

At this point we can browse to our vault (ours is https://vault.nono.io) and unseal it. Our original unsealing keys should work. We log in using our root token and browse to make sure our secrets are there.

Restore Concourse

We’ve deployed Concourse (helm install ci-nono-io concourse/concourse ...), but haven’t logged in or configured any pipelines. The install is pristine. Let’s restore the database. First, let’s get the postgres database’s postgres user’s password. Our secret is ci-nono-io-postgresql; substitute yours appropriately in the following command:

kubectl get secret ci-nono-io-postgresql -o json \
  | jq -r '.data."postgresql-postgres-password"' \
  | base64 -d

In our case, the postgres user’s password is “uywhz4bUkJ”.

Our PostgreSQL pod’s name is ci-nono-io-postgresql-0; substitute yours in the command below:

kubectl cp ~/Downloads/concourse.dump ci-nono-io-postgresql-0:/tmp/concourse.dump
kubectl exec -it ci-nono-io-postgresql-0 -- bash
psql --username=postgres # enter the password from above

Now let’s drop the old database. Note that we must first disallow additional database connections and terminate existing database connections before dropping the database:

UPDATE pg_database SET datallowconn = false WHERE datname = 'concourse';
SELECT pg_terminate_backend (pid)
    FROM pg_stat_activity
    WHERE pg_stat_activity.datname = 'concourse';
DROP DATABASE concourse;
CREATE DATABASE concourse;
EXIT;

Use the postgres password again when prompted below:

pg_restore --dbname=concourse --username=postgres /tmp/concourse.dump

Now browse to your Concourse CI. Try logging in. Try kicking off a build. If your resources are having trouble, yielding messages such as “run check: find or create container on worker … disappeared from worker”, wait a few minutes. It should clear up on its own.

Troubleshooting

If you uninstall Concourse CI, helm uninstall ..., remember to delete any lingering Persistent Volume Claims (pvc), kubectl delete pvc ..., before reinstalling, lest your postgresql-postgres-password secret becomes out-of-sync with the actual password.

If, when triggering a Concourse job that depends on a Vault secret, the job aborts with the error failed to interpolate task config: undefined vars:, then you probably forgot to --set secrets.vaultAuthParam=... when helm install ci-nono-io concourse/concourse .... Fix by running helm upgrade ... --set secrets.vaultAuthParam=...

References

How to Backup PostgreSQL Database for Concourse CI. We have mixed feelings about this post: on one hand, they do a nice description of backing up the Concourse CI database; on the other hand, they never bother restoring the database to make sure their procedure is correct (spoiler: it isn’t).

Recreating the Cluster#

Backup Vault#

Backup Concourse CI’s Database#

Recreate the Cluster#

Restore Vault#

Restore Concourse#

Troubleshooting#

References#