How Yipit Deploys From Github With Multiple Private Repos
Here at Yipit we love using Github. It is a great way to manage our public and private repos and hand off the grunt work of git management. Even better is that we get to use it with Chef to deploy code to our servers on Amazon EC2.
It is a pretty straightforward process for us to start a server and get repository access:
1. Start a new server with the Knife command.
2. Generate a new SSH key
3. Reach out to the Github API and register that SSH key as a Deploy Key with our repository.
TIP: When registering your key register it with a name that can easily be read by humans and machines like
yipit_prod_web1 (i-1234abcd)
. This will make it easier to manage them in code or from the web interface.
Great! Now everything works perfectly. You can easily deploy to your machines with fabric, Chef, or on the command line. It doesn’t even have to be kicked off by a developer so you can do it from a central deploy server.
Cool, we have a few other apps under active development that we want to pull from Github so lets add our key to a few more private repos and we can just …errr…ummm
1 2 3 4 5 6 7 8 9 10 11 |
|
Well, that’s not good. How do we get access to multiple private repos then? Let’s ask Github Help
The Github Way(s)
1. SSH Agent Forwarding
Do deploys from your local machine by forwarding your SSH credentials when logging in to each server. Works for code rollouts via fabric from a developers machine, but not if you want to automate your deployments.
2. Deploy Keys
The method discussed above but we know that Deploy Keys are globally unique across your repositories. Github makes sure to note the downsides of deploy keys as well. Any machine with a deploy key has full access to your source control and the keys will not have a passphrase.
3. Machine Users
Give each machine a user account on Github and authenticate as if they are a person. Not a very automatable or scalable solution.
The Better Way
The options Github provides don’t seem to work very well for our requirements:
Automated (no human interaction)
Each machine should be able to access multiple private repositories.
We get close with Deploy Keys but we are limited to a single repository per SSH key. The solution? Give each machine multiple SSH keys.
Building a better Deploy Key
Using multiple SSH keys can get messy, fast. This means we will want to build an abstraction around it so we don’t directly interface with the complexity (enter Chef or your own homegrown solution). But first we should explain what we are going to do inside our magical abstraction.
1. Creating a new SSH Key
Now we need to figure out how to automatically create new SSH keys and use them when interacting with git.
We can easily create new SSH keys and add them to ~/.ssh/
with the proper permissions.
1
|
|
This will create a new 4096 bit RSA public/private keypair with a custom name so that we don’t overwrite our default keys.
NOTE: We are creating this key without a passphrase since our automation will not have the ability to ask for human input.
2. API Access
To automate this process we will need to interact with the Github API instead of using the web dashboard. There are some great Github API libraries out there that we could use, but for now we want to keep it simple. Simple as in Bash:
1 2 3 4 5 6 |
|
Chef Note: We use a Ruby version of this script inside a custom LWRP as an interface that can be used across recipes. We swallow errors from trying to add the same key to the same repo although the Right Way(TM) is to remember if you have added the key already and just skip the step on subsequent runs.
Now we can easily add new keys to any repo we control. This is a good start, but it doesn’t solve the issues with using multiple SSH keys.
3. Making Git Behave
We need Git to use these new keys. If you are using SSH keys with Git, it will default to your id_rsa keypair (or DSA if you prefer). If you know the server name you will be connecting to you can specify a key in your ssh_config
file, but when contacting Github all the servers look the same, regardless of repository. We need a different solution.
Git allows you to specify a custom script to run when contacting a remote system (git-fetch
or git-push
). From the git documentation:
1 2 3 4 5 6 7 |
|
Unfortunately this isn’t very explicit about what you really need to do, but it is pretty straightforward once you have an example.
Our script will look like this:
1 2 |
|
We are specifying four custom options:
-q
: Quiet. We prefer that our SSH connections aren’t extremely verbose when we run them from Chef.-2
: Force SSHv2. Version 1 has issues and is only included for backwards compatability. Github is on top of their updates (and they were founded after SSHv2 was already standard) so we disable the ability to degrade to SSHv1.-o "StrictHostKeyChecking=yes"
: We want to ensure SSH is forcing the Remote Host Key validation since it will help prevent MITM attacks. This can be tricky since we need to make sure we have a solid~/.ssh/known_hosts
file. Read more on this option here. To retrieve Github’s server fingerprint we can run:
1 2 3 4 5 |
|
Be aware that this command could also be affected by MITM attacks so it is best to validate this out of band (multiple locations, different ISPs etc) before copying it into your known_hosts
file. We will use Chef templates to manage placing this on our servers as we migrate to strict key checking. You should have a good way to update this if Github switches keys since it will break your rollouts.
-i /path/to/key
: This option allows you to specify a private key file to use when connecting with SSH. This is what allows us to map keys to repositories on Github, sidestepping the uniqueness constraint on RSA keys across repositories.
We place this script somewhere safe (alongside our ssh keys works for us) and make sure it is executable.
1 2 3 4 5 6 7 8 |
|
Once we have this script setup, whenever we are using git to interact with my_repo
we need to set GIT_SSH
.
A simple script to update a repository from the command line:
1 2 3 4 5 6 7 8 9 |
|
4. Automation
Now we know how to add keys to Github, create new keys, and force git to use these new keys. To automate this we just need to stitch these pieces together. In our case we use Chef and this is done very simply:
Create a new SSH key
Use our LWRP to add this key to Github via the API
Template our shell script for use with
GIT_SSH
Use the Git
deploy
provider to download the repo and sync changes during Chef runs. Thankfully the provider has support for using GIT_SSH commands. A samplegit
block
1 2 3 4 5 6 7 8 9 |
|
An additional LWRP could be built around the first 3 steps to make it very simple to use.
TIP: We should switch to using
git clone
with thedepth
parameter specified so that we waste less time on our initial checkout when we have no need for detailed history on the machines.
5. Cleaning up old keys
Well, we are all set now aren’t we? Not quite, we still have the issue of cleaning up old keys from dead machines. We can do this through the web console with some painful window swapping to check which servers still exist, but this sucks. Don’t worry, there is an easy fix. We just need a little more automation:
Contact the AWS EC2 API and get a list of all of our staging and production
instance-ids
Contact the Github API and get a list of all of the deploy key names for a our repository
Get a list of every instance ID that is in a deploy key name that is not in the AWS
instance-id
list (Remember when we said to make a server name that can be easily read by both man and machine?).Delete the deploy keys that contain these extraneous
instance-ids
This script is repeatable so you can have it run regularly off a cron job and/or at the tail end of your instance shutdown code so you don’t have to worry about these keys floating about.
6. The Future
Deploying from Github for our large applications will not always be a good solution as we grow. We are investigating alternative methods for dissociating source control distribution from our release process.
Andrew Gross is a Developer at Yipit