How Yipit Deploys From Github With Multiple Private Repos

Here at Yipit we love using Github. It is a great way to manage our public and private repos and hand off the grunt work of git management. Even better is that we get to use it with Chef to deploy code to our servers on Amazon EC2.

It is a pretty straightforward process for us to start a server and get repository access:

1. Start a new server with the Knife command.

2. Generate a new SSH key

3. Reach out to the Github API and register that SSH key as a Deploy Key with our repository.

TIP: When registering your key register it with a name that can easily be read by humans and machines like yipit_prod_web1 (i-1234abcd). This will make it easier to manage them in code or from the web interface.

Great! Now everything works perfectly. You can easily deploy to your machines with fabric, Chef, or on the command line. It doesn’t even have to be kicked off by a developer so you can do it from a central deploy server.

Cool, we have a few other apps under active development that we want to pull from Github so lets add our key to a few more private repos and we can just …errr…ummm

1
2
3
4
5
6
7
8
9
10
11
{
  "errors": [
    {
      "resource": "PublicKey",
      "message": "key is already in use",
      "code": "custom",
      "field": "key"
    }
  ],
  "message": "Validation Failed"
}

Well, that’s not good. How do we get access to multiple private repos then? Let’s ask Github Help

The Github Way(s)

1. SSH Agent Forwarding

Do deploys from your local machine by forwarding your SSH credentials when logging in to each server. Works for code rollouts via fabric from a developers machine, but not if you want to automate your deployments.

2. Deploy Keys

The method discussed above but we know that Deploy Keys are globally unique across your repositories. Github makes sure to note the downsides of deploy keys as well. Any machine with a deploy key has full access to your source control and the keys will not have a passphrase.

3. Machine Users

Give each machine a user account on Github and authenticate as if they are a person. Not a very automatable or scalable solution.

The Better Way

The options Github provides don’t seem to work very well for our requirements:

  • Automated (no human interaction)

  • Each machine should be able to access multiple private repositories.

We get close with Deploy Keys but we are limited to a single repository per SSH key. The solution? Give each machine multiple SSH keys.

Building a better Deploy Key

Using multiple SSH keys can get messy, fast. This means we will want to build an abstraction around it so we don’t directly interface with the complexity (enter Chef or your own homegrown solution). But first we should explain what we are going to do inside our magical abstraction.

1. Creating a new SSH Key

Now we need to figure out how to automatically create new SSH keys and use them when interacting with git.

We can easily create new SSH keys and add them to ~/.ssh/ with the proper permissions.

1
ssh-keygen -b 4096 -t rsa -f /home/${USER}/.ssh/${REPO}_rsa -P ""

This will create a new 4096 bit RSA public/private keypair with a custom name so that we don’t overwrite our default keys.

NOTE: We are creating this key without a passphrase since our automation will not have the ability to ask for human input.

2. API Access

To automate this process we will need to interact with the Github API instead of using the web dashboard. There are some great Github API libraries out there that we could use, but for now we want to keep it simple. Simple as in Bash:

1
2
3
4
5
6
SSH_KEY = "/home/${USER}/${KEY_NAME}.pub"

# Trim extraneous trailing lines
SSH_KEY=$(cat ${SSH_KEY} | tr -d '\n')

curl -d "{\"title\": \"${KEY_NAME}\", \"key\": \"${SSH_KEY}\"}" -H "Authorization: token ${GITHUB_API_KEY}" https://api.github.com/repos/${ORG_NAME}/${REPO}/keys

Chef Note: We use a Ruby version of this script inside a custom LWRP as an interface that can be used across recipes. We swallow errors from trying to add the same key to the same repo although the Right Way(TM) is to remember if you have added the key already and just skip the step on subsequent runs.

Now we can easily add new keys to any repo we control. This is a good start, but it doesn’t solve the issues with using multiple SSH keys.

3. Making Git Behave

We need Git to use these new keys. If you are using SSH keys with Git, it will default to your id_rsa keypair (or DSA if you prefer). If you know the server name you will be connecting to you can specify a key in your ssh_config file, but when contacting Github all the servers look the same, regardless of repository. We need a different solution.

Git allows you to specify a custom script to run when contacting a remote system (git-fetch or git-push). From the git documentation:

1
2
3
4
5
6
7
GIT_SSH

If this environment variable is set then git fetch and git push will use this command instead of ssh when they need to connect to a remote system. The $GIT_SSH command will be given exactly two arguments: the username@host (or just host) from the URL and the shell command to execute on that remote system.

To pass options to the program that you want to list in GIT_SSH you will need to wrap the program and options into a shell script, then set GIT_SSH to refer to the shell script.

Usually it is easier to configure any desired options through your personal .ssh/config file. Please consult your ssh documentation for further details.

Unfortunately this isn’t very explicit about what you really need to do, but it is pretty straightforward once you have an example.

Our script will look like this:

1
2
#!/bin/bash
/usr/bin/env ssh -q -2 -o "StrictHostKeyChecking=yes" -i "/home/my_user/.ssh/my_repo_rsa" $1 $2

We are specifying four custom options:

  • -q: Quiet. We prefer that our SSH connections aren’t extremely verbose when we run them from Chef.

  • -2: Force SSHv2. Version 1 has issues and is only included for backwards compatability. Github is on top of their updates (and they were founded after SSHv2 was already standard) so we disable the ability to degrade to SSHv1.

  • -o "StrictHostKeyChecking=yes": We want to ensure SSH is forcing the Remote Host Key validation since it will help prevent MITM attacks. This can be tricky since we need to make sure we have a solid ~/.ssh/known_hosts file. Read more on this option here. To retrieve Github’s server fingerprint we can run:

1
2
3
4
5
# Get the key and output it with the server address hashed
ssh-keyscan -H github.com

# Get the key and output it with the server address in plaintext
ssh-keyscan github.com

Be aware that this command could also be affected by MITM attacks so it is best to validate this out of band (multiple locations, different ISPs etc) before copying it into your known_hosts file. We will use Chef templates to manage placing this on our servers as we migrate to strict key checking. You should have a good way to update this if Github switches keys since it will break your rollouts.

  • -i /path/to/key: This option allows you to specify a private key file to use when connecting with SSH. This is what allows us to map keys to repositories on Github, sidestepping the uniqueness constraint on RSA keys across repositories.

We place this script somewhere safe (alongside our ssh keys works for us) and make sure it is executable.

1
2
3
4
5
6
7
8
$ ll ~/.ssh/
-rw-r--r-- user user authorized_keys
-rw-rw-r-- user user known_hosts
-rw------- user root id_rsa
-rw-r--r-- user root id_rsa.pub
-rw------- user root my_repo_rsa
-rw-r--r-- user root my_repo_rsa.pub
-rwx------ user user my_repo_ssh_wrapper.sh

Once we have this script setup, whenever we are using git to interact with my_repo we need to set GIT_SSH.

A simple script to update a repository from the command line:

1
2
3
4
5
6
7
8
9
#!/bin/bash

cd my_repo

# Set GIT_SSH for this terminal session
export GIT_SSH=~/.ssh/my_repo_ssh_wrapper.sh

# Run our git commands
git pull

4. Automation

Now we know how to add keys to Github, create new keys, and force git to use these new keys. To automate this we just need to stitch these pieces together. In our case we use Chef and this is done very simply:

  • Create a new SSH key

  • Use our LWRP to add this key to Github via the API

  • Template our shell script for use with GIT_SSH

  • Use the Git deploy provider to download the repo and sync changes during Chef runs. Thankfully the provider has support for using GIT_SSH commands. A sample git block

1
2
3
4
5
6
7
8
9
git "update #{my_repo}" do
  user username
  group groupname
  repository "git@github.com:Yipit/#{my_repo}.git"
  reference branch
  destination "/var/www/#{my_repo}"
  ssh_wrapper "/home/#{username}/.ssh/#{my_repo}_ssh_wrapper.sh"
  action :sync
end

An additional LWRP could be built around the first 3 steps to make it very simple to use.

TIP: We should switch to using git clone with the depth parameter specified so that we waste less time on our initial checkout when we have no need for detailed history on the machines.

5. Cleaning up old keys

Well, we are all set now aren’t we? Not quite, we still have the issue of cleaning up old keys from dead machines. We can do this through the web console with some painful window swapping to check which servers still exist, but this sucks. Don’t worry, there is an easy fix. We just need a little more automation:

  • Contact the AWS EC2 API and get a list of all of our staging and production instance-ids

  • Contact the Github API and get a list of all of the deploy key names for a our repository

  • Get a list of every instance ID that is in a deploy key name that is not in the AWS instance-id list (Remember when we said to make a server name that can be easily read by both man and machine?).

  • Delete the deploy keys that contain these extraneous instance-ids

This script is repeatable so you can have it run regularly off a cron job and/or at the tail end of your instance shutdown code so you don’t have to worry about these keys floating about.

6. The Future

Deploying from Github for our large applications will not always be a good solution as we grow. We are investigating alternative methods for dissociating source control distribution from our release process.

Andrew Gross is a Developer at Yipit

Comments