Why You Need a Git Pre-Commit Hook and Why Most Are Wrong

A pre-commit hook is a piece of code that runs before every commit and determines whether or not the commit should be accepted. Think of it as the gatekeeper to your codebase.

Want to ensure you didn’t accidentally leave any PDBs in your code? Pre-commit hook. Want to make sure your javascript is JSHint approved? Pre-commit hook. Want to guarantee clean, readable PEP8-compliant code? Pre-commit hook. Want to pipe all of the comments in your codebase through Strunk & White? Please don’t.

The pre-commit hook is just an executable file that runs before every commit. If it exits with zero status, the commit is accepted. If it exits with a non-zero status, the commit is rejected. (Note: A pre-commit hook can be bypassed by passing the --no-verify argument.)

Along with the pre-commit hook there are numerous other git hooks that are available: post-commit, post-merge, pre-receive, and others that can be found here.

Why Most Pre-Commit Hooks are Wrong Be wary of the above’s example as the majority of pre-commit hooks you’ll see on the web are wrong. Most test against whatever files are currently on disk, not what is in the staging area (the files actually being committed).

We avoid this in our hook by stashing all changes that are not part of the staging area before running our checks and then popping the changes afterwards. This is very important because a file could be fine on disk while the changes that are being committed are wrong.

The code below is the pre-commit hook we use at Yipit. Our hook is simply a set of checks to be run against any files that have been modified in this commit. Each check can be configured to include/exclude particular types of files. It is designed for a Django environment, but should be adaptable to other environments with minor changes. Note that you need git 1.7.7+

To use this hook or a hook that you create yourself, simply copy the file to .git/hooks/pre-commit inside of your project and make sure that it is executable or add in to your git repo and setup a symlink.

Steve Pulec is a Developer at Yipit.

To find out about future posts, you can follow along using:

How Yipit Deploys Django

If you’re managing your own servers, and you don’t use a tool like Chef, you’re crazy. It’s just that simple.

We’ve been using Chef here at Yipit for about 6 months, and when I think about provisioning a new server with our old load book, I cringe.

There were some pretty high upfront costs to learning Chef, especially since no one here had any real Ruby experience (Chef is written in Ruby), but the time we invested into getting set up with Chef has been 100% worth it.

I’m hoping this post will help people get up and running with Django and Chef as quickly as possible. Opscode has their own django-quick-start repository, but I think it’s too complex if you’re not familiar with Chef. This tutorial will cover:

  • Getting set up with Opscode, a hosted chef-server

  • Installing Ruby, Chef, and some knife plugins

  • Setting up your chef-repo and installing a Django/github quickstart cookbook

  • Using a Python script to deploy to ec2

If you don’t understand what any of those things mean, Opscode maintains a pretty good wiki.

Before We Start

This tutorial makes the following assumptions about your systems:

  • Locally, you run a Mac.

  • Remotely, you want to use Ubuntu on AWS

  • Your AWS account has security groups named “web” and “linux” that allow access on ports 22 and 80

  • The private key for the AWS ssh key you define in add_server.py (we’ll get to that later) is located in ~/.ssh/ on your local machine

  • Your Django app contains a pip requirements file at conf/external_apps.txt (and that file contains gunicorn)

  • Your Django app contains a gunicorn configuration file at conf/gunicorn/gunicorn.conf

  • You’re interested in learning about Chef and managing your own servers. Seriously. If you’re just looking for a simple deployment solution, use Heroku. If you want to manage and automate your own infrastructure, use Chef

If you’re not familiar with AWS security groups and key pairs, this tutorial should help. I’ve also provided a simple “Hello World” Django application that I use as an example.

Get the Gems

I like using homebrew for everything, so even though your mac comes with ruby…

brew install ruby
gem install -n /usr/local/bin/ chef
gem install knife-ec2
gem install knife-github-cookbooks

Set up a chef-repo

We’re going to be adding some python scripts to help us manage our servers, so I like to set up my chef-repo in a virtualenv. From wherever you keep your virtualenvs…

virtualenv chef-env
cd chef-env
git clone https://github.com/opscode/chef-repo.git
cd chef-repo

Now you have the barebones Chef repository distributed by opscode.

Get started with opscode and your knife.rb file

Next you’ll want to get setup with Opscode to manage your chef server. You can create an account at opscode.com.

After you’ve signed up, download your private key. You can download it by clicking on your username on the top right of the console, and then following the link to “get private key”.

Next you’ll need to set up an organization and download the organization validator key. You’ll also want to to download the knife.rb file that opscode will generate for you.

Lastly, throw it all in a .chef directory in your chef-reop.

You should also append this code to your knife.rb:

This changes the amount of time knife will wait for ec2 to spawn a new server for you. I’ve found that sometimes the default wait time isn’t long enough for ebs-backed instances.

Setting up your project

Next we need to set up chef to actually do something. Let’s start by installing a Django quick start cookbook I put together. This cookbook will install a django application behind nginx and gunicorn, which will be managed by supervisor. You should read through the commented code for an explanation of how it works.

knife cookbook github install Yipit/djangoquickstart-cookbook

This cookbook has a dependency on the python cookbook, so we’ll want to install that too.

knife cookbook site install python

Note that “site” version of the install command will handle dependencies automatically, while the github command will not. When you use “site”, however, you’re limited to cookbooks distributed by opscode.

The next thing we’ll want to do is define a role. In your chef-repo, put something like this in roles/web.rb:

Get Some Python In there

Knife has a pretty good plugin system, but it requires you to write Ruby. Writing Ruby is worth it in the cookbooks because it’s the only option, but for helper scripts, I just use python. Aside from being more comfortable with the language, familiarity with the libraries is a huge plus. Here’s a script similar to the one we use to add servers at Yipit:

Fill in your AWS keys and put this code into a file called add_server.py. This script requires boto and pychef, so from within your virtualenv:

pip install boto pychef

I keep this file in right in the chef repo.

The last thing we need to do before we get started is upload our roles and cookbooks to the chef server.

knife cookbook upload --all
knife role from file roles/web.rb

Now, if you run the add_server.py script, you should be able to bootstrap your own web server running your django application through gunicorn behind nginx. The add server script does restart the machine, so you may need to wait a minute or two.

Hopefully you can visit your new web server and see your application. If you have any issues, be sure to leave a comment and I’ll try my best to help you out.

Zach Smith is Technical Product Manager at Yipit.

To find out about future posts, you can follow along using

Extending Django Settings for the Real World

A basic Django installation keeps its global variables in a file called settings.py. This is perfect for simple deployment because it allows the developer to overwrite Django variables like INSTALLED_APPS or SESSION_ENGINE very easily. You simply update the variable like so:

SESSION_ENGINE = 'django.contrib.sessions.backends.cache'

From within the shell, you can see the result:

./manage.py shell
>>> from django.conf import settings
>>> settings.SESSION_ENGINE
'django.contrib.sessions.backends.cache'

Many people have two environments in which they work, and therefore a typical settings.py file will have something like this at the end:

try:
    from local_settings import *
except ImportError:
    pass

This overwrites variables from a file called local_settings.py, overriding any existing variables in the settings.py file. Try it. Add the import code above into your settings.py file and create a new file called local_settings.py in the same directory as the settings.py file and add this to it:

SESSION_ENGINE = 'django.contrib.sessions.backends.cached_db'

Now, if you enter the shell like you did above and request settings.SESSION_ENGINE, you’ll get ‘django.contrib.sessions.backends.cache’. This is very handy because, in a typical situation, you can have a settings.py file which works for all your environments and then have a local_settings.py file for each environment that overrides the variable values.

Problems with the Standard Settings files

Unfortunately, in this scenario, variables from the settings.py file cannot be interpreted in the local_settings.py file and therefore, you couldn’t do something like this:

INSTALLED_APPS += ('debug_toolbar',)

In this situation, you’ll get a NAME_ERROR in which INSTALLED_APPS is undefined rather than  (‘django.contrib.auth’,’debug_toolbar’,).

A Modest Proposal

What we do at Yipit is to put all of our variables in a settings directory:

settings/
__init__.py (where the variable for all environments are)
active.py (optional - defines the environment we're in - not under version control)
development.py (shared by all the development environments)
production.py (live site)

This allows us to create an init.py file for all the variables that are the same across all environments. The init.py file requires no imports (except whatever you may need from Python itself, or other libraries). Then, each file imports from init.py in the way you might imagine:

production.py:

from settings import *
#Alter or add production specific variables

development.py:

from settings import *
#Alter or add development specific variables

active.py:

from settings.development import *
#This file denotes which environment we're in.
#This active.py file creates a development environment

Note: If you’re not that familiar with Python, ‘from settings’ accesses settings/init.py.

In more complex scenarios, you may also want to inherit settings from files other than setting/init.py, and this system fully supports that option. For example, you may have a settings/staging.py files that pulls from settings/init.py and then settings/development.py could pull from staging. It’s really up to you.

This approach has some shortcomings, notably that you can’t dynamically change variables - but that’s really not the point of settings. Now, you can change variables on a per-environment basis like this (in, say, development.py):

INSTALLED_APPS += ('debug_toolbar',)

Which will set INSTALLED_APPS as (‘django.contrib.auth’,’debug_toolbar’,). Here is our manage.py file:

#!/usr/bin/env python
import sys
import traceback
from os.path import abspath, dirname, join

from django.core.management import execute_manager

SETTINGS_ACTIVE_CONTENTS = "\033[1;32mfrom settings.local import *\033[1;33m"
try:
    from settings import active as settings
except ImportError, e:
    print '\033[1;33m'
    print "Apparently you don't have the file settings/active.py yet."
    print "Create it containing '%s'\033[0m" % SETTINGS_ACTIVE_CONTENTS
    print
    print "=" * 20
    print "original traceback:"
    print "=" * 20
    print
    traceback.print_exc(e)
    sys.exit(1)

sys.path.insert(0, abspath(join(dirname(__file__), "../")))
sys.path.insert(0, join(settings.PROJECT_ROOT, "apps"))

if __name__ == "__main__":
    execute_manager(settings)

Final Thoughts

If we make further changes to our settings configuration, we’ll do a follow-up post. Some modifications we are considering:

  • Using Chef to hold many of the systemwide configuration parameters (usernames, machine addresses, etc…) in order to move that information away from the application layer and onto the environment layer.

  • Creating an additional settings file that imports active.py for calculated settings. For example, if a read replica database has not been declared, but the application expects one, have the default database act as the read replica.

If you have a different way of handling settings, we would love to hear from you in the comments below.

Adam Nelson is CTO at Yipit.

To find out about future posts, you can follow along using

Announcing the Yipit Django Blog: Sharing Everything We’re Learning

Yipit has been using Django since its inception nearly two years ago. If it wasn’t for a modern web framework like Django, we probably wouldn’t exist.

Two years and hundreds of thousands of users later, we’ve gotten deep into the core of Django and many of the contributed packages.

Our goal now is to truly master Django.

While one can learn a lot by doing, we tend to believe that you can learn even more by teaching what you’ve learned.

Accordingly, we’re going to be spending significant time sharing what we’ve learned and are learning about using and scaling Django.

Topics We Will Cover

Some of the topics we will be regularly discussing on our new Yipit Django blog include:

  • System configurations: Using Chef and Fabric to deploy our Github-hosted code to Ubuntu servers on Amazon ec2. Running Gunicorn as the application server.

  • Database Migrations: South handling model changes and generating appropriate migrations.

  • Data Storage: A combination of MySQL, Memcache, and MongoDB for storing our data

  • Queueing Tasks: Celery for queueing asynchronous processes. Amazon’s SQS to manage the underlying queue.

  • Templating: Inheritance and custom inclusion tags.

  • Testing: Automated testing tools including Nose, Splinter, and Lettuce

We hope that by sharing what we’re learning, not only will we get feedback to help us improve our current techniques, but we’ll learn about ways to do things better from members of the community.

If all goes well, we’ll be able to meaningfully give back to the Django community that has given us so much.

And, if all goes really well, hopefully we’ll encourage more startups and big companies to share lessons they’ve learned scaling and expanding their own infrastructure.

You can follow along on: