Yipit Django Blog

Django Tips and Best Practices

Pitfalls When Upgrading to Django 1.4

The Sting

The Players

The Yipit Django team just finished the process of upgrading to the 6 week old Django 1.4 and figured that we would share the process we went through and some gotchas we found.

The Set-Up

Our goal in this process was not to start utilizing new Django 1.4 features, although there are quite a few. Our goal was to get our system running and stable on Django 1.4. Over the next few weeks we’ll start utilizing some of the new features (QuerySet.prefetch_related, ModelAdmin.save_related(), and queryset bulk_create). This post focuses on what is needed to get running on 1.4 - even though we’re dying to use some of the new features.

The Hook

Upgrading a critical piece of your stack is something that should never be done with haste.

About a month ago, half of our developers upgraded their local setups to run 1.4 and we read through the release notes to find which areas of our codebase needed changing. (As an aside, the Django community does an excellent job with the release note and we highly recommend you looking through them).

A few weeks later, we upgraded our continuous integration and staging servers. Naturally, we wanted to run 1.4 on hardware that mimics our production environment and found a few more issues.

The Tale

Here were the issues we ran into that may apply to others:

  • Django admin static media moved locations from django/contrib/admin/media/ to django/contrib/admin/static/admin/. If you rely on this location at all, you’ll need to change it.
  • Serialization changes to datetime and time. Django serializers have changed the output they return for datetimes and times to potentially include timezones, milliseconds, or a different format. If you have any sort of API that relies on the built-in JSON or XML serializers(note that Django REST framework does), you will need to take action to prevent your API users from experiencing potential issues.
  • The MySQL GIS backend used to have a bug that returned 1/0 for booleans instead of True/False (our @zmsmith reported it!). The bug got fixed so it could mean another potential change to any APIs you expose.
  • django.contrib.sitemaps bug fix with potential performance implications. Requests to sitemaps used to return cached responses by default. They no longer do. All of our sitemaps already utilized caching, but you it’s always good to double-check.
  • The content type for email encodings changed from quoted-printable to 8bit encodings with this commit. This caused some issues in our emails that we thankfully caught during testing.
  • Various modules/methods moved(django.conf.urls.defaults, django.contrib.auth.models.check_password, etc). The large majority of code that was moved around can still be imported from the old location, but it’s in your interest to upgrade the import paths now so that you don’t need to worry about it when the old import path becomes deprecated in version 1.5

The Wire and The Shut-Out

Bleeding EdgeAfter doing a full walk-through on our staging systems, we made the jump on production. Naturally, a few small bugs revealed themselves once our users stressed some edge cases. With our alert and monitoring systems, we were able to fix the few bugs we found within a few minutes of our rollout song stopping.

The Sting

This was our experience upgrading to 1.4. I hope you can now avoid running into some of these issues. If you have run into your own issues, please share them in the comments below.

Steve Pulec is a Developer at Yipit.

15 Key Resources to Learn Django

Learning DjangoLearning Python and Django is easy if you have the right resources at your fingertips.

Coming from pretty much only having studied programming in school, I started working as a developer at Yipit with almost no web programming experience. In a little fewer than ten weeks, I’ve become comfortable navigating and making large changes to the whole Python/Django application code as well as contributing new features (interested in learning Django at Yipit? Join us!).

While the learning process wasn’t gut-wrenching, I certainly ran into some hefty roadblocks that would have been almost insurmountable had coworkers not pointed me to the resources below:

Comparing Python to Other Languages

If you know another programming language already, you can easily leverage that knowledge by finding out the differences between Python and that other language, and then by focusing in on learning those differences.

1. Python & Java - A Side by Side Comparison I knew enough Java already from college, and doing some quick reading online saved me a lot of time. I would highly recommend reading this article if you know Java already. It helped me form a mental road-map of what areas to focus on.

2. Also, the Python wiki has a good collection of comparisons of Python to other programming languages.

Python Style

Learning Python syntax is one thing, learning “Pythonic” style is quite another. Regardless of whether you end up coding like this, it’s at least a good idea to understand what is idiomatic – it can save you time when reading other people’s code.

3. Code Like A Pythonista by David Goodger– This article gives a good rundown of how to write readable and stylistic Python code that takes advantage of the dynamic features of the language.

4. Google Python Style Guide – Some good standard practices.

Advanced Python Features

Certain programming concepts in Python just don’t exist in Java, for instance closures and meta-programming, and they were a bit tough for me to pick up. The following articles gave me a good deal of insight into these concepts.

5. What Is A Metaclass in Python - A detailed explanation of a pretty tough topic in Python. Django uses quite a bit of metaclasses, and if you get a solid grasp on metaclasses, you’ll save yourself a lot of time going through models and forms.

6. Hidden Features of Python - A great compilation of answers on Stack Overflow.

7. A Few of My Favorite Python Things - An opinionated walk-through some of the cool features of Python

Python/Django Tutorials

For a more complete introduction, the following tutorials will bring you up to speed and are a good review if you know some programming already. The important thing is to give the exercises a try.

8. Google’s Python Class - A colleague of mine who also just recently started programming in Python recommends this resource as a quick way to jump into Python programming which has a great mix of video lectures and notes and practical examples.

9. Writing your first Django app on djangoprojects - This canonical tutorial on Django that lets you get your hands dirty and set up a Django project.

Best of Django Documentation

While reading documentation isn’t the most glamorous endeavor in the world, a few sections of Django’s documentation are worth reading for themselves.

10. Managers / Querysets - These sections give a good deal of insight into how Django interacts with relational databases, which seems quite magical. Just always be mindful of the actual queries you’re sending out or else they’ll come back to bite you hard.

11. Request-Response / User - If you’ve never worked with an HTTP request, these sections are very helpful as an introduction. Also, the notion of a User along with all the authentication that goes with it come free with Django. Lucky you.

12. Views / Templates -  A good introduction on how the application-level code connects to the client-side HTML/JS/CSS code.

Reading Code from Successful Django Sites

Everything said and done, there’s not a great substitute for reading code. I read through most of Yipit’s codebase to get a sense for how Django is used in a production environment. You might not have access to such a codebase, however, you can find a plethora of Django-powered sites that have open-sourced their codebase.

13. Django Sites - A great resource for access to dozens of sites that have their source open for you to browse through.

Python Tools

Having the right tools at your disposal can make your life a lot easier by saving you hours of going down the wrong path. However, especially when debugging, I’ve found that the best thing to have is a healthy skepticism for your own code.

14. IPython - Although not specific to Django, IPython is a huge improvement over the standard Python shell. With its tab complete feature, it saves me a few extra seconds of distraction whenever I can’t remember a function or module name.

15. PDB / iPDB - The Python debugger tool has been incredibly helpful to me when going through a new piece of code. It gives you the power to stop code at arbitrary points and inspect the variables at those points. iPDB gives you the power of PDB along with the features of iPython.

These fifteen resources really got me going with Django. And while I’m no Django pro just yet, I still think how amazing it is that I picked up so much knowledge in just a short span of time.

Mingwei Gu is a Developer at Yipit. @mwgu

How to Prevent Memory Bloat in Mongo

Going big with MongoDB

Feed Mongo!!

Several months ago at Yipit, we decided to cross the NoSQL rubicon and port a large portion of our data storage from MySQL over to MongoDB.  

One of the main drivers behind our move to Mongo was the composition of our data (namely, our recommendation engine system) which consists of loosely structured, denormalized objects best represented as a JSON-style documents. Here’s an example of a typical recommendation object.

How Key Expansion Cause Memory Bloat

Because any given recommendation can have a number of arbitrary nested attributes, Mongo’s “schemaless” style is much preferred to the fixed schema approach imposed by a relational database. 

The downside here, though, is that this structure produces extreme data duplication.  Whereas a MySQL column is stored only once for a given table, an equivalent JSON attribute is repeated for each document in a collection.

Why Memory Management in Mongo is Crucial

When your data set is sufficiently small, this redundancy is usually acceptable; however, once you begin to scale up, it becomes less palatable. At Yipit, an average key size of 100 Bytes per document, spread over roughly 65 million documents, adds somewhere between 7GB-10GB of data (factoring in indexes) without providing much value at all.

Mongo is so awesome, on good days, because it maps data files to memory.  Memory based reads and writes are super fast.  On the other hand, Mongo is absolutely not awesome once your working data set surpasses the memory capacity of the underlying box. At that point, painful page faults and locking issues ensue. Worse yet, if your indexes begin grow too large to remain in memory, you are in trouble (seriously, don’t let that happen).

Quick Tips on Memory Management

You can get around this memory problem in a number of ways.  Here’s a non-exhaustive list of options:

  • Add higher memory machines or more shards if cash is not a major constraint (I would recommend the latter to minimize the scourge of write locks).

  • Actively utilize the “_id” key, instead of always storing the default ObjectID.

  • Use namespacing tricks for your collections. In other words, create separate collections for recommendations in different cities, rather than storing a city key within each collection document.

  • Embed documents rather than linking them implicitly in your code.

  • Store the right data types for your values (i.e. integers are more space efficient than strings)

  • Get creative about non-duplicative indexing on compound keys.

The Key Compression Approach

After you’ve checked off those options, you may still wish to cut down on stored key length. The easiest path here probably involves creating a translation table in your filesystem that compresses keys on the way to Mongo from your code and then decompresses during the return trip. 

For simplicity sake, a developer could hardcode the translations, updating the table on schema changes.  While that works, it would be nice if there were a Mongo ORM for Python that just handled it for us automatically.  It just so happens that MongoEngine is a useful, Django style ORM on top of the PyMongo driver.  Sadly, it does not handle key compression.

Automatic Compression Tool

As a weekend project, I thought that it would be cool to add this functionality. Here’s an initial crack at it (warning: it may not be production ready).

The docstrings and inline comments are fairly extensive, but I should repeat a couple of main points:

  • This logic adds some overhead to the process of defining a class.  This happens only once, when the class is loaded, and quick benchmarking seems to suggest that it’s not overly prohibitive.  That being said, I mention several ways of improving the efficiency of this code.  First, you could move it directly into the TopLevelDocumentMetaclass or you could process the attrs before instantiating the class. Both would avoid the double work incurred here.

  • Embedded fields are not handled completely in this code.  The first time you set an embedded document, the underlying fields will be compressed.  However, if you change the nested fields subsequently but do not change the parent field, the nested fields will not be reset.  This means that you’ll have an uncompressed key for each nested field that you change.  You can get around this by dropping the mapped collection and recreating it (simple operation).  I plan to handle this logic in the code shortly.

  • Indexing in the meta attribute of the class should work as expected, though I would generally suggest that you set indexes administratively as a best practice.

The Final Mapped Output

Here is a working example of the code (you’ll need to add an abstract class to make this work).

When you define the TrialDocument Class, this document will be created in a collection titled, “trial_document_mapping”.

If you were to then remove the judge field from the TrialDocument and add a reporter field, you’d get the following:

If you were to then go into the shell, you could interact with MongoEngine like this:

Success! We’ve got compressed keys. Just one thing before we go. Beyond key space optimization, this is also a quick primer for smart value storage. Never use long string field values like this if you can help it (we can definitely help it here by using integers).

Next Steps Ahead

Hope that’s interesting and (even better) useful. I’ll try to update this post once I’ve worked out all the kinks with embedded objects and sped up the class instantiation process.

Ben Plesser is a Developer at Yipit.

@Bjpless

One of the Biggest Mistakes Django Developers Make When Using Lettuce

This post is the first in a series of posts about best practices when using Lettuce, a testing framework for Django.

When I first released Lettuce, a framework for writting automated tests in Django with user stories, I had no idea that it would have become so widely used. It’s been truly amazing to have seen it expand from Brazil to the United States to China and many other countries. It’s even been translated into 15 languages.

However, over the last 6 months, I’ve observed a common usage that, for the reasons below, developers should avoid.

Steps from Step Definition

Like Cucumber, Lettuce supports calling other steps from a step definition. This can be a very handy functionality, but can easily become a source of code that is hard to maintain.

So why is this functionality available? Although Lettuce is a testing tool, step.behave_as was a patch) that was incorporated in the codebase without complete test coverage. step.behave_as causes a step to call many others by parsing the text and calling them synchronously and sequentially.

Some people like to use this functionality in order to make their scenario look leaner, which is fine. The actual problem is that this workflow is sub-optimal, so I would advise using this functionality with caution.

An example of step.behave_as usage (please avoid doing the same) As an example, let’s consider the following feature and its respective step definitions:

defined as:

So… it looks kinda nice, why is it bad?

  1. step.behave_as implementation has issues.

  2. if you have to bypass parameters to the target steps, you will need to concatenate or interpolate strings, which will easily become a mess.

  3. if the string you pass as a parameter has typos, it’s a pain to debug.

  4. internally in Lettuce’s codebase, every single step is built from an object which is bound to the parent scenario, and metadata such as where it is defined. The current step.behave_as implementation doesn’t remount those aspects properly, leading to craziness when debugging.

  5. once you hardcode strings in your step definitions, your test’s codebase will get hard to scale to more developers, and thus, hard to maintain.

This is how Lettuce works if you are not using step.behave_as:

Please note the two aditional steps when you use it:

The solution: refactor generic step definitions into @world.absorb methods

Lettuce provides @world.absorb, a handy decorator, for storing useful and generic functions in a global test scope. The @world.absorb decorator literally absorbs the decorated function into the world helper and can be used right away in any other python files.

This decorator was created precisely for leveraging the refactoring of step definitions and terrain helpers by not requiring the developer to make too many imports from different paths, as well as to avoid making relative imports. Let’s see how the first example would look like when using @world.absorb.

The step definition def i_log_in_as now calls helpers that are available in the world helper.

Conclusion

You can easily notice that in the example above, **@world.absorb** allows for better maintainability and cleaner step definitions.

  1. Hardcoded strings would require manual updates when any related step-definitions has its regex changed.

  2. Step definitions that are multiple-lines long now just bypass the parameters into single-line function calls.

  3. When the hardcoded string has typos, no syntax error will occur yet the test will fail with a misleading error message.

Gabriel Falcao is a developer at Yipit and the creator of Lettuce. You can follow him on twitter and github.