How I Taught Myself to Code in 8 Weeks

To a lot of non-developers, learning to code seems like an impossibly daunting task. However, thanks to a number of great resources that have recently been put online for free - teaching yourself to code has never been easier.

I started learning to code earlier this year and can say from experience that learning enough to build your own prototype is not as hard as it seems. In fact, if you want to have a functioning prototype within two months without taking a day off work, it’s completely doable.

Below, I’ve outlined a simple path from knowing nothing about software development to having a working prototype in eight weekends that roughly mirrors the steps I took.

Introduce yourself to the web stack (10 minutes):

The presence of unfamiliar terminology makes any subject seem more confusing than it actually is. Yipit founder/CEO Vin Vacanti has a great overview of some of the key terms you’ll want to be familiar with in language you’ll understand.

Get an introductory grasp of Python and general programming techniques (1 weekend):

  • Learn Python the Hard Way. Despite the title, the straightforward format makes learning basic concepts really easy and most lessons take less than 10 minutes. However, I found that the format didn’t work as well for some of the more advanced topics so I’d recommend stopping after lesson 42 and moving on.

  • Google’s python class. Read the notes and / or watch the videos and do all of the associated exercises until you get them right - without looking at the answers. Struggling through the exercises I kept getting wrong was the best learning experience and I would have learned far less if I had just looked at the answers and tried to convince myself that I understood the concepts.

These two resources are somewhat substitutable and somewhat complementary. I recommend doing the first few lessons from both to see which you like better. Once you’ve finished one, skim through the other looking for concepts you aren’t fully comfortable with as a way to get some extra practice.

Get an introductory understanding of Django (1 weekend):

  • Work through the Django tutorial.

  • Delete all your code.

  • Work through the tutorial again, from scratch.

The first time I went through the tutorial I inevitably ended up just following the instructions step-by-step without really understanding what each step did since everything felt so new.

The second time through I wasn’t as focused on the newness of the concepts was better able to focus on understanding how all the parts work together.

Get a deeper understanding of Python / general programming concepts (2-4 weekends):

Again, I would sample each and see which you like the best. I ended up doing both but that was probably overkill.

Practice building simple web applications (1 weekend):

  • Work through a few of the exercises in Django by example. These exercises don’t hold your hand quite as much as the Django tutorial but they still provide a fair bit of guidance so I found it to be a nice way to start taking the training wheels off.

Build your prototype (1 weekend):

That’s it. Eight weekends (or less) and you’ve gone from zero to a functioning prototype. Not so daunting after all is it?

Next Steps:

It goes without saying that there is a huge difference between the relatively cursory amount of knowledge needed to build a simple prototype (the focus of this post) and the depth of knowledge and experience needed to be a truly qualified software engineer.

If you want to learn all that it takes to build modern web applications at scale, getting professional web development experience at a fast-growing startup like Yipit is a great next step.

If you’re smart, hard-working and passionate about creating amazing consumer web experiences drop us a line at jobs@yipit.com - we’re always looking for great people to join our team.

David Sinsky is a noob developer at Yipit

P.S. Below are few other potentially useful resources - please leave a note in the comments if you have suggestions for others:

Parsible: Straightforward Log Parsing

The Graduate

Scratching the Itch

It’s something that everyone needs to do, a coming of age experience for a tech company. Yes, thats right, I am talking about log parsing. And, much like other coming of age stories, everyone does it a little bit differently. There are many so many decisions to make:

Is this just a one time thing or a long term commitment?

What happens if we change our mind about what we want?

Who’s going to be dealing with this mess?

But enough with the euphemisms.

What we wanted in our Log Parser … and Why:

  • Written in Python: We are a Python shop and it makes it easy for us if our technical knowledge is readily applicable to our tools. For us, Python is always the best language for a system because there are more than 10 serious Pythonistas in the office. If you work at a Ruby shop, favor Ruby where it makes sense.

  • Batch Mode and Continuous: We want to be able to use our parser to continually feed our graphs as new data comes in - we don’t like waiting. This doesn’t mean we don’t sit down with our old logs from time to time and peek under the covers. We want to be able to use the same tool for static analysis as we do for real time tasks. We reduce overhead if we can use one tool for both.

  • Simple to Use: One of our core beliefs about metrics and graphs is that the people who care about them need to be the ones watching them. Sysadmins staring at Clicks/Minute can be helpful, but it’s much better when Analytics has them on their screen.

By extension, it means that we want Analytics defining what they want charted and recorded. If you simplify the interactions with the data, you lower the bar for them to set it up themselves. This is solved by a simple plugin based architecture that abstracts away the complexity from the end user so they can focus on getting value from the data.

  • Low Impact: If we want to parse the data as it comes in, it means we will be on production machines. This means no hogging resources from the user facing components.

  • Free: Sorry Splunk, you’re awesome but a bit pricey for Startups.

  • Can be described by lots of Buzzwords: Self-Explanatory

Yet Another Log Parser?

Alice: I wrote a log parser! Isn’t it awesome?!

Bob: But there are already tools that do this!

Alice: Really?

Bob: Well, mostly…

There are more tools to do these sorts of things than you can shake a stick at, some of them even fit the requirements! The better ones (for us) that we have seen:


  • Ganglia Logtailer: A framework focused on ripping your data out and pushing it to Ganglia. It supports both batch and daemon modes, and it is written in Python! However, looking at the source it doesn’t inspire confidence that non-developers will have an easy go of it.

  • Logster: A nice fork of Ganglia Logtailer that is tailored toward cron-style execution with some additional outputs. More focused on simplicity, it echoes many of Etsy’s other high quality releases. Unfortunately it is cron only and hasn’t made enough headway towards simplicity.

The Best Parser

  • Splunk: An awesome program for doing anything with logs of any sort. It can monitor, alert, search, graph and correlate. Super easy to use too! However, the free version is mostly a toy and the enterprise version is not what most would call cheap:

Splunk Enterprise pricing in North America starts at U.S. $6,000 for a 500 megabyte-per-day perpetual license, including first year support, or U.S. $2,000 per year for a term license including support.

A pretty sobering price since most companies will be generating and searching many gigabytes of logs per day. The cost itself will make people hesitate to use it on new data, limiting its effectiveness. Too rich for our blood.


We came to the decision that we should roll our own. After formulating the requirements and spending far too much time with grep we built Parsible. The code is Open Source (obviously) and a fresh fork will be ready to begin parsing your own stuff.

How does it work?

When you want to dig into a log with Parsible, you need three things:

1. A Parser: This will be fed single lines from the log file and will return data in a format of your choosing. We like to build dictionaries by using regex.

Parsing functions need to live in the parsers directory and the method should start with parse_, Parsible will find them from there.

2. A Processor: This is what will act on your parsed data. This is where most of the useful filtering is going to be done and where the data will be sent on its way. It can be as simple as checking the status code of an HTTP request or getting the time to serve a request to the user.

Much like the parsers, processors live in the processors directory and methods need to start with process_. Parsible will automatically discover, import, and run these for each line of your file, feeding them with the data returned from your Parser.

3. An Output: Now that you have your data you need to get it somewhere useful. Your outputs should be reusable so that they can be called from many different processors.

There are no strict naming or placement conventions, but it is simplest if you start them with output_ and put them in the outputs directory.

Why is this better?

Setting up the framework in this way makes it easy for one person to compile a useful set of parsers, outputs and helper functions that can be used by anyone who wants to write a custom processor. This allows a layer of indirection between writing the parser (a developer centric task), the processors (developers, analytics or business units), and the outputs (more developer work).

Our work flow gets much simpler with this set up and extracting new data becomes a simple process:

User:

  1. Clone our internal repo and pick the branch for your desired logs
  2. Write a new processor
  3. Hook up to a ready-made output
  4. Push to the repo

Behind the Scenes:

  1. Chef updates the Git repo on the machine
  2. Parsible is restarted (Runs under supervisord)
  3. StatsD automatically adapts to the new data
  4. Graphite + Graphiti start producing awesome graphs for immediate consumption.

Ready for some examples?

Parsers

A parser for Nginx timed_combined logs, modified slightly from what we use in production (we have some custom fields).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import re

def parse_nginx(line):
    regex = re.compile("(?P<ip_address>\S*)\s-\s(?P<requesting_user>\S*)\s\[(?P<timestamp>.*?)\]\s\s\"(?P<method>\S*)\s*(?P<request>\S*)\s*(HTTP\/)*(?P<http_version>.*?)\"\s(?P<response_code>\d{3})\s(?P<size>\S*)\s\"(?P<referrer>[^\"]*)\"\s\"(?P<client>[^\"]*)\"\s(?P<service_time>\S*)\s(?P<application_time>\S*)\s(?P<pipe>\S*)")
    r = regex.search(line)
    result_set = {}
    if r:
        for k, v in r.groupdict().iteritems():
            # Normalize our data
            if v is None or v is "-":
                continue
            # Make the dictionary fields a bit more useful
            if "request" in k:
                if "?" in v:
                    request = v.partition("?")
                    path = request[0]
                    query = request[2]
                    result_set["path"] = path
                    result_set["query"] = query
                    r.groupdict().pop(k)
                    continue
                else:
                    result_set["path"] = r.groupdict().pop(k)
                    continue
            result_set[k] = r.groupdict().pop(k)
    # This becomes what is passed to other functions as 'line'
    return result_set

Processors

Here we use the output of the parser to generate a count of how many API hits we have in this very contrived example. In reality we have 10-20 processors run against each line with a bevy of helper functions to do things like pick out user agents and bots or filter down referrers.

1
2
3
4
5
6
from plugins.outputs.statsd import output_statsd_count

def process_api(line):
    if 'path' in line.keys():
        if line['path'].startswith('/api/'):
            output_statsd_count('api')

Outputs

Check out the simple StatsD output that is called processor above. Any outputs can easily be imported into a processor in the standard fashion. When running with --batch-mode True for some ad-hoc analysis you could have an output that is as simple as printing a line.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# A Sample output function that pushes data to StatsD
from socket import socket, AF_INET, SOCK_DGRAM

hostname = "127.0.0.1"
port = 8125
udp = socket(AF_INET,SOCK_DGRAM)

# Good to stay non blocking so requests don't
# pile up
def _send_statsd(data):
    udp.sendto(data, (hostname, port))

def output_statsd_count(stat, count=None):
    if count is None:
        count = 1
    data = "{0}:{1}|c".format(stat, count)
    _send_statsd(data)

Not too impressive as StatsD already makes it really easy to send data over. It gets more interesting when it is used to illustrate how simple it is to write new processors. Right now we send a lot of data over to StatsD, but we also have hookups for Redis and our internal alerting software.

Obviously these are trivial examples, but head on over to the Parsible page on Github for more practical examples including the Nginx parser, StatsD outputs, and a few custom processors that filter out requests by User Agent and OS.

Happy Parsing!

Andrew Gross is a Developer at Yipit

TL;DR We made a pluggable parsing engine called Parsible

Pitfalls When Upgrading to Django 1.4

The Sting

The Players

The Yipit Django team just finished the process of upgrading to the 6 week old Django 1.4 and figured that we would share the process we went through and some gotchas we found.

The Set-Up

Our goal in this process was not to start utilizing new Django 1.4 features, although there are quite a few. Our goal was to get our system running and stable on Django 1.4. Over the next few weeks we’ll start utilizing some of the new features (QuerySet.prefetch_related, ModelAdmin.save_related(), and queryset bulk_create). This post focuses on what is needed to get running on 1.4 - even though we’re dying to use some of the new features.

The Hook

Upgrading a critical piece of your stack is something that should never be done with haste.

About a month ago, half of our developers upgraded their local setups to run 1.4 and we read through the release notes to find which areas of our codebase needed changing. (As an aside, the Django community does an excellent job with the release note and we highly recommend you looking through them).

A few weeks later, we upgraded our continuous integration and staging servers. Naturally, we wanted to run 1.4 on hardware that mimics our production environment and found a few more issues.

The Tale

Here were the issues we ran into that may apply to others:

  • Django admin static media moved locations from django/contrib/admin/media/ to django/contrib/admin/static/admin/. If you rely on this location at all, you’ll need to change it.
  • Serialization changes to datetime and time. Django serializers have changed the output they return for datetimes and times to potentially include timezones, milliseconds, or a different format. If you have any sort of API that relies on the built-in JSON or XML serializers(note that Django REST framework does), you will need to take action to prevent your API users from experiencing potential issues.
  • The MySQL GIS backend used to have a bug that returned 1/0 for booleans instead of True/False (our @zmsmith reported it!). The bug got fixed so it could mean another potential change to any APIs you expose.
  • django.contrib.sitemaps bug fix with potential performance implications. Requests to sitemaps used to return cached responses by default. They no longer do. All of our sitemaps already utilized caching, but you it’s always good to double-check.
  • The content type for email encodings changed from quoted-printable to 8bit encodings with this commit. This caused some issues in our emails that we thankfully caught during testing.
  • Various modules/methods moved(django.conf.urls.defaults, django.contrib.auth.models.check_password, etc). The large majority of code that was moved around can still be imported from the old location, but it’s in your interest to upgrade the import paths now so that you don’t need to worry about it when the old import path becomes deprecated in version 1.5

The Wire and The Shut-Out

Bleeding EdgeAfter doing a full walk-through on our staging systems, we made the jump on production. Naturally, a few small bugs revealed themselves once our users stressed some edge cases. With our alert and monitoring systems, we were able to fix the few bugs we found within a few minutes of our rollout song stopping.

The Sting

This was our experience upgrading to 1.4. I hope you can now avoid running into some of these issues. If you have run into your own issues, please share them in the comments below.

Steve Pulec is a Developer at Yipit.

15 Key Resources to Learn Django

Learning DjangoLearning Python and Django is easy if you have the right resources at your fingertips.

Coming from pretty much only having studied programming in school, I started working as a developer at Yipit with almost no web programming experience. In a little fewer than ten weeks, I’ve become comfortable navigating and making large changes to the whole Python/Django application code as well as contributing new features (interested in learning Django at Yipit? Join us!).

While the learning process wasn’t gut-wrenching, I certainly ran into some hefty roadblocks that would have been almost insurmountable had coworkers not pointed me to the resources below:

Comparing Python to Other Languages

If you know another programming language already, you can easily leverage that knowledge by finding out the differences between Python and that other language, and then by focusing in on learning those differences.

1. Python & Java - A Side by Side Comparison I knew enough Java already from college, and doing some quick reading online saved me a lot of time. I would highly recommend reading this article if you know Java already. It helped me form a mental road-map of what areas to focus on.

2. Also, the Python wiki has a good collection of comparisons of Python to other programming languages.

Python Style

Learning Python syntax is one thing, learning “Pythonic” style is quite another. Regardless of whether you end up coding like this, it’s at least a good idea to understand what is idiomatic – it can save you time when reading other people’s code.

3. Code Like A Pythonista by David Goodger– This article gives a good rundown of how to write readable and stylistic Python code that takes advantage of the dynamic features of the language.

4. Google Python Style Guide – Some good standard practices.

Advanced Python Features

Certain programming concepts in Python just don’t exist in Java, for instance closures and meta-programming, and they were a bit tough for me to pick up. The following articles gave me a good deal of insight into these concepts.

5. What Is A Metaclass in Python - A detailed explanation of a pretty tough topic in Python. Django uses quite a bit of metaclasses, and if you get a solid grasp on metaclasses, you’ll save yourself a lot of time going through models and forms.

6. Hidden Features of Python - A great compilation of answers on Stack Overflow.

7. A Few of My Favorite Python Things - An opinionated walk-through some of the cool features of Python

Python/Django Tutorials

For a more complete introduction, the following tutorials will bring you up to speed and are a good review if you know some programming already. The important thing is to give the exercises a try.

8. Google’s Python Class - A colleague of mine who also just recently started programming in Python recommends this resource as a quick way to jump into Python programming which has a great mix of video lectures and notes and practical examples.

9. Writing your first Django app on djangoprojects - This canonical tutorial on Django that lets you get your hands dirty and set up a Django project.

Best of Django Documentation

While reading documentation isn’t the most glamorous endeavor in the world, a few sections of Django’s documentation are worth reading for themselves.

10. Managers / Querysets - These sections give a good deal of insight into how Django interacts with relational databases, which seems quite magical. Just always be mindful of the actual queries you’re sending out or else they’ll come back to bite you hard.

11. Request-Response / User - If you’ve never worked with an HTTP request, these sections are very helpful as an introduction. Also, the notion of a User along with all the authentication that goes with it come free with Django. Lucky you.

12. Views / Templates -  A good introduction on how the application-level code connects to the client-side HTML/JS/CSS code.

Reading Code from Successful Django Sites

Everything said and done, there’s not a great substitute for reading code. I read through most of Yipit’s codebase to get a sense for how Django is used in a production environment. You might not have access to such a codebase, however, you can find a plethora of Django-powered sites that have open-sourced their codebase.

13. Django Sites - A great resource for access to dozens of sites that have their source open for you to browse through.

Python Tools

Having the right tools at your disposal can make your life a lot easier by saving you hours of going down the wrong path. However, especially when debugging, I’ve found that the best thing to have is a healthy skepticism for your own code.

14. IPython - Although not specific to Django, IPython is a huge improvement over the standard Python shell. With its tab complete feature, it saves me a few extra seconds of distraction whenever I can’t remember a function or module name.

15. PDB / iPDB - The Python debugger tool has been incredibly helpful to me when going through a new piece of code. It gives you the power to stop code at arbitrary points and inspect the variables at those points. iPDB gives you the power of PDB along with the features of iPython.

These fifteen resources really got me going with Django. And while I’m no Django pro just yet, I still think how amazing it is that I picked up so much knowledge in just a short span of time.

Mingwei Gu is a Developer at Yipit. @mwgu