Fork me on GitHub

Top Chef: Cloud Edition

Lately at Pure Charity I’ve been using Chef to build our infrastructure on Amazon’s EC2. After using chef for the last few weeks I have come to fully embrace it and I will never again build a Linux box by hand. I know that sounds pretty dramatic, but if you build servers (in the cloud or bare metal) you NEED to do yourself a favor and look at Chef. In the coming week I’m going to write a series of posts about Chef. This week I’m going to explain what Chef is and why it matters.

Chef is the Warden

Prison bars

Chef assists with configuration of your servers (I’m going to call these nodes) in your collection of servers (I’m going to call the collection a stack). It keeps them in line, kind of like a prison warden. I know what you might be thinking at this point: “I’ve already got scripts that I’ve written, I don’t need a tool to do this. Heck, I’ve even got my scripts checked into version control and shared with the rest of my team.”

Kudos if you are already taking the step to automate your own processes in scripts and bravo if you are using version control to manage and share these scripts. Do your scripts query servers based on roles to get the list of servers it will be acting on? Maybe LDAP or something similar solves that problem for you, but will your scripts adapt to software or even hardware specifics of the box? For example, will your scripts check the number of processors on your box and change the Nginx worker_processes setting to accomodate it? What if your database box is on a Ubuntu system, but your application servers are on a RedHat box? Will your scripts know the difference?

OHAI!

OHAI!

Chef scripts allow you to have access to environmental specifics in your scripts through the ohai system profiler. Ohai, besides being an awesome name, will give you NICs and transfer stats, users and groups, what version of Java/Ruby/etc. is installed, memory information, among other details. This means that your scripts become deterministic.

Okay so you can get information about the box that you’re deploying and your scripts can account for that, but why does that require a server? The server is for your configuration management. What’s does that mean? On the Chef server you can set variables that that pertain to boxes in a particular role, or maybe to an entire environment.

Think about that for a second. No more magic variables, random IP addresses, or having a script to run in prod and a script to run in staging. Chef takes the configuration out of your scripts and centralizes it in a server. The biggest benefit to this is that your server can use the Chef server as the authority for it’s configuration.

So what is the benefit to using Chef over some other technology? The greatest difference is that you get everything under one roof. Configuration management, automated building of servers from a base image, and a searchable database of all of your nodes. Plus recipes can be easily shared because they (should) have no information about your configuration in them. Opscode has their community cookbooks, as does 37 Signals.

In my next post I will examine a cookbook for an application using the deploy resource.

Comments

Post New Comment »

You guys using chef solo, server or hosted? Moving off the Windows platform for a side project and am anxious to try out chef for building our environments. How big is each stack your building? e.g. how many nodes and how many components on each node? How long does it take to build each stack in each environment? e.g. if you had to spin up a new environment, assuming you already had IPs, how long from IP to application? Are you also using chef to deploy your actual application bits? Looking forward to future posts on chef! Thanks, Zach
Zach — November 19, 2011 11:33:42 AM
@Zach We're using server because we've got 10 EC2 nodes to manage now (2 environments with 5 nodes each), but more as we scale out. Server is great for EC2 deployments because you can set up environments with a separate set of attributes specific to that environment. For small 1-2 node deployments chef solo will be sufficient. If you don't want to mess around with building your own chef server then the hosted platform is great. Since on most nodes we are compiling Ruby from source it takes longer than it would if we had an image with the latest Ruby or if we used Ubuntu default 1.8.7. With that it takes 15-20 minutes for a full install on each box that requires it. Just bootstrapping a box with chef takes about 3 minutes or less. We do use Chef to deploy as well, I've always used Capistrano in the past, but with chef managing which nodes have what on them it seemed ideal to integrate with Chef's deploy resource. We use the deploy_revision provider. Instead of deploying to a folder created by datetime it deploy to a folder named after the Git SHA1 commit. Using the knife command you can use the knife ssh sub-command to ssh into all the nodes that have your code on them and run chef-client which updates the code and any other dependencies that you have on the node. Really neat stuff. Great comment! Thanks for posting.
Jesse Dearing — November 19, 2011 12:20:16 PM
I want to know more about this!
Chuck — January 18, 2012 11:52:16 PM