Disclaimer: The opinions expressed by the guest blogger and those providing comments are theirs alone and do not reflect the opinions of the Sunlight Foundation or any employee thereof. Sunlight Foundation is not responsible for the accuracy of any of the information within the guest blog.
Earlier, our labs team introduced Python-Sunlight an open source project that will unify all Sunlight APIs and make it easier to use. Today, Eric Davis a lead developer at the Nevada Policy Research Institute, a think tank located in Las Vegas has taken on using Python for government transparency and is here to share the steps with us. Eric specializes in using technology to make government more open and has developed and currently maintains TransparentNevada, Nevada Journal and TweetNevada.
Note: The code in is this post was written for Mac/Linux -- it'll run fine on Windows, but you'll need to make some adjustments around path names when using virtualenv and when placing your API key in your home directory.
The past few years have seen an explosion in the amount of publicly available government data. From White House visitor logs to House expenditures to electronic campaign finance data, there is an unprecedented amount of government data available. This information shines a light on how our government operates, while also requiring bigger and more powerful programs to help make sense of it all.
Compared to the average computer user, it’s likely that most transparency activists already possess an above-average level of computer literacy. We work with huge spreadsheets and massive databases on a daily basis. Yet as important and useful as these programs are, in each case we’re forced to work within the constraints of the program itself.
So what happens if you need to do something that can’t be accomplished with the programs you already use?
You create your own.
For the rest of this article, I’m going to focus on the Python programming language and why it is a good “tool for transparency.” Two features, in particular, make Python an excellent programming language: solid documentation and extensive libraries. When you’re first starting out, having access to well-written documentation can be the difference between “getting it” and “getting lost.” In addition, an extensive number of libraries — pre-written code to handle common tasks — are easily available, which helps you focus on the task at hand rather than re-inventing the wheel.
To help introduce Python, we’re going to write a mini-program that collects the names and Twitter accounts for all the members of Congress.
If you’re a complete beginner to programming, I recommend you read at least the first few chapters of Zed Shaw’s “Learn Python the Hard Way” before continuing. It’s a free, online book that will teach you the basics of running simple programs. At a bare minimum, you should be able to complete exercises 0 and 1. You should be comfortable with editing text in a text editor and running programs from the command line before moving on.
For this program, we’ll be using “Sunlight Labs Services,” a service provided by the Sunlight Foundation that enables programmers to access government data easily and efficiently. The first thing you’ll need to do is register for a key. Click that link and enter your name and email along with the place you work. For “intended usage” put something like “Grab Twitter accounts for members of Congress.” Once you receive your key via e-mail, open your text editor and copy it into a file called ‘.sunlight.key’ (note the leading period) in your home directory. With this key, you’ll be able to access all of the Sunlight Labs Services.
Next, download virtualenv.py into your home directory. Now open your terminal and type:
python virtualenv.py learn-to-program.
After that, change into the ‘learn-to-program’ directory. Now, we’re going to install two libraries. Type:
./bin/pip install sunlight tablib.
That’s it for the setup; now comes the fun stuff.
Open up your text editor and type in the following:
Save this file as
twitter_accounts.py inside the ‘learn-to-program’ directory that was created earlier.
There are five “parts” of this program, each separated by an empty line:
Remember those libraries we installed with
import them so they can be used.
Now that we’ve imported the
sunlight library, here’s how we’re going to use it. This creates a variable —
lawmakers — that holds information on each lawmaker currently in Congress.
Just like we made use of the
sunlight library above, now we’re going to use
tablib — short for “tabular library” — here. This creates another variable,
names_and_twitter that will hold lawmaker names in one column and twitter accounts in another. We also tell it that the data will have the headers “name” and “twitter.”
Line 7 goes through each lawmaker in
lawmakers. Lines 8-10 are run for each lawmaker in the
lawmakers variable. First, it sets the
name variable by combining the lawmaker’s first and last names with a space. Next, it sets the
twitter_id. Finally, it
names_and_twitter dataset for use in the next step.
This creates a file — ‘twitter.xls’ — and tells Python we’re going to be writing binary data to it. The next line
writes the data from the
names_and_twitter variable as an Excel spreadsheet to the file. Finally, we
close the file to tell python we’re done with it.
Now back in your terminal and from inside the ‘learn-to-program’ directory, type
./bin/python twitter_accounts.py. This tells python to run the code you just entered.
Assuming everything was typed correctly, you’ll now have a ‘twitter.xls’ file next to your ‘twitter_accounts.py’ file. Open ‘twitter.xls’ with Excel or OpenOffice and you’ll see the full name in column A and that lawmaker’s twitter account in column B.
Congratulations: You just created your first program!
If you don’t have Excel or OpenOffice or want to generate a CSV (comma separated value) file instead of an Excel spreadsheet, replace
names_and_twitter.csv in line 13 and change ‘twitter.xls’ to ‘twitter.csv’ in line 12. Whereas Excel files have to be opened with special programs, one nice thing about CSV files is they are plain text and can be used to copy data to various other systems – like databases -- quite easily.
So where do you go from here? Try adding features to what you just wrote. Include the lawmaker’s party next to his or her name. Add another column with the lawmaker’s phone number (don’t forget to update the dataset headers). Explore other parts of the Congress API to, for example, find all the lawmakers for a given zip code.
Interested in writing a guest blog for Sunlight? Email us at firstname.lastname@example.org