I am an extremely lazy person. I started on a new project recently that required me to delve into state and census tract data. The thought of the effort involved in locating and copy-and-pasting a dict mapping US state abbreviations to FIPS codes was so overwhelming that I just wanted to go take a nap instead. And once I got the FIPS code dict, I'd have to use it to generate URLs for state shapefile downloads. Ugh!
So instead of (yet again) copying a dict from some other source, I decided to do something more permanent. us, the result of my laziness, is a Python package that contains all sorts of state meta data in an easy to use API.Continue reading
We'd like to welcome python-sunlight into the most excellent family of open-source projects maintained by Labs. This particular project aims to unify and normalize the Sunlight APIs into a single Python library that's easy to understand, use, and fun to play with.
This library currently supports our Congress API, Open States API, and Capitol Words API. As such we're deprecating the old python-sunlightapi and python-openstates libraries. They'll still work but will no longer be receiving updates, so switching is highly recommended.
This library has some neat features that should make migration painless - as well as some new features, such as a standardized location to place your Sunlight API Key, which makes testing (as well as distributing) your app even easier.
We've just released version 1.0.1 over on PyPI, which makes installation a snap on any system with
pip. The documentation is fairly complete, but feedback is super welcome -- we're eager to learn where folks get stuck.
Most of the bugs seemed to be worked out after the Boston Python Project Night, where we had a few folks test out the library. A special thanks to all our beta-testers!
Alright, so how do I get started?
Hacking on python-sunlight is super easy. Here's how to get setup.
You'll need an API key. If you've not done so, get an API key (it's alright, we'll wait, go ahead).
Back already? Great.
Now, you'll have gotten the email that has a long-ish string of letters and numbers - let's save this to
~/.sunlight.key (where python-sunlight will look for a key). If you already had a key, it'd be worth it to go and dig it up.
If you're on a UNIX-type (MacOS, GNU/Linux, *BSD, AIX or Solaris (or any of the other POSIX-ey systems)) machine, you should be able to run a command that looks like the following:
echo "your-api-key-here" > ~/.sunlight.key
It's worth mentioning that
your-api-key-here should actually be your API key that was emailed to you up above.
Next, you should install python-sunlight via
pip is not installed on your system, please download and install pip.
pip install sunlight
And you're good to go!
Without further ado, an example!
#!/usr/bin/env python # Copyright (c) 2012, BSD-3 clause, Sunlight Labs from sunlight import capitolwords from sunlight import congress phrase = "death metal" # Today, we'll be printing out the Twitter IDs of all legislators that use # this phrase most in the congressional record. for cw_record in capitolwords.phrases_by_entity( "legislator", # We're getting all legislators sort="count", # sorted by how much they say phrase=phrase, # this word )[:6]: # We'll just try the top 5 legislators legislator = congress.legislators( bioguide_id=cw_record['legislator'], # Look up this biogude (unique ID) # for every fed. legislator all_legislators="true" # search retired legislators ) if len(legislator) >= 1: # If we were able to find the legislator legislator = legislator # (this is a search, so it's a list) if legislator['twitter_id'] != "": # and they have a Twitter ID print "%s. %s (@%s) said %s %s times" % ( legislator['title'], legislator['lastname'], legislator['twitter_id'], phrase, int(cw_record['count']) ) # Print it to output :)
The output looks like this:
Sen. Feingold (@russfeingold) said death metal 979 times Rep. Jackson Lee (@JacksonLeeTX18) said death metal 923 times Sen. Leahy (@SenatorLeahy) said death metal 800 times Sen. Kyl (@senjonkyl) said death metal 755 times Sen. Durbin (@SenatorDurbin) said death metal 593 times
And once more (this time, searching for "san francisco"):
Rep. Filner (@CongBobFilner) said san francisco 1346 times Sen. Feinstein (@senfeinstein) said san francisco 1288 times Sen. Boxer (@senatorboxer) said san francisco 1181 times Rep. Pelosi (@NancyPelosi) said san francisco 1135 times Rep. Eshoo (@RepAnnaEshoo) said san francisco 677 times
Questions, concerns, bugs, patches, examples and virtual hugs are all welcome on our GitHub page, so please do check it out!Continue reading
If there is one thing that I learned from PyCodeConf, it's that all conferences should be in Miami in October. And they should all feature parties at rooftop infinity pools. Aside from the fun, PyCodeConf had a great selection of speakers that showed the breadth of the Python community, from wedding web sites to scientific computing. Read on for an overview of the some of the talks that pulled at my heartstrings.
The slides and audio from all talks can be downloaded from the PyCodeConf site.
What makes Python AWESOME?
This talk by Python core developer Raymond Hettinger was one of my favorites. When working with a language on a day-to-day basis, it is easy to take features for granted. Iterators, generators, and comprehensions are things that seem simple at first, but allow you to do very complex operations in very little code. The new-ish with statement provides an elegant interface for resource management and separation of common set up and tear down code.
Physics is renowned for the beauty and elegance of it's theories and equations. It's these same principles that made me love Python. While the language is slower in gaining new features, you can be guaranteed that the implementation will be incredibly clean and consistent with the principles of the language.
Embracing the GIL
I was fortunate to see David Beazley give a GIL thrashing talk at PyCon and this talk was just as good. The GIL is a very controversial part of Python which has both FUD and actual issues surrounding it. David has done a lot of research into how the GIL works and demonstrates how it behaves under various conditions. The summary: Python 2.7 is okay, Python 3 needs work, and a basic implementation of thread priorities in Python 3 puts it on par with 2.7.
API Design and Pragmatic Python
Kenneth Reitz is best known for his wonderful packages such as requests, envoy, tablib, and clint. If you've used any of Kenneth's projects you'll have noticed that he values creating sensible APIs that insulate users from the messier parts of Python. He takes a very conservative approach to his cause; no need to actually replace messy packages, just create wrappers that make them easier to use.
Kenneth also announced the release of The Hitchhiker’s Guide to Python. His goal is to create a central repository for Python best practices covering everything from installation and editors to coding style and app layout.
The one common theme of nearly everything at the conference was PyPy, famed alterna-interpreter. The team has come a long way and everyone was eager to show the areas in which it excels over CPython and point out the parts that need some work.
The general consensus seems to be that over the next few years PyPy will become the interpreter of choice for running Python. The team is currently accepting donations on their site for general development, Python 3 support, and a port of NumPY. I've donated, you should too!
Who's coming with me next year?
I highly recommend checking out out each of the talks. Even though I only highlighted a few here, they were all quite excellent. Thanks to GitHub for putting on such a great conference and all of the sponsors that allowed it to happen (free mojitos).Continue reading
Followers of this blog are probably already aware of two of the main sites developed by our Data Commons team: TransparencyData.com and InfluenceExplorer.com. Both sites present a variety of influence related data sets, such as campaign finance, federal lobbying, earmarks and federal spending. Influence Explorer provides easy to use overview information about politicians, companies, industries and prominent individuals, while Transparency Data allows users to search and download detailed records from various influence data sets.
In this blog post I want to show how easy it can be to use the public APIs for both sites to integrate influence data into your own projects. I'll walk through a couple examples and show how to use both the RESTful API and the new Python wrapper.Continue reading
My main project for the last month or so has been something we're calling the Real Time Congress API. It's not quite ready for production use, and the data in it is subject to change, but I wanted to give you all a preview of what's coming, and to ask for your help and ideas.
The goal of the Real Time Congress (RTC) API is to provide a current, RESTful API over all the artifacts of Congress, updated in as close to real time as possible. For the first version, we plan to include data about bills, votes, legislative and policy documents, committee schedules, updates from the House and Senate floor, and accompanying floor video.Continue reading
ScraperWiki is a project that's been on my radar for a while. Last week Aine McGuire and Richard Pope, two of the people behind the project, happened to be in town, and were nice enough to drop by Sunlight's offices to talk about what they've been up to.
Let's start with the basics: remedial screen scraping 101. "Screen scraping" refers to any technique for getting data off the web and into a well-structured format. There's lots of information on web pages that isn't available as a non-HTML download. Making this information useful typically involves writing a script to process one or more HTML files, then spit out a database of some kind.
It's not particularly glamorous work. People who know how to make nice web pages typically know how to properly release their data. Those who don't tend to leave behind a mess of bad HTML. As a result, screen scrapers often contain less-than-lovely code. Pulling data often involves doing unsexy thing like treating presentation information as though it had semantic value, or hard-coding kludges ("# ignore the second span... just because"). Scraper code is often ugly by necessity, and almost always of deliberately limited use. It consequently doesn't get shared very often -- having the open-sourced code languish sadly in someone's Github account is normally the best you can hope for.
The ScraperWiki folks realized that the situation could be improved. A collaborative approach can help avoid repetition of work. And since scrapers often malfunction when changes are made to the web pages they examine, making a scraper editable by others might lead to scrapers that spend less time broken.Continue reading
Sunlight Labs recently held an open house to bring members of the technology and transparency communities together over videogames and beer. Our systems administrator, Tim Ball, volunteered to create a photo booth for the event. A few days before the event Tim destroyed his arm in a terrible, unfortunate accident, nearly dashing our hopes for a photo booth. We had to honor Tim's memory (he's still alive) so rather than using an off-the-shelf photo booth software package, I hacked it up from scratch using Python, CSS3, WebSockets, and an iMac.Continue reading
The Lobbying Disclosure Act of 1995 mandates that lobbyist that meet specific requirements are to register with Clerk of the House of Representatives and the Secretary of the Senate. Being the great body that they are, the House provides a searchable database and bulk download of the registration forms. Sure a searchable database is nice, but we can have the most fun with access to the entire data set. The disclosure forms are provided in XML format, divided by year and reporting period (quarerly, semi-annually, annually), and archived.
In order to download the disclosure archives, an HTML form must be submitted for each file. This can be a huge pain as the files are large and involves non-trivial human effort whenever files are released or updated. We've written a Python script that simulates the form submissions and automatically downloads all of the archives. In addition to the script, we've uploaded a recent download of the archives to Amazon S3 for easier distribution.Continue reading
Today was an amazing day for Sunlight Labs. Check this out:
That's our code sprint in PyCon today. Developers have, according to the wiki, checked out over 20 states to work on, and it looks like amazing, serious progress is being made there.
So to the Pycon and greater Python community-- thank you for all that you do. You are amazing.Continue reading