Preview: Real Time Congress API


My main project for the last month or so has been something we’re calling the Real Time Congress API. It’s not quite ready for production use, and the data in it is subject to change, but I wanted to give you all a preview of what’s coming, and to ask for your help and ideas.

The goal of the Real Time Congress (RTC) API is to provide a current, RESTful API over all the artifacts of Congress, updated in as close to real time as possible. For the first version, we plan to include data about bills, votes, legislative and policy documents, committee schedules, updates from the House and Senate floor, and accompanying floor video.

The Real Time Congress API will work essentially like the Drumbone API (documentation) – a flexible REST API that supports partial responses and lots of denormalized, nested data. The Real Time Congress API is designed with thin clients in mind. We will be deprecating the Drumbone API upon RTC’s release.

The Data

For floor video, we’re scraping the official House Live pages that annotate the video feed for the House floor, breaking down the clips throughout the day, and re-syndicating it in JSON. So for example, the official page for September 29, 2010 becomes this:

We’re taking the bill and vote data from the Drumbone API and merging it into RTC. The data will be very similar, though we’ll be adding better support for votes taken by voice or unanimous consent. We’ll also be drastically improving how current our roll call vote information is, using the House and Senate roll call XML feeds, which are kept up to date within about 10 minutes.

We’ve also been collecting all sorts of legislative and policy documents, whip notices, committee schedules, and floor updates for a while, as part of some non-public APIs we made for our iPhone app (also called “Real Time Congress”). We’ll move this information into the Real Time Congress API.

Your Help

This project is moving fast, and we have code (or the beginnings of it) for all of the things I’ve mentioned so far.

However, there’s a whole range of things we would like to expand the Real Time Congress API to include, such as floor schedules, committee hearing transcripts, committee votes, treaties, outstanding and unfilled nominations, and surely many other things we haven’t thought of.

We need developers to help us write code to collect this data, and we need everyone’s help to figure out what else is out there that we should be collecting.

And of course, we also want people to use the Real Time Congress API, not just help build it. We’ll be making our Android, iPhone and Roku apps our first clients, but there’s certainly much more that can be done. Please, validate us!

As usual, this project is open source on Github. The API is written in Ruby, but the data gathering backend is designed to work well with scrapers written in any language that can communicate with MongoDB, and we already have a mix of Ruby and Python code powering what’s there so far.

We’ll be running a hackathon at RubyConf this week, and if you’re interested in learning more about the project or helping out, please come on by.