Two years ago we held an Open Government Sprint at PyCon 2009. We had never hosted an event like that before, and had no idea what to expect. To our amazement we ended up with one of the largest groups of any of the sprint projects, completely filling our room for the first few days. Approximately 30 people attended and kicked off what has now become the Open State Project.
Next week, we'll be heading to PyCon and hosting an Open Government Hackathon for the third year in a row. The primary focus will again be the Open State Project but our space is open to everyone interested in government data. If you have a project you'd like to hack on let us know and I'll be sure to mention your project when I plug the sprint. If you aren't attending PyCon but happen to be near Atlanta you're welcome to join too, the Hackathon is free and open to the public (March 14th-16th @ the Hyatt Regency in downtown Atlanta).
Additionally, I'm going to be presenting a poster on the technical aspects of the Open State Project on Sunday. I'll be around to talk about the project itself but also web scraping and opening government data in general, so if you're at PyCon stop by during the poster session Sunday morning and say hi.
Continue readingCongress 3.0 for Android
If you have an Android phone (or tablet) and haven't checked out the Congress app for Android in a while, now is a good time to give it another look.
Today we're releasing version 3.0, which, in addition to a redesigned theme and layout, adds:
- Live updates from the House floor.
- Upcoming committee hearings in the House and Senate.
- Keyword search for bills (e.g. "health care", "deficit", "immigration")
- Details on any amendment that receives a vote.
Visualizing the Budget vs. Visualizing Spending
If you haven't yet checked out The Data Viz Challenge, you should. A data visualization contest? Sponsored by Google and Eyebeam? Focusing on the federal budget?! There is basically nothing in the world more up our alley. Yes. A thousand times yes.
And yet I do feel obliged to offer one small criticism: I think the contest would be a little more exciting if entries weren't limited to using data from What We Pay For. Mind you, this is not because of anything wrong with WWPF. That site has done a very nice job of parsing budget data and, through the Challenge's website, exposing it via an API.
The problem is that the budget is only part of the story. As Kaitlin has already explained, tax expenditures--more commonly known as tax breaks--are vital to understanding our nation's finances. When the government declines to collect tax revenue from some particular individual or industry, it's not very different from simply sending them a check. The beneficiary has more money and, all else being equal, the rest of us have to pay more taxes (or take on more debt) to make up for it.
Unfortunately, this is the point at which politics enters the equation. The two major parties tend to pursue their spending priorities in different ways, and this has created political incentives for pretending that tax expenditures don't affect the budget. But this is silly--it's like pretending that if you worked two jobs and neglected to deposit your paycheck from one of them it would have no effect on your finances.
You can find people from both sides of the aisle saying unsupportable things about tax expenditures, but the truth is that every serious scholar who works on this issue regards tax expenditures as a type of spending. That's why the literature uses the word "expenditure."
I don't know if WWPF declined to wade into this space because it's politically charged or because the data's historically been so tough to access, but I wish they had decided differently. If you don't include tax expenditures, you wind up ignoring huge government subsidies to business (accelerated depreciation), housing (mortgage interest tax deduction) and every kind of nonprofit, from museums to soup kitchens to the NCAA (tax exemption). This isn't to say that we shouldn't subsidize those entities and uses. Maybe we should! But we should at least talk about it. We need to make sure these expenditures are considered if we're going to get a clear understanding of our nation's finances. Make no mistake, we're talking about a lot of money -- have a look at the chart from Kaitlin's post and you'll see what I mean.
At any rate, I'm sure that we'll see some stunning visualizations come out of this contest, and I don't hesitate to encourage anyone reading this to participate. But now that Sunlight and the Pew Charitable Trusts have worked together to expose data on tax expenditures--and we'll be adding more such data soon--I hope the visualization community will be inspired to tell that part of the tale as well. Without it, any story about our government's spending is incomplete.
Crossposted from the Sunlight Foundation blog
Continue readingCome Work Here!
One of the few downsides to working with incredibly talented people is that other folks are constantly trying to hire them away. Worse, sometimes they even succeed! This has just happened, in fact. But while we're very sorry to have lost Josh and Kevin to (some admittedly amazing) new opportunities, there is a bright side: the chance to bring some new brilliant technologists into the Labs fold.
So! Please direct your attention to our jobs page. There you'll find two listings for developers in our Washington, D.C. offices. We're looking for someone to lend a hand on the Subsidyscope project, and another dev to serve as a jack/jill-of-all-trades working on a variety of technical projects. But while one of the two positions will have a specific project responsibility, prospective candidates should understand that all labs members have the opportunity and obligation to work on a variety of different things.
Who should apply? More than anything, we're looking for people who are passionate about improving our government, excited about technology's capacity for doing so, and who are interested in digging into some genuinely tough problems. You can find some of the specific technologies we use in the job descriptions, but if you're a smart, creative technologist, don't let, say, a background with Couch instead of Mongo dissuade you from applying.
It really is a pretty great place to work. The compensation's competitive, the work environment's relaxed, and the opportunities for doing exciting, important work are tremendous.
Continue readingOpen States: Present and Future
I'm pleased to say that Caitlin and James have just finished giving our Open States project a lovely new design. Not only is the site now much more pleasing to look at, it's much easier to see the great progress that's being made by James, Mike and our volunteer contributors. In addition to the five states that are live (and supported by OpenGovernment), there are already another twelve states with "experimental" status. Don't let the scare-quotes scare you, though: while we wouldn't yet recommend building your air traffic control system or pacemaker firmware in such a way that it's dependent on our API coverage of Alaska, the scrapers from the experimental states are well on their way to being declared complete. Developers should confident about building around this data -- rest assured that it'll be declared "ready" soon enough.
Of course, we hope that developers in our community will also consider becoming involved in the project directly -- there's plenty of work to be done.
And it's genuinely important work. State legislatures are where vital decisions are made about civil rights, transportation, education, taxes, land use, gun regulation, and a host of other issues. Far too often, these issues don't get the attention they deserve. It's a simple question of scale: there are a lot more resources available at the federal level for both lawmakers and journalists. That means state governance both requires more transparency and tends to get less of it. We think technology can help make the situation better -- that's what Open States is all about.
There are some interesting opportunities for cross-state work, too. Polisci geeks will probably appreciate the comparative politics opportunities that a common data model and API will allow (Gabriel Florit's already been creating some cool visualization experiments that build on our data). But there are also less academic applications for this information. Consider these two stories that NPR published last fall. They got a bit lost in the pre-election shuffle, but they made a big impression on me.
The gist of it is this: Arizona's controversial immigration law didn't happen by magic. One of the special interests fighting for it was the private prison lobby -- as you might imagine, having more prisoners means more business for them, and they saw increased enforcement of immigration laws as a growth opportunity. So, via an intermediary organization that specializes in this sort of thing, they conducted a legislator "education" campaign, wining and dining lawmakers and sending them home with prewritten model legislation.
All of this is perfectly legal. And, depending on your opinion about immigration, you might even approve of the policy outcome it produced. But it's hard to imagine anyone being okay with the shadowy role that commercial interests appear to have played in this legislative process. If we'd been able to spot the provenance of the legislation earlier, would journalists and organizers have been able to give the people of Arizona a more complete understanding of what was going on? I think so -- I hope so. That's the kind of use that Open States should make possible, and the one I'm most excited about.
Continue readingWe’re Muses!
Our sysadmin extraordinaire Timball was presented with this lovely piece of art at Shmoocon a couple of weeks ago:
(In real life the hard drive platters are shinier than they look here)
The piece was made by a gentleman named Phylum Coredata who said Sunlight had inspired it. We think it's pretty awesome (a proper frame is forthcoming). Thanks, P!
Continue readingThe Real Time Congress API
Today we're making available the Real Time Congress API, a service we've been working on for several months, and will be continuing to expand.
The Real Time Congress API (RTC) is a RESTful API over the artifacts of Congress, kept up to date in as close to real time as possible. It consists of several live feeds of data, available in JSON or XML. These feeds are filterable and sortable and sliceable in all sorts of different ways, and you can read the docs to see how.
RTC replaces and deprecates the Drumbone API, which is no longer recommended for use.
Continue readingWhat’s Going on in the Labs
Project Updates
MediaSync
MediaSync got a major upgrade and a Dirty Dancing name thanks to Jeremy and a few others in the Django community. It's incredibly flexible now and definitely work a look if you're in need of a media deployment solution for your Django projects.
Congress API
James spent a little time last month to update the data in the Congress API to get it ready for the 112th Congress. More information on the update can be found here.
Continue readingNew Hampshire Opens Its Legislative Data
As recently covered on TechPresident, the New Hampshire General Court (their state legislature) has made an extremely welcome addition to their website in the form of a downloads section.
New Hampshire isn't the first state to offer such a thing: New Jersey has a similar section on their website, and quite a few states like New York and Kansas are introducing APIs to their new legislature websites. What is interesting, however, is the fact that the justification for offering the data presented by freshman representatives George Lambert and Seth Cohn is centered around reducing cost and strain on the legislature's website caused by web scrapers.
The load placed on sites by scraping them is something that we know a little bit about. Our Open State Project is currently crawling 18 state legislatures once a day, hitting over 100,000 pages nightly. Bulk downloads like New Hampshire's make it possible for us to take in all changes by simply downloading a few files every night instead of hitting thousands of pages--most of which haven't changed. Even though we take precautions like rate limiting our scrapers and having them back off if the site seems to be failing, we still see the occasional failure during our scraping run, which unfortunately only causes us to have to run the scraper again.
New Hampshire and its citizens will see other benefits of the bulk data beyond a less-burdened website. Consumers of the data will now be able to take the data in much faster than they previously could. There's also a much smaller potential for errors when you are importing data from a machine readable source like a CSV or database file. This means that tools built on top of scraped data (like the recently launched OpenGovernment beta) will be able to have more accurate and up to date data.
Those responsible for making this change happen in New Hampshire should be proud of the change that they've enacted. A preliminary glance at the actual New Hampshire data makes it look promising. As the data is quite new unfortunately they are not yet including roll call votes or links to the full text of bills, but we'll reach out to them to see if these oversights can be fixed in the near future. Hopefully New Hampshire is just one of many states that will start seeing the benefits of providing bulk access. To help show what is possible we'll be adding New Hampshire support to the Open State Project as soon as possible.
Continue readingSelling Free Data
It wasn't too long ago that I talked about how hard it is to create a business on open data. So it's probably worthwhile to talk about an open data business that popped up shortly after that post: CQ's First Street. As this writeup mentions, it's just one of a bunch of new services that are launching around business intelligence in the government space -- Bloomberg and Politico are also creating subscription offerings designed to help lobbyists and contractors achieve more success.
But what are these services selling? An awful lot of it is already free. Contact info for legislators or their staff. Lobbyist registrations. Legislative info. Campaign finance information. Data about grant and contract spending.
I've linked to Sunlight projects, but of course there are many other great services who offer this kind of data gratis. So if this stuff is free, why are people paying for it?
Well, obviously these services are offering some added value. First and foremost there's the aggregation function: collecting the data into a usefully centralized interface is the core of these products. In some cases they add value by offering data that can't be gotten anywhere else: original reporting, or cleaned or otherwise improved versions of the data (for instance, Bloomberg bought Eagle Eye, which scrubs USASpending data; and Sunlight's staff directory is created from expenditure reports, not the canonical, non-digital staff directories available on the hill). Finally, and not insignificantly, these services have brands and sales staff that help them find paying customers.
I think it's safe to say that helping lobbyists more effectively manipulate congress is not the use of open data that we at Sunlight are most excited about. But we really are glad to see these businesses evolve and succeed: they help create demand for better data offerings (and their staff members often turn out to be the kinds of folks we get along with at conferences).
Still, this is an area where the underlying data is basically available to anyone. Any developer can try their hand at making a better, cheaper service. I don't know if this particular market will be large enough (or free enough from the principal agent problem) to turn into the hyper-competitive race to the bottom that it could be. But I do know that the data you can get for free is going to keep improving -- we're doing our best to make sure of it.
Continue reading