Today we're launching Clearspending -- a site devoted to our analysis of the data behind USASpending.gov. Ellen's already written about this project over on the main foundation blog, and you should certainly check out her post. But I wanted to talk about it a little bit here, too, because this project is near & dear to my heart, having grown out of work that Kaitlin, Kevin and I did together before I stepped into the role of Labs Director.
The three of us had been working with the USASpending database for a while, and in the course of that work we began to realize some discouraging things. The data clearly had some problems. We did some research and wrote some tests to quantify those problems -- that effort turned into Clearspending. The results were unequivocal: the data was bad -- really bad. Unusably bad, in fact. As things currently stand, USASpending.gov really can't be relied upon.
You can read all about it over at the Clearspending site, and I hope you will -- in addition to an analysis that looked at millions of rows of data and found over a trillion dollars' worth of messed-up spending reports, we spent a lot of time talking to officials at all levels of the reporting chain. I don't think you're likely to find a better discussion of these systems and their problems.
And make no mistake, these systems are important.
Continue readingWhat’s Going on in the Labs
If you're like us, you're busy creating and don't take the necessary time to document and communicate what you're up to. Here in the labs, we've been great about announcing when we're finished with a product but we haven't really kept the community informed on what we're working on before we're done with it. We're going to improve this. Starting with this post, we're going to give you a monthly rundown of what we're working on here in our D.C. offices. With this and other proposed initiatives like improved documentation of our projects and making labs staff available for IRC "office hours," we hope to do a better job of keeping you in the loop and making ourselves available for questions or comments. Please let us know what you think about these proposals. With that, here's what we're up to at Sunlight Labs:
Continue readingIntroducing the Open State Project API
Over a year ago we announced our intention to build scrapers that would collect and sanitize legislative information from all fifty states, an initiative that is now known as the Open State Project.
As of today we're proud to announce a new milestone for the project, version 1 of the Open State Project API. You can start using our API today to get access to information on more than 37,000 bills and 1,600 legislators from the most recent sessions of 10 state legislatures.
Continue readingBetter Living Through Transparency: The Importance of Models
At Sunlight we spend a lot of time exploring ways to open up data sets and make them more accessible. The idea is that data enables us to act collectively, making better informed decisions and building a more effective public sector. When we talk about transparency the focus is often on the possibilities that data offers. But this discussion sometimes ignores the fact that translating data into action is hard.
There's a reason for this: data alone doesn't provide answers.
Coming up with solutions to real life problems -- like designing an effective and fair tax code or improving health care -- requires an understanding of how real life works. Unfortunately, more often than not real life is messy and complicated. In order to make sense of this complexity we need models -- approximations of the world that define fundamental mechanics of a given process and reduce it to understandable and meaningful terms.
As Joshua Epstein writes in a clever essay on scientific inquiry, every time we use data to draw a conclusion we also use a model. Sometimes explicitly: when a meteorologist makes a prediction about the weather they use a rigorously designed framework for translating observational data into a forecast. Sometimes not: when I look at the sky and make a prediction I'm using an implicit model based on a mix of past experience and a rather poor understanding of atmospheric processes. Both of us are using models to interpret data and both are based on assumptions about how weather works. I'm just not sure I could explain how mine functions, nor do I have any sense of how well it works.
Having access to good observational data is incredibly important to arriving at useful answers. But well designed and transparent models are equally important. In fact, having a good model is often a prerequisite to determining what to observe and how. If I want to predict the weather should I measure the temperature? Pressure? Wind direction? Where and how frequently? Without a solid theoretical framework it's often impossible to know where to begin and it's even harder to know when I've made a wrong turn.
When we use a model we embed its assumptions into the results. If key assumptions are incorrect, good data turns into supporting evidence for a potentially misguided answer. Or a bad model might drive the collection of useless data.
Continue readingGoogle Summer of Code: Open State Project
This post was contributed by one of Sunlight Labs' Google Summer of Code Students, Gabriel Joel Pérez. Gabriel's work is currently being integrated into the core project and the states he has been working on should be available via the Open State Project API later this year. His code is available on github as we work on integration.
Hello! I’m Gabriel, I’m a 4th year student of Computer Engineering from the University of Puerto Rico in Mayagüez. This summer I worked as a GSoC student on developing new scapers for the Open State Project. The states I worked on were Colorado, Hawaii, Washington, Oregon and the territory of Puerto Rico. I really enjoyed the whole experience. The work is very fulfilling as coding in Python is always delightful and fun.
Continue readingPreparing for the Worst
I should say up front that Google's been a great friend to Sunlight: they've helped support our contests, they've sent us phones and Summer of Code students to help our Android development efforts, and when I visited their DC offices a couple of weeks ago they let me eat as much candy as I wanted.
Still, I'd be lying if I said the incredible scope of their success didn't make me a little uneasy. We use Google Apps for our work email, for instance, and YouTube is essential to our video production efforts. We're as dependent as anyone else on Google for search, both as a tool and a source of traffic. I know we're not the only ones to be a bit unnerved at being so reliant on the goodwill of a private enterprise -- and of course over the past few weeks, other voices expressing those concerns have become significantly louder.
So, while we're looking forward to continuing to work with Google, it would be irresponsible for us not to prepare for the unthinkable. I'm happy to say that we've taken the necessary precautions, and today the future seems a bit less uncertain:
Of course, what happens after we run through our 1000 free hours is anyone's guess.
(Many thanks to Pierre Huggins of Rox Chox & Blox Woodworking for lending his awesome fabrication capabilities to this ridiculous project (and to our own sysadmin extraordinaire, Tim, for finding Pierre via HacDC)
Continue readingGoogle Summer Of Code Adds New Goodies To Congress (Android App)
Over the past few months we've had the pleasure of working with several developers through Google's Summer of Code program. One of them is Evelina Vrabie, who has contributed her talents to our Android app (and has done so from across an ocean -- Evelina's based in Romania). She was nice enough to write about the experience, and to tease a few of the features she's been working on for the app.
My name is Evelina Vrabie, and for the last four months I've had a great experience collaborating with Eric Mill on the Congress project for Android, as part of the Google Summer of Code 2010 program. Working for the Sunlight Foundation has been an excellent opportunity for me to learn and grow as a capable Android developer...
Continue readingThe National Data Catalog Is Hungry
So you've found some government data on the web. Naturally, you are eager to share your findings with the world. Perfect! Sunlight Labs can help. Our National Data Catalog (NatDatCat) is hungry for government data, and we have to feed it regularly. Otherwise, it gets grumpy.
The first step is to assess what you've found. If it is just a few bits of scattered files, just fill out a quick form and tell us about it. On the other hand, if it is a collection of data sets, you might consider writing an importer...
Continue readingI’m Kind of a Sucker for Transit Data
This may admittedly be of limited interest to those outside the DC area, but it's extremely interesting to me, so I'm afraid you'll just have to humor me for a paragraph or two. WMATA, our regional transit agency, has just launched a developer portal and API, and they've done a really nice job of it. People seem to love transit data -- after crime data it seems to be the municipal information people get most excited about (and I'd argue that it's much, much more useful than crime data) -- and I'm no exception. Playing with this stuff is a bit of a hobby of mine, and I've been following WMATA's gradual move toward openness for years. This is a big step forward for both the agency and its customers.
Bus data is still forthcoming, and I suspect that's where the real possibilities lie: the rail system is pretty easy to use; tech can pay bigger dividends when applied to the relative mysteries of the bus. Still, it's already clear that WMATA has made some smart decisions about implementation, defined reasonable terms of service, and generally seems to be moving in the right direction. When the API is considered alongside the already-released GTFS dataset, Metro's offerings match up fairly well (though not perfectly) with the ten open data principles that Sunlight has just published.
Now to see if I can't get a Graphserver instance running...
Continue readingSunlight Labs Community Survey
We've put together a short survey that we'd greatly appreciate your responses on. It shouldn't take more than ten minutes. By answering you'll be a part of this re-evaluation of where we focus our efforts so that we can help ensure that this community stays focused and energized. Tell us what you like about the community and where we're slacking, but most importantly tell us what you need.
Continue reading