Good news if you were one of the users waiting on our Congress API to support the newly drawn congressional districts! As of today it is possible to pass the districts=2012 flag to the Congress API's districts.getDistrictFromLatLong method to instruct the API to return the district in effect for the 2012 elections.
As you may recall, the data wasn't previously available in a uniform format but thanks to a recent data release from Census.gov we were able to get this data loaded, with days to spare until the election.
The default will remain to return the districts in effect for purposes of representation until the swearing in of the 113th Congress in January 2013 at which point the temporary districts=2012 flag will be retired (but it will be safe to continue to pass the parameter indefinitely).
This change does not yet impact other Sunlight API methods. The Open States district methods and the ZIP code related methods will be updated as that data is available, as described in our last update.)
Continue readingKeeping Authentication Simple
The point of publishing bulk data is so it can be reused as widely as possible. This is particularly true for government data, which belongs to the public.
Government agencies can sometimes also be concerned with ensuring the authenticity of their legal information - especially when the data might be seen as an official source. It breaks down into two major concerns: integrity (ensuring the text is accurate), and origin (proving it's official). A lot of people are used to the "wax seal" model of authenticity - the experience of opening a PDF and seeing that the document is signed and official. This model quickly breaks down for distributing bulk data.
The goals of ease of reuse and authentication are frequently presented as being in tension, but that tension is just as frequently overstated. There are straightforward approaches to guaranteeing authenticity of bulk data that do not encumber reuse.
Continue readingArt Hack Day Boston
A couple of weekends ago, I attended Art Hack Day
at Harvard. The event was put on by the
Big Bad Lab, where we were proud to provide
the data for the event through our APIs
It was a pleasure getting to show off
python-transparencydata
and (ok, I'm biased here) python-sunlight.
The creative juices were really flowing throughout the 3-day hackathon, folks
created some amazing projects, such as a vending machine bill acceptor that
sucked 100 dollar bills at the same rate money has been spent this election
cycle (it was really fast!), political speech karaoke, and a voting booth that just
can't accept "no". Some other creative projects included an app that would
process tweets from federal Senators and Representatives (twitter IDs found off the
Sunlight Congress API), and a
bulletin board was covered with flyers featuring (real!) numbers for lobbyists discovered through
Influence Explorer.
A Cite for Sore Eyes
Earlier this week the annual Law Via the Internet conference was hosted by the Legal Information Institute at Cornell University. The conference schedule featured talks on a range of policy and technical subjects, including the topic of extracting legal citations from text and understanding them programmatically, which arises whenever people need to determine the relevance of legal documents based on the authorities they cite. Recognizing citations in text is also a vexing but fun programming challenge, so I was excited to see this issue figure prominently in at least four separate talks.
Continue readingPolitwoops – Now With More Open Source Flavor
Thursday we released a revamped design of Politwoops! You can read about some of the changes to the features and content in Nicko's post on the main blog, but the main news of the day is that we've now open sourced the code. That means, you can create your own Politwoops instance to track the deleted tweets of any subset of people you fancy.
Continue readingWas the Romney Tax Return Bitcoin Ransom Paid?
On September 4th, an anonymous poster claimed to have obtained copies of presidential candidate Mitt Romney's tax returns. They offered to either immediately release the returns or never release them, at the discretion of whomever first paid them one million dollars in the form of bitcoins (approx. 80,500 bitcoins).
It's safe to say that most people think these blackmailers' claims were a hoax: a clumsy and not-very-believable extortion effort that briefly made headlines and then disappeared. Certainly this is the prevailing opinion around the Sunlight office. But we think the bitcoin phenomenon is fascinating for both technical and social reasons (yes, there are labs staffers who have mined bitcoins). And since today is the day of the ransom, it seems like a good time to consider what, if anything, the release or lack of release could mean.
The Romney campaign has chosen not to release the returns, so it's safe to assume that they don't want the returns released by anyone. The release was scheduled to happen in the absence of a payment, so the poster seems at least slightly biased against the Romney campaign. This seems to imply that the lack of a release means that the ransom was paid (not necessarily by the Romney campaign). Yet having not received a ransom and possibly not actually in possession of the returns, a lack of a release allows our anti-Romney protagonist to cast the shadow of an assumed pay-off over the campaign. Thus a lack of a release doesn't tell us much. Ideally, we could detect whether a payoff happened -- but how?
Bitcoins are bought and sold on exchanges similar to the New York Stock Exchange. Traders advertise offers to buy and sell bitcoins at different prices. The exchange matches compatible orders. The price changes based on the exhaustion of orders at a given price. So can't we watch the exchanges for erratic volume and price fluctuations? Unfortunately it's not that simple. The volume on large exchanges like MtGox could facilitate such an exchange in 3 days and they could facilitate it without a detectable change in price or volume over the 24 days since the ransom announcement. Furthermore, bitcoin purchases don't require an exchange. Just as you can buy and sell stocks privately, bitcoin purchases can be conducted privately. Finally, we have to consider the possibility that the ransom-payer could have already had enough bit coins to satisfy the ransom. We'll have to look elsewhere for the evidence.
Unlike the banking system, the bitcoin protocol has no central authorities keeping track of how many bitcoins each participant has. Each participant has only the bitcoins the other parties can prove he has. This conservative approach is required to prevent double-spending of bitcoins. In order to achieve this, the bitcoin network relies on something called the block chain. For each transaction on the bitcoin network, the recipient asks other participants to verify the transaction by completing a zero-knowledge proof and then recording it in a cryptographically tamper-evident manner. For each transaction there needs to be multiple parties involved (the precise number being a matter of preference). This block chain is a huge database available to all bitcoin participants. If the ransom was paid, it would be forever recorded in the block chain.
There are websites dedicated to letting you watch block chain activity. The largest transactions receive quite a bit of attention. Transferring 80k bitcoins in one transaction would be noticed. Thus, to avoid detection of the transaction in the block chain, the parties would require many transactions spread across many sender and recipient addresses. If the recipient wanted to assemble their new-found funds into fewer addresses, they would have to do so through transactions between those addresses. These transactions would also be recorded.
Therefore, if this ransom has been paid or is eventually paid, the transactions would be recorded -- they're in plain sight for all the world to see. It would require a lot of high-tech detective work to find them, but if a payoff happened, it would have to be there.
Do we think it's worth investigating this? No. The odds of the Romney campaign paying a bagman a million bucks in bitcoins seem only slightly better than the Secretary of State secretly being a reptilian alien. But it's a fun exercise to think about.
Continue readingJoin Sunlight, NPR and the Washington Post for a Hackathon!
It feels like a little while since this community got together for a hackathon, doesn't it? We had a great time at the VIP event around Transparency Camp. But with the election looming and a full summer's worth of new technologies, APIs and data releases, the moment seems ripe for politically-minded devs to get together and create some cool stuff.
So we're delighted to be a part of the upcoming Election Hackathon, cosponsored by Sunlight, NPR and the Washington Post. It's happening October 6th and 7th at the Post's downtown DC headquarters, and it's going to be a good one: we've lined up $5000 in prizes, a bunch of newly-released APIs, and a judging panel that includes Ezra Klein, Brian Boyer, Rob "CmdrTaco" Malda and our own Ellen Miller. Most importantly, it promises to be a great opportunity to meet other like-minded geeks.
You can find all the details here. Start brainstorming now -- we'll see you in October!
Continue readingFrom Sea to Shining Sea: us for Python
I am an extremely lazy person. I started on a new project recently that required me to delve into state and census tract data. The thought of the effort involved in locating and copy-and-pasting a dict mapping US state abbreviations to FIPS codes was so overwhelming that I just wanted to go take a nap instead. And once I got the FIPS code dict, I'd have to use it to generate URLs for state shapefile downloads. Ugh!
So instead of (yet again) copying a dict from some other source, I decided to do something more permanent. us, the result of my laziness, is a Python package that contains all sorts of state meta data in an easy to use API.
Continue readingA New Face For Open States
Wonder what the Open States team has been working on since we finished our initial goal of providing information for all 50 states back in March? As promised, we've been focusing on a new OpenStates.org and expanding our API to support full text search and we're finally ready to show you the results.
If you head over to OpenStates.org now you'll see that we've released a beta version of our site, currently available for 20 states. The remaining states are on their way later this year, but we wanted to make sure we took our time and did things right. As you explore the site you'll see all of the information we've been making available via our API. You'll also notice some enhancements made in the last few months like full-text search and enhanced support for legislator photos and contact addresses.
Continue readingSunlight and Open Source
David Eaves has a thoughtful post over at TechPresident talking about open source and the transparency community's commitment to it -- a commitment that David sees as half-hearted. Sunlight's mentioned in the post, and the MySociety initiative that prompted the post is something that our team has been thinking about a lot. I think there's something to David's criticisms. But he's missing a few important things.
But let's get the baseline stuff out of the way first. Sunlight loves open source. Our whole stack is built on it, from the Varnish cache your browser connects to, to the Django/Rails/Flask/Sinatra/whatever app behind it, to the Postgres/Mongo/Redis/Solr/elasticsearch datastores that power it, to the OpenOffice suite that edits the grant application that paid for it all. All of our code is up on GitHub, and we welcome and celebrate contributions from the community.
But, Kindle contest aside, the above examples are mostly about us benefiting from open source. What have we done for the movement lately? This is the crux of David's critique:
So far, it appears that the spirit of re-use among the big players, like MySociety and the Sunlight Foundation, only goes so deep. Indeed often it seems they are limited to believing others should re-use their code. There are few examples where the bigger players dedicate resources to support other people's components. Again, it is fine if this is all about creating competing platforms and competing to get players in smaller jurisdictions who cannot finance creating whole websites on their own to adopt it. But if this is about reducing duplication then I'll expect to see some of the big players throw resources behind components they see built elsewhere. So far it isn't clear to me that we are truly moving to a world of "small pieces loosely joined" instead of a world of "our pieces, loosely joined."
I think David's missing a few important examples. For one thing, Sunlight's been adopting and investing in other organizations' code for a while now. PPF's OpenCongress has long been a Sunlight grantee, of course, and their code is entirely open source, including specific components like Formageddon that we commissioned. It's been more than a year since we began providing support for the Media Standards Trust to open-source and continue to develop SuperFastMatch; that's a partnership we think has tremendous potential to benefit both us and others, and you can expect to see some additional collaborations announced soon. Politwoops is a recent example of Sunlight adopting, extending and then launching a project started by another NGO -- the Open State Foundation, in this case (we're in the process of working with them to open-source the code).
But this is at the level of fairly specific partnerships with other transparency NGOs. The fact is that the more specific a project's use case, the harder it is to generalize its adoption. The more fundamental and abstract a tool is, the easier it is to adopt it and contribute back to it. It's no coincidence that we have people on our team who have patches in the Linux kernel but none who have patches in FixMyStreet. We see plenty of people use our Django apps and middlewares, but (so far) no successful redeployments of Influence Explorer. We've contributed a number of patches to the Boundary Service project that David mentions, but none to Ushahidi. Heck, back in my fixed-width font days, even I managed to get a minor patch into PySolr.
It simply gets harder to collaborate when you move to a less-abstract level of software. Requirements become more specific, and there cease to be good, general approaches to tackling problems. I saw this first-hand when I threw together the Elena's Inbox project. That effort generated a lot of excitement from other folks who had access to email archives, and we were glad to speak to all of them. I was eager to offer advice, answer questions and generally do some hand-holding, but I found myself wishing I had better news for the people who got in touch with me. Because unfortunately the reusable part of the site isn't all that valuable -- it's just some ugly templates and a basic Django app that provides endpoints for search and starring of emails (though we do have some much less ugly templates waiting for the next time we do a similar project). The real work and value-creation comes in the weekend following the government's Friday afternoon email document dump, when you need a programmer to lose sleep writing endless regular expressions that parse the idiosyncratic formatting of what's likely to be a badly-OCRed pile of text, then apply algorithmic approaches -- usually specific to the particular document set -- to stitch individual emails back together into threads. Come Monday morning, you'll be facing a huge, all-hands-on-deck manual review process as your staff tries to collapse duplicate entities down to single individuals (a process that can be aided by some string-similarity techniques, but which inevitably involves a lot of judgment calls and contextual knowledge).
Setting up an EI-style-site is unfortunately never going to be a clean, easily-repeatable process; not until government starts releasing MDBs or exposing IMAP endpoints (something we have yet to see, as far as I know). And this is fairly typical of work in our space: a lot of it needs to be purpose-built because of the quirks of government and the datasets it produces.
The good news is that although our movement is still quite young, we've already learned some lessons. I think MySociety's components strategy reflects this: they're moving down a layer of abstraction -- cautiously and after much consideration -- and tackling a slightly-more-specific task than a typical NOSQL or GIS project; a task that's still abstract enough to be reusable, but which is targeted enough to be particularly relevant to transparency organizations. It's something that we think is worth pursuing, and that we're anxious to help to make into a success. It probably won't make sense to spend time replacing Sunlight's too-specific-to-be-reusable but perfectly-useful-for-us entity store with PopIt in the near term. But those organizations that come to this space after us should be able to benefit from the lessons learned by MySociety, Sunlight and others. It's the same reason why Open States has been refactored twice: it takes time and experience to figure out what parts of a problem can be abstracted and made reusable.
There's no question that we can do better. We're looking at which projects have the most potential for reuse, and -- where appropriate -- we're planning to clean up their docs, add easy Heroku deployment support, roll some AMIs, and support some up-and-coming general source data formats. We'll also be taking a hard look at how our APIs are organized: we can make our data more easily reusable, too.
But specificity is often the enemy of reusability, and we think some of the most interesting opportunities tend to involve very specific problems. It's a real tension, but one that we're committed to continuing to work to address.
UPDATE: MySociety's Tom Steinburg has also posted a response to David, in which he explains the rationale behind MySociety's components strategy in considerably more detail
Continue reading