Most states keep their legal code in the open, but a few hold-outs have asserted a copyright over that material.... View Article
Continue readingThe STOCK Act and Security through Obscurity
Congress has been delaying implementation of the STOCK Act, largely out of fear over what could happen if disclosures go online. A new report from the National Academy of Public Administration says those fears are well-founded. But its reasoning is flawed, and its recommendations -- which amount to security through obscurity -- are badly wrong-headed. If there are problems with the disclosures mandated by STOCK, let's fix them. Ignoring them and hoping that obscurity will prevent bad things from happening is not only short-sighted, it's dangerous.
Continue readingDOJ’s FOIA Metadata Strategy Makes Sense
The Department of Justice deserves some applause forĀ its plan to improve public access to FOIA materials. This has been in... View Article
Continue readingThis Is Why Government Should Use Open Formats
James Fee brings news of a dismaying decision by an Ohio court. A real estate appraiser named Robert Gambill tried... View Article
Continue readingCongress Should Fix the CFAA
Like so many others, we at Sunlight are terribly saddened by Aaron Swartz’s death. Our longtime friend and adviser Micah... View Article
Continue readingJoin Our Holiday Hangout!
The year is winding down; metrics and reviews are mostly behind us, and as staff members disappear to year-end vacation... View Article
Continue readingElection Night Snapshots
Late last week, we had an idea: election night was sure to be a confusing rush, and the closeness of the race in many states made result-reporting snafus seem possible. And the basic shape of election night data is deeply lousy (outside of media outlets that subscribe to the AP). Why not try to keep a record of what election authorities disclosed, so we could have a closer look in the morning?
Drew and Kaitlin adapted some of Politwoops' code, and we quickly researched as many URLs for election results as we could. The results are necessarily incomplete: some states don't aggregate their results in a central place, and some only went online during election night. But we managed a pretty good start.
We also decided to throw in a few media outlets, just for fun (and then a few more once the results began to come in and it became clear which sites might have to back off their predictions in a maximally screenshottable way). The results includes screenshots and HTML snapshots. Everything's timestamped -- the frequency of snapshots was determined both by when things changed (we only recorded new snapshots when something had been updated) and by the round-robining of the system (which was somewhat variable, based on the speed of the screenshot process).
We haven't had time to go through all of this data, but we'd love your help (or just the chance to satisfy your curiosity). So if you'd like, head over to electionshots.sunlightlabs.com. The content is organized by state -- ZZ is the media. We're working to put together bulk download options now.
Continue readingA Report from the Election Hackathon
A bunch of us from the labs spent the weekend a few blocks away from Sunlight HQ, hacking away with... View Article
Continue readingJoin Sunlight, NPR and the Washington Post for a Hackathon!
It feels like a little while since this community got together for a hackathon, doesn't it? We had a great time at the VIP event around Transparency Camp. But with the election looming and a full summer's worth of new technologies, APIs and data releases, the moment seems ripe for politically-minded devs to get together and create some cool stuff.
So we're delighted to be a part of the upcoming Election Hackathon, cosponsored by Sunlight, NPR and the Washington Post. It's happening October 6th and 7th at the Post's downtown DC headquarters, and it's going to be a good one: we've lined up $5000 in prizes, a bunch of newly-released APIs, and a judging panel that includes Ezra Klein, Brian Boyer, Rob "CmdrTaco" Malda and our own Ellen Miller. Most importantly, it promises to be a great opportunity to meet other like-minded geeks.
You can find all the details here. Start brainstorming now -- we'll see you in October!
Continue readingSunlight and Open Source
David Eaves has a thoughtful post over at TechPresident talking about open source and the transparency community's commitment to it -- a commitment that David sees as half-hearted. Sunlight's mentioned in the post, and the MySociety initiative that prompted the post is something that our team has been thinking about a lot. I think there's something to David's criticisms. But he's missing a few important things.
But let's get the baseline stuff out of the way first. Sunlight loves open source. Our whole stack is built on it, from the Varnish cache your browser connects to, to the Django/Rails/Flask/Sinatra/whatever app behind it, to the Postgres/Mongo/Redis/Solr/elasticsearch datastores that power it, to the OpenOffice suite that edits the grant application that paid for it all. All of our code is up on GitHub, and we welcome and celebrate contributions from the community.
But, Kindle contest aside, the above examples are mostly about us benefiting from open source. What have we done for the movement lately? This is the crux of David's critique:
So far, it appears that the spirit of re-use among the big players, like MySociety and the Sunlight Foundation, only goes so deep. Indeed often it seems they are limited to believing others should re-use their code. There are few examples where the bigger players dedicate resources to support other people's components. Again, it is fine if this is all about creating competing platforms and competing to get players in smaller jurisdictions who cannot finance creating whole websites on their own to adopt it. But if this is about reducing duplication then I'll expect to see some of the big players throw resources behind components they see built elsewhere. So far it isn't clear to me that we are truly moving to a world of "small pieces loosely joined" instead of a world of "our pieces, loosely joined."
I think David's missing a few important examples. For one thing, Sunlight's been adopting and investing in other organizations' code for a while now. PPF's OpenCongress has long been a Sunlight grantee, of course, and their code is entirely open source, including specific components like Formageddon that we commissioned. It's been more than a year since we began providing support for the Media Standards Trust to open-source and continue to develop SuperFastMatch; that's a partnership we think has tremendous potential to benefit both us and others, and you can expect to see some additional collaborations announced soon. Politwoops is a recent example of Sunlight adopting, extending and then launching a project started by another NGO -- the Open State Foundation, in this case (we're in the process of working with them to open-source the code).
But this is at the level of fairly specific partnerships with other transparency NGOs. The fact is that the more specific a project's use case, the harder it is to generalize its adoption. The more fundamental and abstract a tool is, the easier it is to adopt it and contribute back to it. It's no coincidence that we have people on our team who have patches in the Linux kernel but none who have patches in FixMyStreet. We see plenty of people use our Django apps and middlewares, but (so far) no successful redeployments of Influence Explorer. We've contributed a number of patches to the Boundary Service project that David mentions, but none to Ushahidi. Heck, back in my fixed-width font days, even I managed to get a minor patch into PySolr.
It simply gets harder to collaborate when you move to a less-abstract level of software. Requirements become more specific, and there cease to be good, general approaches to tackling problems. I saw this first-hand when I threw together the Elena's Inbox project. That effort generated a lot of excitement from other folks who had access to email archives, and we were glad to speak to all of them. I was eager to offer advice, answer questions and generally do some hand-holding, but I found myself wishing I had better news for the people who got in touch with me. Because unfortunately the reusable part of the site isn't all that valuable -- it's just some ugly templates and a basic Django app that provides endpoints for search and starring of emails (though we do have some much less ugly templates waiting for the next time we do a similar project). The real work and value-creation comes in the weekend following the government's Friday afternoon email document dump, when you need a programmer to lose sleep writing endless regular expressions that parse the idiosyncratic formatting of what's likely to be a badly-OCRed pile of text, then apply algorithmic approaches -- usually specific to the particular document set -- to stitch individual emails back together into threads. Come Monday morning, you'll be facing a huge, all-hands-on-deck manual review process as your staff tries to collapse duplicate entities down to single individuals (a process that can be aided by some string-similarity techniques, but which inevitably involves a lot of judgment calls and contextual knowledge).
Setting up an EI-style-site is unfortunately never going to be a clean, easily-repeatable process; not until government starts releasing MDBs or exposing IMAP endpoints (something we have yet to see, as far as I know). And this is fairly typical of work in our space: a lot of it needs to be purpose-built because of the quirks of government and the datasets it produces.
The good news is that although our movement is still quite young, we've already learned some lessons. I think MySociety's components strategy reflects this: they're moving down a layer of abstraction -- cautiously and after much consideration -- and tackling a slightly-more-specific task than a typical NOSQL or GIS project; a task that's still abstract enough to be reusable, but which is targeted enough to be particularly relevant to transparency organizations. It's something that we think is worth pursuing, and that we're anxious to help to make into a success. It probably won't make sense to spend time replacing Sunlight's too-specific-to-be-reusable but perfectly-useful-for-us entity store with PopIt in the near term. But those organizations that come to this space after us should be able to benefit from the lessons learned by MySociety, Sunlight and others. It's the same reason why Open States has been refactored twice: it takes time and experience to figure out what parts of a problem can be abstracted and made reusable.
There's no question that we can do better. We're looking at which projects have the most potential for reuse, and -- where appropriate -- we're planning to clean up their docs, add easy Heroku deployment support, roll some AMIs, and support some up-and-coming general source data formats. We'll also be taking a hard look at how our APIs are organized: we can make our data more easily reusable, too.
But specificity is often the enemy of reusability, and we think some of the most interesting opportunities tend to involve very specific problems. It's a real tension, but one that we're committed to continuing to work to address.
UPDATE: MySociety's Tom Steinburg has also posted a response to David, in which he explains the rationale behind MySociety's components strategy in considerably more detail
Continue reading