Bulk Data at the House Legislative Data Conference

by

Many of us from Sunlight have been at the House’s legislative data conference today, as Daniel has noted on the blog. The conference organizers have done a fantastic job — the day has been like an all day committee hearing, where the House’s tech officials are the witnesses, and the public gets to ask the questions. This is exactly the sort of good faith attempt to take responsibility for data policy that we wrote about in 2007 with the Open House Project report. It’s extraordinary for the leading providers of third party legislative information systems to sit as peers among the administrators, staff, and politicians responsible for how the House shares it work with the public. If that praise seems effusive, it should be; the House is setting an example for how to work with NGOs on data availability.

That’s not to say everything we’re hearing is good news.

The morning’s last panel featured the leaders of the offices responsible for most legislative data processes — like the Office of Law Revision Counsel, the Law Library of Congress, and the Government Printing Office.  We saw valuable new projects — mobile sites, web redesigns, and incremental improvements in data publication. All worthy efforts showing the legislative support bureaucracy adapting to new expectations for online information.

In cultivating these projects, though, these offices are also choosing to ignore another responsibility: their role in providing the data about Congress that enables third party web publishers (like Sunlight) to do their jobs. The officials were asked (by a number of us from Sunlight) why they still haven’t begun publishing bulk legislative data, and their answers were telling: it’s not a priority, they’re more concerned about accuracy.

These answers were a bit of a surprise for me, since Sunlight has been asking for bulk legislative data since 2007, persistently. These agencies have seen letters from Members and leadership, appropriations language requiring a report on feasibility, a bill proposing to force the issue, public criticism, and steadfast activism from our colleagues like Josh Tauberer (of Popvox and GovTrack) and David Moore (of OpenCongress). Even with all that attention, we’ve been met with a shrug.

The people responsible for publishing this information should get a little more familiar with the third party publishers who are reusing and re-presenting congressional information. Right now, people are researching legislation and the records of their representatives using both official sites (like THOMAS), and also third party sites like OpenCongress.org, GovTrack.us, Popvox.com, WashingtonWatch, or Congress. Third party sites aren’t going away — they’re essential to activists and analysts who rely on access to information that official congressional sites will never provide. Official and third party sites should be capable of coexisting amicably, reinforcing each other’s role and mission.

By declining to provide bulk access to legislative data, support agencies are actually ensuring that third party sites will continue to rely on a brittle, complex system of scraping and parsing, where legislative data lags behind the official version, and errors from official sources spread even after they’re corrected. Whatever concerns the LOC has about reliable data, the publishing system they’re relying on now is probably worse. By withholding bulk data, they’re creating the liabilities they warn against: the public relying on slightly less reliable data.

Part of Congress’s job should be to empower third party developers who are are permanent part of the infrastructure that brings legislative data to a huge slice of the public. By ignoring the public’s and Congress’s calls for bulk legislative data, administrators are ignoring part of what it means to be a responsible steward of public data. That definition has changed, and this morning demonstrated that we’ve got a lot of work left to do to demonstrate that bulk data does in fact fall squarely within those responsibilities.

Update: See our wiki page for more resources regarding how to improve THOMAS.