Cabinet agencies (and others) released their Open Government Plans last week with much fanfare, mixed reviews, and many promises for the future. I want to focus on one initiative -- the Department of Labor's "Online Enforcement Database" -- to highlight the strengths and weakness of what we've seen, and suggest some guidelines for going forward.
Online Database Strengths and Weaknesses
With the explosion at a mine in West Virginia last week, many questions are being asked about federal safety inspections. My colleague Anu Narayanswamy wrote on Monday, before the Online Enforcement Database was released, that the way the federal government releases data on mine safety makes it impossible to see how safety violations at one mine stack up against others. You cannot tell if the 500 safety violations in 2009 at this particular mine, for example, are typical for this industry.
On Wednesday, the Labor Department released the Online Enforcement Database, which contains five major data sets, including one on mine safety. Anu's follow-up article on Friday explained that "with mine safety data, released for the the first time in bulk [on Wednesday], users can search for mine inspection data by state or even zip code." But she also reported the data sets are only in a partially downloadable format, and do not include "the kinds of violation and penalties levied on mines across the country." In other words, it's difficult to figure out what's going on.
It is the search results, and not the underlying database, that are downloadable in bulk. ("Bulk" access means that you can download all of the information at once, and not piecemeal.) The only way to get at the Enforcement Database's information is to use its search tool, which has very limited capabilities. Users may search by state, agency, zip code, and by industry code. (DOL deserves credit for including the industry codes in a link from the search page.) So, a user cannot narrow the search range to a county, or a congressional district, or by the owner of a facility. Compare this to the search tool used at transparencydata.com, a new initiative from Sunlight that allows users to search a database on campaign contributions, that allows searching, sorting, and downloading in a multiplicity of ways.
As mentioned before, the Online Enforcement Database itself is not available for download in bulk. There's no way to look at all of the information the Labor Department has painstakingly gathered. And despite the wealth of information, a clunky search tool adds to the frustration. Without access to the supporting data, researchers cannot answer many questions. In fairness, the Labor Department says that bulk access and improved search tools are "coming soon," but it would be very helpful to have a date to accompany this promise. Doing so would make the promise concrete and testable.
I do not mean to pick on the Department of Labor, which made an effort in its Open Government Plan [PDF] to identify datasets for online publication and to set deadlines. Indeed, they stated they plan to take all data they collect and make it publicly available online and in downloadable formats, with appropriate caveats. Many agencies fell far short of DOL's achievements. But DOL should go further.
Open Data Principles
Elsewhere I've pulled together resources (from Princeton and Sunlight Labs) on building good data sets, including drafting guidelines for government data catalogs. It's important focus, however, at the fundamental level of what it means when we talk about how government should publish data online, a.k.a. "open data principles." As an attorney, I'm hardly qualified to talk about this, so I am fortunate that much of the heavy lifting was done at a conference in 2007. Afterward, my colleagues Clay and John and I worked on revising the open data principles, nine in number, and fleshed out a rough evaluation of when they are satisfied.
When agencies think about how to make information available, they should look to these (draft) principles. They state, in short, that data should be: complete, primary, timely, accessible, machine processable, non discriminatory, non propriety, license free, and permanent. Resource to these principles by the agencies -- and a better effort to comply with the directive's requirement to identify all high-value data sets and set deadlines for online publication -- would have turned the thus-far mixed results of the Open Government Directive into an unqualified success. There is still time to make that promise into a reality.
Here are the 9 open data principles in a framework to evaluate the extent to which they are satisfied:
(If you're wondering what ^M means, it's old school geek for delete - the 8 principles of open data have now become 9.)