The White House's new Open Data Policy has received many accolades, but its ability to be sustained long term will depend on support from the legislative branch. Fortunately, Congress has been working on these issues for the last several years.
Today, the White House is issuing a new Executive Order on Open Data -- one that is significantly different from the open data policies that have come before it -- reflecting Sunlight's persistent call for stronger public listings of agency data, and demonstrating a new path forward for governments committing to open data.
This Executive Order and the new policies that accompany it cover a lot of ground, building public reporting systems, adding new goals, creating new avenues for public participation, and laying out new principles for openness, much of which can be found in Sunlight's extensive Open Data Policy Guidelines, and the work of our friends and allies.
Most importantly, though, the new policies take on one of the most important, trickiest questions that these policies face -- how can we reset the default to openness when there is so much data? How can we take on managing and releasing all the government's data, or as much as possible, without negotiating over every dataset the government has?
Open data policies can come in different shapes, sizes, and strengths. The most common and idealized form aims to mandate or direct energy toward open data specifically (reflected in the recent wave of municipal referendums). Another takes the focus off of open data, and instead tucks related provisions into policies for other issue areas (a neat example is this (now tabled) Viriginia education bill, introduced in January).
The open data legislation passed yesterday by Utah reflects a third form: the mandated plan.
We’ve seen this model before, most recently in Montgomery County, MD. In essence, this sort of legislation directs a particular agency (or, in Utah’s case, overhauls a snoozing Transparency Advisory Board) to study and make recommendations for online, best practice data disclosure.
Although it’s easy to think of these policies as a punt, this sort of reallocation of attention, time, and expertise can actually be a move to stabilize and ensure thoughtful implementation and real enforcement of an open data agenda -- so long as it’s executed well, actually moves from planning to action, and operates start to finish within the public’s eye.
Utah’s Board will be one to watch, with a unique combination of state agency actors, legislators, archivists, technologists, county and municipal reps, and two members of the public. It’s a team that hints at greater ambitions for Utah’s approach to future online publication of data, one that seems to be looking, at least tentatively, outside the State House and towards Utah’s local governments. But we won’t know for sure until the board turns around its first series of recommendations, due by November 30, 2013.
In the course of writing scrapers for all 50 state legislatures, our Open States team and volunteers spent a lot of time looking at state legislative websites and struggling with the often inadequate information made available. Impossibly difficult to navigate sites, information going missing and gnarly PDFs of tabular data have become daily occurrences for those of us working on Open States. People are always curious to know how their state stacked up compared to others -- in fact one of the most frequent questions we have been asked has been “so which state was the worst?” That question got us thinking: How could we derive a measure of how “open” a state’s legislative data was?
After some consideration, we came up with six criteria on which each state could be evaluated, based on six of the Ten Principles for Opening Up Government Information: completeness, timeliness, ease of access, machine readability, use of commonly owned standards and permanence. We omitted four of the original ten criteria (primacy, non-discrimination, licensing and usage costs) that tended not to present serious differences between states.
Evaluating each state on each criteria was a large task, and with community support we ensured that each state was evaluated by multiple people. After the evaluation was complete, we converted the qualitative data on how a state performed to numeric scores (specific scoring details are available on the report card itself). After summing these scores, states were also assigned a letter grade according to where they fell among their peers. A state with a net score below negative one was given an F, a negative one or zero became a D. With the average total score among states being a 1.5, we gave states with a net score of one or two a C, three became a B, and four and above became an A.
The final breakdown was 8 As, 11 Bs, 20 Cs, 6 Ds, and 6 Fs. If you’re interested in how your state did compared to others you can check out all the details on the Open Legislative Data Report Card.
We know first-hand from our ongoing dialogue with state legislatures and open government technologists that identifying these commonplace problems can go a long way toward addressing them. In that spirit, and in the spirit of Sunshine Week, we offer this report card and recommendations today.
This afternoon, the Montgomery County Council voted unamimously in favor Bill 23-12, the Mongtomery County Open Data Act of 2012. Through this law, the county plans to release new data sets, develop a single web portal linked to the county’s homepage, and, more interestingly, charge the Chief Administrative Officer to create an implementation plan for defining agency participation and compliance.
The councilmembers behind this bill clearly did their homework, and we’re glad to see that so many best practices outlined in Sunlight's Open Data Policy Guidelines made their way into Montgomery’s approach, particularly when it comes to data licensing and re-use (2-154(f)) and how the open data bill will intersect with relevant existing laws, like Maryland’s Public Information Act (2-159). These measures are thoughtful and equally important additions to the bill's more classic provisions, such as the creation of a county open data portal (2-154), the choice of open technical standards (2-157), and the requirement for new datasets to be released (2-154).
Originally, the bill included more best practices, such as specific guidance for how agencies should prioritize data (leaning towards disclosure in the public interest -- good choice) and required that the County not just release “some” data (as it stands now), but that each County agency release one dataset. These and other provisions were ultimately removed and amended by the County Executive to fall under an Open Data Implementation Plan that would be directed by the Chief Administrative Officer (though ultimately put to the Council for approval).
Effective open data policies are all about balance: Too much aspirational vision with too little practical guidance for funding and implementation and the wheels fall off. The same can be said for policies that swing in the other direction, opting to be overly specific about open formats and technology systems, without consideration for the leaps in technical improvements and analytic needs that will come when the technology of today is outdated (and the companies contracted have long closed their doors and shut down their servers).
These tensions are reflected in the Montgomery Implementation Plan, which is strung between the County’s ambition to be more open and provide greater public access to information, and the County’s need to deliver on its promises. On one hand, the Plan offers an opportunity for stability by empowering the Executive to ensure sufficient funding, staffing, and compliance to follow through on the bill. On the other, if not drafted in public or infused with measures for accountability, the Implementation Plan runs the risk of dampening the ambitious goals of the Open Data Bill to serve the needs of bureaucracy. (One of a few "slippery slope" concerns that also show up in review of the bill’s overly exclusive definition of data (2-153) and overbroad timeline (2-158).)
That’s all to say, while such a tack may not be appropriate in all open data policy contexts, its development in Montgomery County will be something to watch. A binding regulation like this, if properly open to public feedback, could be a long-run antidote to some of the problems developed in “open” legislation and executive orders modelled too closely on the Open Government Directive, a federal plan that inspired a wave of agency plans and very little open data (or accountability, for that matter). Then again, it could just repeat them.
Very few counties have entered into the fray of open data policy-making. We're encouraged to see our neighbors ("MoCo" is located just north of DC) take on this task and look forward to watching its development.
From Moses to James Madison to David Letterman, important ideas come in lists of ten, as do these principles for opening up government information. The list isn’t new: my colleague John Wonderlich wrote about “themes for legislative information publication” in February 2007, and eight open government data principles emerged from a conference organized by internet oracle Carl Malamud and technology publisher Tim O’Reilly in December 2007. However, we have refreshed the principles, expanded upon them, and added details.
The government is increasingly making data available online, partly in response to congressional and presidential leadership and partly from public pressure. The newly released or updated data varies markedly in quality and usefulness; agencies are searching for guidance on how to do better.
These principles are intended to provide a starting point. They are: completeness, primacy, timeliness, ease of physical and electronic access, machine readability, non-discrimination, use of commonly owned standards, licensing, permanence and usage costs. Each one exists along a continuum of openness, and the list writ large is intended as a guidebook, not a rulebook.
We welcome additional ideas and corrections. The document is available here.
 Technically speaking, what we call the "Bill of Rights" was intended to have 12 constitutional amendments, although only 10 were enacted in the 1790s; noted commentator Melvin Kaminsky reports the number of commandments varied over time; and few items from Letterman’s list are actually funny.
 Sunlight provided a grant to the conference.
 More background materials are available here.
Cabinet agencies (and others) released their Open Government Plans last week with much fanfare, mixed reviews, and many promises for the future. I want to focus on one initiative -- the Department of Labor's "Online Enforcement Database" -- to highlight the strengths and weakness of what we've seen, and suggest some guidelines for going forward.
Online Database Strengths and Weaknesses
With the explosion at a mine in West Virginia last week, many questions are being asked about federal safety inspections. My colleague Anu Narayanswamy wrote on Monday, before the Online Enforcement Database was released, that the way the federal government releases data on mine safety makes it impossible to see how safety violations at one mine stack up against others. You cannot tell if the 500 safety violations in 2009 at this particular mine, for example, are typical for this industry.
On Wednesday, the Labor Department released the Online Enforcement Database, which contains five major data sets, including one on mine safety. Anu's follow-up article on Friday explained that "with mine safety data, released for the the first time in bulk [on Wednesday], users can search for mine inspection data by state or even zip code." But she also reported the data sets are only in a partially downloadable format, and do not include "the kinds of violation and penalties levied on mines across the country." In other words, it's difficult to figure out what's going on.
It is the search results, and not the underlying database, that are downloadable in bulk. ("Bulk" access means that you can download all of the information at once, and not piecemeal.) The only way to get at the Enforcement Database's information is to use its search tool, which has very limited capabilities. Users may search by state, agency, zip code, and by industry code. (DOL deserves credit for including the industry codes in a link from the search page.) So, a user cannot narrow the search range to a county, or a congressional district, or by the owner of a facility. Compare this to the search tool used at transparencydata.com, a new initiative from Sunlight that allows users to search a database on campaign contributions, that allows searching, sorting, and downloading in a multiplicity of ways.
As mentioned before, the Online Enforcement Database itself is not available for download in bulk. There's no way to look at all of the information the Labor Department has painstakingly gathered. And despite the wealth of information, a clunky search tool adds to the frustration. Without access to the supporting data, researchers cannot answer many questions. In fairness, the Labor Department says that bulk access and improved search tools are "coming soon," but it would be very helpful to have a date to accompany this promise. Doing so would make the promise concrete and testable.
I do not mean to pick on the Department of Labor, which made an effort in its Open Government Plan [PDF] to identify datasets for online publication and to set deadlines. Indeed, they stated they plan to take all data they collect and make it publicly available online and in downloadable formats, with appropriate caveats. Many agencies fell far short of DOL's achievements. But DOL should go further.
Open Data Principles
Elsewhere I've pulled together resources (from Princeton and Sunlight Labs) on building good data sets, including drafting guidelines for government data catalogs. It's important focus, however, at the fundamental level of what it means when we talk about how government should publish data online, a.k.a. "open data principles." As an attorney, I'm hardly qualified to talk about this, so I am fortunate that much of the heavy lifting was done at a conference in 2007. Afterward, my colleagues Clay and John and I worked on revising the open data principles, nine in number, and fleshed out a rough evaluation of when they are satisfied.
When agencies think about how to make information available, they should look to these (draft) principles. They state, in short, that data should be: complete, primary, timely, accessible, machine processable, non discriminatory, non propriety, license free, and permanent. Resource to these principles by the agencies -- and a better effort to comply with the directive's requirement to identify all high-value data sets and set deadlines for online publication -- would have turned the thus-far mixed results of the Open Government Directive into an unqualified success. There is still time to make that promise into a reality.
Here are the 9 open data principles in a framework to evaluate the extent to which they are satisfied:
(If you're wondering what ^M means, it's old school geek for delete - the 8 principles of open data have now become 9.)
David Moore at Open Congress has an excellent post up explaining how the current life of a bill in Congress is riddled with disclosure holes. I can't do more than say, go read David's post. Here's some choice graphs:
The reason is that the “Baucus Bill” is only a “mark”, not yet an official Senate bill, which means (to summarize reductively) that the digital text that constitutes the .pdf does not make its way off internal government web servers to the official website of the Library of Congress, THOMAS — and in turn, does not make its way to government transparency web resources such as GovTrack and OpenCongress. Before that happens, this mark of the health care bill needs to be reconciled with other Senate committee versions of the same, which will then be put forward for consideration to the U.S. Senate as a whole. Health care reform is leading news coverage & blog analysis of American politics right now, this is a major document in the mix, and there’s not a widely-recognized, user-friendly resource for online examination by the public at large. You should have better access to this info! You should have — at your fingertips — immediate, unrestricted digital access to the full text of any piece of legislation the very moment it’s released publicly by Congress. ... The current Congressional process for publishing data is, to borrow a phrase from the Free Software Foundation, Defective By Design. As we see in many proprietary, top-down systems affecting the public interest, it’s insistently closed-off. Congress’ processes for distributing legislative info is fundamentally broken — it could and should relatively easily be fixed, starting now. Whether or not you support the Baucus markup or the House version of the health care reform bill, we hope you agree that the public has a right to read this important iteration & political volley in the process.
The Obama administration has promised that they will track the progress of project approved in the stimulus bill (H.R. 1) through a web site, Recovery.gov. Matt Cooper at TPMDC notes the obvious about the current make-up of the site:
In his remarks earlier this morning about his stimulus plan, Obama touted Recovery.gov as a website where Americans "will be able to see how and where we spend taxpayer dollars." Actually the site is empty pending the passage of the bill. Basically, it's a placeholder for after the bill is passed. Shouldn't there be something in there about the competing proposals? The options? Etc. It seems kind of lame for such a techno-savvy White House. Besides after the bill is passed how quickly are they really going to be able to update how Topeka spends it's sewer money?
On that note, I think it would be best if anyone who might have control over the stimulus tracking web site to take note of the awesome suggestions laid out by our own John Wonderlich in a CNET article:
We'd like the site to serve not just the amateur information consumer, but also the programmers that can skillfully remix the information. The citizen observer's role seems well-addressed by the legislation that mandated the site (with requirements for "printable reports," feedback, and to be "easy to understand"), while the needs of the programmer are largely unaddressed. The data should be available in formats that facilitate more advanced use by programmers and analysts alike. Certainly, the data should be made available following the 8 Principles of Open Data: (1) complete, (2) primary (as it is collected at the source), (3) timely, (4) accessible, (5) machine-processable, (6) nondiscriminatory, (7) nonproprietary, and (8) and license-free. XML and CSV are a minimum. Search is great, if you are looking to find information about any one thing. But original analysis and visualization require access to data in bulk. If the goal of putting the data online is to increase accountability and transparency, then it is necessary (to) provide bulk data access.
Similarly, Ellen Miller blogged about David Robinson's (not the 7-foot former Spurs center) even more ambitious suggestions for the release of large data sets of government information.
We know the administration, especially the tech team, is having a tough time getting used to the antiquated equipment in the White House, the Executive Office Building, and the Old Executive Office Building. I remember what it looked like in the '90s and I'm sure it has changed very little.
At the same time, there are a lot of impatient people out here wondering when the administration will start running the kind of wired White House they have always intended. In the case of Recovery.gov, there are no shortage of ideas for them to quickly tap.