Today, the White House is issuing a new Executive Order on Open Data -- one that is significantly different from the open data policies that have come before it -- reflecting Sunlight's persistent call for stronger public listings of agency data, and demonstrating a new path forward for governments committing to open data.
This Executive Order and the new policies that accompany it cover a lot of ground, building public reporting systems, adding new goals, creating new avenues for public participation, and laying out new principles for openness, much of which can be found in Sunlight's extensive Open Data Policy Guidelines, and the work of our friends and allies.
Most importantly, though, the new policies take on one of the most important, trickiest questions that these policies face -- how can we reset the default to openness when there is so much data? How can we take on managing and releasing all the government's data, or as much as possible, without negotiating over every dataset the government has?
How can the public (or policymakers) request what they don't know exists? How can CIOs manage what they haven't surveyed?
Most open data policies have avoided this complexity, building requirements around datasets that are already public, or setting vague goals about new disclosures. The Open Government Directive, Obama's signature open data initiative from his first term, took a similar approach, requiring agencies to release three high value datasets, and then requiring a data disclosure planning process that was hard to oversee and even harder to enforce.
We've often pointed out this deficit, and grown more pointed in our vision for reliable open data policies that move beyond aspirational statements and cherry-picking data policies over the last few years. The Open Government Partnership has spawned a huge number of similar commitments from governments around the world -- important, welcome enthusiasm for sure, but enthusiasm that needs an example of where to head next.
This new policies provide that direction, similarly to how we've envisioned it. To move beyond vague aspirations, the policies require agencies to index all of their data (internally), to make a public list of all their public data, and (this is the key point) requires all agencies to list all their data that can be made public.
Here's how we envisioned such a requirement about a year ago:
So we're not giving up on forcing agencies to make information policy decisions in public. One of the most important things that governments can do to be more transparent is to list, or index all of their information holdings online. CIOs should be more than just technology purchasers; the word information is in their title. Every agency should have a public list of its major information holdings, along with a description of whether it's public or not, and why. Without creating such a list, how do Chief Information Officers even do their jobs?
Now, the question "where is all of our information" can be a tricky one to answer, but agencies can rely on threshold definitions. For example, any database with a maintenance cost over a certain number should be listed. Any information specifically described in a statute governing the agency should be described. Any form, report, or data described in the regulations governing the agency should be described. Whether the information is usually (or never) accessible via FOI request should be noted, and whether bulk data is available through a central portal should be spelled out as well. (By far, the best example of such a review that we've seen is the DOT regulatory compliance plan, and the closest we've found for Congress is this.)
The new policies take a similar tack:
b. Create and maintain a public data listing- Any datasets in the agency's enterprise data inventory that can be made publicly available must be listed at www.[agency].gov/data in a human- and machine-readable format that enables automatic aggregation by Data.gov and other services (known as "harvestable files"), to the extent practicable. This should include datasets that can be made publicly available but have not yet been released. This public data listing should also include, to the extent permitted by law and existing terms and conditions, datasets that were produced through agency-funded grants, contracts, and cooperative agreements (excluding any data submitted primarily for the purpose of contract monitoring and administration), and, where feasible, be accompanied by standard citation information, preferably in the form of a persistent identifier. The public data listing will be built out over time, with the ultimate goal of including all agency datasets that can be made publicly available. See Project Open Data for best practices, tools, and schema to implement the public data listing and harvestable files.
By requiring agencies to publicly list all their data that could be made public, the President is not just reaffirming that decisions about disclosure should be based on the public interest, he's also giving the public (and Congress) tools to enforce them. When open data procedures are incorporated into agency processes from the start, we'll start to see more systems designed for bulk access from the start, and we'll be better able to recoup all the missed opportunities in legacy datasets that are still closed. We'll be able to evaluate agencies' transparency against what they've defined as their candidates for release, and clearly identify areas where agencies avoid disclosure altogether.
To be sure, getting agencies to publicly list all their data that can be open will be a significant challenge, even with a high-profile Executive Order. Concerns like cost, privacy, and security will be used to justify non-disclosure (as they often are), and will be used to try to justify keeping even a description of many datasets private. That's a good struggle to have, though, and one we're looking forward to. Without this Executive Order, too many agencies are managing data holdings that they haven't comprehensively reviewed, without public oversight, while advocates, journalists, and policymakers have an unclear view of what agencies know, and what they could be releasing.
Today's Executive Order demonstrates a new approach to open data, moving beyond rhetoric and aspiration, requiring agencies to publicly report on what data can be made public, building a new backbone for federal open data policy, and setting an example for other governments to follow.
We're thrilled that the President (and some very dedicated staff) have been listening, and are aggressively pursuing a strong vision for what open data should mean. Changing the default to open takes more than political commitments and enthusiastic rhetoric, and today's new policies mark a new aggressive move to pursue that idea.