The White House’s new Executive Order may be significantly different than the open data policies that have come before it on the federal level, but where does it stand in a global -- and local -- context?
Many folks have already jumped at the chance to compare this new US executive order and the new policies that accompany it to a similar public letter issued by UK Prime Minister David Cameron in 2010, but little attention has been paid to one of the new policy’s most substantial provisions: the creation of a public listing of agency data based on an internal audits of information holdings. As administrative as this provision might sound, the creation of this listing (and the accompanying scoping of what information isn’t yet public, but could be released) is part of the next evolution of open data policies (and something Sunlight has long called for as a best practice).
So does this policy put the U.S. on the leading edge?
Yes. Sort of. It’s hard to discern the entire landscape of policies dealing with data indexes: The new U.S. policy certainly appears to be the strongest index and audit requirement we’ve found, though policies from the city of San Francisco and government agencies in Wales and the United Kingdom (all explored in more detail below) are close contenders. The Welsh and UK agency examples are particularly strong as they are not only evidence of good policy but good implementation: You can find the inventories fully released for each of these government bodies here and here. Over the next few months we will start to see how well the approaches of San Francisco and the U.S. federal government perform at bringing similar practices to a much broader scale.
Even at the U.S. federal level, indexing policies are not entirely unprecedented: In 2010, the US Department of Transportation released (and to this day continues to maintains) an inventory of its high-value data for the public as part of its Open Government Plan. As we review later in this post, cities around the country have actively pursued these inventories, too, though most of these initiatives are too new for us to see the listings that will result.
Looking beyond Wales and the UK, on the international level, the Open Government Partnership has spawned a huge number of commitments on more reliable open data policies from governments around the world. However, language barriers (and a lack of common terminology) make it hard to puzzle out when such a specific requirement like data listings (or synonymous inventories, audits, registers, or holdings) are written into policy -- let alone the strength or robustness of these commitments.
What examples we can find are few and not created equally.
ON THE GLOBAL STAGE
The most similar indexing efforts we've found to what the U.S. is about to engage in come from Wales and the United Kingdom. The Welsh Land Registry publishes a complete list of all its electronic data, including both internal and external raw information that “has been obtained or recorded for the purpose of enabling [the Registry] to carry out [its] statutory functions” -- a close parallel to the work done by the U.S. Department of Transportation. Similarly, the Department for Communities and Local Government in the UK publishes a list of information assets held by the department as part of their commitment to “increasing transparency.” Whenever an entry on their index cannot be made available to the public in its entire, raw form, the department includes the rationale for not publishing it -- a best practice that we are eager to see the U.S. emulate.
There is a big difference, however, between agency-level implementation and national-level implementation. Canada is one of a few national-level examples we could find, but it’s hard to tell the scope and completeness of its indexing policy and related activity. The policy appears in the Canadian Data Inventory Project, a government-wide stock-taking of federal data holdings, which started around April of 2012. The project targets 18 policy departments and central agencies by a research group subordinated to deputy ministers; the research seems to have ended a few weeks ago.
Unfortunately, there’s no sign of the data from this project, and a few troubling answers from the project’s “Q & A” page suggest that crucial datasets on personnel or financial activities are not included in the inventory. The original goal of the Canadian information listing is to determine “the broad range of data holdings that could address the medium to longer-term priorities,” but only on “key policy issues” -- a somewhat meaningless phrase that seems difficult to enforce and which will likely create a degree of unreliability in the completeness of these registers.
Information listings are also among best practice principles in the UK Ministers of State for the Cabinet Office and Paymaster General Open Data White Paper released in June 2012, which notes that “public bodies should maintain and publish inventories of their data holdings" (Section 2.46, Principle 13). While this is an admirable and sound recommendation (one among many in the document), a white paper exploring best practices is often just that: an advisory document. Although we’re glad for the ambitious vision expressed by the paper, it doesn’t quite carry the same power as a mandate from an executive office (like the U.S. Executive Order or UK public letter) and we couldn’t find any official inventories published online following this paper’s release.
What about the European Union? The Public Sector Information (PSI) Directive, currently in revision, provides a common but relatively loose legislative framework for member states on the reuse of public sector information and recognizes the importance of such information listings. The directive states that:
Member States should (...) ensure that practical arrangements are in place that help reusers in their search for documents available for reuse. Assets lists, accessible preferably online, of main documents (documents that are extensively re-sed or that have the potential to be extensively re-used), and portal sites that are linked to decentralised assets lists are examples of such practical arrangements.
Even though all European Union member states reported that they had fully implemented the directive, we found no evidence of a comprehensive policy or national portal (not even among the mushrooming central data portals and data catalogs) that would provide a scope of all information holdings belonging to a government (including listing (and linking, when possible) to information that has been made public, as well as that which is not public). This is an important point to underscore as it begins to define what makes the U.S. (and UK Department of Communities) different from the policies that have come before in the international theatre.
WITHIN THE U.S.
A little closer to home, data indexing situation seems slightly more prolific -- but not by much. The subject of inventories and data listings has come up in several municipalities and states in the last few years, but most are still in the early stages of implementation and are not yet positioned to share results.
In an update made to its administrative code regarding open data in April 2013, San Francisco added a new indexing requirement for city departments. Under the supervision of the Chief Data Officer, designated “Data Coordinators” from each city department will create an “Open Data plan” which includes “a summary description of all data sets under the control of each Department (including data contained in already-operating information technology systems)” as well as “a timeline for the publication of the department’s open data and a summary of open data efforts planned and/or underway” by the department.
If this provision is interpreted ambitiously, San Francisco could end up with the closest policy to the new federal requirements that we’ve found, with an index that displays summary information of both public and not-public data and an eye toward information that could be released. The inventories of two other cities engaging in this space -- New York City and Philadelphia -- demonstrate different perspectives on which government datasets can be cataloged and shared, though they lack San Francisco's more comprehensive view.
As part of the plan for compliance with New York City’s open data law (Local Law 11 of 2012), New York City’s Department of Information Technology and Telecommunications (DoITT) established a series of city policies which include the creation of an Open Data Dashboard to track which “information has been published through the NYC OpenData portal or by direct public access.” The Dashboard (which is not yet live) will also list scheduled data releases, an inventory of data sets released, and data sets behind schedule by agency. The emphasis here is on making information that is already public more accessible to the public -- so datasets that are not yet public are not necessarily on the table for release.
By contrast, with no policy or mandate (or, rather, with a touch of liberal policy interpretation) Philadelphia has taken a future-facing view. Philly’s Innovation Management Team at the Office of Information Technology has started a public-facing Trello task management account to scope out datasets for future release. Not every dataset listed will become available (as Chief Data Officer Mark Headd describes in more detail here), but it is a view of internal prioritization and creative auditing that certainly stands out.
Other examples of data inventories that we’ve found on the sub-federal level come from the Midwest. The City of Chicago announced in February of 2012 that it would create an inventory of its and its sister agencies’ data as part of a citywide data collection project, noting that it is "impossible to take advantage" of Chicago's vast amounts of municipal data "without understanding exactly what data the City owns." Just a few miles West of Chicago, at least two Minnesotan state agencies have experimented with smaller-scale inventories: the Department of Employment and Economic Development, which maintains a PDF-format inventory of private and non-public data categories and classifications, and the Bureau of Criminal Apprehension, which keeps a list (also in .PDF) of "all data collected, stored or maintained" by the division, with citations about whether information is classified as public or not and links to the state statutes that set those classifications.
By and large, it remains difficult to get a complete picture of the popularity and strength of data audits not solely because of a lack of common terminology, but because data management is often done with no public involvement. It is very likely that data indexing is being explored or even common practice for a number of additional governments and government agencies but that this practice is not discoverable from the public’s perspective (or the POV of those outside the country, city, or state).
From what we can tell, the work done on the U.S. federal level represents one of the most comprehensive, if not an entirely new approach to data indexing. Where many governments and agencies have only sought to tackle information that is already public (even if it is not yet online), the federal government’s approach takes an important extra step by publicly identifying categories of information that are private or non-public and by further asking federal agencies to identify what could be released. Too often when we talk about open data policies we focus on changing the collection point for information that is already publicly available, ignoring that open data policies can -- and should -- support the public release of new information. Although it’s unclear whether the Welsh Land Registry supports this level of comprehensive disclosure and San Francisco’s policy is too new to really be tested, from our research, it appears that only the new United States policy and the UK Department for Communities and Local Government are clearly pursuing this vision with ambition.
Then again, we don’t know what we don’t know. So, we’re asking for your help. Given the challenges to finding these policies and listings online, we’d like to crowdsource support for creating a master list of lists, if you will. If you know of a data listing, registry, audit, index, or other synonym for a project that tallies the ata holdings of a government or agency and publishes it in a public way, please add it to our spreadsheet. We’re hoping to grow this list over time so that we can get a better picture of the information available to us -- and where to find it.
(Photo by Flickr user yuan 2003)