Agency Plans and Data

by

As I wrote earlier today, Sunlight is going to be particularly focused on data transparency in the new Open Government Plans, which are expected to be released today.

We’re focused on this element of the plans — how agencies inventory and plan to release data — because it’s important, and because the government has repeatedly failed to do it effectively in the past.  Broad data access, and the agency plans intended to create it, lie at the heart of the empowerment and accountability the Directive is intended to create.

Asking “What is knowable about this institution and its work” is extremely powerful.  The government has never provided a sufficient answer to this question, despite past (and current) laws and initiatives that require it.  That’s why we plan to hold the agencies and the administration to the highest standard — to create broad, systemic awareness and access to government information that has up to now been elusive.

The clarity and detail agencies use to inventory and plan for future data release will signify, in part, whether specific agencies view the directive as an administrative exercise, or a transformative initiative. First, here’s what the directive requires:

A strategic action plan for transparency that (1) inventories agency high-value information currently available for download; (2) fosters the public’s use of this information to increase public knowledge and promote public scrutiny of agency services; and (3) identifies high value information not yet available and establishes a reasonable timeline for publication online in open formats with specific target dates. High-value information is information that can be used to increase agency accountability and responsiveness; improve public knowledge of the agency and its operations; further the core mission of the agency; create economic opportunity; or respond to need and demand as identified through public consultation.

First and foremost, we’re interpreting this provision to mean that agencies’ plans (released today) must contain an inventory of data currently available for download, and must contain specific plans for data to be released in the near future, with specific timeframes.  A plan to release a plan for a procedure for an inventory won’t cut it.

The directive doesn’t require a comprehensive list of all public information holdings.  The first requirement, “A strategic action plan for transparency that (1) inventories agency high-value information currently available for download;” has two significant qualifiers.  First, that the inventoried information be “high-value”, and second that it be “currently available for download.”  I expect this to be interpreted to mean “those datasets that are available now on Data.gov,” although a strict reading would imply other data available in bulk as well.

This first requirement echoes past initiatives, like the Federal Information Locator Service, which was replaced by the Government Information Locator Service (44 USC 3551).  This requirement has been almost completely ignored, although the GILS database still exists.  It also echoes similar requirements from the e-government Act of 2002, though they are more vague and ineffectual.

We proposed a definitive solution in the POIA bill, which would require agencies to publish machine-readable lists of all of their public information holdings, and is enforcable through a private right of action.  These and similar requirements, created to generate a comprehensive public accounting of public information holdings, will only succeed if a high standard is set and agencies are held to it.  True success in this enterprise would be wildly transformative, and empower governance, oversight, and reuse of government information.

The second requirement deals with future action.  Agencies must create a “A strategic action plan for transparency that … (3) identifies high value information not yet available and establishes a reasonable timeline for publication online in open formats with specific target dates.”

This requirement means that agencies must identify information for future publication.  Again, plans to plan are insufficient.

This requirement also has two major qualifiers.  First, list of data to be released in the future need not be comprehensive.  It applies specifically to high value information.  High-value information (defined in the beginning of this post) is a useful list of ways that data can be valuable, but does little to strengthen this requirement.  In fact, in weakens it, by adding a significant qualifier to the set of datasets agencies must plan to release.

Similarly, “not yet available” could be interpreted to mean “not available at all to the public,” or could mean “not yet available in a structured or bulk form.”  I expect agencies to interpret this to mean both definitions, although they should distinguish between data that is gettting a format upgrade, and data that will be released for the first time.

Despite those two qualifiers, this provision gives agencies their chance to shine.  Their role as important stewards of some aspect of the national interest involves our vital information, and the Open Government Directive is intended to create access to that data.  We’re hoping to be impressed by what agencies identify for publication.  The scope of what is knowable should grow.

In fulfilling this second provision, agencies should be as comprehensive as possible, be clear about what they’re releasing (is it new or not?), and it should reflect agencies’ priorities.  There are easy and hard datasets to release.

In creating the “high-value data” standard for what agencies should release, the administration attempted to set some priorities for what should be made public.  Ultimately, that “high-value” definition doesn’t serve that purpose well.  It’s a great list of ways that public data can be valuable, but doesn’t really help agencies to decide where to put their effort first.

Sunlight is particularly interested in what we’re thinking of as accountability data.  Accountability data is information about an agency and its functions, or about those entities that report to the agency, that allows those entities or the agency to be held accountable.  While that’s a somewhat tautological definition, the distinction is important.  Some information has enormous transformative potential, and empowers oversight and accountability.  This information also can be the most embarrassing, or the hardest to relinquish control over.  That’s where we want to see agencies’ attention — on fixing tough problems.

Up to now there has been a bit of a priorities vacuum, where publicizing information on the Internet has happened in an ad-hoc manner, in response to laws (albeit unreliably) or often because it serves an agency’s or department’s self-interest.  As we move beyond a request-based FOIA world of data access, and achieve more affirmative publication, we’re going to need new distinctions about what can be public, and new priorities for where to focus our energy.  Accountability data is our first priority, and should be agencies first priority as well.

That’s where we’re headed.  For now, we’re anxiously awaiting agencies’ plans.  We’ll be looking through their data plans in the coming hours, and evaluating them in the terms I’ve described above.  Whatever agencies deliver today, we’ll be working to create broad systemic access to their information, and to create the tools and practices necessary so that we can all benefit from it.