Dark Data: The importance of open inventories

by and
data chart

Over the past few weeks, we have had the chance to meet with a number of agencies (16 so far) to discuss their upcoming Open Government Plans. The meetings have been very productive overall. We were particularly impressed and excited by the willingness of representatives from the Department of Transportation, General Services Administration, and Health and Human Services to exchange ideas and engage in honest conversations about their plans.

Among other asks, we’ve used these opportunities to encourage agencies to release their full Enterprise Data Inventories, which they must create to comply with President Obama’s open data executive order.

Guidance, released as part of Project Open Data, only requires agencies to publish a list of the data that they already make public or could easily make public. It allows them to keep their enterprise data listings to themselves, effectively masking the existence of data sets that they do not want to make public — as well as their rationale for doing so. We see no compelling reason why agencies should keep these more comprehensive inventories private. In fact, releasing them will be good for the public interest, good for government, and good for democracy.

Without access to full lists of agency data sets, even the names of those that contain otherwise private information, the public will not be able to understand the inner workings of government and hold it accountable. We won’t know what is being withheld. We won’t know why. And we won’t know where to look.

The Enterprise Data Inventories and public data listings are notable in that they share metadata about agency data assets, but do not themselves reveal the content of data sets that aren’t already public. Sharing an agency’s Enterprise Data Inventory thus won’t expose any information that should reasonably be withheld from public view. It would, however, provide an easy way for interested parties to understand government disclosure decisions, while also showing the public what data these agencies actually have.

Agencies are required to justify their decisions to not publish data sets as part of Project Open Data. These and similar decisions are already made public through a variety of channels including Systems of Records Notices, Privacy Impact Assessment, and OMB information collection reviews. Enterprise data inventories merely aggregate this required and existing information into a central, data-focused, agency-specific location.

Making it easy for the public to understand the full scope of agency data holdings would positively affect agency operations. The Freedom of Information Act is a powerful tool for citizens trying to explore government operations. Its utility also presents a major challenge for federal agencies and employees tasked with responding to requests. Releasing Enterprise Data Inventories will let the public learn, not only about unreleased data sets, but also about the government justifications for keeping these sets private. This expansion of knowledge would reduce speculative FOIA requests by helping requesters target their FOIAs and better understand agency reasoning, ultimately cutting back on expensive and time-consuming adjudicative processes.

There can be no truly informed debate about what government data should be made public without a full picture of what data that government holds. President Obama’s open data executive order aims to enable and encourage that debate, but without making the enterprise data inventories public its goals will go unfulfilled.