What can local data inventories tell us about the U.S. data index release?

by Emily Shaw

policy

Feb 11, 2015 11:19 am

While the U.S. has finally committed to releasing a full inventory of agency-held data sets, we have also tracked the progress of local governments in achieving similar goals — and several of them have been leading the way. In the past two years, we’ve followed the work of three different local governments that have successfully published a full list of their data holdings: New York City, Philadelphia and Montgomery County, Md.

In the process of watching these developments at the local level, we’ve learned that the first release of a government’s data inventory — even one that aims to be comprehensive — is rarely the end of the story. Instead, we’ve learned that data inventories are really iterative processes. People conducting data inventories must be committed to improvement in order to achieve their aim of comprehensiveness, and the tasks involved are complex enough that governments must be willing to take the work seriously and devote resources to the program over the course of several years. We’ve also discovered that public oversight of the inventory process plays several important roles. Public review can improve the work through observing gaps and omissions, and it also provides a point of external accountability to ensure that the process won’t lose momentum.

While constructing inventories isn’t simple, we have also seen governments at the local level use their inventories to solve one of the routine challenges of open data: figuring out what data is most important to release under conditions of limited resources. For the local governments that have released their data inventories, public involvement with the full list of data sets forms an important part of their data release prioritization process. A public prioritization process, founded on public access to full list of potentially available data, ensures that the cost of preparing and releasing government data will achieve maximum possible benefit for the investment.

We hope that the federal government’s experience with publicly releasing its full data inventories will help federal agencies achieve a similar degree of success in engaging the public with their open data initiatives. Here’s what we’ve seen over the last couple of years of watching inventories in local government.

New York City

In 2013, New York City was the first full government we found in the U.S. — at the state, local or national level — to successfully produce a data inventory. While we identified a concern with an official definition of data that limited the breadth of its inventory process, we nonetheless celebrated the significance of this step. In this original version of its inventory, the city committed to releasing all of the data sets on the list by 2018.

New York’s release of an initial data inventory paved the way for an inventory revision that was released in July 2014. In this second version of the document, New York revealed additional data sets that had been identified (without an explanation of how they had been found or why they had been initially overlooked), and it described the open data managers’ decision to remove 27 datasets from the list. While we appreciated the city’s willingness to openly point to the assets it was removing from being listed, we deplored its decision to remove them in the first place. It is better to know that data exist, even if they cannot be released openly to the public, than that they be hidden from a public accounting. For that reason, we advocate that data inventories serve as true inventories and contain a list of all data, whether public or private.

Montgomery County, Md.

In 2014, additional data inventory releases showed the broadening application of the principle. Just across the border from D.C., Maryland’s Montgomery County demonstrated national leadership at the county level when it published its inventory of openable data. We recently described its progress toward opening some of its more challenging data sets according to a clear data set release prioritization schedule — an achievement made possible by its comprehensive inventory. Montgomery County has been working to create a complete data inventory since mid-2013, committing enough time, resources and political will to the task to ensure that it was not just done, but done well. While its initial process to inventory data sets from its 28 departments, offices and agencies yielded over 300 datasets, the team overseeing the inventory process conducted an audit and identified an additional 200 data sets that should have been submitted. The inventory currently contains listings for over 550 data sets, including many that have not yet been released and for which data collection is not yet complete.

Philadelphia

Philadelphia has also shown itself to be a national leader in the field of government data inventories. We tracked its progress in developing a data release prioritization method that was based on a comprehensive ongoing census of city data. In its first exploration of city-held data, the Philly open data team was able to identify over 150 data sets that had already been published by the city. The team uncovered a further list of over 60 datasets that had not yet been published (and ongoing inventory process for 12 additional city departments will inevitably reveal more unpublished data.) By exploring the value, difficulty and cost of publication of each data set from the full list, the open data team was able to demonstrate that the great majority of low cost/low complexity data has already been published, while the majority of the data sets left to be published contained data that was assessed as being highly complex and costly to release. These insights can help open data advocates in other places understand what might be driving the order of data release in their governments.

Local inventories on the way

Several local governments have made very significant strides towards data inventory release and deserve to be studied for what they can tell us about good inventory practice. Chicago, for example, has taken a slightly different approach to its data inventory process: Rather than just creating a list, Chicago’s municipal data management team has focused primarily on making existing data as intelligible as possible to potential internal users. It has led the city’s cataloging work through the creation of an innovative city data dictionary, the Chicago Data Dictionary, which aims to provide detailed metadata on a group of data sets that will eventually encompass all of Chicago’s data.

Chicago’s approach is important for individuals who are interested in finding good models for government-wide management of metadata. The dictionary it created is an unusually detailed catalog of data set attributes, allowing users to search for common elements or field names across all included data sets. On the other hand, the Data Dictionary still does not provide a full, public list of all of the data sets it covers — an individual must know what data they want to find before they can use the interface — reducing the dictionary’s value as an actual inventory. Nonetheless, it provides an exciting model of how aspects of inventoried data sets might be most usefully surfaced.

San Francisco provides another interesting example of inventory process. In 2013, San Francisco added a requirement to its existing open data policy that the city provides “a summary description of all data sets under the control of each Department.” While the city aims to produce this comprehensive inventory by later this year, it is instructive to note how transparent the city has been in its description of the timeline and stages of assembling this inventory. Each city department has a designated data coordinator, and the city has provided a public resource to instruct the coordinators in how to inventory data within their department, including a model data set questionnaire to ensure that data coordinators are collecting a standardized set of metadata for each cataloged dataset. The city tracks its progress toward completion of the inventory task, as well as its other milestones, in a public timeline of activities. While the inventory is not yet complete, the city’s documented process provides a number of excellent ideas for governments that have yet to begin their own.

In a number of ways, local governmental data inventories show us how we can anticipate the federal data index to produce further improvements in governmental data management and release practices. The recent history of local governmental data inventories suggest that this release is not a final outcome, but rather an important milestone in an ongoing process. We are really looking forward to see what kind of further changes it might provoke!

Sunlight Foundation

Follow Us

What can local data inventories tell us about the U.S. data index release?

New York City

Montgomery County, Md.

Philadelphia

Local inventories on the way