Adding transparency to the processes behind open data


More local governments across the country are setting the default to open with open data policies, proactively sharing public information online in easily accessible and reusable formats.


Though the goal is to proactively publish all public information online, data release realistically can’t happen all at once. Outdated technology systems can make it difficult to extract information and share it online. Data cleaning might be needed for datasets with sensitive details that cannot be released publicly. Data quality might need improvement to make datasets more useful to the public, or better metadata might need to be added. Some datasets are of higher value than others to the public, while others might be of higher value to government.

Ultimately, some datasets will inevitably be published before others. The question is how to make the data release process a thoughtful one that doesn’t just implicitly prioritize what’s easy to share.

Several governments have committed through their open data policies to crafting a prioritization process for releasing data, but there has been little transparency about how that prioritization works in practice.

Until now.

Montgomery County, Md., recently released an Open Data Operations Manual that shows, in detail, how the government approached prioritization. Montgomery County’s open data policy, approved in December 2012, mandated a prioritization plan as part of the ongoing data-release efforts. The county moved forward with releasing datasets on its open data portal in the meantime, but the new prioritization process will create a more meaningful plan for data release going forward and can serve as an example for other open data efforts.

How it works

Montgomery County made an effort to take a variety of factors into account for choosing which datasets are worthy of an effort to be released quickly (and which can probably wait). The county’s prioritization methodology assesses both internal and external demand for data release.

Internal demand factors assessed for determining priority of data release include furthering a department’s mission, providing a performance measure, saving staff time, and facilitating collaboration. The opportunity to move away from old technology and data already being readily accessible are also factors that are considered.

External demand factors assessed for determining priority of data release include impacting quality of life, boosting efficiency and effectiveness for businesses or residents, whether Montgomery County is the only source of the data, contributing to civic engagement, creating economic opportunities, and making government more accountable or responsive by increasing public knowledge of operations.

To put the prioritization process into practice and start evaluating datasets based on all of these factors, Montgomery County’s open data team completed an inventory of its more than 550 datasets. The inventory is listed in the Operations Manual and available online as a dataset.

Every unpublished dataset in the inventory was scored from zero to ten points for each of the internal and external demand factors. Datasets with higher total scores received a higher priority designation. Datasets were grouped into five categories of priority for release based on their scores. (For more details, read the methodology in Section 3.3 and Appendix B of the Operations Manual.)

Why all of this matters

Completing an inventory and crafting a detailed prioritization process are not small feats — and Montgomery County managed to go beyond just finishing these tasks by sharing the processes and results openly. Montgomery County isn’t the first to conduct a dataset inventory and work on questions of prioritization for data release, but it does appear to be the most open about just how it made those processes work.

Several other places are working on or have completed data inventories. New York City, like Montgomery County, shares its data inventory as a dataset, but it still has improvements it could make for being more complete and transparent. Montgomery County detailed its inventory process in the Operations Manual, explaining how each department was consulted for compiling a list of datasets, how analysis helped fill in the lists of datasets, and how the public was engaged in the process.

Montgomery County provides the most detailed example of determining prioritization, too. New York City released the five factors it uses when determining how to prioritize data for release, but it hasn’t shared how those questions figure into a prioritization process. It’s not clear how each of the factors are answered or weighted and how that methodology determines release.

It’s encouraging that Montgomery County’s prioritization goes into much more detail. The prioritization methodology could help provide a framework for other places considering quantifying the prioritization process, and there are many variations that could be produced to fit different contexts. Questions for determining prioritization don’t necessarily need to all be weighted equally, for instance, and there don’t have to be discrete sets of questions for assessing public and government interests. Factors such as increasing transparency and accountability could certainly fall into the category of benefitting both government and the public.

A new wave of making open data processes more open

Montgomery County appears to be part of a new wave of local governments being more open about their approach to open data. The county’s open inventory and detailed descriptions of its inventory and prioritization processes set a high bar that other governments should work to meet. Philadelphia’s recent revamp of its own open data pipeline shows that those places with maturing open data efforts are working on strengthening the foundation for this work and increasing the transparency of all the processes involved.

We hope to see more governments taking advantage of these kinds of opportunities. Open data efforts cannot exist in a vacuum. Places working to share open data should be learning from and building on each other’s work, and being more open about the processes behind open data can help with that.