The chicken or the egg? Deciding which data to release first

by Emily Shaw

policy

Mar 24, 2014 10:52 am

Our hope for governments adopting open data policies is that they will open all government information holdings proactively and by default. However, at this point we have been discussing open data policy adoption for long enough to know that this presents a real challenge for even the best-intentioned governments. As we all continue to transition from 20th century to 21st century government, most governments seeking to guarantee their citizens access to public data online find they must ask themselves: What should we release first? The timeline of data release can sometimes grow long, so it is worth thinking about this issue from the beginning of designing a data release project.

While this might seem like a minor concern, it is not. What data is made available — and its format, quality and “findability” — ultimately determines the effectiveness of an open data policy. An open data policy that does not quickly open data supporting the uses envisioned at the inception of the data-opening process is bound to be a disappointment. This could result not only in a loss of momentum for the open data effort, but may also work against future appropriations where governments use performance-type budgeting approaches.

The first question a government should answer in determining which data to prioritize is “what data do we have?” Since it’s impossible to weigh the full range of options without really knowing what data is available for release, a comprehensive inventory of existing data sets lays the groundwork for a thoughtful policy development process. We strongly suggest making this inventory public even if some of the data sets themselves would be impossible to release in their current state because of privacy or security concerns. Revealing a full list of what data currently exists makes it possible for more people to see their own interests in making that data more available, building both an internal and external stakeholdership for open data.

Because of the importance of real data utilization, it is also critical for communities working on an open data policy to consider their goals for open data at an early point in the process. A project of explicitly identifying shared goals and values is not only a valuable way to gather stakeholders’ support for an open data policy, but also helps to clarify what the next steps should be. Is your open data policy intended to improve citizen access to services? To decision-makers? Is your open data policy intended to spur economic development? If you are a specialized agency, your goals should be informed, at least in part, by your mission. Whatever your goals may be, reflect on how prioritizing the release of specific data sets can help accomplish those goals.

Here are some of the kinds of goals that an open data policy might aim to accomplish, and the kind of data that might support the achievement of those goals.

Transparency: For a government to achieve transparency, citizens should be able to see how government actors make decisions, see what influences those decisions and see how those decisions get implemented. The goal of transparency in governmental decision-making is particularly served by providing data that is invoked in the process of creating public laws or rules, such as data referenced in legislative hearings, program evaluations and reports, or in public administrative memos.
Accountability: Traditionally, political scandals have provided an important impetus for public disclosure of government-held information. For that reason, data which provides oversight of high-frequency areas for governmental ethics concerns — data related to assets, campaign finance, lobbying, procurement and audited financial information, for example — serves the specific goal of achieving accountable government.
Accessibility: Open data policies can aim to achieve greater government openness in the sense that they improve people’s access to the information they already know they want. To achieve better accessibility, data release prioritization should include a review of the existing volume of requests for government data. Varieties of existing data requests that governments should review include requests from reporters, FOI requests and constituent communications. To learn more about what the public wants, a good open data policy formulation process should also include a new solicitation of public input, through mechanisms like public hearings, town meetings or online draft policies which are open for public comment. (Meanwhile, though direct public participation is important, public hearings should not serve as the sole method of data set prioritization since this mode of participation can inadvertently reinforce the specific preferences of people who are already comfortable engaging with government actors.)
Policy effectiveness: An open data program can help internal and external stakeholders evaluate the quality of programmatic interventions, but only when the data relevant to evaluating those programs is made available in the proper scale, format and time period.
Political “mandates”: When a government official is elected after promising to achieve a particular policy initiative, it makes sense to prioritize information connected with the tracking and evaluation of that initiative. Data related to specific legislative or executive policy initiatives or data which is created incidental to a new policy or regulation gives the public feedback regarding the performance of an electoral mandate.
Program efficiency: Open data can be used as a way to improve the quality of program-related interactions within government by creating a platform for the sharing of data that different governmental agencies or departments might otherwise keep separate. To evaluate what kinds of data might help government achieve the purpose of improved service delivery, examine existing frequencies of inter-agency or inter-departmental requests for specific datasets. To improve the quality of program evaluation, consider what external actors might be able to provide — and perhaps integrate within government open data holdings — in terms of collected data that could shed light on existing concerns.
Cost: Finally, given practical concerns, the cost of releasing individual data sets is likely to be used as an aspect of determining priority for release. Some data sets — including large data sets and those containing potentially sensitive information — will require more pre-release work than others and will thus be more costly than others to release. Nonetheless, while cost may be a factor in determining the priority of data release, it should not be the main consideration. A careful evaluation of the goals of an open data policy will reveal that other considerations must be balanced against the allure of reducing short-term costs in order to produce a truly useful collection of public data.

For a look at some of our other updates to the Open Data Policy Guidelines, read our additional blog posts about Guidelines Version 3.0.

Sunlight Foundation

Follow Us

The chicken or the egg? Deciding which data to release first