Guidelines for Open Data Policies
The Sunlight Foundation created this “living document” to present a broad vision of the kinds of challenges that open data policies can address. For more information about these Guidelines, please see this blog post.
Also available as in PDF and ODT.
Open Data Policies can…
- Mandate open formats for government data.The utility, quality, and permanence of information depends on the format in which it’s published. “Open” formats are considered best practice by technology and transparency communities because of their versatility: To quote Josh Tauberer, open formats “tend to promote a wide range of uses, backward and forward compatibility, and an independence from short-term commercial interests”. In other words, these formats are machine-readable (structured), serve searchable, sortable data, and tend to be non-proprietary and/or implemented in open source software. When combined with appropriate methods of distribution, these traits maximize the degree of access, use, and quality of published information. This degree of access and interaction allows citizens and government alike to get the most out of the data.
Specific open data formats include JSON, CSV, and XML (for databases), and HTML and plain text (which are only semi-structured, but can provide more flexibility for documents). The Open States Project has explored how these formats relate to legislative data in more detail here. More details about file formats and open data best practices can be found in the Open Knowledge Foundation’s Open Data Handbook, Josh Tauberer’s Open Data is Civic Capital, the 8 Open Government Data Principles, the 10 Open Government Data Principles, and The Power of Information report.
Open format provisions can be broad or specific in scope. More broadly defined provisions are generally hard to enforce, but can still be helpful as statements of general policy. Provisions that use more specific wording (e.g. those that define both specific datasets and the formats that they’ll be released in) are more likely to cause meaningful change but take more effort to craft.
It should be noted that in this context, “data” refers broadly to any information published in electronic formats. This definition refers to a variety of resources, including databases, analytics, documents, transcripts, and audio and video recordings. Although each of these examples represent different kinds of data, all can be published in an open format.
- Mandate the release of specific new government information.Open data laws provide an opportunity not just to update and improve access to information that is already open and/or public, but to specify new datasets and records to be published. Open data policies can create specific mandates about a variety of kinds of data: information ranging from transportation data to lobbying registration databases to the video and audio of public meetings are all fair game (see Provision 1). Careful consideration should be given to the language used to describe what data is affected. Phrases to such as “high-value” or “high priority”, when used without direction or indication of how to assign value/priority, can open up loopholes that prevent or slow the release of information desired by the public. It is important, therefore, that the scope of this provision be clearly defined: as with other provisions listed here, the scope of this provision can be broad or narrow, but in the final bill or order, the scope should be explicitly defined, the limitations noted, and the key agencies, committees or other relevant agents are identified. Similarly, policies should be specific about what “new” data can mean: Some policies require new data to be created, collected, and released for the first time, whereas other identify existing datasets and (newly) mandate their release.
Other provisions noted on this page address how to bring the public into the process of determining how datasets can be prioritized for release.
- Mandate electronic filing.Many existing disclosure requirements were created as inefficient, paper-based requirements and should be updated to require electronic filing, as long as the filers can be reasonably expected to have access to the necessary technology. Electronic filing requirements save money, make real-time disclosure possible, and allow structured data to be created, while paper filings make reuse and analysis more difficult.
Electronic filing is currently required in various places across the United States Federal Government, including the Federal Election Commission — where House candidates are required to file disclosures electronically, and Senate candidates are notably exempt. Similar requirements can be found throughout the states: For example, in 2012, Delaware passed a bill (SB 185) mandating that all lobbyist registration and disclosure be filled electronically by default.
- Require any public information to be posted on the Internet.The government makes tremendous amounts of information available to the public, but only a small subset is available on the Internet even as more people look online first to find these records. To close this gap, public information should be published online in a timely fashion subject only to common-sense exceptions (such as redacting personally identifiable information in certain contexts.) Implementation could be by legislative action or executive directive. The “Public Online Information Act” has been introduced on the federal level to require public information to be available online.
- Mandate continuous publication and updates to data.It is not enough to mandate the one-time release of information: data is often created on an ongoing basis and should be released the same way. A one-time release of data is in some sense incomplete as soon as additional information is generated. Therefore, in order to ensure that the information published is as accurate and useful as possible, specific requirements should be put in place to make sure that government is released as quickly as it is gathered and collected (in “real time”). This kind of rapid publishing becomes less of a burden when combined with others measures for online publishing, such as electronic filing (Provision 3), data portals (Provision 13), and APIs (Provision 8).
- Create permanent, lasting access to government data.Information released by the government should be sticky: Once released, it must remain “findable” at a stable location or through archives in perpetuity. Although portals and websites can be vehicles for accessing this data over the long term (see Provision 13), it is critical that the data’s permanent release & accessibility is defined so as to apply to the data itself, not just the means of access.
Provisions relating to permanence can also be expanded to relate to updates, changes, or other alterations to the data. For best use by the public, these changes should be documented to include appropriate version-tracking and archiving over time. These provisions should build on the strengths of existing records management laws and procedures.
- Publish bulk data.Bulk access is a simple, but effective means of publishing datasets in full, giving the public the ability to download all the information stored in a database at once. Bulk downloads are often the simplest, most direct way of maximizing reuse and analysis of a dataset. Although they aren’t absolutely necessary for the release of bulk data, data portals can be helpful indexes of specific sources for bulk data downloads.
Daniel Schuman and Eric Mill explore bulk data further as it relates to legislative data and the federal THOMAS system in this blog post.
- Create public APIs (Application Programming Interfaces) for accessing information.Government bodies can develop APIs, or Application Programming Interfaces, that allow third parties to automatically search, retrieve, or submit information directly from databases online (See Josh Tauberer’s Open Data is Civic Capital). Navigating requirements for bulk data and APIs should be done in consultation with people with technical expertise and also likely users of the information. For a lengthier discussion of the benefits of APIs, see the recently developed Federal Web Policy, and for a slightly more critical take on APIs (and their relationship to bulk data), see this post by Eric Mill of the Sunlight Foundation.
- Remove restrictions for accessing government information.Open data that is out of reach of the public is hardly open. To provide truly open access, you must provide both the right to reuse government information (explored in Provision 6) and remove arbitrary technical restrictions, such as registration requirements, access fees, and usage limitations, among others. Whether these technical restrictions have been specifically put in place (i.e. access fees) or are the accidental result of the choice of data format or software (i.e. usage limits), it is appropriate for an open data policy to address and remove these barriers to access. The aim should be to be to provide broad, non-discriminatory access so that any person can access the data at any time without having to identify him/herself or to provide any justification for doing so. More detailed exploration of these limitations can be found in Josh Tauberer’s book Open Government Data.
- Remove restrictions on reuse of information.Most government restrictions on the reuse of government information serve no purpose but to restrict the public value of important information. If information is to be truly public, there should be no license-related barrier to the public’s interaction with or use of that information.
Outside of data legally exempted from public use or access because of privacy or security restrictions (see Provisions 11 and 12), to be completely “open,” public government information should be released completely into the public domain and clearly labeled as such. At a minimum, licenses that grant the right to use, download, and reproduce government data can be applied. The fewer restrictions the better. Opening data into the public domain (or at least into free public use) removes arbitrary barriers to information access (more explored in Provision 9), helps disseminate knowledge, aids in data preservation, promotes civic engagement and entrepreneurial activity, and extends the longevity of the technological investments used to open information in the first place.
- Appropriately safeguard sensitive information.Open data policy should be complementary to pre-existing legislation and directives about access to public information (see Provision 30), which means taking into consideration pre-existing protections for sensitive information for privacy, security, or other reasons. While these protections should be upheld, careful thought should be given to the language used to describe what (if any) additional information will be exempt from the policy, as overbroad terms can create loopholes that undermine the soundness of provisions requiring openness.
- Require exemptions to open data policy to be balance-tested against the public interest.Exemptions to disclosure are a necessary component of many transparency requirements. Unfortunately, these exemptions are often crafted as blanket categories for entire types of information, without consideration for competing interests. Valid privacy and security concerns should be addressed through provisions that recognize the public interest in determining whether information will be disclosed or not. For example, rather than saying “information relating to X topic are exempt from disclosure”, provisions should require that “information relating to X topic are exempt from disclosure if the potential for harm outweighs the public interest their disclosure.” Public interest here does not mean public attention, but instead refers to interests like democratic accountability, justice, and effective oversight.
- Create a portal or websites devoted to specific issues related to data publication or specific policy arenas.Data portals and similar websites can facilitate the distribution of open data by providing an easy-to-access, searchable hub for multiple datasets. At their best, these portals or hubs promote interaction with and reuse of open data (see the note about bulk data in Provision 7) and provide documentation for the use of information (see Provision 8). Portals can be generalized (such as Data.gov, “open data portal”) or specific (e.g. a spending or ethics portal), and can vary in terms of their sophistication.
Portals and other related websites also provide governments with the opportunity to go into detail about issues and policies related to its commitment to openness and transparency. To facilitate their “findability” these websites should be allowed to be indexed and searched by third parties (such as search engines).
Some examples of websites created through different kinds of open data policies include:
- San Francisco’s Data Portal: https://data.sfgov.org/
- Austin’s Open Data and Open Government Hub: http://austintexas.gov/austingo2.0
- Missouri’s Accountability Portal: http://mapyourtaxes.mo.gov/MAP/Portal/Default.aspx
- NASA’s Open Government Initiative Plan: http://www.nasa.gov/open/
- USASpending.gov: http://usaspending.gov/
- Recovery.gov: http://recovery.gov
- Create or explore potential public/private partnerships.Partnerships can be useful in the effort to increase awareness of the availability of open data and in connecting government information to that held by non-profits, think tanks, academic institutions and others. Ed Mayo and Tom Steinberg have noted that such partnerships can aid civic participation, help identify the gaps in services delivery, among other benefits. Poorly planned public/private partnerships run the risk of subsidizing private sector actors at the expense of the public. (For example, see the Government Accountability Office’s digitization project described here.)
- Create contests or other events focused on the use of government data.Like partnerships (Provision 14), contests and events (both held in real space and online) are an effective mechanism for generating use of, interest in, and attention to the government’s open data resources and repositories. Further, hosting events and facilitating both creative and practical uses of government data can help spur civic innovation and build communities around information. Events can range from barcamps, hackathons, and apps contests to town halls, webinars, and public hearings, with both technical and non-technical communities in mind. Outside of structured, government-run events, participation in developer communities through listservs and meetups could also be explored. (For example, see the Health Data Initiative Forum, the Apps for Democracy contest, or the Illinois Reform Commission’s listening sessions.)
- Require digitization and distribution of archival materials.Open data policies can address not only information currently or soon to be available in an electronic format, but also undigitized archival material. See for example Vancouver’s Open Data motion, which critically notes not only the importance of thoughtful digitization of archival information but the imperative to release this data to the public, ideally eventually in the same formats and in the same locations as modern data.
- Create processes to ensure data quality.Data quality will not be ensured through data release alone: efforts need to be made to keep the data up to date, clean, accurate, and accessible. In the executive memorandum that established that Washington, DC would shared internal data on DC.gov, the city specified not only the need to maintain data quality but, broadly, the processes required to do so and responsibilities of the agencies involved. Other approaches to ensuring data quality include assigning specific staff responsible for maintenance (see the Open Government Directive on financial data (Section 2.a.)) and creating other audit processes.
In any case, data quality concerns should not be accepted as an excuse for exempting or restricting the release of information, but a challenge that becomes clearer and easier to address when data is released. Data with serious accuracy and quality concerns should be adequately documented to avoid creating confusion or misinformation.
Similarly, public data reporting streams that are separate from what is used within government should be avoided whenever possible, as redundant or parallel data streams can create opportunities for data quality to suffer.
- Create a public, comprehensive list of all information holdings.Government bodies often do not know what information they have. Open data policies should require a full public listing of government information. This comprehensive listing empowers policymakers and administrators to determine whether information is being appropriately managed and empowers the public oversight of those determinations. Publicly accounting for agency information helps ensure that information is managed to benefit the public interest, can create efficiencies among government departments, and empower journalists and policymakers. To provide up-to-date information, agencies can also be required to regularly audit their information holdings.
In an Open Data White Paper released in June 2012, the UK Ministers of State for the Cabinet Office and Paymaster General noted among a list of open data strategies and principles that “Public bodies should maintain and publish inventories of their data holdings”(Section 2.46, Principle 13). The Obama Executive Memo on Regulatory Compliance Data and, in particular, the Department of Transportation’s index of major datasets (“Regulatory Enforcement and Compliance Data”) are other examples of this provision in action. For more details, see also this blog post from John Wonderlich explaining the need for indexes.
- Mandate the use of unique identifiers.Unique identifiers within datasets empower analysis and reuse by allowing disparate datasets to be combined and to be more carefully mapped to real-world entities. Without unique identifiers, some analysis can become difficult or impossible, since similar names may or may not refer to the same entities. Importantly, identifiers should be non-proprietary and public. A typical example of where unique identifiers are often required is found in lobbying disclosure. For more information, see also this list of extensive resources about the need for unique identifiers for corporate entities.
- Require the publishing of metadata or other documentation.Metadata and other documentation about the data provided by the government can be useful to the public and government alike. Notations such as these add helpful context about the data’s creation that will aid in the public’s use of that information and support current and future archival and data quality efforts. The Open Data White Paper released by the UK’s Ministers of State for the Cabinet Office and Paymaster General in June 2012 notes that the UK data portal (www.data.gov.uk) already includes “basic metadata about all its datasets, including timing and geographical scope” as well as “a link to a departmentally supplied description of the data and details of a contact point within the department who data users can ask for further details” (Section 2.46, Principle 14).
- Require the publishing of code.Not only the data, but the code used to create government websites, portals, tools, and other online resources can provide internal and external benefits, often as valuable open data itself. Governments should employ open source solutions whenever possible to enable sharing and make the most out of the opportunities provided. The Consumer Finance Protection Bureau (CFPB) began publishing open code on the social code site GitHub in 2012, citing that doing so helped them fulfill the mission of their agency and facilitated their technical work. (More information is available in the announcement blogpost on the CFPB’s website.)
- Set appropriately ambitious timelines for implementation.Setting a clear deadline can demonstrate the strength of a commitment and can help to translate these commitments into results. They can also help to identify failures clearly, opening the door to public oversight. Relevant actors should be given enough time to prepare for the changes brought on by the new open data policy, but not so much time that the policy becomes inoperable. The timeline should be firm, provide motivation for action, and have actionable goals that can be used as a metric for compliance. These goals or checkpoints can include qualitative and quantitative measurements.
- Ensure sufficient funding for implementation.Like any other initiative, implementing an open data policy should be done with an eye on long-term sustainability. One way to do this is to consider funding sources for the implementation of the policy as well as its future maintenance. Sufficient funding can mean the difference between successful and unsuccessful policies.
For example, in 2011, the Electronic Government Fund, which supports Data.gov, the IT Spending Dashboard, and USASpending.gov, among other programs, was sliced from over $34 million to $12.4 million. Without the work of the advocacy community, funds would have dropped as low as $8 million. This dramatic change in funding has continued implications for federal data, some of which can be explored in this tag on the Sunlight Foundation blog. By contrast, in 2012, California approved a bill (SB 1001) to pay for maintenance, repair, and improvements to their Cal-Access public disclosure database by increasing the registration fees for those engaged in lobbying and with political action committees.
- Empower the creation of binding regulations to implement the new policy.While some questions may defy easy treatment in the process of creating an open data policy, specific officials can be appropriately empowered to create regulations or guidance to ensure a strong, reliable policy. For example, in the proposed Public Online Information Act, a central regulator is empowered to create and set data standards. Similarly, the Dodd-Frank Financial Reform bill empowers regulations to require public reporting of royalty payments made by the extractive industry (see Section 1504). A similar approach is taken in the DATA Act.
- Tie contract awards to transparency requirements for new systems.Existing procurement, contracting, or planning processes can be used to create new defaults and requirements for IT systems and databases — to bake open data requirements into new systems being planned. See for example the White House’s Digital Government strategy, which proposes the creation of similar new requirements.
- Stipulate that provisions apply to contractors or quasi-governmental agencies handling public data.The government often uses third party entities or contractors to handle, research, or generate government information, and the use of outside services should not necessitate sacrificing important public protections.
Similarly, these public protections should generally apply to quasi-governmental agencies and other similar actors, such as multi-state agencies, government-sponsored entities, publicly-funded universities, and self-regulatory organizations (like FINRA).
- Create new legal rights or other legal mechanisms to empower the public.An open data policy can create mechanisms that allow individual members of the public to play a dynamic role in policy oversight and compliance. For example, the right to sue serves as the ultimate enforcement mechanism of the Freedom of Information Act (Section 4.K, Page 33), some countries (like Canada) have FOI ombudsmen with special legal enforcement powers, and some countries also have special anti-corruption agencies.
- Appeal to values and goals, such as accountability, efficiency, employment and commerce, innovation, civic engagement, and public services provision.Publishing open data has many practical and normative implications which can be noted and explored in the text of the open data policy. These values and goals can be noted for the record as part the policy.
- Reference and build on existing public accountability policies, like Freedom of Information Laws, Open Meetings Acts, Open Records Acts, Ethics Protections, Campaign Finance, and Lobbying Disclosure Laws.Open data policies should be informed by provisions already on the books, building on precedent for opening information and taking advantage of pre-existing laws, executive orders, and other policies that defend and establish public access, define standards for information quality, disclosure, and publishing.
- Incorporate public perspectives into policy implementation.Implementing the details of an open data policy will benefit from public participation, especially since open data policies can have effects government-wide and also have consequences for a variety of different stakeholder groups. Formal mechanisms for collaboration can include public hearings, draft proposals, and online resources, like wikis and email lists. For example, in 2012, New York City created a wiki to encourage collaborative input on the open data policies, standards, and guidelines that would be enacted as part of its then-newly passed open data law.
- Require analytics about the use of open data to be published publicly.Statistics about the use of and interaction with government data can be mandated as part of an open data policy and can strengthen the goals of the policy. For example, the New York State Senate publishes their monthly website analytics as part of their open data portal (See, for example “NY Senate Web Presence May 2012”).
- Mandate future review for potential changes to this policy or law.Just as publishing open data is an ongoing process that requires attention to its quality and upkeep (Provision 27), so too does the policy that establishes it. In order to keep up with the times, current best practices, and feedback from existing policy oversight, open data policies should be written in a way that makes them open to future revision. Open data policies should acknowledge that the context in which they operate is rapidly changing over time, and will likely need sustained attention to remain relevant.