Your Guidelines to Open Data Guidelines Pt. 2: Stages of Development


In revisiting Sunlight’s Open Data Policy Guidelines for our Version 2.0 release, we took a closer look at other sources for open data guidance that have been released over the years. To see a comprehensive round up of open data guidance (complete with a timeline!) see Part 1 of Your Guideline to Open Data Guidelines: The History.

Although it’s only been eight years since the first resource of this kind was created with the Open Knowledge Foundation’s Open Knowledge Definition, exploring open data guidance in its totality not only shows how much these recommendations build on each other, but how the movement has matured. Moreover, many of these resources occupy separate-–but overlapping-–arenas of expertise, though an outside perspective may not immediately catch their nuances. Below, we’ll explore in more detail the three major themes of open data guidance: How to Define Open Data, How to Implement Open Data, and How to Open an Open Data Discussion.

The sequence, prevalence, and layering of these themes showcase the developmental stages of the open data movement thus far. Over the years we have seen open data advocacy emerge from its nascent expert-driven defining period to becoming (quite self-referentially) a public discussion. We’ve seen different missions of the major players in the open data movement inform nuanced definitions and implementation recommendations, and we have seen an increase in best practice assessments, academic critique, and diverging schools of thought.

To understand this larger story, let us look at each piece.

1. How to Define Open Data

Although the precedent for open data policies is firmly premised on the Freedom of Information Act and other public records laws (at least for the US), when we set out to create a timeline of open data guidance, we chose to narrow in on the genesis of concept of “open data” specifically, which takes us to about 2005. At this time, many of the actors working on policies and technology challenges related to open data were focused on providing materials that cemented first principles and foundational characteristics for what open data should truly mean. The Open Knowledge Foundation’s Open Definition (2005) and the Open House Project (2007) both identified the foundational principles of accessibility, reusability, and the absence of technological restrictions as inherent to open information, and these core principles have been present in every definition and discussion of open data since. The OKF’s open definition accounted for information unencumbered by licensing and release that was non-discriminatory to any group or person, while Open House Project called for information that was accurate and timely. All of these values laid the groundwork for the often cited, “founding father” document, The 8 Principles of Open Data (2007). These 8 principles included the principles identified above as well as stating that the data must be a primary source and replacing reusability with “open formats” being further described machine-readable and non-proprietary. These 8 principles, highly referenced, would serve as the basis for open data guidance to come.

Subsequent efforts to define open data have since continued to focus on the reusability of data, both technically and legally. The technical reusability (or open format) definitions have expanded as technology and open data advocates’ understanding of technology has expanded. Open format (green, below) would in addition to meaning “reusable, machine-readable, and non-proprietary”, come to also mean “not including executable content, with preserved machine-readability, utilizing Application Programming Interfaces (APIs), and signed for integrity” (Association for Computing Machinery’s Recommendations on Open Government, 2009), “using open standards” (Sunlight’s 10 principles to Open Data, 2010); “being: structured, including URIs, and be linked to related data” (5 Star Data, 2010), and “including Global IDs”, according to the vertical axis of Josh Tauberer’s Open Government Maturity Model (2013).

The spreadsheet below explores how these principles overlap:

While redefining technical reusability has expanded by and with technology, the next frontier is very likely in pushing the legal reusability limits of public information. As policies (and the guidance they create) continue to push for open-by-default settings and data inventories (two areas Sunlight has advocated for for years), closer attention will be paid to legal limitations and ramifications, including but not limited to re-defining moments for open licensing, terms of use, proper duties of care in protecting against the mosaic effect and disseminating private information, as well as consequences for not opening data under open data policy laws (some day!).

2. How to Implement Open Data

As guides have continued to be written and published, the focus of both government and civil society actors has pivoted to include more information on how to implement open data policies, a move in parallel with the rise of budding analysis on policy and implementation efforts by academic and research groups. Implementation guidance has taken the form of guidebooks, example language roundups, analytic research and progress reports.

Guidebooks and toolkits to assist in open data implication have been created by open government institutions, such as: the Power of Info Task Force (2009), the Open Knowledge Foundation’s Open Data Handbook (2012), and Socrata’s Open Data Field Guide (2013) and the World Bank Open Data Readiness Assessment Tool (2013). Many governments also produce guidebooks in conjunction with policy release including: the federal 2009 Open Gov Directive, the 2013 Federal Open Data Policy Memo and associated Project Open Data; the state level Open New York Provisional Open Data Handbook (2013); and at the local level: the NYC Open Data Wiki (2012) and Philadelphia’s Open Data Guidebook (2013).

Sample policy language roundups have been gathered by the Civic Commons Wiki (2010), remixed and condensed in the Open Government Initiative Model Government Directive (2011), and included in both iterations of the Sunlight Foundation’s Open Data Guidelines (2012, and 2013, respectively). Most recently, there has been an increase in evaluative papers and progress assessments, including NYC’s Digital Roadmap (2012),’s City of Philadelphia Open Data Executive Order: report card 1 yr later (2013), and San Diego Regional Data Library’s Municipal Open Data Policies Report (2013).

3. How to Open an Open Data Discussion

Who writes these guidelines? The answers is varied, including governments (as noted above), civil society organizations like Sunlight and the Open Knowledge Foundation, and even private companies, like Socrata. Many wind up in the guidance drafting business because of their mission, their clients, or the need to execute policy; all in some way filling an open data void they perceived as not quite being addressed.

As a result, some of these guidebooks have been a rallying point for open data advocates to follow their own recommendations of public participatory engagement and draft policy recommendations together. Guidelines that appear earlier on in the timeline, such as Open House Project and 8 Principles of Open Data and the Open Government Initiative Model Government Directive were created by collaboration from a small group of open government advocates. Later projects have be more inclusive in their public outreach efforts. Prominent examples include the Open New York Provisional Open Data Handbook, New York state’s provisional open data guidebook and Project Open Data, the federal government’s guidance for implementing the federal policy, both hosted on GitHub and editable by all. As well as the creation of the Open Data Stack Exchange, a question-and-answer forum dedicated to questions about data and ways to open it up.

Looking at the (still new) open data movement from 1000 feet in the air shows us a story of growth from defining the space to implementing the idea to (with more public participation) practicing the defined values throughout all the steps of the process. The future definitions, implementation guidance, and process will build off of these guidelines as the quest for the most effective and open open data continues to be hashed out.