Boilerplate Open Data Policy and Why It’s a Problem

by

In preparation for the revamping of our Open Data Policy Guidelines, we reviewed all twenty-three  of the current local (city, county and state) open data policies on the books since their debut in 2006. These “open data policies” ranged in form from government administrative memos ordering the release of “high-value” datasets to legislation calling for open data policy planning to the newest member of the open data policy family, South Bend, Indiana’s executive order. Our main takeaway: There has been a lot of copying and pasting amongst policies, confusion on common open data terminology, and missed opportunities for information disclosure, but best practices are emerging.

Copying and pasting boilerplate legislative language is as old as law itself. In fact, legal precedent is built on throwbacks, edits, and remixes. The modern day copying and pasting feature has served as a technological blessing in legal matters that require a high level of repetition, such as producing demand letters for common legal claims, or, for one of Sunlight’s favorite exercises of individual rights, completing a public records or freedom of information request. However, when copying and pasting enters more nuanced areas of law, such as contract or legislation drafting, significant complications can arise. Without the proper edits or engaged collaborative thinking required in policy drafting, the ever tempting copy/paste model falls short. Below we explore just how borrowed open data legislative language thus far has been and examples of where it’s been the least helpful.

One of These Things is Exactly Like The Others

If you read the twenty-three US open data policies on the books in a row, you quickly get the impression that provisions you are reading are ones you have read before. Going through the early policies, it’s understandable that the “whereas” legislative intent language shared common themes of economic development and transparency, but should the language be exactly the same? Quite often, this was the case. Portland’s 2009 language appeared in five subsequent policies, word-for-word. This in and of it itself is not necessarily a bad thing, the Portland policy introductory language references a lot of great reasoning for creating an open data policy: civic engagement, transparency, new opportunities for information analysis, etc, but the word for word copying does speak to the lack of confidence, time, and envelope-pushing going into open data policy drafting.

Technology makes identifying the original source of language easier. For example, see Portland’s whereas language through the eyes of Superfastmatch, Sunlight’s language comparison software that powers Churnalism, below:

For additional examples of technology identifying problematic copy and pasting in law, check out Chase Davis’s bill analysis using Open States data, and our piece on Stand Your Ground legislation from last year.

Definitions: to Copy or Not to Copy

The prevalence of New York City’s “data” definition appears in whole or in part across five policies (Madison, Montgomery County, Chicago, NY state, and Hawaii) and its “legal policy” word-for-word across three subsequent policies (Chicago, Madison, and Montgomery County, respectively). While New York City’s definition of data is thoughtful (it is lengthy and narrowly tailored to discourage .PDF files and copyright issues), it excludes much contextual information by including a non-narrative qualifier — meaning that data such as meeting minutes, reports, and procedural text are not included. Cities that copy and paste this definition should think long and hard about whether this is the route they want to take (and their constituents would want to take) in crafting their open data law and it’s impact not just on their data releasing present and future.

We also saw moments where definitions were not copy-and-pasted where consensus in open data terminology might be helpful. Take, for example, the definition of what an “open format” is. We saw several iterations of what this meant to cities. Sometimes it was conflated with “machine-readability,” sometimes not. Often, the non-proprietary nature of open formats was not included in the definition. See below all the instances of where “open format” appeared in current open data policies.

“Open Format” Around the US

Austin, TX

in an open format that is platform independent, machine readable, and made available to the public without restrictions that impede the re-use of that information.

Chicago, IL

freely available online in a machine-readable, open format that can be easily retrieved, downloaded, indexed, sorted, searched, analyzed and reused utilizing readily available Web search applications and software;

Cook County, IL

available online in a machine-readable, open format, that can be retrieved, downloaded, indexed, sorted, searched, and reused by commonly used Web search applications and commonly used open format software that facilitate access to, and the reuse of, such information.

Madison, WI

shall be in a format that permits automated processing

New Hampshire (state)

“Open data format” means the organization of digital data within a computer file in a manner that makes it accessible for all to implement and use in perpetuity, with no royalty or fee. The published specification for the open data format is usually maintained by a standards organization.

Philadelphia, PA

The open format will provide data in a form that can be retrieved, downloaded, indexed, searched and reused by commonly used web search applications and software.

South Bend, IN

Open Format is any widely accepted, nonproprietary, searchable, platform-independent, machine-readable method for formatting data.

Copying and Pasting Results in Missed Opportunities

The most significant side effect copying and pasting has on policies is not that blind spots created within one policy are copied over and over again into others. We have continued to see policies that fail to include best practice provisions, such as setting the default to open, applying provisions to contractors or quasi-governmental agencies, mandating unique identifiers, calling for complete data (bulk and APIs for accessing information), mandating electronic filing, creating processes to ensure data quality, creating a public, comprehensive list of all information holdings, mandating future review for potential changes to the policy — and, surprisingly, mandating that public or open data be posted online.

Searching a sea of already passed legislation provides false confidence and doesn’t automatically assume that these policies have been properly vetted or fit your local government’s needs best.

Are there areas of how cities handle open data policy differently that interest you the most? Tell us in the comments. And stay tuned for more open data comparison posts that distill their differences, praise their best practices, and further explains what open data marks are being missed.

Photo by evanspicturios via Flickr