Shouldn’t text be open data too? The search for an inclusive data definition

by Stephen Larrick

policy

Aug 10, 2016 4:55 pm

data spelled out with scrabble letters — Open data policies should be remixed and reformulated to best suit individual cities’ needs. (Photo credit: Justin Grimes/Flickr)

As our friends at the OpenGov Foundation know all too well, open access to the law is not a given in American cities: Many, if not most, of the city ordinances, executive orders, resolutions and other policy documents that govern our municipalities are not being proactively released online as open, structured, machine-readable data for anyone to access, download and reuse. We believe that open data policy can help address this problem, but we’re not seeing many real-world examples of this yet. We previously elaborated on the issues addressed by our new Open Data Policies Decoded resource, namely that open data policy in American cities is certainly not setting a good example of open access to the law; open data policies themselves are not open, and what’s worse is that many of these policies are actually potentially creating barriers to municipalities sharing any policy as open data.

That’s because many of these policies either implicitly or explicitly exclude nonquantitative/nontabular data (such as textual data) from being included as part of open data programs in their legal definitions of data/open data. In reviewing the definitions of “data,” “public information/data” and “open data” laid out in open data policies, we found that nearly half of state, county, or municipal open data policies are restrictive or somewhat restrictive of “non-narrative” or non “quantitative” or “textual” data.

For instance, in the 2014 ordinance of Jackson, Mich., defines “open data” as follows:

The word ‘Open Data’ shall refer to structured data (i.e. tabular or relational, such as spreadsheets and databases, and as opposed to solely textual documents) that is collected, created, or stored by the City that is a matter of public record or otherwise accessible by a FOIA request. This ordinance does not require or restrict posting public records outside of the open data portal. (emphasis added)

Did you catch that? “As opposed to solely textual documents.” Some of the most important public information, including public policy, is contained in textual documents — and this information can and should be open too.

How a flawed idea can spread

And Jackson is not alone. Consider New York’s City’s 2012 Local Law 11, a highly visible open data policy laying out the country’s largest municipal open data program. Because of this viability, many of the policy ideas contained in Local Law 11 have been replicated elsewhere. Reuse of good policy ideas from New York City in other jurisdictions has largely been a good thing, helping best practices in open data to take hold throughout the country. However, one oft-replicated aspect of New York Local Law 11 is the law’s definition of “data,” which specifies that open data is “non-narrative.” By our count, this definition — which may be interpreted as discouraging the inclusion of important public information like policy text in open data programs — has spread to at least 14 other open data policies throughout the country.

Definitions of open data and related terms in cities across the U.S. To view this in a new window, click here.

When remix culture is at its best, reuse involves tweaking the original to create thoughtful variants and improvements. One such variant to NYC Local Law 11’s definition of data was an important addition made in Montgomery County, Md. (MoCo). MoCo’s 2012 open data policy adds that beyond “non-narrative” digital information, “data” can also include “digital information … in an unstructured factual or content form … or other narrative form” (emphasis added). But MoCo does not have the visibility of New York City, and even after this meaningful addition to an evolving definition of data — an addition that potentially helps pave the way for open access to policy — no jurisdiction has replicated it since.

It’s unclear if definitions of open data are the primary reason that policy text is, by and large, not accessible, or if instead these definitions are a symptom of the same root causes. It may be: that “data” is often thought of as numbers, not words; that dealing with textual information is perceived to be harder than dealing with tabular or numerical information; or that digitizing the law is often contracted out to third party vendors like LexisNexis that have an interest in keeping policy information proprietary, not in treating it as open data. It should also be noted that there are jurisdictions that, despite likely not having inclusive definitions of open data, are bucking the trend and incorporating textual policy information as part of open data initiatives. OpenGov Foundation’s America Decoded project documents some great examples of state and local governments doing just that, but these governments are the exception, not the rule. (And even when municipal codes are open, executive orders and other important administrative policy documents remain closed.) Whatever the reasons for that, official policy language that either explicitly or more subtly excludes textual information from open data initiatives certainly can’t help.

From copy-and-paste policy to policy remix

We’ve previously written about “copy-and-paste” open data policies and some of the pitfalls to that approach; even way back in 2013, we noted our concerns about the “non-narrative” qualifier in New York City’s definition of data, as well as the risk of that definition spreading to other jurisdictions. The copy-and-paste approach to open data policy can be described as un-thoughtfully grabbing what’s available. And when what’s available is limited — again, because open data policies aren’t open — this can mean replicating a flawed definition or some other aspect of an original that is less than perfect for the new jurisdiction.

Our goal for resources like our new Open Data Policies Decoded website is definitively not to foster copy-and-paste open data policy, but instead to empower open data policy remix — to help surface good policy ideas from jurisdictions beyond just the major players like New York City (places like Montgomery County) to provide the broader context of what has been done for thoughtful reuse. If we can reduce the barriers to the spread of public policy language that supports open data generally and open legislative data specifically, we’ll enable a kind of virtuous self-replication, laying the groundwork for increased usability of all kinds of policy — not just open data policy.

Sunlight Foundation

Follow Us

Shouldn’t text be open data too? The search for an inclusive data definition

How a flawed idea can spread

From copy-and-paste policy to policy remix