NYC’s Plan to Release All-ish of Their Data

by

nycdata

On Monday, September 23rd, New York City released a plan to, as Government Technology put, open “all” its data. Pursuant to section two of Local Law 11 of 2012 (§23-506 of the Administrative Code of the City of New York) — also known as NYC’s Open Data Policy — the long awaited agency compliance plan does complete the monolithic task of  listing all NYC agency public datasets, with scheduled release times of no later than the end of 2018, but there are ways it could have been more inclusive and comprehensive.

Here’s the plan:

(The dataset list is also available in CSV, JSON, PDF, RDF, RSS, XLS, XLSX, XML formats on NYC’s newly revamped open data portal).

Why the Plan Is Great

Creating a public inventory of government owned information is fundamental in data management, empowers public and media oversight, and assists in thoughtful, organized, prioritization schedules — and it’s something we haven’t seen done before at the local U.S. level. The Sunlight Foundation has consistently called for open data policy law and implementation to require a complete inventory of information holdings, and even with the passing the of U.S. federal Open Data executive order on May 9th, and its accompanying guidance which included a provision that requires U.S. agencies and interagency groups create and maintain an enterprise data inventory, these calls have thus far remained unanswered. NYC’s release of their Open Data Plan and agency public data holdings is to date, the most comprehensive list of government held information compelled by an open data policy. Moreover, New York City is the first U.S. government that has completed a comprehensive dataset inventory, beating counties, states, and the federal government to this open data milestone. To get a feel for the landscape of public information audits and indexes, check out our round-up of international and federal agency examples (and plans for other local inventories) here. NYC’s Open Data Plan, which identifies and schedules for release over 400 datasets across 44 agencies, also includes new datasets that bolster government accountability and transparency, such as Department of Corrections and Probation data, ACRIS data, Conflict of Interest data, OMB datasets, and daily procurement bids and award data. This level of data release from a local government (in 2014!) is groundbreaking and should be celebrated.

Why This Plan Could Be Better

While NYC’s open data plan is an important milestone, it bears noting that while the plan is massive (434 datasets to be exact) and includes much of NYC’s data, it does not all of the data relevant to NYC.  The data inventory covers structured data controlled by New York City agencies, but a lot of information relevant to municipal processes fall outside these bounds. Ninety-three of the 434 datasets listed are scheduled for release currently by 12/31/2018, the last day mandated by Local Law 11. This makes us wonder: Is this date a failure of ambition or one made in vain? Without transparency into how the scheduling criteria was assessed it is impossible for citizens, journalists, and other third party watchdogs to assess how ambitious the scheduled dates for release are or are not.

A closer look at  the open data plan, read in conjunction with Local Law 11 of 2012, shows that the plan only applies to certain kinds of data, doesn’t identify what data exists that is not fit for release, misses the opportunity for involving public input in the prioritization process, and obscures just how lengthy of a process NYC dataset release will be.

Let’s break these down these concepts further and explore why the tailoring of NYC’s current open data plan could be more comprehensive and is, in its current form, a far cry from “all of NYC’s data.”

The Plan Is Only as Good as the Law Behind It

NYC’s open data policy strictly and carefully defines data to what might be interpreted to a layman as numbers in columns §23-501 b. More specifically, the legal definition of NYC’s data excludes narrative forms and image files (such as designs, drawings, maps, photos, or scanned copies of original documents from its purview). This narrow definition of data provides NYC built-in protection against certain copyright issues and discourages the use of narrative records saved in .PDF files (often the enemy of open data advocates), but it also severely limits what government information can be released to the public. This definition explains why agency reports and narrative procedures, such as crime reports, the zoning process, city council meeting minutes, etc., are not included in this dataset release schedule and are not, as of this writing, hosted on the NYC open data portal. Sunlight advocates for complete datasets, with context, to provide the most legible government and provide for maximum accountability, and a lot of the time, this information is housed by the government in forms not considered for release, or inventory, under NYC’s current open data policy.

Further, NYC’s open data policy only applies to data controlled by city agencies, where “agency” (§23-501 a.) is defined as, “an office, administration, department, division, bureau, board, commission, advisory committee or other governmental entity performing a governmental function of the city of New York.” In practice, this means that NYC Board of Election data, New York Housing Authority data, Metropolitan Transportation Agency data, just to name a few datasets, are not considered under the policy or included in the dataset release schedule. While limiting the scope of open data policies to city controlled agencies is legally and practically easier to orchestrate, it also limits what city-relevant information is getting released to the public, leaving out information key to decision-making legibility, creating blind spots in accountability, and (for those not familiar with the intricacies of their municipal administration) advancing misleading expectations about what releasing “all” the data means.

 

NYC’s agency compliance plan provision empowers agencies to prioritize their dataset release scheduling by considering the following 5 factors about each dataset:

Can the data…

  • (1) be used to increase agency accountability and responsiveness;

  • (2) improve public knowledge of the agency and its operations;

  • (3) further the mission of the agency;

  • (4) create economic opportunity; or

  • (5) respond to a need or demand identified by public consultation.

At a high level, allowing agencies to prioritize dataset release without public input (even if public knowledge is a consideration) misses an opportunity to let the public’s demand for datasets affect the scheduled release. Philadelphia, Montgomery County, and even NYC in other contexts, such as dataset nomination, have examples where public input has played a role in scheduling the release timing of datasets. Since prioritization considerations are not included in the plan, it is unclear (especially in cases where the release date is the same across all of a particular agency’s datasets) whether the date was chosen arbitrarily or how the 5 factors played into the decision making.

Lastly, this initial plan fails to articulate what won’t be opened (and why), what can’t be practically be opened by 2018 (and why), and how the scheduled timelines are (or are not) ambitious. By failing to outline where the edges of the data qualified for release lay and where the boundaries of NYC agency’s capacity for release are, the plan silently implies that all data that could be public is included and is manageable for release by the 2018 deadline. Such implications raise challenges for public oversight (and agency implementation).

***

New York City has accomplished with this dataset inventory what no other U.S. government entity has yet completed, but it is important to understand the limitations of this inventory and how transparency could have been injected in the process to improve it. As as we look ahead the forthcoming US federal and United Kingdom inventory efforts (scheduled for November of this year respectively), as well as those in Montgomery County (scheduled for 2014), San Francisco, and Chicago, we hope to report that these entities have learned from New York’s experience and raised the bar even higher.

As for New York City, hopefully, the NYC yearly open data progress reports, mandated to be created after this initial plan and starting in July, will provide clarity to what is missing and why, and why release dates have been prioritized and scheduled as they have.

Is there data missing from NYC’s inventory list that you would have liked to see? Let us know in the comments.

Photo by nycdoitt via Flickr