Defining “High Value Data” Is Hard. So Let’s Not Do It.

by Tom Lee

technology

Mar 19, 2011 1:37 pm

Yesterday I had the pleasure of sitting on a Sunshine Week panel moderated by Patrice McDermott, along with CRP‘s Sheila Krumholz, Pro Publica‘s Jennifer LaFleur and Todd Park of HHS. We touched on a lot of different topics, including one that by now is probably familiar to anyone who’s followed the progress of the Open Government Directive: frustration with the vagueness of the term “high value datasets.” Various organizations–Sunlight included–have criticized the administration for releasing “high value” datasets that seem to actually be of questionable usefulness.

Jennifer coined a formulation of what she considers to be a high value dataset, and it attracted some support on the panel:

Information on anything that’s inspected, spent, enforced, or licensed. That’s what I want, and that’s what the public wants.

I don’t think this is a bad formulation. But while I’m not anxious to tie myself into knots of relativism, we should keep in mind the degree to which “high value” is in the eye of the beholder. It’s clear how Jennifer’s criteria map to the needs of journalists like those at Pro Publica. But if you consider the needs of someone working with weather data, or someone constructing a GIS application–two uses of government data that have spawned thriving industries, and generated a lot of wealth–it’s obvious that the definition isn’t complete. To use a more melodramatic example, if World War III broke out tomorrow, a KML inventory of fallout shelters could quickly go from being an anachronism to a vital asset.

The point isn’t that Jennifer’s definition is bad, but rather that any definition is going to be incomplete. The problem isn’t that agencies did a bad job of interpreting “high value” (though to be clear, some did do a bad job); rather, it’s that formulating their task in this way was bound to produce unsatisfactory results.

We’re going about this backward. Ideally, we’d be able to start by talking about what the available datasets are, not by trying to figure out what we hope they’ll turn out to be. Government should audit its data holdings, publish the list, then ask the public to identify what we want and need. This won’t be easy, but it’s far from impossible. And any other approach will inevitably leave the public wondering what we’re not being told.

Sunlight Foundation

Follow Us

Defining “High Value Data” Is Hard. So Let’s Not Do It.