Get your act together, Data.gov

by

On May 21st, we launched Apps for America 2: the Data.gov Challenge— the very same day that Federal CIO Vivek Kundra & Company launched data.gov. On May 26th, Kundra announced that there were hundreds of thousands of data sources just around the corner.

It is now November 13th, 2009. Right now the Raw Data Catalog in data.gov stands at an even 600 feeds. What’s worse, the data is chunked up into small little bits, making 600 not a particularly exciting number. For instance, nearly half the datasets (293/600) in the raw data catalog are toxics release inventory datasets, broken up into individual states and outlying territories further broken up into individual years, from 2005 through 2008. This isn’t living up to expectations, or even keeping in line with public statements. This needs to be fixed.

They’ve broken Geodata out into its own section— and it contains 110,076 datasets. The same problem exists in the Geodata catalog, though. For instance, here’s 387 data sets regarding the Shapefile of Adams County, broken up into years, then address ranges, blocks and county subdivisions.

The amount of public data that government has is unbelievable. Like cash in a stimulus package, public data can create not just jobs but entire industries. For example, just a week after Data.gov was launched, I wrote a post about what I’d change about it including a list of data that could be added to the catalog. I’m saddened that much of it has not been added. What’s weird is that Data.gov is a catalog, not a repository– so adding the data isn’t a huge technical burden, but an editorial one. Why isn’t this happening faster– why isn’t data.gov living up to its challenges?

I can think of a few reasons:

Politics

I’d imagine that the Data.gov team cannot just link to the data, and that they’re working with the different agencies in the executive branch to add data themselves, rather than running an internal editorial team. It may be that different agencies are just not embracing the program, and without significant pressure or incentive, they just won’t.

Budget

The Data.gov team may not have enough budget to maintain even consistent operations or attention to it. For all we know the only person in government that is paying consistent attention to Data.gov could be Kundra himself. Though it seems like George Thomas, the Cloud Computing Technical Architect for HHS and former office of the CIO’s Enterprise Chief Architect is paying attention to it lately.

A Pending Overhaul

Perhaps, like Recovery.gov, the first stab at Data.gov was a proof of concept– to put something out in order to justify a larger, bigger play. One can hope. But knowing that there’s a second iteration of a “Concept of Operations” (what Mr. Thomas was referring to in his tweet) is encouraging.

Whatever the case may be, work on Data.gov needs to continue, and it isn’t going far enough. While I applaud the administration for at least launching a data catalog, Data.gov needs to step up its game if it is to really be considered a success. It may be that the UK shows us how it’s done.