New Data on Data.gov

by

Good news and Bad news from Data.gov

Looks like Data.gov has added a whole bunch of new feeds, they’re up from 47 to 87 in two weeks, not a bad start. Most of the new feeds come from the IRS, they look to be interesting data: 990 forms from 501(c)(3-9) organizations.

That’s the good news.

The bad news? It’s pretty bad so hold on to your britches. All the data from this data source is labeled as CSV files. But when you look closely, they’re not. They’re .exe files. See the Tax Year 2005 SOI Exempt Organization Study for instance. Pesky .exe files! This isn’t any good– data that comes from data.gov ought to be at least open standard compressed: .zip files, .gzip files or even .bzips are fine. The problem is, those of us without Windows, we here in the Labs operate on Macs and Ubuntu boxes) really can’t get at this data easily.

So, in short: Yay, new data! Booo .exe format!

Even worse, the data, once extracted, seems to not even be in CSV, but in .flat files, with custom documentation. But inside of them, there is some documentation on how to parse them, at least.

In addition, it looks like the Data.gov team took at least one of our suggestions, and put the data.gov catalog itself as a new source. Though the description says it is an XML file, that’s not true– it comes in one format: csv. This means, interestingly enough, for our contest that one could build a better data.gov from this data catalog itself as an entry, and maybe expand it to include data sources from states or other branches of Government.

P.S. For you Mac users, StuffIt Expander can open Winzip compressed .exe files.