Late afternoon yesterday, Data.gov went from 81 feeds to 261, and the EPA overtook the USGS for the agency providing the most data. The EPA added 180 new data files– the Toxics Release Inventory data for each state and territory as well as for federal agencies for 2005, 2006 and 2007.
This data is interesting stuff– dozens of CSV files (still in .exe compressed archives, ick) that speak to where corporations and government are managing toxic chemicals. There’s lots of interesting data in there. But it isn’t just a clear win– this data is poorly documented byte delimited text files. While we do have some headers provided to get us started, but no real description of the actual files.
If you do end up working with this data for your [Apps for America 2: The Data.gov Challenge] entry, make some notes on how you parsed the data and let’s create our own documentation for this data source.
Here’s a breakdown of the data in Data.gov as of today: