OGD: Commerce repackages old data and offers broken links

by

To comply with the Open Government Directive, the Commerce Department released four high value datasets that require considerable technical sophistication on the part of users–and patience. Some of the files are so large and cumbersome they’re very difficult to open and use;  others require a great deal of explanation–and you can currently only find those explanations by digging through the agency’s site. Still other entries feature broken links or only contain a fraction of the information described on Data.gov. The Commerce Department says they’re working on all of these problems, so hopefully we’ll see an improvement in the coming days.

Consider the broadband applications database. The Recovery Act provided $7.2 billion in grants and loans to extend broadband Internet access to underserved communities. The Commerce Department has released a spreadsheet of the applications they have received for those funds, including the names of the organizations, contact information and amounts requested. This is potentially useful information; one can easily see, for example, that a few states submitted a large portion of the applications–the top six submitted as many applications as the bottom forty. 

However, the spreadsheet was already out of date when it was posted on Data.gov. The Commerce Department awarded four large grants before Jan. 21, 2010, the deadline for releasing high value datasets, but were not included in the spreadsheet. Furthermore, the data isn’t new–it was released on September 9–although it is much more user-friendly in its current form to reporters who want to analyze the bulk data (previously, it was posted as a PDF). 

What’s more, the data has been searchable and up-to-date on the National Telecommunication and Information Administration Web site (here) since last fall.While its helpful to researchers to have it in Excel form, it’s not exactly new.

Another potentially useful database put out by Commerce is the National Technological Information Service database. The dataset lists the titles, categories, and sponsoring agencies of government-funded R&D studies, and includes links that allow you to buy a copy of each report.  Unfortunately, the description on data.gov is currently inaccurate; it says that the data is available electronically from 1964.  The attached XML file only goes back to 2005, and for anything earlier than that, parts of it are only available through third-party paid services. 

It would also be nice to have a clearer description of what some of the codes mean in the document.  What’s a category code 97P 97R 57K 57B?  And the file is so large it required a programmer here at Sunlight to convert it into a form that a commonly-used program like Excel could handle–after crashing browsers on several computers all morning.

The third dataset available is a list of fees paid by holders of U.S. patents. Previously, you had to get a data expert to “scrape” the U.S. Patent Office site if you wanted to get your hands on this data–and it was popular enough that the Patent Office ended up putting out requests for users to stop doing this, because it was using up too much of their bandwidth. It’s promising that this is now available to download — but most of the links to the data on data.gov were broken for the better part of a week, so users had to dig around on the Patent Office site to find out what the columns of numbers and letters mean. And that was no easy task. (We made sure to let them know the links weren’t working.)

Finally, the Commerce Department is offering a map of precipitation data gathered by volunteers around the country, in a format ready to import into Google Maps and Google Earth. Maps of this data were available online previously, and a member of the data collection team tells me that they’ve offered a data export function all along — and that he believes some agencies had already imported them into Google Earth.

So it looks like all Commerce has done is eliminate a couple of steps and put it up on data.gov. Handy, but not exactly a brand new revelation.