We’re still surveying those high value data sets released as part of the open government directive–there are hundreds of files to sift through, which is obviously a good thing. But while we don’t have a final analysis done, a few trends are becoming apparent.
The high value data sets consist, overwhelmingly, of information that’s already been released elsewhere. In many cases, at least in the raw data catalog, the information is provided as, well, raw data. For example, the Dept. of Transportation released its Uniform Tire Quality Grading System, which provides various ratings on the durability, traction and other qualities of tires. That information had long been available to consumers through structured Web pages; thanks to the open government directive, it’s now available in a new format: XML. With the raw data, and a good database program that can translate XML, a sophisticated user could load the data and start fooling around with it. (Of course, if one is a consumer looking to buy tires, the old original DoT site is probably a better place to go.)
It’s not so much that we’re seeing a lot of new data, but rather seeing it in new (raw) formats. That in itself can be tremendously useful–Web interfaces can never be designed to answer all the potential queries a user might have. Access to raw data solves this problem; it’s one of the reasons that reporters frequently FOIA raw data sets from government. Give us everything, without mediation; we’ll make sense of the data ourselves.
Alas, not everything in the release gives one this option. In the tool catalog, there is an entry for the Excluded Party List System database maintained by the Government Services Agency (Anu Narayanswamy recently critiqued it here). The entry in Data.gov does not take one to the raw data, which consists of the names of individuals and entities banned from doing business with the federal government for various past bad acts, but rather directly to the clunky old Web site.
As a first step toward making agency data available in more accessible formats for sophisticated users, the open government directive is so far somewhat successful–plenty of data sets that had been available only as PDFs, or had to be pulled down by scraping Web sites, are now there for the taking (we’ll have better counts of this later in the week). But new data sets are not predominant: the major agencies covered by the directive released 58 data sets, of which, by our count, 16 were previously unavailable in some format online.
We’re still looking at the data, and will have more to report as the week goes on.