Sunlight Foundation launches own data catalogue
Publication: Federal News Radio
July 15, 2009
Sunlight Labs' Director Clay Johnson discusses the launch of a new project, the National Data Catalogue.
Clay Johnson is the director of Sunlight Labs. He talked more about this and his recent blog post concerning federal IT dashboards on Wednesday's Daily Debrief.
"What we're doing is taking the concept behind data.gov, which Vivek Kundra's team has done so wonderfully inside the government, and we're stealing it. We're going to do our own data catalogue that can extend, I think, beyond the reach of what the executive branch can do."
The project will gather all of the data involved with the federal government that is available to the public.
Sunlight Labs is also seeking out state and local information in addition to gathering information about governments, such as campaign contributions.
The impetus for this new site, Johnson says, is because data.gov can't go as far as some would like because of laws that are already in place, such as the Paperwork Reduction Act.
"[That] regulates how much information [the] government can ask users for. [There are also] political boundaries. For instance, right now data.gov only has information around the executive branch of government. It doesn't have any information around the judicial or the legislative branch of government and we don't have any indication as to whether or not it can."
Johnson says another issue has to do with the data itself.
Not all government data is documented, which means it won't land on data.gov.
The hope of the Sunlight Foundation, though, is that their site will foster community participation.
"It will be a little bit more structured than [a wiki] and we'll have some sort of editorial process inside of Sunlight to figure out what goes in and what doesn't and how it's classified and stuff like that, but it will generally be lightweight. We're not ruthless overlords."
Another hope of Sunlight Labs is to highlight what's not publicly available.
"We had a deep debate inside of Sunlight about whether or not this thing should catalogue documents as well as data, and then we got down to the philosophical discuss of 'What is data?' Is a bill a piece of data or a document? What we ended up with is -- if it's not machine-readable, we're going to call that a document and we think the Sunlight Foundation's job is to encourage the government to turn documents into data, or turn documents into data ourselves."
Johnson says, because data.gov publishes its own data catalogue, Sunlight can simply download that information and upload it on its own site.
The organization also plans to work with agencies to get additional information.
"I think this is the perfect example of how government can be a platform for the outside to really build on top of it and make really interesting things. This is an opportunity that has been created because data.gov has been released."
Johnson says Sunlight is currently working hard to get the word out about this new project.
In addition to working on a new site, the Sunlight Foundation has also been examining the federal government's use of the IT dashboard.
Johnson recently wrote a post about his ideas for helping the federal government utilize the technology.
"There's a few problems, and this goes back to the data catalogue to a certain extent. There's no way for people to publicly share information about what's wrong with the data that's coming out of the government."
He says this is why community-based data sharing is so important. He also points to the problem that the data in the IT dashboard can be somewhat mysterious.
"They don't do a really good job of actually explaining where the data comes from and when it comes from. So, there are government contractors in there that are gigantic . . . and it says 'This government contractor has received nine contracts for the year,' which is may be true for 2009 but it doesn't actually specify that they just started collecting the data in 2009 [or] if they started collecting the data in 2004 or 2006 or whenever they started."
Overall, Johnson says there is an editorial component to data publication that the government is lacking.
He does add, however, that the problem can be solved by employing someone who can explain what the data actually is and how it's being collected.





