Kickoff: The National Data Catalog

by Clay Johnson

technology

Jul 15, 2009 11:24 am

Sometimes you can get inspired by government. In our field it happens more than you’d think. Obviously all our new tools– new things like TransparencyCorps and Congrelate along with CapitolWords have been inspired by government to a degree, but there aren’t many ideas that we’ve actually stolen from government.

Today I’m happy to announce we’re stealing an idea from our government. Data.gov is an incredible concept, and the implementation of it has been remarkable. We’re going to steal that idea and make it better. Because of politics and scale there’s only so much the government is going to be able to do. There are legal hurdles and boundaries the government can’t cross that we can. For instance: there’s no legislative or judicial branch data inside Data.gov and while Data.gov links off to state data catalogs, entries aren’t in the same place or format as the rest of the catalog. Community documentation and collaboration are virtual impossibilities because of the regulations that impact the way Government interacts with people on the web.

We think we can add value on top of things like Data.gov and the municipal data catalogs by autonomously bringing them into one system, manually curating and adding other data sources and providing features that, well, Government just can’t do. There’ll be community participation so that people can submit their own data sources, and we’ll also catalog non-commercial data that is derivative of government data like OpenSecrets. We’ll make it so that people can create their own documentation for much of the undocumented data that government puts out and link to external projects that work with the data being provided.

We’re starting this project today, now, and will be building it out in public. Two developers here will be working on it: Luigi and David. We’ve set up PivotalTracker for the project and of course you can find the source. This project has three major components, and there’s three separate repositories for them. There’s the API, the web catalog, and a ruby library for the API. These things will work in symphony– we’re building our API first and our data catalog website will run on top of it, using the ruby-datacatalog client library.

In terms of timeline, we’re ruthlessly ambitious, hoping to have something up after the contest ends. That’s not set in stone, but we’ll do our best to get there. The catalog is going to have three components to start with: an api, a web interface, and a command line interface. If you’re interested in helping out with this project, please join our Google Group. If you just want to help the Sunlight Foundation fund this project, please consider a contribution.

Both David and Luigi will be blogging updates periodically throughout the process, and we’d appreciate any feedback or help you can give. You can also submit data sources you’d like to see added to help us get started.

For updates on what’s going on in the labs, you should follow me on twitter here.