The National Data Catalog

by

Sometimes you can get inspired by government. In our field it happens more than you’d think. Obviously all our new tools– new things like TransparencyCorps and Congrelate along with CapitolWords have been inspired by government to a degree, but there aren’t many ideas that we’ve actually stolen from government.

Today I’m happy to announce Sunlight Labs is stealing an idea from our government. Data.gov is an incredible concept, and the implementation of it has been remarkable. We’re going to steal that idea and make it better. Because of politics and scale there’s only so much the government is going to be able to do. There are legal hurdles and boundaries the government can’t cross that we can. For instance: there’s no legislative or judicial branch data inside Data.gov and while Data.gov links off to state data catalogs, entries aren’t in the same place or format as the rest of the catalog. Community documentation and collaboration are virtual impossibilities because of the regulations that impact the way Government interacts with people on the web.

We think we can add value on top of things like Data.gov and the municipal data catalogs by autonomously bringing them into one system, manually curating and adding other data sources and providing features that, well, Government just can’t do. There’ll be community participation so that people can submit their own data sources, and we’ll also catalog non-commercial data that is derivative of government data like OpenSecrets. We’ll make it so that people can create their own documentation for much of the undocumented data that government puts out and link to external projects that work with the data being provided.

We’re starting this project today, now, and will be building it out in public. Two developers here will be working on it: Luigi and David. We’ve set up PivotalTracker for the project and of course you can find the source. This project has three major components, and there’s three separate repositories for them. There’s the API, the web catalog, and a ruby library for the API. These things will work in symphony– we’re building our API first and our data catalog website will run on top of it, using the ruby-datacatalog client library.

In terms of timeline, we’re ruthlessly ambitious, hoping to have something up after the contest ends. That’s not set in stone, but we’ll do our best to get there. The catalog is going to have three components to start with: an api, a web interface, and a command line interface. If you’re interested in helping out with this project, please join our Google Group. If you just want to help the Sunlight Foundation fund this project, please consider a contribution.

Both David and Luigi will be blogging updates periodically throughout the process, and we’d appreciate any feedback or help you can give. You can also submit data sources you’d like to see added to help us get started.

Categorized in:
Share This:
  • Those of us involved in the integration of govt data via linked-data approaches (cf. http://data-gov.tw.rpi.edu) would be interested in working with you on this – is there a method for outside orgs to work w/Sunlight?

  • This might be of interest:

    http://blog.okfn.org/2009/07/23/what-features-should-be-included-in-a-catalogue-of-open-government-data/

    Also, I don’t know if you’ve seen CKAN, a community driven registry for open data, which we’ve been developing for several years:

    http://ckan.net/

    All content and code is open.

  • l m beale

    what is “API” for us non programer types?

  • Scott

    I think this is a great idea because, yes, there are things that will probably not be done either because of policy in the Government or technical challenges. That being said, you would be doing the citizens of this great country…and the Government…if you would then take your results/data and proactively work with the White House and push that data back up to data.gov. I’m sure that there are some policies that could be used in order to feed that data back up while still providing credit to Sunlight. Doing so would be the “proof in the pudding” for how the public and Government can work together to create useful products.

    If you keep that data to yourself and don’t share it back then you’ll be doing the same thing the Government has been known to do for years…stovepipe information.