Follow Us

Tag Archive: Clay Johnson

We Don’t Need a GitHub for Data

by

picture of Lt. Commander Data standing in front of a screen with the GitHub logThere was an interesting exchange this past weekend between Derek Willis of the New York Times and Sunlight's own Labs Director emeritus, Clay Johnson. Clay wrote a post arguing that we need a "GitHub for data":

It's too hard to put data on the web. It’s too hard to get data off the web. We need a GitHub for data.

With a good version control system like Git or Mercurial, I can track changes, I can do rollbacks, branch and merge and most importantly, collaborate. With a web counterpart like GitHub I can see who is branching my source, what’s been done to it, they can easily contribute back and people can create issues and a wiki about the source I’ve written. To publish source to the web, I need only configure my GitHub account, and in my editor I can add a file, commit the change, and publish it to the web in a couple quick keystrokes.

[...]

Getting and integrating data into a project needs to be as easy as integrating code into a project. If I want to interface with Google Analytics with ruby, I can type gem install vigetlabs-garb and I’ve got what I need to talk to the Google Analytics API. Why can I not type into a console gitdata install census-2010 or gitdata install census-2010 —format=mongodb and have everything I need to interface with the coming census data?

On his own blog, Derek pushed back a bit:

[...] The biggest issue, for data-driven apps contests and pretty much any other use of government data, is not that data isn’t easy to store on the Web. It’s that data is hard to understand, no matter where you get it.

[...]

What I’m saying is that the very act of what Clay describes as a hassle:

A developer has to download some strange dataset off of a website like data.gov or the National Data Catalog, prune it, massage it, usually fix it, and then convert it to their database system of choice, and then they can start building their app.

Is in fact what helps a user learn more about the dataset he or she is using. Even a well-documented dataset can have its quirks that show up only in the data itself, and the act of importing often reveals more about the data than the documentation does. We need to import, prune, massage, convert. It’s how we learn.

I think there's a lot to what Derek is saying. Understanding what an MSA is, or how to match Census data up against information that's been geocoded by zip code -- these are bigger challenges than figuring out how to get the Census data itself. The documentation for this stuff is difficult to find and even harder to understand. Most users are driven toward the American Factfinder tool, but if that's not up to telling you what you want, you're going to have to spend some time hunting down the appropriate FTP site and an explanation of its organization -- Clay's right that this is a pain. But it's nothing compared to the challenge of figuring out how to use the data properly. It can be daunting.

But I think there are problems with the "GitHub for data" framing that go beyond the simple fact that the problems GitHub solves aren't the biggest problems facing analysts.

Continue reading
Share This:

Hello, Labs

by

Like Clay said, I'm the new guy. Well, not entirely new -- I've been at Sunlight since late 2008. But I'm the one who's going to be trying to fill the enormous gap he's leaving. I thought I'd start to explain how I want to do that by talking about how I arrived at Sunlight.

I first became aware of the Sunlight Foundation while working as a programmer at a consultancy here in DC, building sites for large nonprofits and dabbling with using and writing about various technologies on the side. When I heard about Sunlight Labs, I thought it was pretty much the coolest thing in the world. Technologists using their skills to directly improve society. For people like me (and probably you) -- people who have acquired a technical skillset that's powerful, in a sense, but not always obviously useful -- it's an incredibly compelling prospect.

Continue reading
Share This:

Goodbye Sunlight

by

It's been just over two years since I first started here at Sunlight, and today's my last day.

Over the past two years, we've done some incredible things together. Through Apps and Design for America contests, our community developed nearly 200 open source applications and visualizations on top of government data, for a total expenditure of about $100,000. We built an army of nearly 2000 developers and designers working to change their government. We launched the first wiki bid on Recovery.gov, and changed the FEC using a collaborative testimony.

Continue reading
Share This:

CFC (Combined Federal Campaign) Today 59063

Charity Navigator