Why everyone should know what makes a good data set; it’s not as hard as you think

by Daniel Schuman Mar 29, 2010 9:50 pm

In many offices, when technology questions arise, the answer is to reflexively trust the technologists. These are often the folks who link to Venn diagrams of the fine distinctions between nerds, geeks, and dweebs; who prefer the comic xkcd to the Far Side; and who trust slashdot over NPR. So when it comes to the question of how the government should make information available online — in particular, how data should be made available online — most people’s first inclination is to nod to the technologist and slowly back away. That disengagement is a mistake.

How information is made available online fundamentally controls what can be done with it. Fortunately, an intelligent layperson can understand how structure makes data usable. That’s important, as the intelligent layperson is likely the one writing the rules on how government data will be made available: whether as a congressional staffer, a federal agency employee, or a citizen making a request. Awareness about data structure encourages smarter specs, and the ability to get more out of your information.

The very smart technologists at Princeton’s Center for Information Technology Policy have put together short blogposts that explain key concepts, and are geared towards the intelligent layperson. Sunlight Labs is working in a similar direction. The following articles are worth considering.

“Government Datasets That Facilitate Innovation,” Freedom to Tinker Blog (3/1/2010)
“Basic Data Format Lessons,” Freedom to Tinker Blog (3/2/2010)
“Labeling Dataset Contents,” Freedom to Tinker Blog (3/3/2010)
“Correcting Errors and Making Changes,” Freedom to Tinker Blog (3/8/2010)
“Best Practices for Government Datasets: Wrap-up,” Freedom to Tinker Blog (3/12/2010)
“Drafting Guidelines for Government Data Catalogs,” Sunlight Labs Blog (3/29/2010)

If you know of other articles along these lines, please add them in the comments.

Sunlight Foundation

Follow Us

Why everyone should know what makes a good data set; it’s not as hard as you think