Why everyone should know what makes a good data set; it’s not as hard as you think


In many offices, when technology questions arise, the answer is to reflexively trust the technologists. These are often the folks who link to Venn diagrams of the fine distinctions between nerds, geeks, and dweebs; who prefer the comic xkcd to the Far Side; and who trust slashdot over NPR. So when it comes to the question of how the government should make information available online — in particular, how data should be made available online — most people’s first inclination is to nod to the technologist and slowly back away. That disengagement is a mistake.

How information is made available online fundamentally controls what can be done with it. Fortunately, an intelligent layperson can understand how structure makes data usable. That’s important, as the intelligent layperson is likely the one writing the rules on how government data will be made available: whether as a congressional staffer, a federal agency employee, or a citizen making a request. Awareness about data structure encourages smarter specs, and the ability to get more out of your information.

The very smart technologists at Princeton’s Center for Information Technology Policy have put together short blogposts that explain key concepts, and are geared towards the intelligent layperson. Sunlight Labs is working in a similar direction. The following articles are worth considering.

If you know of other articles along these lines, please add them in the comments.