XML is Not Enough


David Robinson, associate director of Princeton’s Center for Information Technology Policy, has an interesting post at the center’s Freedom to Tinker blog about the best way government should present data. David proposes that government should release its information in a form nobody wants to read via XML files that are  “machine-readable” but are largely indecipherable to the human eye. It would be up to journalists, activist organizations and individuals to decipher and present the data in ways citizens can understand. This would spawn the creativity that would allow “a thousand mashups bloom,” he argues.

Government releasing data in XML format, in many cases, would be a step in the right direction. No question about that. One of the great maxims of Web 2.0 is that when it comes to data and information, content is king. The act of making data available opens up all sorts of possibilities of sharing, remixing and the like. But why should government stop there? Why shouldn’t government agencies make an effort to make their data more easily understood by the average citizen? David is proposing a false argument, I believe. Who is advocating that government should be the “only source for interaction” with its data? Why either/or? Why not both?

There have been some very clever displays of government data. Take ProPublica’s interactive graph of where all the money is going in the proposed stimulus bill  published earlier this week, for example. Another is Sunlight’s own Capitol Words, where, for every day Congress is in session, Capitol Words visualizes the most frequently used words in the Congressional Record, giving you an at-a-glance view of which issues lawmakers address on a daily, weekly, monthly and yearly basis.

Andrew Rasiej, Personal Democracy Forum founder and Sunlight senior technology adviser, has said government should put the Sunlight Foundation out of business by fully embracing Web 2.0. I won’t hold my breath but I wouldn’t be unhappy with the situation. Government should be in the business of devising methods of both serving up its data and communicating its  so that the citizens it serves can use it as they see fit.

Categorized in:
Share This:
  • joe

    Hmmm, maybe I don’t understand. What part of ProPublica’s graph, Sunlights CapitolWords, etc. was done by the government? It seems to me that the data was released by the government and these groups remixed, presented, displayed, etc., no? That’s exactly what David R. and colleagues are arguing should happen. I agree that we should let govt. off the hook but we should work to concentrate their energy and effort where it will make the best contribution… right now, I don’t see any remixes, visualizations, **done directly by a government entity** that do anything close to what these orgs do.

    Anyway, I’m a big fan of sunlight and what you’ve done, just trying to understand your thinking here.

    PS: BTW, does Sunlight have a “Jobs” page? I’m slowly over the next year or so looking for a position and I’m trying to cast a net wider than academia. (this goes for transparency-minded outfits everywhere… get in touch!)

  • Joe, thanks for your comment. My point about listing the remixing of data by ProPublica and Sunlight was to show what government can do…And they don’t have to rely on outside groups, journalists and activists to be creative in making the data more user friendly and understandable. We shouldn’t let government off the hook so easily. Thanks again!

  • I agree with Joe’s comments, and would emphasize that nothing about David’s post suggests that the government shouldn’t also do presentation… but as a secondary matter. The big issue is when doing presentation takes precedence over getting the data out there.

    One might argue that this reversed set of priorities contributes to things like the pay wall of the federal district courts’ “PACER” system, where the Administrative Office spends millions of dollars developing new versions of their lousy software instead of opening up access and letting others make it presentable.

  • joe

    There’s a misunderstanding here.

    None of the examples you provide are government entities doing the presentation / visualization / mashing up. That is, I believe you’re making David’s argument for him (that government shouldn’t have as a primary core goal a monopoly on the presentation of data that it generates and collects because publishing it in a rich data structure will allow others with interests different from the govt. to use it.)

    I think David et al. are arguing that the primary focus on the part of the government should be on capturing, describing and publishing government data, not that they should be forbidden from presenting it (nor should anyone else). In fact, an exemplar presentation/visualization can often expose what the data orginiator thinks the data is good for and provide examples of how to use it.

    Anyway, as someone who recently published a phd thesis on transparency and e-voting, I’m a big fan of Sunlight Foundation / Labs, etc. BTW, I’m also a postdoc here at CITP, for full disclosure.