Open Govt Data Geeks Unite, and the Rise of 3-D Journalism

by

Micah Sifry (Sunlight senior strategic consultant) writes:

I’ve just finished spending two days at a mini-retreat on open government data organized by Carl Malamud of Public.Resource.Org, hosted by Tim O’Reilly of O’Reilly Media and funded by the Sunlight Foundation, Google and Yahoo!. The purpose of the meeting was to gather a bunch of folks from both the public and private sectors who are working on everything from pro-democracy websites to hyper-local news startups to see if we could draft some common principles for data and open government, and also to deepen connections and collaboration among a powerfully creative group of individuals and projects. (Full disclosure: I was there in my consulting role as a senior technology adviser to Sunlight, but this was another of those fortuitous events where I get to where all my hats as PdF editor, open government activist, and Sunlight consultant at once.)

In attendance were Adrian Holovaty and Daniel O’Neil of the soon-to-be-unveiled EveryBlock; Michal Mugurski and Eric Rodenbeck of Stamen Design, which does amazing work with data visualization; Josh Tauberer of GovTrack.us, which makes Thomas useful and amazes the rest of us with his efficiency; Lawrence Lessig of Stanford, who’s focusing his prodigious energies on the problem of corruption; Dan Newman of MAPLight.org, which is doing path-breaking work connecting money, legislators, votes and power; John Geraci of outside.in, which is localizing the blogosphere down the neighborhood level; Ed Bender of the Institute for Money in State Politics, which has state-of-the-art APIs for mashing up state-level campaign finance data; Tom Steinberg of mySociety.org, probably the world’s leader in pro-democracy web services (see TheyWorkForYou.com); David Moore and Donny Shaw of OpenCongress, which brings social wisdom to unveil what’s really going inside Congress now; JL Needham of Google, you’ve probably heard of them; Ethan Zuckerman of the Berkman Center, who has more accomplishments in the geek-to-social-good sector than anyone I know (and he’s only 34!!); Greg Palmer, whose stepping down as Congressman Henry Waxman’s tech director soon to venture into some exciting projects in the private sector; Jamie Taylor of Metaweb, which is building a powerful platform called Freebase for public information sharing; Bradley Horowitz of Yahoo!, you’ve probably heard of them too; Zack Exley of the New Organizing Institute, whose one of my favorite progressive agitators; Michael Dale of Metavid, which is bringing transparency and interactivity to Congressional video; Joseph Lorenzo Hall of UC Berkeley, one of the world’s experts on e-voting; Marcia Hoffman, a staff attorney for the Electronic Frontier Foundation, which I am a proud member of; David Orban of Metasocial Web, who is exploring the frontier of networked politics; Will Fitzpatrick of Omidyar Network, which is moving toward embracing transparency as a top priority; Aaron Swartz of Open Library, which is working on creating a wiki page for every book in the world; and myself and Greg Elin of the Sunlight Labs.

The common denominator of this group of non-profit and for-profit social entrepreneurs is the conviction that freedom of information is a cornerstone of democracy, and that the internet is the most powerful system ever invented for expanding public information and participation in the decisions that affect our lives. Thus just about everyone in attendance is actively involved in projects that take publicly available data and, using all kinds of new software, make it dramatically more meaningful and engaging.

Thanks to this movement, it is now possible–or soon will be possible–to do all of these things:
-discover your member of Congress’s full voting record (the official record at Thomas only chunks out votes by bill, not by member), and explore oddities like "late-night" votes, at the Washington Post’s votes database, courtesy of Adrian Holovaty and crew.
-explore precisely how money from an industry or interest group correlates to specific votes, and then produce fine-grained charts with your own analysis of these relationships, thanks to MAPLight.org.
-search for video of speeches by Members of Congress from the floor of the House and Senate–and soon to annotate those snippets, edit them and package your own video for exporting on other sites, thanks to Metavid.
-track a bill or a vote or a Member, and soon, to add your own comments and point of view all of the same, and see what others are saying, thanks to OpenCongress.

If you want to get a sense of the presentations, read Ethan Zuckerman’s post from Saturday, which describes some of the cutting-edge work that participants (several of whom are grantees of Sunlight) are currently doing. David Orban also shot some video and created a discussion group around the event (details here).

3-D Journalism
I had a great conversation over dinner with Adrian Holovaty, who helped pioneer this growing field of public data mashups with his ChicagoCrime.org site. He agreed with me that there’s a new kind of story-telling being done with many of these projects, a kind of dynamic, data-driven journalism that is simply impossible in print–but we struggled for a simple phrase to describe it. I like the word "charticle," but most of the people around us, including Adrian, thought that sounded vaguely unsanitary. Then it hit me, how about "3-D journalism"? That is, online sites or charts that are built around data, that may be dynamically produced by other web services (or by combining several services), that a reader can interact with, produce a kind of 3-D view of an issue or story. So, whaddya say? have I coined a term?

The group spent some time brainstorming about tools and products that we wish we had, and I am hopeful that this conversation will continue and expand. Some of the things that we discussed included: better tools for groups to annotate and discuss specific paragraphs or items in a text (such as a bill); a safe hub for public servants from school-teachers to bureaucrats to blog about what really needs to be fixed inside their institutions; better tools for incumbent legislators to receive and respond to the huge flow of email they currently receive (and ignore); better tools for converting audio to text; and a hub for tracking the promises politicians make (and whether they keep them).

How to Open Government Data
Underlying all of our discussions was the sense that we stand on the verge of much greater developments in the field of networked democracy, but that we need all branches of government, from the national and the local, to update how they handle data to enable the greatest possible flowering of beneficial public uses. Too often, data that taxpayers have paid to develop is published on government websites in essentially unusable forms–locked in pdfs, abbreviated in summary tables, delayed by months or even years from when it was timely; unreadable by computers and thus unmashable; and with all sorts of unnecessary restrictions on its use. So the group spent a lot of time working out a basic set of principles for defining when government data can truly be considered open, and at the end of the afternoon on Saturday we posted these "Open Government Data Principles" to the web.

Government data shall be considered open if it is made public in a way that complies with the principles below:

1. Complete
All public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations.

2. Primary
Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms.

3. Timely
Data is made available as quickly as necessary to preserve the value of the data.

4. Accessible
Data is available to the widest range of users for the widest range of purposes.

5. Machine processable
Data is reasonably structured to allow automated processing.

6. Non-discriminatory
Data is available to anyone, with no requirement of registration.

7. Non-proprietary
Data is available in a format over which no entity has exclusive control.

8. License-free
Data is not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed.

Compliance must be reviewable.

Each of these definitions is hyperlinked to additional clarifying pages, and the group is inviting further discussion on its wiki pages to help flesh out these definitions further. Perhaps, in the same way that the Open House Project has helped kickstart a productive dialogue with the House of Representatives on how it can improve how it uses the web to share information with and engage the public, these Open Government Data Principles can fire a bigger conversation with government insiders at all levels, as well as policy thinkers and technologists worldwide, to seize the moment and bring government data practices further into the age of the open, networked public sphere. I, for one, am very optimistic.

Cross-posted at TechPresident.