Datafest project opens Uncle Sam’s daily ledger

by

Michael Keller shows off Treasury.io site at Sunlight offices.

Good morning America. Your $11 trillion check book is ready for viewing.

A project that got its start early this year at a bicoastal datafest cosponsored by the Sunlight Foundation now can give taxpayers a day-by-day, line item-by-line item view of what they’re paying the government and how the government is spending their money.

Treasury.io, set to be shown off Thursday at the New York Times open source science fair, is the brainchild of csv soundsystem, an eclectic group of journalists, data geeks, developers and even a recovering particle physicist. It provides a new perspective on the budget debate by enabling citizens to analyze the government’s intake and outlay of money in real time. The idea: Take detailed data that the U.S. Treasury publishes every day and put it into a format that can easily be analyzed by computer.

“If you want to see how much we spent on Medicare last Tuesday, there it is,” says Cezary Podkul, a Reuters reporter and team member. “It has the power to be very granular.”

What started as a coffee shop conversation about how to wrangle a balance sheet that the Treasury Department issues every afternoon into formatted data that can be queried and analyzed turned into a full-fledged project at the Bicoastal Datafest earlier this year. The team wowed the judges, as well as the crowd, which voted to give the Treasury data project top honors and a $2,000 prize. The winning entry even included a data-driven music video. Yes, the national debt can sing!

“It was surprisingly catchy,” said team member Tom Levine, who heard reports that the video got heavy play at  Treasury.

But for all its whimsical bells and whistles, Treasury.io is a project with a serious point: “They should be putting this on the news every night,” datafest judge Arlene Morgan said of one of the team’s visualizations, which shows, day by day, how expenditures on Medicare compare to the government’s revenues. (Hint: Grandma’s doctor bills are a lot higher.)

Treasury.io contains all of the government’s daily ledger sheets dating back to June, 2005 — when Treasury began publishing them in text, as opposed to PDF, format. By making it possible to do calculations across all of that data, Treasury.io produces interesting new insights. Podkul, for instance, was surprised to discover that the Treasury daily ledgers record $11 trillion in outlays a year, since the value of the annual federal budget is generally calculated at about $3 trillion. The difference, Podkul learned: Service on the national debt.

The datafest where csv soundsystem launched its project was underwritten by the MacArthur Foundation and was a joint project of Teresa Bouza, deputy Washington bureau chief of EFE and former Knight Fellow, and the Sunlight Foundation. It was hosted by the Brown Institute at Columbia and Stanford Universities and by the Columbia Journalism School and Stanford’s journalism program.

Following the datafest, csv soundsystem continued refining the concept with the help of a $10,000 Knight-Mozilla OpenNews grant. Team member Michael Keller, who stopped by Sunlight to show off the resulting website to developers and journalists here, said most of the money went for web hosting. In addition to the website, the team also created an automated Twitter feed that queries the data and produces several intriguing observations a day, such as this one:

 

Keller said the team is working on making the Treasury.io site, which right now requires at least a basic knowledge of SQL query language (though you can download structured data directly), more accessible to non-expert users. Podkul began showing off the site at last month’s Investigative Reporters and Editors annual conference. “They’re very excited about it,” he said.

Like all of the work that Sunlight supports, the Treasury.io site is completely open source. The csv soundsystem team has included extensive documentation on the Treasury tables they used and have written libraries to enable the data to be used in Python, Ruby, JavaScript, R, Node.js and GoogleDocs.

The members of csv soundsystem hope they are setting an example that will be contagious. “There have been a few tweets and blog posts from Italians, saying ‘Why don’t we have something like this in our country,'” said Brian Abelson. They hope the same attitude will trickle down to U.S. municipalities. “As people get used to this level of granularity, they’ll demand it on a local level,” Keller predicted.