How the New York City Council and an annual conference dedicated to the democracy-saving power of the internet are poised to beat the federal government into the 21st century.
Continue readingNonprofit E-File Data Should Be Open
The IRS is refusing to release digital e-file data for public documents filed by nonprofits--instead, they release it as PDFs. This introduces wasteful barriers for people who want to use this data. Carl Malamud's been fighting to fix this problem. We at Sunlight join him in calling for the IRS to release 990 e-file data.
Continue readingComing to PDF? Get Warmed Up With a Hackathon
A bunch of the Labs team (and the rest of Sunlight) will be in New York next week for PDF 2012. It's one of the can't-miss events of our calendar year -- and not just because Sunlight counts Micah Sifry and Andrew Rasiej as close friends. PDF is a consistently great opportunity for like-minded folks to get together and share their visions for how technology can change society for the better. We've found more than a few team members at past PDFs; I don't think it's a coincidence.
This year the folks behind the event are trying something new: a two-day hackathon in the leadup to the conference. They're calling it PDF: Applied, and if you have talent for coding and a chance to make it to New York a little early, you should really consider attending. It's always exciting to see this kind of attempt to translate big thoughts into concrete action.
Continue readingSenate Expenses to be PDF’d
We’ve just gotten the following document, which gives us the latest on the Senate’s plan to post official Senate expenses... View Article
Continue readingRedaction and Technical Incompetence
Felix Salmon, finance blogger extraordinare, was inspired by some reporting by Bloomberg to have a look at Treasury's website. Apparently Tim Geithner visited Jon Stewart back in April, and Felix was understandably interested in seeing the evidence for himself. He went to the Treasury website, and then... well, things took a turn for the worse:
First, you go to the Treasury homepage. Then you ignore all of the links and navigation, and go straight down to the footer at the very bottom of the page, where there’s a link saying FOIA. Click on that, and then on the link saying Electronic Reading Room. Once you’re there, you want Other Records. Where, finally, you can see Secretary Geithner’s Calendar April – August 2010.
Be careful clicking on that last link, because it’s a 31.5 MB file, comprising Geithner’s scanned diary. Search for “Stewart” and you won’t find anything, because what we’re looking at is just a picture of his name as it’s printed out on a piece of paper.
In other words, these diaries, posted for transparency, are about as opaque as it can get. Finding the file is very hard, and then once you’ve found it, it’s even harder to, say, count up the number of phone calls between Geithner and Rahm Emanuel. You can’t just search for Rahm’s name; you have to go through each of the 52 pages yourself, counting every appearance manually.
Is this really how Obama’s web-savvy administration wants to behave? The Treasury website is still functionally identical to the dreadful one we had under Bush, and we’ve passed the midterm elections already. I realize that Treasury’s had a lot on its plate these past two years, but much more transparent and usable website is long overdue.
This all sounds sadly familiar to me. I still remember when Treasury started posting TARP disbursement reports as CSVs instead of PDFs. I was working on Subsidyscope at the time, and had to load those reports on a weekly basis. It's more than a little sad how much better my life got when they made that change.
But I think it's important to note that Felix's frustration isn't just the product of bad technology.
Continue readingElena’s Inbox: How Not to Release Data
On Friday @BobBrigham tweeted a suggestion: put the just-released Elena Kagan email dump into a GMail-style interface. I thought this was a pretty cool idea, so I started hacking away at it over the weekend. You can see the finished results at elenasinbox.com.
I'm really pleased that people have found the site useful and interesting, but the truth is that a lot of the emails in the system are garbage: they're badly-formatted, duplicative or missing information. For instance, one of the most-visited pages on the site is the thread with the subject "Two G-rated Jewish jokes" -- understandably, given that it's the most potentially-scandalous-sounding subject line on the first page of results. Unfortunately, if you click through you'll see that there's no content in the messages.
The site was admittedly a bit rushed, but in this case it isn't the code that's to blame. If you go through the source PDF, you'll see that the content is missing there, too. It looks like it might have been redacted, but the format of the document is confusing enough that it's difficult to be sure.
But the source documents' problems go beyond ambiguous formatting. A lot of the junky content on the site comes from the junk it was built from -- there's not much we can do about it. To give you some idea of the problem, consider these strings:
Continue readingFollowing the Money: New House Expenditure Reports Available Online
For the second time ever, the House of Representatives released an online update to its “House Expenditure Reports” — a... View Article
Continue readingCruching Numbers on the President’s Economic Report
James Jacobs (of Free Government Info) writes that the Economic Report of the President, which provides an overview of the... View Article
Continue readingA lesson in Humility
On Monday the House of Representatives delivered, as promised, an electronic dump of House Expense Reports. We, at Sunlight Labs had a plan. We knew it was going to be a huge PDF, but we have all the infrastructure in place. We had plenty of bandwidth, knew when the data was coming out, roughly how it was going to look, and that it was likely we wouldn't be able to parse it all with computers. "We'll use TransparencyCorps," we thought, to get that last mile out of the data, so that eventually we'll end up with a parseable database.
Continue readingNo PDFs!
This week, Speaker Pelosi asked House administrators to post House members’ expenses on the Web, for the first time. We... View Article
Continue reading