Follow Us

Tag Archive: kagan

Elena’s Inbox: How Not to Release Data

by

screenshot of elenasinbox.com

On Friday @BobBrigham tweeted a suggestion: put the just-released Elena Kagan email dump into a GMail-style interface. I thought this was a pretty cool idea, so I started hacking away at it over the weekend. You can see the finished results at elenasinbox.com.

I'm really pleased that people have found the site useful and interesting, but the truth is that a lot of the emails in the system are garbage: they're badly-formatted, duplicative or missing information. For instance, one of the most-visited pages on the site is the thread with the subject "Two G-rated Jewish jokes" -- understandably, given that it's the most potentially-scandalous-sounding subject line on the first page of results. Unfortunately, if you click through you'll see that there's no content in the messages.

The site was admittedly a bit rushed, but in this case it isn't the code that's to blame. If you go through the source PDF, you'll see that the content is missing there, too. It looks like it might have been redacted, but the format of the document is confusing enough that it's difficult to be sure.

But the source documents' problems go beyond ambiguous formatting. A lot of the junky content on the site comes from the junk it was built from -- there's not much we can do about it. To give you some idea of the problem, consider these strings:

Continue reading
Share This:

CFC (Combined Federal Campaign) Today 59063

Charity Navigator