As stated in the note from the Sunlight Foundation′s Board Chair, as of September 2020 the Sunlight Foundation is no longer active. This site is maintained as a static archive only.

Follow Us

Sarah’s Inbox: The Agony and the .tgz

by

Many of you have probably already seen that earlier today we stood up a copy of the Elena's Inbox code for the Sarah Palin email collection. You can find the site here. I think that by most reasonable standards, Sarah Palin is currently a less newsworthy figure than Justice Kagan was at the time of her confirmation. But there's no question that many people find her fascinating, and folks seem to really enjoy having this sort of interface available -- the response has been overwhelmingly positive, even in spite of its horrifying Gmail 1.0 look (for what it's worth, Sunlight's design team deserves absolutely none of the blame for this one!).

It's worth taking a moment to reflect on what it took to get this site online. The state of Alaska released Governor Palin's email records on paper. News organizations had to have people on the ground to collect, scan and OCR these documents. Our thanks goes out to Crivella West, msnbc.com, Mother Jones and Pro Publica, whose incredibly quick and high-quality work provided us with the baseline data that powers the site.

But it wasn't yet structured data. It was easy enough to convert the PDFs into text, though this introduced some errors -- dates from the year "20Q7", for instance. Then we had to parse the text into documents, each with recipients, a subject line, and a sender. This is trickier than it might seem. Consider the following recipient list:

To: Smith, John; Jane Doe; Anderson; Andy (GOV); Paul Paulson

It's parseable... sort of. It turns out that, in this case, "Andy Anderson" should be treated as an entity. In this dataset, portions of names are delimited by semicolons, but so are names. It's a bit of a mess. Sunlight staff spent the better part of Monday performing a manual merge of the detected entities, collapsing over 6,000 automatically-captured people to less than half that number. I won't pretend that the dataset is now spotless, but it's considerably more structured than it used to be.

And that structure makes possible not only novel interfaces like Sarah's Inbox, but also novel analyses. Consider this graph of how often the word "McCain" appears in the emails:

total emails mentioning 'mccain' by week

Interesting, right? More substantively, consider the efforts of Andree McCloud, who's raising questions about an apparent gap in the Palin emails near the beginning of the governor's term. With the data captured, it's easy to visualize this -- here's a graph of the total email volume in the system by week, beginning with the first week of December 2006, when Palin took office:

total released email volume by week

(To be clear, I don't think you can necessarily conclude from this graph that there's anything nefarious about that period's low email volume -- there are plenty of potential explanations. Still, it's useful to be able to be able to understand the outlier period in the larger context of the document corpus.)

Of course, these analyses and interfaces could be even better if Alaska had just released the files digitally. In fact, if they had, we might be able to draw some more solid conclusions: as our sysadmin Tim pointed out, message headers' often-sequential IDs could conceivably show whether there actually are missing emails from those first few weeks.

It's a shame that that didn't happen -- and not just because it meant my weekend was spent parsing PDFs. Releasing properly structured data ultimately allows everyone to do better work in less time. It's unfortunate that the authorities in Alaska introduced such a substantial and unnecessary roadblock.

But we at Sunlight can at least share what we've done to improve the situation. If you're interested in running your own analysis, you can find our code here, and the data to power it here (12M). At the moment it's in the form of a Django project -- if you need it in a different format, don't hesitate to ask on our mailing list. If you do something neat with it, please tell us!

Continue reading

Palin used six email accounts as governor

by

On Friday, reporters in Juneau, Alaska, began to sift through and scan more than 24,000 pages of emails to and from former Governor Sarah Palin, just released in response to requests made when she was governor. But they had full coverage of just two of her email accounts--and perhaps not the most interesting ones--because Palin had at least six accounts: one for public contact, one for internal state business, one for anything confidential and others for a mix of state and personal business.

Palin’s use of private accounts has been previously reported, but the just-released emails--which Sunlight is ...

Continue reading

Sunlight Live to cover Senate hearing on Clean Air Act Wednesday

by

Lawmakers on the Senate's Environment and Public Works committee will meet Wednesday, June 15, to hear from experts on public health as it relates to the longstanding Clean Air Act.

According to the Environmental Protection Agency, the Clean Air Act, passed in 1970, will have saved $2 trillion dollars by 2020 and 230,000 lives each year. But this spring, lawmakers in the House of Representatives attacked the law by passing a bill to keep the EPA from regulating green house gases. Similar legislation did not pass in the Senate.

The Sunlight Foundation will live blog during Wednesday's ...

Continue reading

Plaintiff in Citizens United case forms a Super PAC

by

Citizens United, whose court challenge to rules barring political spending by corporations has led to far-reaching changes in the campaign finance landscape, has formed its own Super PAC, allowing it to raise and spend unlimited amounts of money to influence elections.

Paperwork for the new committee -- Citizens United Super PAC LLC -- was received by the Federal Election Commission on Saturday Friday and posted on its website this morning.

In Citizens United v. FEC, the Supreme Court ruled 5-4 in January 2010 that prohibitions on independent expenditures by corporations and unions are unconstitutional. The case grew out of Citizens United's ...

Continue reading

CFC (Combined Federal Campaign) Today 59063

Charity Navigator