Luigi passed along a couple of links to a great/infuriating On the Media segment about the new rules the FCC is considering related to the online disclosure of political ad purchases.
To run through the issue quickly: every broadcast station is required to keep a "public file" of paper records related to campaign ad purchases. These records show basic information about how an ad was purchased, who bought it and when it aired. As the name implies, the file is available for public inspection, but only if you show up at the station and ask for it.
The FCC has proposed a rule that would require the public file to be posted online. We feel that this is an obvious and overdue step, and have submitted comments to the rulemaking saying as much. After all, it's 2012--it's absurd to claim that information is "public" if it isn't also online. And this information is particularly important: with Citizens United enabling a new flood of money into our political system--with less acountability!--keeping track of the ways in which wealth is deployed to move political opinion is more important than ever. The public file is a vital source of this kind of information.
The first OTM segment, which features Steven Waldman, does a good job of explaining all of this. The second one mostly just makes your blood boil. In it, Jack Goodman, a lobbyist for the National Association of Broadcasters, makes the case that posting the public file online would represent an onerous burden on broadcast stations.
Clearly, this is nonsense. As Waldman notes, Goodman is claiming that his would be "the first industry to use the internet to become less efficient." I've seen what the public file looks like. Yeah, there's a bunch of stuff in there, but obviously not too much to fax to the FCC once a day (or, preferably, enter into a modern electronic records-keeping system--perhaps one supplied by the FCC--instead of continuing to record everything on paper like it's 1970).
But forget for a moment how ridiculous Goodman's argument is. Consider how outrageous it is that he's even making it. This is one of the underappreciated pathologies that lobbying produces. If you're an organization like the NAB and you have a staff lobbyist, whenever an issue comes along--however minor--your lobbyist can be counted on to make a fuss about it. That's what they're paid to do, right? Here we have a disclosure burden that is basically the bureaucratic equivalent of your office manager announcing that expense reports have to be filed using a webform. Yet for some reason we're now having a national conversation about it.
It's absolutely dumbfounding to have an effort to make money in politics more transparent weighed against someone not wanting to use the fax machine. And yet here we are. That's the magic of the lobbying industry.
Continue readingThe FEC’s New Mobile Site Could Use Some Work
Last Friday the Federal Election Commission announced the launch of a new mobile interface. You should try it for yourself at http://fec.gov/mobile/. The site declares itself to be a beta, which I suspect you'll agree is something of an understatement.
Let's call a spade a spade: there's no use pretending this is good. To begin with, there are obvious superficial problems: graphs lack units, graphics have been resized in a lossy way, and the damn thing doesn't work on most Android devices.
Worse, there are substantive errors. Look at Herman Cain's cash on hand. Why are debts listed as a share of positive assets? Look at the Bachman campaign's receipts. Why is "total contributions"--which should reflect the entire pie--just a slice? (It's not 50% because other slices seem to have incorrectly counted overlap, too.) Why don't any of the line items below the graphs reflect the fact that some are components of others?
We asked the FEC for comment, but so far they've declined. Once the powers that be over there have a closer look, I'm confident they'll agree that the mobile site is a mess.
It's hard to know what to say about all of this. Part of Sunlight's mission is to encourage government agencies to embrace technology more fully. We don't want to send mixed messages by jumping down their throats when they actually try to do so. Sure, we gave FAPIIS a hard time, but that was because the site's creators were obviously and deliberately undermining the idea of public oversight. By contrast, I don't think anyone who worked on the FEC Mobile site intended to do a bad job.
And of course there's a fundamental question. Obviously the bits that are relaying incorrect information are a problem. But assuming those get fixed, is a half-hearted attempt like this better than nothing? I suppose there might be some poor, twisted soul who will enjoy listening to FEC meeting audio while they're at the gym (though frankly, if such a person existed I suspect they'd already be working here). But as a general matter it's difficult to imagine anyone needing a mobile interface to a set of campaign finance data that's as narrowly conceived as this one.
To their credit, it doesn't seem as if this mobile interface was created at the expense of the organization's much more important responsibility to publish data--a mission that, by and large, the FEC fulfills ably and with steadily increasing sophistication. There's always room for improvement, but the truly pressing needs, like reliable identifiers for contributors and meaningful enforcement of campaign finance law, are beyond the reach of the organization's technical staff.
Still, it's a bit amazing to see obviously wrong numbers attached to a product that Chairperson Bauerly has been quoted as endorsing appreciatively. Among those of us concerned about America's campaign finance system and the effect it has on our democracy, there is a sense that the FEC's leadership does not take its mission particularly seriously. The release of shoddy work like this mobile site does little to dispel that impression.
Continue readingRemembering Richard Cordray: Nominee and Jeopardy Champion
This morning the Senate filibustered the nomination of Richard Cordray to be the Consumer Financial Protection Bureau’s first director. Most... View Article
Continue readingCongress Should Step Away from the Internet
About that black bar… If you’re reading this post on our website, you might have noticed the black bar covering... View Article
Continue readingDon’t Forget: Our Open House is Tomorrow!
A gentle reminder: our open house is tomorrow starting at 6, and we'd love to see as many of you here as can make it. Beer has been ordered, candy is being acquired, and plans are afoot for a Kinect-powered haunted painting. In short: it's going to be great. RSVP, why doncha?
Continue readingSave the Date: Labs Open House October 25
Jeremy mentioned it in this week's labs update, but it's worth broadcasting it more loudly: we're having another Sunlight Labs open house! It's been about a year since the last time we did this. We had a great time with you all back then, and are looking forward to doing it again.
So! Please mark your calendars: we'll be opening our doors on Tuesday, October 25 at 6pm. Expect drinks, games, technology chit-chat and more than a little Halloween-themed nonsense.
If you think you can make it, do us a favor and RSVP here. We're looking forward to seeing you there!
Continue readingAnnouncing Superfastmatch
Today I'm pleased to announce that the Superfastmatch project is open-source and ready for use. I’m excited to be posting this—I’ve been waiting to do so for a while! I think SFM is really, really cool—and I think you’ll agree once I tell you why. But first, a little bit of backstory.
We first became aware of the technology behind SFM when Churnalism launched. Created by the Media Standards Trust, Churnalism is an ingenious effort to detect when UK journalists copy-and-paste press releases into their published stories. It’s a great project, but we were even more excited by the technology behind it. Finding overlap between documents in huge corpora is not as simple a problem as you might think--it's tempting to assume that diff will manage the job, but in truth that tool is unsuitable for most types of documents.
The basic algorithmic challenge is the same one faced by those working on systems to detect academic plagiarism--a rich and evolving field in its own right. But surprisingly little of that technology is freely available.
Sunlight reached out to MST and was ultimately able to provide a grant that allowed them to open-source their code. Even better: they've been improving it. A mostly-Python implementation that needed hefty hardware is now a compiled solution that runs blazingly fast on commodity hardware (we’ve also successfully run it on vanilla EC2 instances--see the README for details).
Each instance of the system is an HTTP server. Users load documents by POSTing their text to a RESTful interface. As each document is processed, it’s normalized and split into substrings, which are hashed into unique tokens. After you’ve loaded your documents, you run an association task, which compares each document's collection of tokens against one another. Where there's overlap, contiguous chunks of text are assembled, and you can begin to inspect the parts that might be borrowed from one another. (The actual mechanics of the system are considerably more complex than this explanation, but the preceding should give you a rough idea of how things work.)
There's a demo at scripts/gutenberg.sh that loads the Bible, the Koran and ten classic novels from Project Gutenberg into the system, then finds every bit of overlap between them (it takes about 45 seconds from start to finish on my three year-old laptop).
Sunlight's particular interest is in pairing this technology with data from our Open States Project in order to detect when legislation is migrating between statehouses or from interest groups and into law. But we hope and expect that SFM's uses will extend well beyond our mission--the applications of this technology seem sure to surprise us.
The project remains under very active development. We expect a bugfix related to very large datasets to be merged into the main branch in a week or two, for instance. But Sunlight and MST are both anxious to see developers begin to acquaint themselves with Superfastmatch. And of course we're also hopeful that others might be inspired to contribute back to it. Providing the system's output as JSON, for example, is a long-planned feature that would be easy to implement and of considerable value.
For now, though, please have a look at the project repo and start thinking about what SFM might make possible for you. You don't need to look for a needle in a haystack anymore--you just need a few good haystacks.
Continue readingData Visualization Fellowship
We've got a new job listing up, and I hope you'll have a look. If you do, you'll see that we're doing something new. This position came about because we decided that we wanted to create more and better data visualizations -- they're interesting, people like them, and they're a great opportunity to experiment with new technologies.
But as we started thinking through how to staff this position, we realized we didn't really want someone who was an expert in d3, or processing.js, or any other presentation technology. Don't get me wrong: finding someone with those skills for this position would be great. But we already have a bunch of talented front-end developers and designers. I think we can present answers in beautiful and compelling ways; what I could really use are better questions.
So, like I said, we're looking for something a little different. The listing says "quantitative social scientist," but you could easily substitute the "data scientist" buzzword that the tech industry seems to be embracing. Whatever you call it, what we're looking for boils down to this: we need someone with the ability to understand the questions that can reasonably be asked of our data; someone who knows the questions that people have asked of the data in the past; and who is be able to find some decent answers of her own. At Sunlight, those questions are likely to be about the U.S. government and the entities that try to influence it. Once you've got an interesting answer, we'll throw all the Javascript and CSS at it that you could ever want.
So please have a look, and if you know folks who you think would be a good fit, pass the link along to them. And if you yourself are thinking about applying, please don't be scared off by the specific requirements -- they describe what we think an ideal candidate would be, but we know that we're likely to find some surprises. This fellowship is a bit of an experiment for us, but I'm excited about the possibilities it represents.
Continue readingIt’s Not Okay for Congressional Websites to Crash
Clearly, Washington hasn't been covering itself in glory lately. The debt ceiling standoff in particular seems to have catalyzed an outpouring of frustration over what many think has been an especially feckless congress.
Naturally, opinions differ about where blame should lie. But I hope we can all agree about this much: the fact that many congressional websites went offline last night is deeply shameful.
There was a reason for it, of course. The President addressed the nation and urged citizens to contact their representatives. Something like that is going to produce a lot of web traffic.
But the vendors who manage those systems should have been prepared for it. Congressional websites are not particularly complex. Caching technology, aggressively and properly applied, should have been able to avoid most of this problem. To the extent that it couldn't, there still isn't much of an excuse. We're now several years into the cloud computing revolution. Competent vendors should be ready for spikes in demand, and able to spin up additional resources as necessary.
The congressional phone system also shouldn't escape blame. I was at a hackathon in SF recently where one of the teams demoed a Twilio-based app that dialed their local representative's office -- in this case it was Nancy Pelosi. It was the weekend, and they were so confident that her voicemail inbox would be full and unable to accept new messages that they'd even written a little gag about it into their pitch. It was a funny joke, but it's not particularly amusing that this inability to communicate can be counted on to happen.
This stuff is important. Too often, people in Washington look at the huge volume of emails, letters and phone calls that arrive on the hill and shrug. There are a ton of messages, so handling them necessarily becomes a bit like a factory job. And the many correspondents can be counted on to have differing opinions, so no single call or missive can ever be given very much weight. As a result, it's tempting to view dealing with constituent communications as a pointless chore -- a pressure valve by which citizens can blow off steam, but not much else.
That view is tempting, but deeply wrong. These channels are the cheapest, fastest and most egalitarian way for citizens to exercise their constitutional right to petition their government. Making sure these channels stay up and running is a serious responsibility -- one that the Capitol Hill vendor community ought to take more seriously.
Continue readingLive from OKCon
I suspect/hope that most of this blog's readership is still asleep right now, but for those who rightly begin their day with a review of Sunlight blogs over their morning coffee, let me encourage you to tune in to the proceedings here at OKCon. So far we've already heard great talks from Rufus Pollock and Glyn Moody, and Richard Stallman is beginning a talk as I post this. I'll be speaking around 8:30am EDT, and plan to say a bit about the e-Gov cuts, #savethedata and the lessons that other open data organizations can take from the episode.
If that's too early for you, I suspect that the video will be archived. And while you're at it, have a look at the OKCon schedule -- there's lots of good stuff coming up!
Continue reading