How Sunlight, EFF and 150 civic hackers reverse-engineered Congress’ email system in two days

by

Constituent communication is a big part of what we’re into at Sunlight — we believe that a legislature in close touch with an active and engaged electorate is the most effective tool we have for maintaining a working democracy.

Understandably, then, it’s long dismayed us that Congress doesn’t provide an easy or straightforward way to have that kind of discourse. Each member’s email is obscured by a contact form — and while perhaps it’s fair that our legislators not be subject to constant bombardment and spam, these forms often aren’t intuitive, accessible or sometimes even meant to be used. Many are guarded by CAPTCHAs or gated by a zip+4 requirement, ensuring that nobody living outside a lawmaker’s home district (or perhaps even those who aren’t able to find out what their zip4 is) is able to contact them for any reason. Sure there’s the congressional phone tree, and phone calls are still the best way for Jane Q. Public to get her voice heard on the Hill outside of in-person visits, but operating hours are limited and, let’s face it, phones are inconvenient.

Screenshot of OpenCongress' contact-your-reps tool
Contact-Congress data is currently in use on OpenCongress‘ contact tool, originally built by the Participatory Politics Foundation

So we’re left with forms that won’t work unless you specify whether you’re a Mr., Mrs. or Miss; that require you to know which side of the street you live on and how that maps to some arcane number maintained by the post office; and that demand you categorize your message into some predefined bucket, regardless of whether there’s one that’s appropriate or not. We live in a nation that invented the telephone and email, surely we can do better than this.

Building on a well-laid foundation

David Moore and the folks at the Participatory Politics Foundation thought so, too, and a few years ago — with funding from Sunlight — they built Formageddon, a tool for programmatically filling out reverse-engineered contact forms for members of Congress. Formageddon was, as far as I know, the first of its kind, and somewhat of a marvel — it dealt with CAPTCHAs, errors, retries and did a darn good job of letting users of their site OpenCongress contact all 3 of their lawmakers in one go. Pretty great!

Sunlight has been the maintainer of OpenCongress for almost a year now, and some of the original Formageddon code still powers our contact-your-rep feature on the site. It did a great job, but one thing that always irked us was that while the tool itself is free and open-source, the data that powered it — the instructions for how to fill out these forms — was locked away in our database. We had no great way to get more eyeballs on it, and we were solely responsible for maintaining it — knowing when it broke, figuring out a fix and resending any messages that had failed. With limited development resources, this seemed to us like a pretty bleak approach that we wouldn’t be able to maintain over time.

Working out in the open

So while I was gearing up for the daunting task of fixing a huge number of representatives whose forms had fallen out of date, my colleagues Eric Mill and Amy Ngai were talking to the Electronic Frontier Foundation about a new mechanism for constituent communication that could be maintained in the open. We’ve had great luck collaborating with other folks in the open gov space in the @unitedstates organization on Github — a civic commons of sorts where folks who have to write data-ingesting or -cleaning code anyway can donate it to the greater good. @unitedstates repos power tools for not only us, but also the likes of folks working with GovTrack.us, The New York Times, Yahoo! News, Time and probably many others by now. We’re all incentivized to keep the code and data up-to-date and everybody wins as a result.

So we decided that any new effort should be done in this same spirit, and with that, Contact-Congress was born. As we began working, it turned out we weren’t even remotely alone in looking for a solution to this problem. There are few vendors that provide constituent communication as a service, and even they are mostly locked in to one or two upstream providers who route their messages. We found that it was a common point of frustration among vendors that they were powerless to fix bugs as they came up, and it could take hours or days before upstream fixes were made, which sounds like an advertisement for open source in and of itself. I’m certain that we never would have found all of the folks in this same space with us, and that our work would have been far more difficult and far less successful had we decided to go it on our own, and I think that’s a huge open source lesson totally validated by our experience — if it’s worth doing, it’s probably worth sharing.

How it works

The project’s core data (like other @unitedstates efforts) is serialized in YAML, a compact and human-readable data format. Each member’s form is codified into a couple lines of metadata followed by a set of steps to be taken by some piece of backend software to fill out and submit the form in question. When deciding on a language set to use for distilling this information, I looked at existing tools which seek to automate interaction with web pages.

Jonas Nicklas’ Capybara stuck out to me as a great example of a pluggable system for scripting a web user, so I decided on a subset of its commands — a member’s contact steps may include keys like “visit,” “fill_in,” “select,” “click_on” and others to help determine what should be done and in what order. Repeated actions on different fields — such as filling in a dozen text inputs — can be collapsed into a single “fill_in” array for brevity, so long as the order of operations remains intact.

Following these instructions can be as simple as parsing and transforming the YAML out to a capybara script, or integrated into a much more robust system in whatever way the integrator sees fit. OpenCongress has been retrofitted to use this data, and the EFF’s Congress-Forms project has been built around it from the get-go. You can see examples of the schema here, which should all look pretty readable if you’re at all familiar with HTML forms and CSS selectors.

GitHub is seriously emerging as the backbone of modern civic hacking

If there’s one takeaway from the whirlwind of the last week or so, this is it: Every aspect of our effort, from our first run at the Senate nine months ago to the awe-inspiring conquering of the entire House in two days by EFF’s volunteer corps, worked as well as it did because we used git, and more specifically, GitHub.

Screenshot of the current status of legislators' contact forms
A continuously updated readout of each member’s status is available on the contact-congress project page, thanks to the hard work of Bill Budington and the EFF

We share all of our data as YAML files hosted in version control, and can easily pinpoint changes made to any lawmaker’s form right at the instant that it’s committed by a maintainer. We can use the log to find out whose forms have changed and selectively update those in our system without having to rebuild everything from scratch on each import, saving CPU cycles, energy and money.

The bookmarklet code I wrote to help contributors generate the YAML itself pulls live data from other GitHub repositories via their API in order to stay up-to-date as legislators are elected in and out. We built continuous-integration style testing which can be run automatically via GitHub’s service hooks to make sure that new changes in fact fix the problem. We used GitHub issues to track every member’s form and coordinate more than 150 individual contributors all pushing code at the same time.

GitHub brought sanity to total chaos throughout this process, and I found it fascinating how the workflow we’re used to implementing with a pretty small team of developers at Sunlight scaled almost effortlessly (hopefully I can say this without slighting the heroic folks who also did tons of coordination on IRC while this was all going down; thanks Sina Khanifar, Bill Budington, Thomas Davis, Jason Rosenbaum and all the contributors who helped to shepherd folks once they got the hang of things) to a ludicrous number of contributors. Even the project’s site and documentation are hosted for free on GitHub Pages. GitHub is seriously a national treasure.

How you can get involved

While an overwhelming amount of generosity and help went into getting this data gathered for the current Congress, this remains an ongoing need. With each election cycle comes a new crop of members who’ll need to have their forms done. Likewise, websites change and are redesigned all the time, and just like any scraper, our forms will break. So, if this project sounds interesting to you, you can watch or star it on GitHub and pitch in when you see a member’s form fall out of the green. Contributors are working on getting an automated system in place to re-open tickets when forms break, so it should be pretty easy to remain proactive about fixes. You can read all about how folks are contributing to the project, and about its current status at http://theunitedstates.io/contact-congress/.

Screenshot of the documentation for contributing to contact-congress
It’s easy to get started contributing, thanks to a series of videos created by Sina Khanifar of taskforce.is

Finally, a (long) word of thanks

So much collective time and effort went into getting over this hurdle in such an impressive fashion that I feel compelled to name names. I myself was out of the country when the volunteer call went out and so the hard in-the-trenches work was done by other folks who took to it more naturally than anyone could have hoped. So from Eric, Amy and myself at Sunlight to Rainey Reitman, Bill, Sina and Thomas working with EFF, Jason at Action Network, Paul Nickerson and all of the project’s contributors so far:

@moizsyed, @darrikmazey, @fazam, @unthunk, @d-reinhold, @dsissitka, @kuyan, @sqweak, @agrif, @buchelew, @estiens, @Aaron1011, @livesurge, @mejackreed, @sinned, @lauradhamilton, @scrozier, @liviucmg, @timdavila, @stevenmg, @zanetaylor, @spulec, @akosednar, @makecakenotwar97, @ptariche, @ahdinosaur, @elcapo, @rhunbre, @NickMk, @amit, @anmonteiro, @rwinikates, @j4yd0rs3y, @cllunsford, @andylolz, @chill117, @shwei, @gcosta, @Braunson, @fly, @netinept, @spiggy, @pjfamig, @imthinhvu, @ralfharing, @gronkeff, @gruiz17, @ryan-ludwig, @schmich, @carter-sande, @johnjminer, @mandarg, @an, @fiendly, @radioation, @greggawatt, @kfinity, @jadient, @mariehuynh, @norova, @knoxzin1, @josegonzalez, @winterchord, @dmelliott0311, @wigginus, @trusche, @EhevuTov, @rayashman, @ricanking787, @6a68, @sodaplayer, @wxguychris, @noahc, @clindsay107, @rabdill, @ruishi, @benvinegar, @evonfriedland, @garrettpauls, @ericburns, @malikabdul, @sjs, @Edderic, @dustinbrownman, @abrudtkuhl, @billyvg, @ahadb, @deepakhj, @jonocodes, @morganestes, @danasf, @omsai, @cbumgard, @vincentbarr,

…plus the 50-some more whose names I can’t get to on the project’s contributors page, THANK YOU!