Congress in photos, a civic hacking success story

by

For the last 5 years or more, we’ve maintained a dataset of photos of members of Congress, mostly manually, and published them as zip files. It worked, but wasn’t a great process for us or for people using the photos. We also weren’t 100% certain about the copyright status of the photos, or whether a contractor’s copyright might be involved.

But thanks to an enthusiastic contributor, and an ensuing burst of [discussion](https://github.com/sunlightlabs/congress/issues/432) and [work](https://github.com/unitedstates/congress-legislators/pull/167), all of that’s been fixed up!

We now help manage a new [public domain dataset of photos of members of Congress](https://github.com/unitedstates/images#images-of-congress). Contribution is much easier, and collection is automated, by web-crawling the [Government Printing Office’s Member Guide](http://memberguide.gpo.gov/) and downloading their photos.

The photos are hosted on Github, so you can easily download the entire dataset through git. Or, since we’re using Github Pages to host the files on the web, you can hotlink to photos via predictable URLs. URLs are constructed using the available sizes (`original`, `450×550`, or `225×275`) and a member of Congress’ Bioguide ID, like so:

> [http://theunitedstates.io/images/congress/original/L000551.jpg](http://theunitedstates.io/images/congress/original/L000551.jpg)
> [http://theunitedstates.io/images/congress/450×550/L000551.jpg](http://theunitedstates.io/images/congress/450×550/L000551.jpg)
> [http://theunitedstates.io/images/congress/225×275/L000551.jpg](http://theunitedstates.io/images/congress/225×275/L000551.jpg)

Bioguide IDs are a standard unique identifer for members of Congress, taken from the [Congressional Bioguide](http://bioguide.congress.gov/). They can be found in other official and community datasets about Congress, like the unitedstates project’s [bulk data on legislators](https://github.com/unitedstates/congress-legislators), or [Sunlight’s Congress API](http://sunlightlabs.github.io/congress/legislators.html).

This project, simple as it is, is a textbook example of civic volunteerism in the open source world. [Hugo van Kemenade](https://github.com/hugovk) was intrigued by [Matthew Skomarovsky](https://twitter.com/skomputer) and [Rebecca Lieberman](http://rebeccalieberman.com/)’s horrifying [composite of members of Congress](http://www.huffingtonpost.com/2014/01/29/congress-average-picture_n_4688163.html), and noticed that the photos Skomarovsky used were collected manually. After [having his own fun](http://www.flickr.com/photos/hugovk/12329158515/) compositing photos, he wrote a crawler for photos of members of Congress from Wikipedia and [submitted it for inclusion](https://github.com/sunlightlabs/congress/issues/432) in our Congress API.

After a bunch of discussion about using official sources and copyright, Public Knowledge’s [Michael Weinberg](https://twitter.com/mweinbergPK) finally picked up an actual phone and [called the GPO](https://github.com/sunlightlabs/congress/issues/432#issuecomment-34481338) to get an assurance that GPO’s Member Guide’s images were public domain. Hugo rewrote the crawler to use the Guide instead and [re-submitted it](https://github.com/unitedstates/congress-legislators/pull/167), which we accepted and then moved to [its own repository](https://github.com/unitedstates/images). After an [absurdly detailed discussion](https://github.com/unitedstates/images/issues/1) over appropriate photo sizing strategy, we finalized the dataset.

The more people that use the photos, the easier it will be to keep them complete and timely. Both Sunlight and [GovTrack](https://www.govtrack.us/) already make use of the images in our projects, and it’s [seeing use](https://github.com/unitedstates/unitedstates.github.io/issues/2) in others’ too. Collecting and using photos of members of Congress is a really basic thing – it should be a Solved Problem. Thanks to Hugo van Kemenade, Michael Weinberg, and the other participants for making it one.