Is that photo vintage, senator? Scattered data paint outdated picture of some in Congress

by
A black and white photo of newly elected member of Congress Bruce Westerman
Newly elected Rep. Bruce Westerman, R-Ark., in black and white! (Photo credit: New Member Pictorial Directory)

Since the new Congress started in [early January](https://github.com/unitedstates/images/commit/7eb7278deac9251bb7d787a96c5dabf84eabb60f), visitors to [OpenCongress](https://www.opencongress.org) have seen something kind of strange: [black and white photos](https://www.opencongress.org/people/show/412608_Gary_Palmer) of newly elected members of Congress positioned alongside glorious, full-color photos of their longer tenured colleagues. As Sunlight’s mercifully tolerant designers will tell you, this was not a conscious style choice — it’s just another example of a trade-off made because of scattered government data.

Here’s the situation: OpenCongress gets its legislator photos from the open source [unitedstates-images](https://github.com/unitedstates/images) repository, a project that [grabs images](https://github.com/unitedstates/images/blob/gh-pages/scripts/gpo_member_photos.py) from the Government Publishing Office’s (GPO) [Member Guide](http://memberguide.gpo.gov/) (among other places, as we will see). If you visit that page — and check out its [underlying structured data](http://www.memberguide.gpoaccess.gov/Congressional.svc/GetMembers/113) — you’ll see that it’s still defaulting to the 113th Congress, which packed its bags at the end of last year. If you try to get structured data for the 114th Congress, you get [nothing](http://www.memberguide.gpoaccess.gov/Congressional.svc/GetMembers/114): `[]`.

I called GPO to ask about when the site would be updated; they estimated “June.” So we looked elsewhere.

Specifically, project collaborators found a [New Member Pictorial Directory](http://www.gpo.gov/fdsys/browse/collection.action?collectionCode=GPO&browsePath=Congressional+Pictorial+Directory%2F114th+Congress&searchPath=Congressional+Pictorial+Directory%2F114th+Congress&leafLevelBrowse=false&isCollapsed=false&isOpen=true&ancestors=root&packageid=GPO-PICTDIR-NEW-114&ycord=42), which is in every software developer’s favorite data format: .pdf. Thanks to some help from civic hacker [Josh Tauberer](https://twitter.com/joshdata) and `pdfimages -j`, the photos were ripped from that document. From there, it was just a matter of matching the photos up with legislators’ names and [bioguide IDs](http://bioguide.congress.gov/biosearch/biosearch.asp), which hadn’t yet been issued when the document was published. This process culminated in a decent-sized [contribution to the unitedstates/images project](https://github.com/unitedstates/images/pull/21).

And so now we (and everyone that wants to grab images from [the repository](https://github.com/unitedstates/images/tree/gh-pages/congress/450×550)) have black and white images for some new legislators. This will continue to be the case until there is a current, centralized data source for high-quality, public domain photos of every member of Congress. (If you know of one, please tell us!)

My gripes about our disjointed image repository set might seem trivial, but they point to a larger truth: If a government data source isn’t updated appropriately, its value is principally historical.