In a few short hours I and much of the rest of the internet will be descending on Austin, TX for SXSW Interactive. If you're among the folks who'll be attending, I hope you'll consider coming by one or more of the panels and events we'll be doing:
- Drew will be talking about corporate (and other) identifiers
- I'll be on a panel with Sarah Cohen and Vivek Kundra, where we'll discuss the successes and shortfalls of Gov 2.0
- And Ellen will be helping to judge SXSW Accelerator
But even if you can't make it to the panels, we hope you'll say hello -- just drop either Drew or me an email (tlee/dvogel (at) sunlightfoundation.com) or tweet at the @sunlightlabs account.
For those of you headed to California instead of Texas, note than an even bigger contingent of labs staffers is currently winging its way toward PyCon. They'll be leading our now-traditional open government code sprint, looking for folks who want to contribute to Open States and/or a new, super-secret (well, not really) community project.
Merry conference-going to all -- we're looking forward to seeing some old friends, and to making some new ones.
Continue readingThe Congressional Roku App – Senate Edition!
Back in August, we released three roku apps, one for each branch of government. The apps included streaming video for the White House and the House of Representatives, as well as streaming audio for the Supreme Court (no cameras allowed in there). This week, we've released an update to the Congressional Roku app to include two new major features: video from the Senate floor (courtesy of Floor.Senate.gov) and text search.
Continue readingRegulations.gov Gets an API & More
Sunlight has been interested in the federal rule-making process for quite a while: we sponsored the app contest that lead to the current incarnation of federalregister.gov, which lists federal regulations as they are published, and kick-started an effort to map regulations to the laws that authorize them during a hackathon late last year. We also have extensive experience in the analysis of corporate influence on the political process, having launched several prominent influence-related projects under the Influence Explorer banner. During the last year, we’ve begun to examine the confluence of these two interest areas: corporate influence on the regulatory process, and, in particular, the comments individuals and corporations can file with federal agencies about proposed federal regulations. The first glimpses of the results of this effort went live on Influence Explorer last fall, with the addition of regulatory comment summaries to corporations’ profile pages.
Given this history, we’ve been excited to explore this week’s relaunch of regulations.gov, the federal government’s primary repository of regulatory comments, and the source of the data that powers our aforementioned Influence Explorer regulatory content. This new release brings with it a much-needed visual spruce-up, as well as improved navigation and documentation to help new users find and follow regulatory content, and a suite of social media offerings that have the potential to expose rule-making to new audiences. There have also been some improvements to document metadata, such as the addition of category information visitors can use to filter searches by industry, or browse rule-makings topically from the homepage.
Of more interest to us as web developers is the addition, for the first time, of official APIs to allow programmatic access to regulatory data. It’s clear that the regulations.gov team has taken note of current best practices with respect to open data APIs, and have produced clean, RESTful endpoints that allow straightforward access to what is, especially for a first release, a reasonably comprehensive subset of the data made available through the general end-user web interface. While we have been successful in performing significant regulatory analysis absent these tools, our work required substantial effort in screen-scraping and reverse engineering, and we expect that other organizations hoping to engage in regulatory comment analysis will now be able to do so without the level of technical investment we’ve had to make.
Of course, there is still work to be done. Much of the work we’ve done so far on regulations, and that we hope still to do, revolves around analysis of the actual text of the comments posted to regulations.gov (which can take the form of PDFs and other not-easily-machine-readable documents), and depends on being able to aggregate results over the entirety of the data, or at least significant subsets of it. As a result, even with these new APIs, we’ll still need to make large numbers of requests to identify new documents, enumerate all of the downloadable attachments for each one, download these attachments one at a time, and maintain all of the machinery necessary to do our own extraction of text from them. While we’re fortunate to have the resources to do this ourselves, and have made headway in making the fruits of our labors available for the public, it would certainly behoove the regulations.gov team to move forward with bulk data offerings of their own. Sunlight has a long history of advocating the release of bulk data in addition to (and perhaps even before) APIs, and the regulatory field illustrates many of our typical arguments for that position; the kinds of questions that can be answered with all of the data are fundamentally different than those that can be answered with any individual piece. We recognize that offering all of the PDFs, Word documents, etc., to the public might be cost-prohibitive from a bandwidth point of view, but regulations.gov is doing text extraction of their own (it powers the full-text search capabilities that the site provides), and offering bulk access to the extracted text as we have done could provide a happy medium that would facilitate many applications and analyses without breaking the bandwidth bank.
In general, we see plenty of reasons to applaud this release and the team at EPA that's behind it. While many of its changes are cosmetic and additional improvements will be necessary for regulations.gov to reach its full potential, this update promises further progress that will benefit developers and members of the public alike. We share the enthusiasm of the regulations.gov team for increasing access to and awareness of these crucial artifacts of the democratic process, and look forward to engaging with them and the broader open government community as they continue to improve this public resource.
Continue readingHelp Open States Rate State Websites
As Open States closes in on our initial goal of supporting all 50 state legislatures (just 3 more to go!) we're also planning to put out a report card evaluating state legislative data across every state.
As the 40+ individuals that have sat down and helped us scrape state sites can affirm, most states simply don't do a decent job of making legislative information available so we're hoping that this can serve as a sort of wake up call to states that make this vital data far too difficult to access. For those few states that are doing a good job we're hoping to praise their commitment to open data and point out areas where they can do better.
We've come up with a set of criteria based on Sunight's "Ten Principles for Opening Government Data" (which expand upon the 8 Principles of Open Government Data) that we feel we can fairly apply to the states and created a survey to evaluate states against this criteria.
In order to guarantee a high quality report we'd like to get several responses per state and that's where you can help us out. Click the link below to head to a form that will ask you to evaluate the information that your state legislature makes available via their official website. By doing this you'll help us ensure that our eventual report is as accurate and as complete as possible.
(If you have any questions feel free to contact jturk@sunlightfoundation.com. If there are any questions you aren't sure how to answer we'd prefer you leave them unanswered instead of guessing.)
Continue readingIntroducing python-sunlight
Hello, World!
We'd like to welcome python-sunlight into the most excellent family of open-source projects maintained by Labs. This particular project aims to unify and normalize the Sunlight APIs into a single Python library that's easy to understand, use, and fun to play with.
This library currently supports our Congress API, Open States API, and Capitol Words API. As such we're deprecating the old python-sunlightapi and python-openstates libraries. They'll still work but will no longer be receiving updates, so switching is highly recommended.
This library has some neat features that should make migration painless - as well as some new features, such as a standardized location to place your Sunlight API Key, which makes testing (as well as distributing) your app even easier.
We've just released version 1.0.1 over on PyPI, which makes installation a snap on any system with pip
. The documentation is fairly complete, but feedback is super welcome -- we're eager to learn where folks get stuck.
Most of the bugs seemed to be worked out after the Boston Python Project Night, where we had a few folks test out the library. A special thanks to all our beta-testers!
Alright, so how do I get started?
Hacking on python-sunlight is super easy. Here's how to get setup.
You'll need an API key. If you've not done so, get an API key (it's alright, we'll wait, go ahead).
Back already? Great.
Now, you'll have gotten the email that has a long-ish string of letters and numbers - let's save this to ~/.sunlight.key
(where python-sunlight will look for a key). If you already had a key, it'd be worth it to go and dig it up.
If you're on a UNIX-type (MacOS, GNU/Linux, *BSD, AIX or Solaris (or any of the other POSIX-ey systems)) machine, you should be able to run a command that looks like the following:
echo "your-api-key-here" > ~/.sunlight.key
It's worth mentioning that your-api-key-here
should actually be your API key that was emailed to you up above.
Next, you should install python-sunlight via pip
. If pip
is not installed on your system, please download and install pip.
pip install sunlight
And you're good to go!
Without further ado, an example!
#!/usr/bin/env python
# Copyright (c) 2012, BSD-3 clause, Sunlight Labs
from sunlight import capitolwords
from sunlight import congress
phrase = "death metal"
# Today, we'll be printing out the Twitter IDs of all legislators that use
# this phrase most in the congressional record.
for cw_record in capitolwords.phrases_by_entity(
"legislator", # We're getting all legislators
sort="count", # sorted by how much they say
phrase=phrase, # this word
)[:6]: # We'll just try the top 5 legislators
legislator = congress.legislators(
bioguide_id=cw_record['legislator'], # Look up this biogude (unique ID)
# for every fed. legislator
all_legislators="true" # search retired legislators
)
if len(legislator) >= 1: # If we were able to find the legislator
legislator = legislator[0] # (this is a search, so it's a list)
if legislator['twitter_id'] != "": # and they have a Twitter ID
print "%s. %s (@%s) said %s %s times" % (
legislator['title'],
legislator['lastname'],
legislator['twitter_id'],
phrase,
int(cw_record['count'])
) # Print it to output :)
The output looks like this:
Sen. Feingold (@russfeingold) said death metal 979 times
Rep. Jackson Lee (@JacksonLeeTX18) said death metal 923 times
Sen. Leahy (@SenatorLeahy) said death metal 800 times
Sen. Kyl (@senjonkyl) said death metal 755 times
Sen. Durbin (@SenatorDurbin) said death metal 593 times
And once more (this time, searching for "san francisco"):
Rep. Filner (@CongBobFilner) said san francisco 1346 times
Sen. Feinstein (@senfeinstein) said san francisco 1288 times
Sen. Boxer (@senatorboxer) said san francisco 1181 times
Rep. Pelosi (@NancyPelosi) said san francisco 1135 times
Rep. Eshoo (@RepAnnaEshoo) said san francisco 677 times
Rock on!
Questions, concerns, bugs, patches, examples and virtual hugs are all welcome on our GitHub page, so please do check it out!
Continue readingLabs Update: February 2012
Previously in Sunlight Labs: Influence Explorer redesigned, James moved to Boston, and Capitol Words was released. So then why is Luigi cleaning out his desk? Where did Transparency Data go? Why is Ethan calling in to the morning check-in meeting? Find out on this episode of Labs Update!
Goodbyes…
Let's start off with some terrible news; Luigi Montanez has up and left the Labs. He'll be working for a new startup in the world of politics so I'm sure you will see much more of his amazing work in the near future. The local frozen yogurt and frozen custard shops will feel the loss of his business.
We also recently said goodbye to designer Chris Rogers. Did you like our Indecent Disclosure poster or the Transparency Camp 2011 branding? That's just two examples of her fine work.
...lead to open positions!
Sunlight Foundation is hiring!
- Graphic Designer - 2 positions available!
- Software Developer / Civic Hacker
If you haven't noticed, this is a really great place to work. Talented people, a fun environment, and lots of nearby delicious food. We also do really important work.
We're even offering a referral bonus: if we hire your suggestion, I'll give you a sincere hug.
Sunlight Seattle and Influence Explorer
Goodbye, Transparency Data! As part of an effort to streamline our branding, TransparencyData.com has been transformed into data.influenceexplorer.com. If you rely on the Transparency Data API, do not fret; it is the exact same API at a new URL and with much nicer documentation. All calls to Transparency Data URLs are being forwarded to the new domain so don't freak out thinking that all of your projects have broken.
In data-related news, Alison has recently refreshed state campaign finance data with the latest dump from FollowTheMoney.org. Ethan has wrapped up development on a project I hinted to at the end of last year where we will be pulling in timelier, but messier, data directly from the FEC. Andrew has been cleaning up much of IE's JavaScript infrastructure and working on a new site highlighting influence on the regulatory process.
Influence Explorer team lead Ethan has moved to Seattle, but will still be working remotely for Sunlight. Unfortunately, my knowledge of Seattle is quite limited so I've no witty jabs to include here. If you have any, please contact me.
Sunlight Boston and the Open States Project
It's like a spin-off of your favorite sitcom, but just as good as the original. Sunlight Boston is fully staffed with new hires Paul Tagliamonte and Thom Neale. Rounding out the team is a new data quality intern, Nina Psoncak. They're based out of a hip co-working space so if you find yourself in the area, stop by, compliment their code, and tell them how much cooler they are than Sunlight DC.
And they are getting work done too! Scrapers have been fixed and/or updated for California, Delaware, Colorado, Hawaii, and Rhode Island. billy, the underlying scraping system, now allows for the merging of legislators. The Open States API has been updated with several feature requests to better support mobile clients.
Sunlight Live
We took the State of the Union address as an opportunity to try out our new Sunlight Live platform, Datajam. Dan and Luigi (before his defection) did a really amazing job on the project. The event administration tools and chat module are super slick.
Upwardly Mobile
I've talked about it for months, but seriously, we are wrapping up Upwardly Mobile! The finishing touches are being added (animated cow, need I say more?), communications and organizing are planning the launch, and I'm wrestling with final tweaks to static maps generated with matplotlib.
Work begins on the third Knight app soon! We'll have an exciting announcement about a new partner that will be working on it with us.
python-sunlight
As the number of APIs we offer increases, so do the number of client libraries needed to work with each service. The madness must stop! Paul has started work on python-sunlight, a grand unified Python wrapper for (eventually) all of Sunlight's APIs. We are launching with support for the Capitol Words, Congress and Open States APIs. An experimental version of the Influence Explorer API is included and work on the Real Time Congress API will begin soon. Just pip install sunlight
to get started. Python and Sunlight are BFF.
Subsidyscope
As the Subsidyscope project winds down, Kaitlin and Drew have updated the data with the latest release from USASpending.gov and prepared six more sectors for impending launch. They've also been working with Superfastmatch for some upcoming projects.
Team Journalism
The Sunlight Reporting group has historically been responsible for all journalistic output, but in recent months Labs has been taking an increasing role in our reporting. We've got access to these vast data sets, so why not do something worthwhile with them, right?
Joining the team is new hire Jacob Fenton who will be working as our embed in the Reporting Group. Since starting at Sunlight he's been knee deep in the swampy morass commonly known as raw FEC campaign finance reports. Ryan and Lee have been covering super PACs and elite donors for the 2012 presidential campaign.
Team Sysadmin
With our ever expanding troop of remote workers, Tim was tasked with finding a solution to replace our existing (and terrible) conference call system. Using ambient mics mounted in the ceiling, a mixer, a web cam, and Google Hangouts, Tim was able to rig up a solution that works surprisingly well!
When the robot's nose glows green, you know you are being broadcast.
Team Tom
Tom has been (with much help from James and Daniel) setting the groundwork for a new scraper project (GASP!, literally) that we'll be asking you all to lend a hand with (stay tuned for more on that). Otherwise it's been the usual glamorous mix of contracts, grant reports, negotiating metrics and dealing with turnover. But he did buy a BeagleBone, which is kind of exciting (less so for Tim, who's been receiving a lot of tedious questions about rc.d and wpa_supplicant as a result).
Tidbits
- Ever wondered if there are any crazy late night Tweets from members of Congress that get deleted the next morning? Eric is working on a project that will help uncover long lost updates from Congress.
- Shouldn't come as a surprise, but there's been another awesome update to Congress for Android consisting of design updates and search features.
- Daniel released Lapidus, a metrics tracking dashboard we will be using internally to track our goals.
- February should see the release of the 180° project which will turn the cameras on the audience at Congressional committee hearings.
Ali has been slowly rolling out our new logo as the rebranding effort continues. The next few months should see updated identity on all of our properties including a completely new SunlightFoundation.com!
An awesome new release of the Congressional Roku app is coming soon.
- February's afternoon snack of the month is the PBJ smoothie at Yola. Tell them Sunlight sent you and they'll look at you weird because they don't know who we are!
- The sandwich gods are smiling upon Sunlight.
Introducing Lapidus, an Analytics Dashboard
Lapidus is an Analytics Dashboard we developed in response to our desire to track metrics for all of our projects, whether they are web sites, APIs, mobile apps, etc. Sunlight has multiple projects that target different audiences and have different uses, but it is important for us to understand how all of these projects are used. Beyond that, we wanted to improve how we compared metrics across our projects -- while keeping in mind that not every possible comparison makes sense. With Lapidus we can view metrics across all of our projects in a single view, and when viewing aggregates across date ranges, Lapidus automatically color-codes certain metrics based on whether they increased or decreased from the previous period. Lapidus does not replace Google Analytics -- in fact it relies on GA for web metrics data -- but it does extend our ability to record and view additional metrics of our choosing.
This project was started by Jeremy Carbaugh (who named the project after a character from 'Lost'), who laid out the initial models for the metrics app with an eye toward flexibility. Ali Felski provided the design which also inspired some of the better features of the site (color-coding, sorting, etc.).
Continue readingDon’t Use Zip Codes Unless You Have To
Many of us in the labs found it thrilling to watch the internet community unite around opposition to the SOPA and PIPA bills yesterday. Even more gratifying was seeing how many participating websites used our APIs to help visitors find their elected representatives. This kind of use is exactly why we built those tools, and why we'll always make them freely available to anyone who wants to make government more accessible to its citizens.
Still, I'd be lying if I said we don't occasionally wince when we see someone using our services in a less-than-ideal way. It's completely understandable, mind you: the problem of figuring out who represents a given citizen is tougher than you might think. But we hate to think that anyone is getting bad information about which office to call -- talking to the people who represent you should be simple and easy! Since this comes up with some frequency, it's probably worth talking about the nature of these problems and how to avoid them.
TL;DR: Looking up congressional districts by zip code is inherently problematic. Our latitude/longitude-based API methods are much more accurate, and should be used whenever possible.
The first complication is probably obvious: zip codes and congressional districts aren't the same thing. A zip code can span more than one district (or even more than one state!), so if you want to support zip lookups for your users, you'll have to support cases where more than one matching district is returned. Our API accounts for this, but it's important that your code do so, too. We err on the side of returning inclusive results when a zip might belong to multiple congressional districts.
Unfortunately, things are actually more complicated than that. Most people don't realize it, but zip codes describe postal delivery routes -- the actual routes that mail carriers travel -- not geographically bounded areas. Zip codes are lines, in other words, while congressional districts are polygons. This means that mapping zips to congressional districts is an inherently imperfect process. The government uses something called a zip code tabulation area (ZCTA) to approximate the geographic footprint of a given zip as a polygon, and this is what we use to map zip codes to congressional districts. But it really is just an approximation -- it's far from perfect.
It's much better to skip the zip code step entirely and simply look up your location against the congressional district shapefiles published by the Census Bureau using a precise geographic coordinate pair instead of a hazy, vague zip code. Thanks to the Chicago Tribune News App Team's excellent Boundary Service project, we offer exactly this capability. If you can, we strongly encourage you to get a precise latitude/longitude pair from your users (either by geolocating them or geocoding their full address), then use it to determine their representatives.
"But what about house.gov's ZIP+4 congressional lookup tool?" I hear you asking. It's true, many House offices use this tool to determine who your representative is (and whether you're allowed to email them). Unfortunately, just because this tool is on an official site doesn't mean it's perfect. Here in the Labs, Kaitlin (who lives in Maryland) can't write her representative because the ZIP+4 tool gives incorrect results. Besides, not that many people know their full nine-digit ZIP+4 code.
So if you can, use latitude/longitude pairs. If you can't, and have to depend on zips, we'll supply results that are very, very good -- but not as good as real coordinates would allow.
Continue readingBroadcasters’ Public Files Should Be Published Online (and it’s absurd that we’re even having this conversation)
Luigi passed along a couple of links to a great/infuriating On the Media segment about the new rules the FCC is considering related to the online disclosure of political ad purchases.
To run through the issue quickly: every broadcast station is required to keep a "public file" of paper records related to campaign ad purchases. These records show basic information about how an ad was purchased, who bought it and when it aired. As the name implies, the file is available for public inspection, but only if you show up at the station and ask for it.
The FCC has proposed a rule that would require the public file to be posted online. We feel that this is an obvious and overdue step, and have submitted comments to the rulemaking saying as much. After all, it's 2012--it's absurd to claim that information is "public" if it isn't also online. And this information is particularly important: with Citizens United enabling a new flood of money into our political system--with less acountability!--keeping track of the ways in which wealth is deployed to move political opinion is more important than ever. The public file is a vital source of this kind of information.
The first OTM segment, which features Steven Waldman, does a good job of explaining all of this. The second one mostly just makes your blood boil. In it, Jack Goodman, a lobbyist for the National Association of Broadcasters, makes the case that posting the public file online would represent an onerous burden on broadcast stations.
Clearly, this is nonsense. As Waldman notes, Goodman is claiming that his would be "the first industry to use the internet to become less efficient." I've seen what the public file looks like. Yeah, there's a bunch of stuff in there, but obviously not too much to fax to the FCC once a day (or, preferably, enter into a modern electronic records-keeping system--perhaps one supplied by the FCC--instead of continuing to record everything on paper like it's 1970).
But forget for a moment how ridiculous Goodman's argument is. Consider how outrageous it is that he's even making it. This is one of the underappreciated pathologies that lobbying produces. If you're an organization like the NAB and you have a staff lobbyist, whenever an issue comes along--however minor--your lobbyist can be counted on to make a fuss about it. That's what they're paid to do, right? Here we have a disclosure burden that is basically the bureaucratic equivalent of your office manager announcing that expense reports have to be filed using a webform. Yet for some reason we're now having a national conversation about it.
It's absolutely dumbfounding to have an effort to make money in politics more transparent weighed against someone not wanting to use the fax machine. And yet here we are. That's the magic of the lobbying industry.
Continue readingThe FEC’s New Mobile Site Could Use Some Work
Last Friday the Federal Election Commission announced the launch of a new mobile interface. You should try it for yourself at http://fec.gov/mobile/. The site declares itself to be a beta, which I suspect you'll agree is something of an understatement.
Let's call a spade a spade: there's no use pretending this is good. To begin with, there are obvious superficial problems: graphs lack units, graphics have been resized in a lossy way, and the damn thing doesn't work on most Android devices.
Worse, there are substantive errors. Look at Herman Cain's cash on hand. Why are debts listed as a share of positive assets? Look at the Bachman campaign's receipts. Why is "total contributions"--which should reflect the entire pie--just a slice? (It's not 50% because other slices seem to have incorrectly counted overlap, too.) Why don't any of the line items below the graphs reflect the fact that some are components of others?
We asked the FEC for comment, but so far they've declined. Once the powers that be over there have a closer look, I'm confident they'll agree that the mobile site is a mess.
It's hard to know what to say about all of this. Part of Sunlight's mission is to encourage government agencies to embrace technology more fully. We don't want to send mixed messages by jumping down their throats when they actually try to do so. Sure, we gave FAPIIS a hard time, but that was because the site's creators were obviously and deliberately undermining the idea of public oversight. By contrast, I don't think anyone who worked on the FEC Mobile site intended to do a bad job.
And of course there's a fundamental question. Obviously the bits that are relaying incorrect information are a problem. But assuming those get fixed, is a half-hearted attempt like this better than nothing? I suppose there might be some poor, twisted soul who will enjoy listening to FEC meeting audio while they're at the gym (though frankly, if such a person existed I suspect they'd already be working here). But as a general matter it's difficult to imagine anyone needing a mobile interface to a set of campaign finance data that's as narrowly conceived as this one.
To their credit, it doesn't seem as if this mobile interface was created at the expense of the organization's much more important responsibility to publish data--a mission that, by and large, the FEC fulfills ably and with steadily increasing sophistication. There's always room for improvement, but the truly pressing needs, like reliable identifiers for contributors and meaningful enforcement of campaign finance law, are beyond the reach of the organization's technical staff.
Still, it's a bit amazing to see obviously wrong numbers attached to a product that Chairperson Bauerly has been quoted as endorsing appreciatively. Among those of us concerned about America's campaign finance system and the effect it has on our democracy, there is a sense that the FEC's leadership does not take its mission particularly seriously. The release of shoddy work like this mobile site does little to dispel that impression.
Continue reading