Inbox Influence: Political Influence Data in Gmail

by

Inbox Influence screenshot

Today we’re officially launching Inbox Influence, the latest addition to our suite of political influence tools. Inbox Influence is a browser extension that adds political influence data to your Gmail messages. With Inbox Influence installed, you’ll see information on the sender of each email, the company from which it’s sent, and any politician, company, union or political action committee mentioned in the body of the email. The information is added unobtrusively and nearly instantaneously, and includes campaign contributions, fundraisers and lobbying activity. You can use it to add context to news alerts, political mailers and corporate emails, or just to see who your friends donated to in the last election. We hope that the tool will be of interests to journalists, activists and anyone interested in seeing the political activity of the people and organizations they communicate with.

Check out the video preview:

Inbox Influence is the latest addition to our set of contextual political influence tools built off of Influence Explorer. If you haven’t already, be sure to check out Poligraft, a bookmarklet that adds influence data to any article you read, and Checking Influence, a bookmarklet that integrates with your online banking site to show the political activity of the companies you do business with.

Under The Hood

The basic idea of adding political influence data to email has been kicking around internally in Sunlight Labs for over a year. Back before we even launched Influence Explorer, we developed a Rapportive plugin that matched the sender of an email to contributors in Transparency Data. Our next idea was to utilize the Poligraft API to add influence data to the actual text of the email. But Poligraft–or more specifically, Calais, the text processing service used by Poligraft–can take several seconds to analyze a text. This is fine for adding information to a single news article, but too slow for a user that may click through ten emails in a minute.

To be able to highlight entities in real time, we built a custom string matching service. Unlike Calais, there’s no sophisticated natural language processing to determine what is and isn’t likely to be an entity. And unlike a standard regular expression engine, it can handle the three hundred thousand unique strings that we recognize. For those interested in the algorithm itself, we use a trie structure, where each edge is a token and child sets are stored in a hashtable. This gives a tree that is very wide (up to the number of distinct words) and shallow (as tall as the number of words in the longest string), making for very fast traversal.

The other trick to making the service work in real time is aggressive caching. The Influence Explorer APIs that the service is built on are fast (around 50ms response time), but still too slow if we need to make 20 different requests to highlight the entities in one email. Instead, we precompute the exact JSON response that needs to be returned for all 130,000 entities in the system. Once the text matching algorithm has found the entities, the caching layer can simply concatenate the JSON responses for each entity.

The end result is a service that adds contextual information to emails with almost no user-perceptible delay. In our tests, the round trip time from the client for processing an email is under 100ms for short messages and under 200ms for a 4KB email. Matching the sender and sender organization is done as a separate AJAX request, since the universe of possible individuals is too large to cache. Sender information comes from the Transparency Data API, and the sender’s organization is derived from the email domain using data from DBpedia.

Security and privacy are obviously big concerns in an application like this. All data is transmitted over SSL and used only to server that single request. No data is stored or logged in any way.

As with all of our projects, the code is open source and available on GitHub. Two parts of the project seem very suited for re-use: the fast string matching and the JavaScript for integrating into Gmail. You can also access the underlying influence data through our free APIs, described earlier on our blog. Take a look, and be sure to let us know what you think.