I’m happy to announce the newest project from Sunlight Labs, Poligraft. A utility built on top of Transparency Data, Poligraft takes in a block of text, parses it for entities like politicians and corporations, and returns a result set representing the political influence contained in that text. I won’t dwell on the features — read Ellen Miller’s announcement blog post and the about page for more information. What I want to talk about instead is the development process.
Third Time’s A Charm
The idea behind Poligraft is not new. Back in late 2007, well before I joined Sunlight, the nascent Labs team attempted an initial version of the concept that didn’t pan out. Then in early 2009, I still wasn’t at Sunlight, but I did develop an entry for the first Apps for America contest that was called Defogger. Defogger was embarrassingly slow, didn’t use any AJAX updating, and stopped short of making the connections between entities that Poligraft does today. Much more worthy apps placed at the top of Apps for America.
With the content plucked, Poligraft extracts the entities (people, organizations, companies) from the text with the Calais API. A service by Thomson Reuters, Calais semantically processes any given text, and returns a rich representation of that text. It’s very detailed, much more so than what Poligraft needs. Try out the Calais Viewer to see what I mean.
Using the people, companies, and organizations that Calais detected, Poligraft then uses the Transparency Data API in three steps. First, the Transparency Data entity search is called on each Calais entity. This will usually weed out the majority of entities detected by Calais, because we’re only focusing on entities that have something to do with campaign contributions. These are the “Points of Influence” you see in the sidebar, and you can sometimes see the “weed out” step if you watch closely. Second, on that subset of entities, Poligraft uses the Transparency Data aggregate endpoints to draw the graphs you see on the sidebar. Third, the “Aggregated Contributions” section in the sidebar is filled out using a pairwise aggregation endpoint that is not yet described in the official Transparency Data API documentation. It’ll be ready for public use very soon.
Providing an API
Poligraft also has its own built-in API, which is used by Poligraft itself for dynamically populating the results page via AJAX. Specify a URL or text to be processed, and get back the results in JSON format. In fact, every result page in Poligraft has a corresponding JSON representation. Just append a
.json to the unique slug, like so.
To process an article, use the
http://poligraft.com/poligraft endpoint in conjunction with a
Be sure to pass in
json=1 or else HTML will be returned, and use
url= to pass in a URL or
text= to pass in a selection of text. HTTP clients must have redirection enabled, as the response will be a redirect to a slug endpoint like
Because Poligraft does processing asynchronously, this endpoint will return a
202 ACCEPTED code until processing is finished, when it returns a
200 OK. In addition to the HTTP response code, there’s a top-level field in the JSON called
processed which is set to
false while processing is active. Poll the endpoint every few seconds until the return code is
200 or the
processed value in the JSON is true. Both techniques will work.
Open Source + Open Data
As usual, the code behind Poligraft is open source on GitHub. The APIs it uses are available for use, for free. Specifically, the Transparency Data API is incredibly valuable for building tools and apps that examine and visualize political influence. While building Poligraft, I was pleasantly surprised on many occasions by what Transparency Data provides. In the months and years to come, I hope we see many more apps built on top of it, not just from within the Labs, but from the wider community.