One of the hardest parts of creating an open-data policy is figuring out where to start. Here at Sunlight, we have several resources to help with this, including our Open Data Policy Wizard.Continue reading
In the coming months, the Sunlight Foundation will consolidate our resources, making it easier for you to access and use the data we open up about government and political influence.Continue reading
Some thoughts on the strategy of retiring projects and how we look back at our work on the new tools page.Continue reading
Roger L. Simon, writing at Pajamas Media, announces a new transparency project, soliciting suggestions from readers on what the blogosphere-bloomed news organization should dig into. I wouldn't presume to play assigning editor for the effort, but hope I can help by pointing to some resources (full disclosure--many, but not all, are built by or supported by the Sunlight Foundation) that might help Pajamas Media readers do some digging on their own and get the ball the rolling.
Simon notes that government spending is a big issue, and starts by asking about spending on government employees. He writes, "Some of ...Continue reading
This morning the Data Commons team released their newest tool: Checking Influence, a bookmarklet that lets online banking users gain insight on how the merchants with whom they do business are influencing our political system. We think it's a great example of the future of influence disclosure -- hopefully you'll agree.
But I won't prattle on about it any more here. The announcement blog post goes into more detail. I hope you'll give that a read, and give the tool a try.Continue reading
ScraperWiki is a project that's been on my radar for a while. Last week Aine McGuire and Richard Pope, two of the people behind the project, happened to be in town, and were nice enough to drop by Sunlight's offices to talk about what they've been up to.
Let's start with the basics: remedial screen scraping 101. "Screen scraping" refers to any technique for getting data off the web and into a well-structured format. There's lots of information on web pages that isn't available as a non-HTML download. Making this information useful typically involves writing a script to process one or more HTML files, then spit out a database of some kind.
It's not particularly glamorous work. People who know how to make nice web pages typically know how to properly release their data. Those who don't tend to leave behind a mess of bad HTML. As a result, screen scrapers often contain less-than-lovely code. Pulling data often involves doing unsexy thing like treating presentation information as though it had semantic value, or hard-coding kludges ("# ignore the second span... just because"). Scraper code is often ugly by necessity, and almost always of deliberately limited use. It consequently doesn't get shared very often -- having the open-sourced code languish sadly in someone's Github account is normally the best you can hope for.
The ScraperWiki folks realized that the situation could be improved. A collaborative approach can help avoid repetition of work. And since scrapers often malfunction when changes are made to the web pages they examine, making a scraper editable by others might lead to scrapers that spend less time broken.Continue reading
Today Sunlight is launching Poligraft, what I think is one of the coolest, most revealing and most interesting tools of... View ArticleContinue reading
Long ago, putting together a map of data points would be the sole domain of a skilled GIS practitioner employing... View ArticleContinue reading
Just a few odds, ends and bits of reporting that didn't make it into this post that relies on data from the Foreign Lobbying Influence Tracker that we collaborated on with our friends from ProPublica:
Numbers: I really hesitated to use the administration's claims of $210 billion in tax revenue raised (there's a fairly good breakdown of which proposals raise what part of the $210 billion in the New York Times here). Tax havens offer secrecy, so even if the government knows how much money they hold (I doubt that it does) it can't determine who ...Continue reading
On Sunlight's "twitter lobbying" efforts and building a more effective means for communicating to congress. Do we need a GetSatisfaction.com for Congress?Continue reading