This month, a few members of Sunlight Labs continued our tradition of attending the always-exciting annual conference of North American Python developers: PyCon 2015. Here are the biggest highlights and takeaways from our trip.
Continue readingOpenGov Voices: Opening up government reports through teamwork and open data
Recently, a project started to gather the work of every inspector general (IG) in the U.S. government by using web scrapers. This effort has now hit a major milestone, gathering the reports of every U.S. federal IG that publishes them: 65 inspectors general with over 18,000 reports.
Continue readingHow partisan are your state’s legislators?
We've used statistical models to visualize how partisan (almost) every state legislator in America is. Find out where your reps stand!
Continue readingSunlight at PyCon 2014
Labs members go to Canada, eat poutine and Paultag dazzles with an A+ talk. It's Sunlight's recap of PyCon 2014!
Continue readingOpening data: Have you checked your pipes?
Almost every technical project (and every idea for one) has an initial cost known as ETL. So why aren't we talking about it?
Continue readingHow Sunlight updated Churnalism — and created a new tool in the process
Yesterday we told you we updated Churnalism with fresher Wikipedia content. Today, we'll tell you about the technical challenges involved — and about a new tool spawned from that effort.
Continue readingCome to Ladies who Code DC!
Ladies who Code (like the name suggests) is a gathering of ladies who code. Ladies Who Code, which already has chapters in Manchester, New York and London, recently opened their DC chapter and the first Meetup will be hosted at Sunlight’s DC offices.
What: Ladies Who Code DC Meetup
Where: Sunlight Foundation, 1818 N St. NW Suite 300 Washington, DC, 20036
When: September 19, 6:30pm
Sign up: http://www.meetup.com/Ladies-Who-Code-Washington-DC/events/139227392/
Continue readingOpenGov Voices: PySEC, bringing corporate financial data to the masses
Disclaimer: The opinions expressed by the guest blogger and those providing comments are theirs alone and do not reflect the opinions of the Sunlight Foundation or any employee thereof. Sunlight Foundation is not responsible for the accuracy of any of the information within the guest blog.
Luke Rosiak is a former Sunlight Foundation reporter and database analyst who now writes for the Washington Examiner. This post addresses the tooling around Extensible Business Reporting Language and provides recommendations on what needs to be done. You can reach Luke on Twitter at @lukerosiak.
In the early 1990s, long before most federal agencies had embraced the digital era, the Securities and Exchange Commission (SEC) undertook a truly “big data” initiative that showcased some of the best that open data had to offer: Its quarterly reports were uploaded in real time, in text, rather than PDF, format, to a public FTP server called EDGAR. (File Transfer Protocol or FTP is a standard network protocol used to copy a file from one host to another over a network.)
Like with the Federal Election Commission, the companies submitted their own reports, but they immediately entered the public record, and it was the government who required the submissions, dictated the forms and made them available.
EDGAR, which was implemented in part by Sunlight Foundation supporter Carl Malamud, revolutionized a massive industry of financial watchers who used the reports to decide what companies to invest in--and which to dump. Firms like Bloomberg and Reuters processed the text files into structured data, and analysts pored over them.
And before long, the SEC was pushing the ball even further, with talk of XBRL or Extensible Business Reporting Language. After all, financial information was almost entirely numbers-based, lending itself to computer analysis, and was fairly structured, with accountants all using a core of the same carefully-defined terms--though Wall Street accounting is too complex to fit in simple columns and rows, necessitating nested structures and the ability to dynamically define new terms.
The X in XBRL was a double-edged sword there. XBRL’s power, advocates said, was derived from its flexibility. It was a unified language that could express financial ideas whether the company was trading goats in Ethiopia or derivatives in Manhattan. To provide that kind of flexibility, it allowed accountants to define their own terms in financial documents, extending from a base of agreed-upon terms, in America called the US-GAAP.
But the US-GAAP itself had thousands of terms, and accountants who were accustomed to filing paper reports never bothered to learn its structure. Lazy filers created their own custom terms when buried somewhere in the GAAP, there was already a universal term that meant the same thing. That defeated the purpose of structured reporting, because it made comparing across companies impossible.
Continue readingSunlight at PyCon 2013
A few Sunlighters took off for the west coast last week to attend PyCon, the largest annual gathering for the... View Article
Continue readingSunlight from the Command Line
Are you as big of a fan of Paul Tagliamonte as I am? If so, then you are well aware of python-sunlight, his awesome, comprehensive Python API client for Sunlight's APIs. The latest release includes a command line interface, or CLI, so you can interact with the Sunlight APIs directly from the shell. Cool, right?
Continue reading