Follow Us

OpenGov Voices: Announcing CitizenAudit, a free tool for fully-OCRd nonprofit financials

by

Disclaimer: The opinions expressed by the guest blogger and those providing comments are theirs alone and do not reflect the Luke Rosiakopinions of the Sunlight Foundation or any employee thereof. Sunlight Foundation is not responsible for the accuracy of any of the information within the guest blog.

Luke Rosiak is a former Sunlight Foundation reporter and database analyst who now writes for the Washington Examiner. Luke is also a winner of Sunlight Foundation’s OpenGov Grants for his project, CitizenAudit. You can reach Luke on Twitter at @lukerosiak.

In return for not paying taxes, nonprofits in the U.S. file detailed financial disclosures to the IRS, listing how much of their money goes to certain categories, how much they pay their top people and what groups they give money to.

But even though large nonprofits submit structured electronic data, the IRS takes pains to convert it into paper copies and doesn’t make them available publicly at all, instead directing interested parties to request a copy from the organization itself.

Recently, tech pioneer Carl Malamud’s Public.Resource.Org began successfully filing Freedom of Information Act requests for all disclosures--990s, as they are called---and paying the IRS on a monthly basis for reams of DVDs with TIFF images. Some are scanned paper filings, for others the IRS went out of their way to turn structured data into a mere image. None has an embedded text layer.

CitizenAudit

The information is invaluable for philanthropists, journalists and competitors--and the universe of nonprofits is enormous, including the major sports leagues, political groups, hospitals and universities and quasi-public institutions.

So I began an enormous OCRing spree, using open-source tools and home-built software and put the results in elasticsearch and PostgreSQL on a free site. The effort, half the funding for which came thanks to a Sunlight Foundation OpenGov grant of $5,000, is called CitizenAudit.org.

Continue reading
Share This:

OpenGov Voices: PySEC, bringing corporate financial data to the masses

by

Disclaimer: The opinions expressed by the guest blogger and those providing comments are theirs alone andLuke Rosiak do not reflect the opinions of the Sunlight Foundation or any employee thereof. Sunlight Foundation is not responsible for the accuracy of any of the information within the guest blog.

Luke Rosiak is a former Sunlight Foundation reporter and database analyst who now writes for the Washington Examiner. This post addresses the tooling around Extensible Business Reporting Language and provides recommendations on what needs to be done. You can reach Luke on Twitter at @lukerosiak.

In the early 1990s, long before most federal agencies had embraced the digital era, the Securities and Exchange Commission (SEC) undertook a truly “big data” initiative that showcased some of the best that open data had to offer: Its quarterly reports were uploaded in real time, in text, rather than PDF, format, to a public FTP server called EDGAR. (File Transfer Protocol or FTP is a standard network protocol used to copy a file from one host to another over a network.)

Like with the Federal Election Commission, the companies submitted their own reports, but they immediately entered the public record, and it was the government who required the submissions, dictated the forms and made them available.

EDGAR, which was implemented in part by Sunlight Foundation supporter Carl Malamud, revolutionized a massive industry of financial watchers who used the reports to decide what companies to invest in--and which to dump. Firms like Bloomberg and Reuters processed the text files into structured data, and analysts pored over them.

washington times imageAnd before long, the SEC was pushing the ball even further, with talk of XBRL or Extensible Business Reporting Language. After all, financial information was almost entirely numbers-based, lending itself to computer analysis, and was fairly structured, with accountants all using a core of the same carefully-defined terms--though Wall Street accounting is too complex to fit in simple columns and rows, necessitating nested structures and the ability to dynamically define new terms.

The X in XBRL was a double-edged sword there. XBRL’s power, advocates said, was derived from its flexibility. It was a unified language that could express financial ideas whether the company was trading goats in Ethiopia or derivatives in Manhattan. To provide that kind of flexibility, it allowed accountants to define their own terms in financial documents, extending from a base of agreed-upon terms, in America called the US-GAAP.

But the US-GAAP itself had thousands of terms, and accountants who were accustomed to filing paper reports never bothered to learn its structure. Lazy filers created their own custom terms when buried somewhere in the GAAP, there was already a universal term that meant the same thing. That defeated the purpose of structured reporting, because it made comparing across companies impossible.

Continue reading
Share This:

CFC (Combined Federal Campaign) Today 59063

Charity Navigator