Last week, President Obama kicked off the fiscal year 2016 budget cycle by unveiling his $3.99 trillion budget proposal. Congress has the next eight months to write the final version, leaving plenty of time for individual senators and representatives, state and local governments, corporate lobbyists, bureaucrats, citizens groups, think tanks and other political groups to prod and cajole for changes. The final bill will differ from Obama’s draft in major and minor ways, and it won’t always be clear how those changes came about. Congress will reveal many of its budget decisions after voting on the budget, if at all.
We spent this past summer with the Data Science for Social Good program trying to bring transparency to this process. We focused on earmarks – budget allocations to specific people, places or projects – because they are “the best known, most notorious, and most misunderstood aspect of the congressional budgetary process” — yet remain tedious and time-consuming to find. Our goal: to train computers to extract all the earmarks from the hundreds of pages of mind-numbing legalese and numbers found in each budget.
Watchdog groups such as Citizens Against Government Waste and Taxpayers for Common Sense have used armies of human readers to sift through budget documents, looking for earmarks. The White House Office of Management and Budget enlisted help from every federal department and agency, and the process still took three months. In comparison, our software is free and transparent and generates similar results in only 15 minutes. We used the software to construct the first publicly available database of earmarks that covers every year back to 1995.
Despite our success, we barely scratched the surface of the budget. Not only do earmarks comprise a small portion of federal spending but senators and representatives who want to hide the money they budget for friends and allies have several ways to do it:
- They can insert earmarks in the text of an appropriations bill. Finding those earmarks may require you to wade through hundreds of pages of text.
- They can place earmarks in the explanatory report attached to the bill. Few Americans even know these reports exist, let alone that Congress allocates money through them.
- They can call or write the bureaucrats who make budget decisions in an attempt to steer money. Good luck getting a copy: If call logs or letters exist, they are heavily redacted for the public. (The Senate Judiciary Committee unanimously approved a FOIA reform bill that might help. Read this post for more info.)
To simplify the problem, we decided to focus on tables, where 85 percent of observable earmarks reside. We ignored free text in the bills and reports as well as phone calls and letters. We cannot estimate how many earmarks we’re missing because the call logs and letters are not public, but Minnesota Democrat Sen. Al Franken’s office says letter writing is “fairly routine,” and it’s likely becoming more common given Congress’ stated ban on earmarks since 2010. A 15 percent miss rate might be low.
One challenge we faced was the variety of table formats used. The Government Printing Office provides congressional bills and reports as plain text files where tables appear as blocks of formatted text. Indentation, whitespace and occasionally dots and dashes are used to format tables. Our program finds the tables by searching for these text patterns, but Congress could easily hide a table from our script by, say, using semi-colons instead. Even small changes can reduce government transparency.
Another challenge is the lack of information these documents provide about each earmark. (We have provided an example below.) We get a recipient, an intended purpose and a dollar amount, but neither a sponsor nor a justification. Finding those requires more digging.
Advocates will also be disappointed to know that this tool cannot help them rapidly respond to most earmark requests. Bills often go up for a vote before anyone has a chance to thoroughly review them. Congressional leadership recently gave members less than a week to review a 1,600-page appropriations bill and less than 48 hours to review a 959-page farms bill before voting. Members of the public who wanted to contact their legislators about the content of those bills had even less time for review. Although our software can parse these documents in less than 15 minutes, nearly 95% of observable earmarks are found in the congressional reports and memos attached to spending bills — and those reports and memos often aren’t released until after the vote.
The problems we encountered will ring familiar to professionals in the transparency sector. In one of our favorite examples, the IRS converts machine-readable charity tax filings to PDFs to make them more difficult to read. Even when government agencies try to be transparent, they often disappoint. For example, it recently came to light that USASpending.gov did not “include $619 billion worth of grants and awards.”
Governments could solve many of these problems by adopting machine-readable formats. That’s what President Obama’s 2013 executive order and the Library of Congress’ legislative-data challenge (among others) were intended to do. Unfortunately, implementation has been slow and spotty, leaving concerned citizens looking for alternative ways to learn more about their government. We continue to develop computer programs that reduce the barriers to public information, but we recognize their limitations and hope that governments will adopt more enlightened data-sharing policies.
Interested in writing a guest blog for Sunlight? Email us at email@example.com