Opening up Indiana’s hard to reach legislative data

by
Indiana State Capitol at the end of Market Street, Indianapolis. (Photo credit: Daniel Schwen/Wikimedia Commons)

Thanks to SB 101, Indiana’s recently enacted Religious Freedom Restoration Act, bills coming out of the Hoosier State have been newsworthy of late. The Open States team has been thinking a lot about Indiana bills as well.

Open States makes use of a bill’s text to make the entire corpus of state legislation searchable, and to allow the public a single place to download bill texts. Indiana, like many states, provides bills as PDF files. Typically, Open States would provide a link to the state’s website so the public can find the bill text from the original source. For example, the [Open States page for Arkansas’ similar RFRA bill, HB 1228](http://openstates.org/ar/bills/2015/HB1228/#billtext), links directly to [the text on Arkansas’ legislative FTP](ftp://www.arkleg.state.ar.us/Bills/2015/Public/HB1228.pdf).

Like Arkansas, Indiana provides PDFs of legislation. Open States is getting most information from Indiana through its legislative API, [MyIGA](http://docs.api.iga.in.gov/). Indiana requires an API key, and the key requirement extends to the PDF links served through the API, so providing these gated links through Open States would be useless.

screenshot reading "Please authenticate using a valid consumer token. For more information, see: http://docs.api.iga.in.gov"
You need an API key to view Indiana’s PDFs

The PDF versions of bills are also available through Indiana’s legislative website through a download link. So after pulling the bill information, we could navigate to the bill’s page and scrape an ungated link from there. Unfortunately, it wasn’t that easy. The link appears to be pieced together on the back-end and requires finding a document-specific hash value.

screenshot of url reading "http://iga.in.gov/static-documents/9/2/b/a/92bab197/SB0101.05.ENRS.pdf"
Document URL including an apparently hash-generated ID

We were able to write code to find this ID using some creative header passing, and it worked successfully when we ran it on one bill at a time. But the fact that we had to hit the site multiple times for every bill, combined with its relatively slow site and frequent timeouts, led to a scraper that crashed or hung almost every time it ran.

Just for fun, we tried running variations of the document’s title through several common hashing algorithms in an attempt to reverse-engineer the document ID, but none panned out. We also contacted the state legislative service to see if they’d tell us how they were constructing the IDs, but they didn’t return our calls.

Finally, we turned back to the API. The terms of service allow us to use our API key to create an app, so enter http://in.proxy.openstates.org/. Using a URL that’s easy to construct based on available data, the proxy retrieves the desired document for download.

Screenshot of url reading "http://in.proxy.openstates.org/2015/bills/sb0101/versions/sb0101.05.enrs"
Proxy URL is easy to construct

Finally, the bill text is consistently available to Open States’ search function — and to our users! Of course, the code for our proxy app is [available on github](https://github.com/sunlightlabs/indiana-docs).