Data for Better Bill Searching


I’ve put up a dataset on Github that maps popular search terms to bills in Congress. It’s a simple, 5-column CSV designed to help people create better search engines that take in user input to search for bills. The idea is that this will be useful to, and get contributions from, the community of people out there working with legislation and building tools around them.

It’s humble – I started it out with a mere 7 rows, assigning the keywords “Obamacare”, “SOPA”, “PIPA”, and “PPACA” to the appropriate bills. There are certainly more good candidates than that, so please contribute via pull request, or if you don’t know how to do that, open an issue and talk about it with words.

This is intended to fill gaps that automated systems won’t easily fill. The word “Obamacare” appears in no official metadata about any bills at all, or in the text of any of them. It is a political slogan, perhaps inappropriate for legal documents, but like it or not it is the handle by which many people think of the legislation, and search applications should be responsive to it.

Similarly, for a long time, the acronym “SOPA” did not appear in any official metadata about the Stop Online Piracy Act, even around the time of the Internet protests around it. At some point after that, the Library of Congress formally added “SOPA” as a “popular title” for the bill, but by that time it was a dead bill. Same for PIPA – and since SOPA and PIPA are part of the same Congressional and public campaign, it also makes sense to ensure that “PIPA” comes up when you search for “SOPA”, and vice versa.

Popular apps around bills can and have added manual search terms for major legislation; it’s part of running a good service. But that collective wisdom ends up residing around the Internet in a bunch of private database tables. Plus, keeping anything up to date by hand is a major tax, so we should distribute that load as widely as possible, and maximize the benefit of each contribution.

So let’s centralize it somewhere, and automate off of it. As an example, here’s a script I wrote last night that downloads the data from Github and loads it into Sunlight’s bill search index automatically.

This dataset has already proven helpful for me; I hope others find it so too.