For the past five weeks, Congress has been on recess but here at the Sunlight Foundation, we have been working hard to improve our Congress API! There are now two new Congress API endpoints that make congressional and accountability documents searchable. We also added more information about hearings and improved legislator information.
New House scrapers feeding the API improvements are open source and a part of the @unitedstates project, “a shared commons of data and tools for the United States. Made by the public, used by the public.” This work wouldn’t be possible without funding from the Knight Foundation.
Read the documentation for full details, but here are the highlights:
Now on the Congress API you can get far more information about House hearings. We built new scrapers to take advantage of the docs.house.gov committee website. We scraped all the data from the 113th Congress that was available on the site.
One of the more interesting pieces of data is witness information. Our API now has the names of witnesses, their organization and links to their testimony. We used this information to reveal that only 23 percent of the witnesses testifying before House panels in the current session of Congress are women.
Aside from witness documents, all other available documents for a hearing are also available. These include drafts of bills, committee reports, committee amendments, transcripts and other documents related to a given meeting. There are links to the original document as well as a digital copy we are keeping to protect these documents from link rot.
Committee meetings of both the House and Senate now have a unique identifier, called “hearing_id.” The House hearings also have a “house_event_id,” which is a unique identifier from the docs.house.gov site that will stay consistent even if there is a schedule change and a change in content or description.
We also improved “bill_id” validation so that that field yields better results and invalid bill numbers are discarded.
Many of these improvements have been made possible by the docs.house.gov website and its thorough XML offerings. Similar Senate information is scattered across different committee websites in varying formats. This makes it hard to collect detailed information comprehensively. (Sigh. We are talking about a legislative body that still files campaign disclosures on paper.)
Sunlight’s Congress API has been keeping track of official, government-sponsored Twitter accounts. Now, we are also harnessing our project, Politwoops a site that tracks deleted tweets from politicians, to provide a list of Twitter campaign accounts for members of Congress. This has been one of the most asked-for features from our users.
Have you ever needed to look up a list of lawmakers’ ids before? If your data is reasonably clean but is formatted as a full name, you would had to parse out first and last and hope that you did not have a common nickname. Now with the aliases field, the API gives several permutations of first name, nickname, title, suffix and last name so you can take advantage of the Congress API’s wealth of ids by querying full names.
Congressional documents are now available for full text search. These documents include committee reports, committee amendments, witness statements, witness bios, transcripts and more. Currently, the collection only contains House committee documents. We hope to add Senate documents and other types of congressional documents in the future.
This new endpoint offers full-text search on non-congressional documents, including Government Accountability Office reports and Inspectors General reports. These government oversight documents investigate misconduct and waste as well as evaluate projects for agencies or programs.
Volunteer coders, led by former Sunlighter Eric Mill, captured the Inspector General reports as part of the@unitedstates/inspectors-general project. The effort created scrapers for all 65 US federal inspectors.
We hope that the improved API will power new ideas and innovations from the amazing community that uses Sunlight tools!