The Lobbying Disclosure Act of 1995 mandates that lobbyist that meet specific requirements are to register with Clerk of the House of Representatives and the Secretary of the Senate. Being the great body that they are, the House provides a searchable database and bulk download of the registration forms. Sure a searchable database is nice, but we can have the most fun with access to the entire data set. The disclosure forms are provided in XML format, divided by year and reporting period (quarerly, semi-annually, annually), and archived.
In order to download the disclosure archives, an HTML form must be submitted for each file. This can be a huge pain as the files are large and involves non-trivial human effort whenever files are released or updated. We’ve written a Python script that simulates the form submissions and automatically downloads all of the archives. In addition to the script, we’ve uploaded a recent download of the archives to Amazon S3 for easier distribution.