The House Ethics Committee is responsible for investigating and making recommendations on the enforcement of House ethics rules. In an nod towards transparency, its reports and statements are published online -- but they are virtually unusable. The Committee publishes documents in an unsearchable PDF format, spreads them out over of 24 pages, and gives them impenetrable titles like "Statement of the Chairman and Ranking Minority Member." Search engines (like Google) cannot see the documents, and only the most patient will click on each link to see what's inside.
We've taken all 120+ documents, made them searchable, and published them online in a database. Now every document from December 1998 until July 2011 can be searched -- at once. It's easy to find the 20 documents that refer to Rep. Rangel, or the 15 documents that refer to (former) Rep. DeLay, or anything else that you're looking for. The web tool DocumentCloud has made this all possible.
The search isn't perfect, of course. We had to use optical character recognition technology to transform the PDF into a searchable format, so there's a number of transcription errors. It would be better if the committee posted the documents in a searchable format, or even better, in an open format. The committee should also publish an index that links to all relevant documents for each matter, and include a description for each document of what it contains. Until then, our House Ethics Committee search tool will be an invaluable tool for anyone monitoring the House Ethics process.
We'd be remiss if we didn't give the House Ethics Committee kudos for at least publishing these documents online. One look at the Senate Ethics Committee website makes clear that things could be much worse.