Tools for Transparency: Research and Report with DocumentCloud


Journalists often face a problem we at Sunlight run into — dealing with huge piles of government documents sadly trapped in unsearchable, non-machine-readable PDFs. So in 2009, with a grant from the Knight News Challenge, DocumentCloud was born. DocumentCloud is a great service to help investigative reporters deal with those troublesome files, and it’s become a vital tool in newsrooms all over the country.

While only approved journalists may upload to DocumentCloud, anyone can browse and search their large archive of public documents.

What else does it do?

  • DocumentCloud will run every document through OpenCalais, a metadata service, to add more context to you uploads.
  • It can take the dates within a document and plot them on a timeline.
  • It can help you find documents related to your story.
  • You can annotate and highlight important sections of your document. (Bonus: Every note you add will have its own unique URL.)
  • You can share and embed public documents.
  • And, you can research source documents such as court filings, hearing transcripts, testimony, legislation, reports, memos, meeting minutes and correspondence.

In recent months DocumentCloud gained a bit of traction by publishing both President Obama’s birth certificate and former-Governor Palin’s emails, and the Chicago Tribune has used the service to great effect, showcasing legal documents from the Rod Blagojevich trial.

Our own Daniel Schuman has created the New House Ethics Committee Report Search Tool:

The following search tool allows you to explore all of documents and statements published online by the House Ethics Committee between 12/24/1998 and 7/24/2011. The Committee publishes files in an impossible-to-search PDF format; the files also cannot be sorted. We’ve downloaded all the files, transformed the PDFs into text format, and made it so you can search the contents.

Need more convincing? The Washington Post, ProPublica, The LA Times, The Boston Globe, PBS Newshour and many others have used DocumentCloud to get their source documents online.

To read more about the service, check out the archives of Idea Lab or head right over to DocumentCloud.