OpenGov Voices: PDF Liberation Hackathon – At Sunlight in DC, SF and Around the World – January 17-19, 2014

by

Disclaimer: The opinions expressed by the guest blogger and those providing comments are theirs alone and do not reflect the opinions of the Sunlight Foundation or any employee thereof. Marc JoffeSunlight Foundation is not responsible for the accuracy of any of the information within the guest blog.

Marc Joffe is the founder of Public Sector Credit Solutions (PSCS), which applies open data and analytics to rating government bonds. Before starting PSCS, Marc was a Senior Director at Moody’s Analytics. You can contact him at marc@publicsectorcredit.org. Marc is also one of the winners of Sunlight Foundation’s OpenGov Grants.

Extracting useful information from PDFs is a problem as old as … PDFs. Too often, we focus on extracting information from a specific set of documents instead of looking at the bigger picture. If you’ve ever struggled with this problem, join us for Sunlight’s PDF Liberation Hackathon, dedicated to improving open source tools for PDF extraction.

Instead of focusing on one set of documents, coders will come together to add features, extensions and plugins to existing PDF extraction frameworks, making them more flexible, useful and sustainable. Sunlight’s PDF Liberation Hackathon will tackle real-world PDF data extraction problems. In doing so, we will build upon existing open-source PDF extraction solutions such as Tabula and Ashima’s PDF Table Extractor. ( A full list of PDF extraction technologies relevant to the hackathon can be found on our resource page here.)  In addition, hackers will have the option of using licensed PDF software libraries as long as the implementation cost of these libraries is less than $1,000. If you have an idea for a library you want to use, please mention it in your signup form and we will try to work out the licensing ahead of time so that things run smoothly.

Register now to attend the PDF Liberation Hackathon!

The hackathon will kick off on the evening of Friday, January 17 with a brief social. Coding will be all day Saturday and Sunday morning, with lunch provided on Saturday. Judging will follow. The main location will be Sunlight’s Washington D.C. headquarters at 1818 N Street, NW, but we expect to add hacking locations in other cities. Our hacking location in California has been sponsored by Rally.org — who are offering their co-working space on the ground floor of 144 2nd Street (between Mission and Howard), San Francisco, CA 94105.

Teams can participate in person or remotely – from anywhere in the world. Solutions will be judged on:

  • Creativity
  • Implementation cost
  • Flexibility
  • User friendliness

Winning entries will be awarded prizes and if they’re 100% open source, will be featured on Sunlight’s API Community portal page.

Right now, we are looking for developers who would like to participate and cosponsors to contribute problems, prize money and hacking spaces.

Hackers:  Please sign up here.

Potential Sponsors: Please contact pdfhackathon@sunlightfoundation.com

Interested in writing a guest blog for Sunlight? Email us at guestblog@sunlightfoundation.com.