Disclaimer: The opinions expressed by the guest blogger and those providing comments are theirs alone and do not reflect the opinions of the Sunlight Foundation or any employee thereof. Sunlight Foundation is not responsible for the accuracy of any of the information within the guest blog.
Marc Joffe is the founder of Public Sector Credit Solutions (PSCS), which applies open data and analytics to rating government bonds. Before starting PSCS, Marc was a Senior Director at Moody’s Analytics. You can contact him at marc@publicsectorcredit.org. Marc is also one of the winners of Sunlight Foundation’s OpenGov Grants.
Extracting useful information from PDFs is a problem as old as … PDFs. Too often, we focus on extracting information from a specific set of documents instead of looking at the bigger picture. If you’ve ever struggled with this problem, join us for Sunlight’s PDF Liberation Hackathon, dedicated to improving open source tools for PDF extraction.
Instead of focusing on one set of documents, coders will come together to add features, extensions and plugins to existing PDF extraction frameworks, making them more flexible, useful and sustainable. Sunlight’s PDF Liberation Hackathon will tackle real-world PDF data extraction problems. In doing so, we will build upon existing open-source PDF extraction solutions such as Tabula and Ashima’s PDF Table Extractor. ( A full list of PDF extraction technologies relevant to the hackathon can be found on our resource page here.) In addition, hackers will have the option of using licensed PDF software libraries as long as the implementation cost of these libraries is less than $1,000. If you have an idea for a library you want to use, please mention it in your signup form and we will try to work out the licensing ahead of time so that things run smoothly.
Register now to attend the PDF Liberation Hackathon!
Continue reading