Earlier this week the annual Law Via the Internet conference was hosted by the Legal Information Institute at Cornell University. The conference schedule featured talks on a range of policy and technical subjects, including the topic of extracting legal citations from text and understanding them programmatically, which arises whenever people need to determine the relevance of legal documents based on the authorities they cite. Recognizing citations in text is also a vexing but fun programming challenge, so I was excited to see this issue figure prominently in at least four separate talks.
Continue readingA lesson in Humility
On Monday the House of Representatives delivered, as promised, an electronic dump of House Expense Reports. We, at Sunlight Labs had a plan. We knew it was going to be a huge PDF, but we have all the infrastructure in place. We had plenty of bandwidth, knew when the data was coming out, roughly how it was going to look, and that it was likely we wouldn't be able to parse it all with computers. "We'll use TransparencyCorps," we thought, to get that last mile out of the data, so that eventually we'll end up with a parseable database.
Continue reading