To take a break from the routine and our official projects, the Sunlight Labs organized an internal "labs olympics", in which teams would compete for outrageous prizes by building an extracurricular project. This installment brings you the contribution from "Team Intern".
As team intern, we felt we had something to prove. Could four unseasoned new recruits withstand the blazing glory of the veteran sunlighters? On the team were Charlie DeTar (from MIT, working at Sunlight Labs on Transparency Data), Dan Schneiderman (from RIT, working on the Fifty State Project), Michael Stephens (from RPI, also with the Fifty State Project) and Ryan Wold (consultant, working on the National Data Catalog).
We started off on Monday morning with a couple of vague ideas of what we might work on (Some sort of direct message/twitter bot for RSS feeds? Something to do with mapping?). We kicked it off with a brain storming session for a couple of hours, putting ideas on post-it notes, sorting them into categories, pruning, and we eventually settled on a "Legalese Translator" service: a wiki which lets people annotate legalese documents – such as Terms of Service and Privacy Policies – with more human-readable summaries, and eye-catching icons indicating major problem areas (such as the company asserting they can change the TOS at any time). We started poking around the MediaWiki codebase to see what it would take to do a few extensions to suit our needs. After spending a couple of hours on this, we started to second guess ourselves: would we be able to pull something off with this worthy of a demo? Challenges included coming up with a taxonomy of legal problems (none of us are lawyers), coming up with enough seed data to make the wiki work, and a realization that the vast majority of the work in a project like this would involve community management, expectation setting, and organization, none of which were particularly strong points in any of our expertise.
So, at 1pm on Monday with 1/4 of the alloted time already consumed, we shifted gears. Gathered around a whiteboard, we almost instantly converged on another topic: mapping the complex references in bodies of law. Legal code tends to refer to itself, often in noodley, snakey paths that are hard to traverse, and most of the laws were written before such a thing as "hypertext" existed. This stayed in our general topic area of "legalese", but gave us a much more finite and concrete objective: visualizing and navigating references in laws. We started exploring a few different bodies of law to choose one for the project, and settled on the US Code – a gargantuan body comprising more than 50 titles broken into more than 60,000 sections with a decidedly complex subsection hierarchy. To get started, we made use of Cornell University's XML translation of the code. For the rest of the day, we worked on importing the code into a relational database from which we could generate the reference hierarchies necessary for our navigation and visualization tools. And a name.... we needed a name. Since we were dealing with the law in a shredded and stringy form, we decided to call it "Coleslaw", or if you prefer, "Cole§law".
The US code is awfully complex. Among the 50 titles of the US Code, there are 168,000 references – including those within and between sections. Now on to the eye candy.Continue reading