Marc Chung is one of the organizers who helped make the Great American Hackathon a success, and is a friend of Sunlight. He's asked for a little space on the Labs blog to announce his new Phoenix-area open data group, and we're only too happy to oblige. Read on for the details.
I'm Marc Chung, a computer scientist who is passionate about bringing technologists together to improve our world.
Last year, I organized the Phoenix edition of the Great American Hackathon. That weekend a local gathering of developers decided to contribute time towards building a (parser)[http://sunlightlabs.com/blog/2009/hotness-arizona/] for the Arizona State Legislature. The work was done as part of the Fifty States project which supports organizations like MapLight and OpenCongress.
After the hackathon, I was contacted by several journalists and developers who were very excited by the work we did and just as eager to offer their assistance on future civic hacking initiatives. In the short time since GAH '09, we've been working with to extract useful information from public data in an effort to shed more light on how state governments work.
Combining the interests of these two groups was inevitable and so today, along with Mark Ng and Brian Shaler, I'd like to announce PhxData, a group to unite technologists in the Phoenix area who are engaged in data mining, parsing, visualization, etc. It also serves as a platform for journalists and government officials to connect with civic hackers who want to take public data and make it useful.
Check out our website: http://phxdata.org
If you're a data scientist, journalist, government official, statistician, developer or designer who would like to work on exploring data in the interest of pursuing greater government transparency for the state of Arizona, you should join this group.
Continue readingCole§law: Visualizing the US Legal Code
To take a break from the routine and our official projects, the Sunlight Labs organized an internal "labs olympics", in which teams would compete for outrageous prizes by building an extracurricular project. This installment brings you the contribution from "Team Intern".
The Team
As team intern, we felt we had something to prove. Could four unseasoned new recruits withstand the blazing glory of the veteran sunlighters? On the team were Charlie DeTar (from MIT, working at Sunlight Labs on Transparency Data), Dan Schneiderman (from RIT, working on the Fifty State Project), Michael Stephens (from RPI, also with the Fifty State Project) and Ryan Wold (consultant, working on the National Data Catalog).
The Process
We started off on Monday morning with a couple of vague ideas of what we might work on (Some sort of direct message/twitter bot for RSS feeds? Something to do with mapping?). We kicked it off with a brain storming session for a couple of hours, putting ideas on post-it notes, sorting them into categories, pruning, and we eventually settled on a "Legalese Translator" service: a wiki which lets people annotate legalese documents – such as Terms of Service and Privacy Policies – with more human-readable summaries, and eye-catching icons indicating major problem areas (such as the company asserting they can change the TOS at any time). We started poking around the MediaWiki codebase to see what it would take to do a few extensions to suit our needs. After spending a couple of hours on this, we started to second guess ourselves: would we be able to pull something off with this worthy of a demo? Challenges included coming up with a taxonomy of legal problems (none of us are lawyers), coming up with enough seed data to make the wiki work, and a realization that the vast majority of the work in a project like this would involve community management, expectation setting, and organization, none of which were particularly strong points in any of our expertise.
So, at 1pm on Monday with 1/4 of the alloted time already consumed, we shifted gears. Gathered around a whiteboard, we almost instantly converged on another topic: mapping the complex references in bodies of law. Legal code tends to refer to itself, often in noodley, snakey paths that are hard to traverse, and most of the laws were written before such a thing as "hypertext" existed. This stayed in our general topic area of "legalese", but gave us a much more finite and concrete objective: visualizing and navigating references in laws. We started exploring a few different bodies of law to choose one for the project, and settled on the US Code – a gargantuan body comprising more than 50 titles broken into more than 60,000 sections with a decidedly complex subsection hierarchy. To get started, we made use of Cornell University's XML translation of the code. For the rest of the day, we worked on importing the code into a relational database from which we could generate the reference hierarchies necessary for our navigation and visualization tools. And a name.... we needed a name. Since we were dealing with the law in a shredded and stringy form, we decided to call it "Coleslaw", or if you prefer, "Cole§law".
The US code is awfully complex. Among the 50 titles of the US Code, there are 168,000 references – including those within and between sections. Now on to the eye candy.
Continue readingElena’s Inbox: How Not to Release Data
On Friday @BobBrigham tweeted a suggestion: put the just-released Elena Kagan email dump into a GMail-style interface. I thought this was a pretty cool idea, so I started hacking away at it over the weekend. You can see the finished results at elenasinbox.com.
I'm really pleased that people have found the site useful and interesting, but the truth is that a lot of the emails in the system are garbage: they're badly-formatted, duplicative or missing information. For instance, one of the most-visited pages on the site is the thread with the subject "Two G-rated Jewish jokes" -- understandably, given that it's the most potentially-scandalous-sounding subject line on the first page of results. Unfortunately, if you click through you'll see that there's no content in the messages.
The site was admittedly a bit rushed, but in this case it isn't the code that's to blame. If you go through the source PDF, you'll see that the content is missing there, too. It looks like it might have been redacted, but the format of the document is confusing enough that it's difficult to be sure.
But the source documents' problems go beyond ambiguous formatting. A lot of the junky content on the site comes from the junk it was built from -- there's not much we can do about it. To give you some idea of the problem, consider these strings:
Continue readingLabs Olympics: Sunlight 2D
Recently, the Labs broke into teams and spent two days doing projects entirely of our own devising, given free rein. Our team consisted of two developers, a designer, and Sunlight's prized sysadmin. So for our project, we wanted to do something for the office, that blended software and design with the physical world. Inspired by some recent internal work in inventorying items using QR codes, we thought it'd be fun to make a system that lets Sunlighters print out QR codes for anything they wanted.
What people do with those codes is up to them - document internal events for posterity, lead coworkers on a scavenger hunt, plant jokes, write QR slam poetry, whatever. The design goal here was to make it dirt easy, through their computer's browser or their mobile phone, for a Sunlighter to print out a QR code with some text and/or a picture attached.
Continue reading“How Our Laws Are Made”: Now in Poster Form
Just a quick note: we've been getting a few requests from folks saying they'd like to buy a printed copy of How Our Laws Are Made, one of the winning entries from Design for America. Well, good news: the folks responsible for this fantastic infographic have made it available from an on-demand print service, letting you get a physical copy in whatever format you think would best suit your classroom, office or other source of blank wall space. Even as I type this a print should be winging its way to Sunlight's offices -- if you'd like one, too, you know where to click.
Continue readingHello, Labs
Like Clay said, I'm the new guy. Well, not entirely new -- I've been at Sunlight since late 2008. But I'm the one who's going to be trying to fill the enormous gap he's leaving. I thought I'd start to explain how I want to do that by talking about how I arrived at Sunlight.
I first became aware of the Sunlight Foundation while working as a programmer at a consultancy here in DC, building sites for large nonprofits and dabbling with using and writing about various technologies on the side. When I heard about Sunlight Labs, I thought it was pretty much the coolest thing in the world. Technologists using their skills to directly improve society. For people like me (and probably you) -- people who have acquired a technical skillset that's powerful, in a sense, but not always obviously useful -- it's an incredibly compelling prospect.
Continue readingGoodbye Sunlight
It's been just over two years since I first started here at Sunlight, and today's my last day.
Over the past two years, we've done some incredible things together. Through Apps and Design for America contests, our community developed nearly 200 open source applications and visualizations on top of government data, for a total expenditure of about $100,000. We built an army of nearly 2000 developers and designers working to change their government. We launched the first wiki bid on Recovery.gov, and changed the FEC using a collaborative testimony.
Continue readingForms: We have a winner
After discovering the conflict of interest in the forms contest, we scrambled to find a judge. Ultimately, Adobe came through and brought us Stephen Buckley host of OpenGovRadio and blogger at http://www.ustransparency.com. Here's what Stephen had to say about the winner, and why he picked who he picked:
Continue readingDesign for America: Mistake 1
Our Design for America contest was great-- we had a lot of great entrants in all the categories. One of the categories I was more excited about was the "Redesign of a Government Form" category. While it was a bit esoteric-- if you think about it, the primary way people interface with government is through forms. Perhaps people think government is mundane, soulless and complicated because government forms are that way.
They don't have to be-- and that's what got us excited about seeing what the design community could do with government forms. To top it off, we selected someone who really helped revolutionize the way web forms got made: Kevin Hale to be a judge.
Unfortunately, what happened was that the winning entrant used Wufoo, Hale's company to build their form. It's a clear conflict of interest between judge and contestant. I made the mistake of not checking out the form's URL when I took the screenshot and catching it (I was in a rush to make the announcement at Gov2Expo), and now we're at a situation where somebody won with a conflict of interest looming.
Sunlight's an organization about transparency and ethics. So first thing's first: there's the confession. We messed up. In order to fix it-- initially we thought about just allowing the community to vote on which one won. But I feel like that doesn't ensure the best result, that ensures the most popular one. And those can be different. So, today we're going to try and find a new judge and give them the opportunity to judge the forms independently of the original results.
Continue readingHow We Use MongoDB at Sunlight
Last week, David and I attended MongoNYC, a one-day conference focused on MongoDB. We like Mongo here at Sunlight. We like it a lot.
Working with Mongo, it's become clear that it's a more natural way to store data. We primarily use Python and Ruby, and because Mongo allows us to think in JSON, everything tends to just click. JSON documents are close enough to objects in Python and Ruby that mapping between application and database becomes almost effortless. Mongo has really shined in two specific use cases: as a datastore for a resource oriented web service, and as a datastore for results from scraping a web site.
Continue reading