Unlike last year, I wasn't just a mere spectator of the Labs Olympics. I got to participate this year and take a couple days off from the usual watch-dogging we do here at the Sunlight Foundation. My team's goal was to take our combined skills of web development, research and story telling and create a product very different from the usual web applications and data tools we usually strive for.
I was lucky enough to be on a team with Daniel Cloud, Ethan Phelps-Goodman and Eric Mill. Originally, the four of us struggled to come up with a project that would be topical, technical and entertaining. After an extended brainstorming session where we considered projects surrounding campaign finance, the London riots and natural disasters around the world, we decided to create the ultimate data visualization using (drum roll, please) Jell-O! To be clear, our idea was not inspired by the London artists that sculpt things out of Jell-O. Our use of the jiggly substance was completely coincidental.
As we talked about what we could build and what would be of interest, we kept in mind that this year’s competition, unlike last year’s, was not limited to building applications. Our end result could be, and was encouraged to be, tangible. So when we considered mapping areas recently hit by earthquakes -- D.C., Denver and California (of course) -- it occurred to us that we should not only map those areas, but also make those maps dynamic by making them light up, and vibrate, too!. We investigated ways we could embed LED lights in a three dimensional Jell-O mold, and quickly ran into several obstacles. It looked like the Peggy 2 was going to be our LED board until we realized we’d have to solder 625 LEDs. Given the time and skill level that would require, it didn’t seem realistic. We then gave up on embedding LEDs in Jell-O and decided to go for the more obvious choice, a layer of Jell-O in the shape of the United States on top of a horizontally-oriented LCD monitor.
So it was set: we were going to use sophisticated mapping software new to all of us (especially me, since I’m not a developer) to map earthquake and other government data and then distract our audience entirely by putting a sticky mass of gelatin on the table and somehow, someway, make it jiggle on cue.
To create the underlying map visualizations we used the TileMill mapping stack from Development Seed. We collected dentist and diabetes data from the Centers for Disease Control (CDC) to map the change in obesity rates over time and the number of dentists per capita. We mapped earthquake data using information from the U.S. Geological Survey (USGS).
TileMill lived up to its promise of providing an easy to use complete solution for people with little experience mapping, as was true for most of us. Once the maps were designed we exported them to static images and displayed them as a slideshow.
Once the slideshow was created we chose a large monitor to display it on and wrapped the whole thing in Saran wrap. After we sculpted the states using Jello, we placed it on the protected monitor and displayed the maps we generated beneath the translucent dessert. When it came time to display the earthquake data, we had to resort to an over-sized neck massager to get the Jello to jiggle. Our early experiments involved installing vibration motors from a Play Station 3 controller into a layer of Jell-O. While it was definitely a sight to behold, we didn’t get the range of motion we had hoped for. Ultimately, making it shake was not an easy task.
The final product looked very much like an early prototype for a more sophisticated device, the one we had imagined in our planning phases. Perhaps intrepid tinkerers will take what we learned and build upon it to form something bigger, better and more delicious.
I’ll spare you all the suspense and let you know our team, the J-team, didn’t win. Keep checking this blog for a post by the winning team. Everyone who works in a office will appreciate their creativity and problem solving ability.
Continue readingHouse Revamps Floor Feed
Yesterday, the House of Representatives massively improved its feed of live updates from the House floor. The House Clerk has been hosting a live floor feed for a long time, but this update breaks out related bills and votes more cleanly, adds times down to the second for each update, and drastically cleans up the HTML of the page.
But most wonderfully, the cleaner HTML doesn't really matter, because they also turned on a live XML feed.
Continue readingLabs Olympics 2011: How Is Babby?
For this year's Labs Olympics I was on an all-star team comprised of Aaron, Alison, Tim, and myself, better known as the Labs Olympics Winners (note: we did not win, this was just our team name). Alison has a young baby at home and Aaron was out during our first brainstorming session for the birth of his niece so it wasn't a big surprise that we wound up with a plan to make a sophisticated baby monitor. (It might come as even less of a surprise that we named it How Is Babby in honor of an infamous web meme.)
At first all we knew is that we wanted to use some random gadget or assortment of Arduino sensors to give geek parents a way to monitor their geek children, but it wasn't until we realized we had a spare Microsoft Kinect sitting around the office that we realized exactly how far we could take it.
Kinect
The Kinect is an impressive device, sporting 4 microphones, RGB and IR cameras, an additional depth sensor, and a motor that allows vertical panning. Getting the Kinect running on Linux is a fairly well documented process. We leaned heavily on instructions from the OpenKinect community, which worked pretty much without issue. After doing the usual
cmake, make, make installdance, things worked without issue on Ubuntu 11.04.
Also included in the OpenKinect source tarball are bindings for a half dozen languages, including Python. Having a Python wrapper made things incredibly easy to experiment with as I had access to python OpenCV bindings for displaying image data and NumPy for manipulating the matrices that the Kinect driver returns.
With these tools in hand we just had to decide what we actually wanted to get from the Kinect. We decided to take regular snapshots to present via a web interface, and also have a mechanism for the Kinect process to notify the web client when there was motion. Snapshots were extremely easy: with just a single line of code, we were able to bring back the RGB image from the Kinect's main camera and convert it to a suitable format using OpenCV. Once we made the discovery that there was also the option to bring in an IR image, we added a night-vision mode to our application as well. This way, the parent can adjust the camera to either take a standard image in normal light situations or switch to the IR camera for the night. (Due to a hardware limitation of the Kinect, it is impossible to use the RGB and IR camera at the same time.)
Given the uncertainty in the amount of available light and the fact that the depth sensor provided simpler data to work with (essentially a 2D matrix of depth values refreshed about 30 times per second), we decided to use the depth sensor to detect motion. NumPy's matrix operations made this a breeze. By averaging the depth of the frame and comparing the deviation across a range of frames, we could flag each individual frame as likely containing motion or not. Depending on the desired sensitivity of the alerts, the application would wait for anywhere from ten to thirty frames of consecutive motion before notifying the web application that the baby was on the move.
The Web Application
As opposed to a traditional baby monitor, which has a dedicated viewing apparatus, we liked the idea of a web console that could be viewed from anywhere, including via a mobile device. The main features of the web app would be viewing, motion alerts, and configuration of features such as SMS notifications and nightvision. The basic web app was built with Django, but we used a few add-on libraries to help accomplish our goals in the two days given for the contest.
We decided that the easiest way to get images to the user was to have the web page embed a single image that the monitoring software would update at a set interval. We used Socket.IO for a very light-weight solution to keep the image updated to the latest version. In the best case scenario, i.e. the user's browser supports it, Socket.IO will use WebSockets to keep the connection open, but will degrade gracefully and fall back to AJAX or other means to get the job done.
Because our team lacked a designer, we used a CSS framework to take care of cross-browser issues and provide some pre-designed UI elements. Twitter just recently released their Bootstrap framework, so we went with it. It styled all of the UI elements on our site, including a navigation bar, alert boxes, buttons, and a form. Although we had some unresolved trouble with the form elements not lining up properly with their labels, it proved very easy to work with, overall.
The remaining technical component of the website was the AJAX alerts on motion events detected (and logged in a DB table) by the backend. There were a few criteria for how it needed to work, the most important being that alerts needed to be somewhat persistent to the user, so that a user couldn't miss an all-important alert saying that the baby was moving, just because they were clicking quickly between pages on the site, for instance. This meant that we needed something more sophisticated than Django's inbuilt messaging framework (django.contrib.messages). The answer came in the form of django-persistent-messages. It was built to work right on top of Django's messaging system, so it worked seamlessly and was a no-brainer to set up. With django-persistent-messages working, alerts now would not disappear unless dismissed by the user, hopefully averting any potential baby-on-the-move mishaps.
In the end, there were a few features we had to leave unfinished to get the project out the door on time, including audio monitoring and SMS messaging, but we were pretty happy with the results. As usual, all of our code is available on GitHub: How Is Babby.
Continue readingLabs Olympics: Nice Neighbor
As part of the 2nd Annual Labs Olympics, Team Leaf Peepers built NiceNeighbor, a network designed to put helpful neighbors in contact with each other.
Inspiration
It's been an interesting couple of months on the East Coast and in the DC area in particular, with earthquakes, torrential rains and flooding, terror threats and even a 2-0 start to football season in the mix leaving Washingtonians confounded, confused and generally insecure. Amidst these troubling times we've observed a pattern: In the face of uncertainty, people can tend to be jerks to each other. We hoard things, Jam up the roads and grocery aisles, and get pushy and rude. However, when disaster strikes, we are helpful, compassionate neighbors, each pitching in to face hardship together. It was our team's goal to help encourage this second behavior pattern all the time.
The team
Yes, the team, but wait. Leaf peepers? I recently returned from vacation in Vermont, where Luigi assumed I'd be photographing leaves. Nevermind.
Our juggernaut of raw, unstoppable productive force consisted of Luigi, Caitlin, Casey and myself, covering nearly every discipline represented in the Labs from design to research to front-end and back-end web development. With this veritable cornucopia of skills, we knew we had to bite off something significant.
Getting to work
Despite high confidence in our ability to execute, we were pretty strapped for ideas until late afternoon on the Friday before go time. We had been toying with a voice and SMS interface to guide people in rural areas without broadband internet toward the local public services they need, but Casey discovered in preliminary research that the infrastructure to make such an app worthwhile really wasn't there. We'd already scrapped some decidedly lesser ideas, such as a kitchen cleanliness tracker (pfff!), an rfid/motion sensor combo that played WWE-style entrance music for every Sunlighter as they came into the office each morning, and 'Auto-Tune the Law,' which would have set Sunlight Live to music, pitch-shifting testimony a la T-Pain, which (sadly!) didn't appear to fit the timeline or budget.
So, after reaching consensus--and without a comfortable degree of consideration of our problem domain--we got cracking Monday morning. The plan was to use a plain old Rails/ActiveRecord/Postgres stack to deliver the web interface, and Twilio for SMS and voice. Casey took our basic concept of 'have' and 'need' and set forth on IA and taxonomy, while Caitlin began on a color palette and logo. Luigi dug into the Twilio API, and I spun our project up on Heroku and started modeling.
Collaborating
Good teamwork is everything when dealing with compressed timelines, and we did our best to keep in touch throughout the process. We set up an IRC channel on Freenode that we hung out in each day for answering quick questions, and escalated to face-to-face as needed. Heroku provides an IRC bot to notify the room of deployments, which came in handy for status tracking and letting team members know when to update their code. For copy and user stories, we worked with an EtherPad instance that Eric had stood up for everyone to use, and found it to be great for collaborative typing.
Noteworthy tech
With the lofty goal of a backend, 3 interfaces and loads of location-aware goodies in just a couple of days, we had our work cut out for us. As mentioned above we decided to let Rails and Twilio handle the interfaces, and even though I tend to prefer Python/Django, it felt good to have a chance to play with some of the less-familiar-to-me-bits of Rails such as single-table inheritance for 'needs' and 'haves,' and scoped/nested routing patterns that are new-ish in rails 3. For IP-to-location, geocoding and radius search I used GeoKit, which was a pleasure to work with, though initially it forced me to trade sqlite in development for postgres.
For the SMS and Voice features, Luigi evaluated Twilio and Tropo. Both are excellent telephony systems, with straightforward RESTful APIs. But Luigi figured out how to get a custom phone number through Twilio first (719-522-NICE), and so that's what he chose. When working with telephony systems, outgoing activity is straightforward: make an API call. But how does one handle incoming activity? Twilio expects developers to implement endpoints in their app using a custom XML-based markup language, TwiML, while Tropo allows developers to host scripts on Tropo's cloud. Tropo also supports an endpoint-based solution, similar to Twilio. On top of all that, Tropo offers a new service called SMSified that makes development even more straightforward if one only needs to support SMS.
Overcoming adversity
By the end of Monday, we had a solid start--Catilin had a great logo that pulled inspiration from the letters 'NN' back-to-back to form a Mr. Rogers-esque cardigan, we had hello world in SMS, an admin scaffold, an auth system, some models and a sense for how requests and offers would be delegated. But to poorly paraphrase Bret Michaels, every Monday has its Tuesday. While working with Caitlin to help her get started integrating her markup/css into the project, Luigi mistakenly deleted all of her work! The next several hours were spent attempting to reconstruct it from browser cache, which turned out reasonably successful, though very costly in time.
To add illness to insult and injury, Caitlin came down with food poisoning that night, leaving your Leaf Peepers woefully short-handed during Wednesday's pretend-like-you're-working-but-try-to-make-up-for-lost-time sprint to the finish.
For fun, if not profit
By our measure, we didn't quite make minimum viable product, but the fruits of our effort stand nonetheless at http://niceneighbor.net, with code at github. We stand by the idea and perhaps will develop it further at some point to get it over that elusive hump of 'usefulness.' Results aside, we had a great time working together and learning about bits of tech we don't normally use.
Continue readingCivic Hacking Quarterly: Fall 2011
There's a lot going on in the world of open government and open data. And it's tough to keep up. Once a quarter, we'll do our best to round up all the events and challenges going on that the Sunlight Labs community may be interested in.
Events
- Hack4Reno, September and October, Reno. The biggest little city in the world is hosting a series of events in the next month designed to build up a community of civic hackers, then caps it off with a 24 hour hackathon on October 15.
- Hacks/Hackers at ONA 11, Sep. 22, Boston. A day of hacking as the Online News Association's annual conference kicks off.
- Code 4 Country, Sep. 24-25, Washington, D.C. and Moscow: The first collaborative codeathon between Russia and the U.S. The D.C. event is taking place at American University, and the Moscow event at the offices of Russia's largest search engine, Yandex, located on Leo Tolstoy Street.
- BmoreSmart Meets City Hall, Sep. 27, Baltimore. The Baltimore startup community meets with the City of Baltimore's CIO to discuss city government, technology, and citizen engagement.
- Hack the Map, Oct. 2, Phoenix. Part of WhereCampPHX, this hackathon is focusing on geo apps.
- Apps for SEPTA, Oct. 8-9, Philadelphia. The Philly area's transit system has recently released GTFS data and a real-time bus and trolley API. Time to let a thousand apps bloom.
- Data Without Borders Kickoff, Oct. 14-16, New York City. This nascent organization kicks off with this event pairing NGOs with data hackers. More events are planned for London, Chicago, and Washington, D.C.
- Open Government Data Camp, Oct. 20-21, Warsaw. The Open Knowledge Foundation is hosting two days of talks, code sprints, and workshops in Poland's capital. Sunlight will be there, and we've pitched in $5,000 to support travel bursaries for US attendees.
- OpenDataPhilly's OpenDataRace, through Oct. 28. The open data community in Philly is seeking input from city non-profits on data sets not currently available that would be useful to their work. Then, OpenDataPhilly will work with the City of Philadelphia to make that data available.
- OpenAccessPhilly Forum, Oct. 28. A forum to discover what the City of Philadelphia and its citizens are doing at the intersection of civic innovation, participation, and technology.
- Education Hack Day, Nov. 12-13, Baltimore. Developers and designers will get together at the Digital Harbor High School to build apps based on ideas from local teachers. Current project ideas include a parent-teacher conference scheduler using a web and phone interface, and a homework notifier, via email, voice, or SMS, for parents.
Challenges
- Apps for Metro Chicago. Submissions by Sep. 30.
- Apps for Communities. Submissions by October 3.
- Apps Against Abuse. Submissions by October 17.
- The ASPR Lifeline Facebook Application Challenge. Submissions by November 4.
- Apps 4 Africa Climate Challenge. Runs this year and next year. Entrants must live in Africa. Mentors can be from outside of Africa.
Just Passed
- California Laws Hackathon, Sep. 17, Berkeley and Denver. Co-hosted by our friends at Maplight.
- Cleveland Civic Hacking Meetup, Sep. 13.
- Transit Hack Day, Sep. 10. Hosted by the Mobility Lab of Arlington County, Virginia.
- Reinvent NYC.gov, Jul. 30-31.
If we've missed something going on through the end of the year, let us know in the comments. If you're planning an event for 2012, send Luigi a quick note.
OpenLexington
Each edition of Civic Hacking Quarterly will close by featuring a local civic hacking group. To kick things off, we're highlighting OpenLexington. Based in Kentucky, the group reminds us that civic hackers are not just found in big cities. In addition to a full website, OpenLexington has a presence on Github, Twitter, and Google Groups. Founder Chase Southard has recently been added to an Open Data workgroup inside the Lexington-Fayette Urban County Government, with plans of a data catalog launch in the near future. The next OpenLexington meetup is scheduled for October 27 at 7 p.m.
Continue readingLabs Olympics: Talk of the Town
It's that time of year again...time for the 2011 Labs Olympics! This year, I was on a team with Andrew Pendleton of the Data Commons/Influence Explorer team and labs intern Matthew Gerring. Last year, I teamed up with Jeremy and Luigi to form the fierce (and winning) team, Blood Monkey. This year, we needed an equally intimidating team name and an equally creepy project to boot. So without further ado, team Baby in a Straight Jacket presents: Talk of the Town.
Talk of the Town is a corpus of closed captioning data from transcripts of municipal meetings from around the country. You can type in any word and see which cities or counties are talking about it, and how often. The size of the circle over each municipality corresponds to how frequently it was mentioned. Additionally, there's a sparkline underneath the word you searched for that shows the week-by-week change in frequency.
Talk of the Town is powered by data from the nice folks at Granicus. Granicus is a vendor that provides a streaming video and document publishing suite to governments who want to increase their transparency by making public meetings more accessible to citizens. They were kind enough to let us use the beta version of their api to pull down data from their clients for the last six months. Luckily, they serve hundreds of municipalities across the country, so while the data isn't exhaustive, it's a nice sampling.
In addition to noting that the data does not contain every local government, users should also note that we haven't had a chance to scale the frequency of mentions by the frequency of the meetings. However you can still find some pretty interesting results (bonus: try searching for "earthquake" or "irene"). For instance, if you search for "taxes", you'll notice the mention of taxes in Montgomery County is off the charts for a county that size (Montgomery County is the 13th wealthiest county in the country and is also home to a few Sunlighters, including myself).
So that was our two day project for the 2011 Labs Olympics. Although it wasn't the winner, we're happy to work on something that takes opengov to the grassroots level, even if only experimentally.
Continue readingTuning Up Influence Explorer
For the past year we’ve been busy adding data and features to Influence Explorer, our central source for data on money and influence in politics. From our original three data sources (federal campaign finance, state campaign finance and federal lobbying) we’ve now expanded the site to include earmarks, federal spending, contractor misconduct and EPA enforcement data. We’ve also released tools such as Checking Influence and Inbox Influence that put political influence data in a context that’s relevant to users.
After tackling these larger projects we decided to step back and revisit a few things that we didn’t get perfect the first time around. In a number of places we’ve tweaked our methodology, cleaned up our data and added more context. The result should be more accurate and useful information across the board.
Continue readingAnnouncing Superfastmatch
Today I'm pleased to announce that the Superfastmatch project is open-source and ready for use. I’m excited to be posting this—I’ve been waiting to do so for a while! I think SFM is really, really cool—and I think you’ll agree once I tell you why. But first, a little bit of backstory.
We first became aware of the technology behind SFM when Churnalism launched. Created by the Media Standards Trust, Churnalism is an ingenious effort to detect when UK journalists copy-and-paste press releases into their published stories. It’s a great project, but we were even more excited by the technology behind it. Finding overlap between documents in huge corpora is not as simple a problem as you might think--it's tempting to assume that diff will manage the job, but in truth that tool is unsuitable for most types of documents.
The basic algorithmic challenge is the same one faced by those working on systems to detect academic plagiarism--a rich and evolving field in its own right. But surprisingly little of that technology is freely available.
Sunlight reached out to MST and was ultimately able to provide a grant that allowed them to open-source their code. Even better: they've been improving it. A mostly-Python implementation that needed hefty hardware is now a compiled solution that runs blazingly fast on commodity hardware (we’ve also successfully run it on vanilla EC2 instances--see the README for details).
Each instance of the system is an HTTP server. Users load documents by POSTing their text to a RESTful interface. As each document is processed, it’s normalized and split into substrings, which are hashed into unique tokens. After you’ve loaded your documents, you run an association task, which compares each document's collection of tokens against one another. Where there's overlap, contiguous chunks of text are assembled, and you can begin to inspect the parts that might be borrowed from one another. (The actual mechanics of the system are considerably more complex than this explanation, but the preceding should give you a rough idea of how things work.)
There's a demo at scripts/gutenberg.sh that loads the Bible, the Koran and ten classic novels from Project Gutenberg into the system, then finds every bit of overlap between them (it takes about 45 seconds from start to finish on my three year-old laptop).
Sunlight's particular interest is in pairing this technology with data from our Open States Project in order to detect when legislation is migrating between statehouses or from interest groups and into law. But we hope and expect that SFM's uses will extend well beyond our mission--the applications of this technology seem sure to surprise us.
The project remains under very active development. We expect a bugfix related to very large datasets to be merged into the main branch in a week or two, for instance. But Sunlight and MST are both anxious to see developers begin to acquaint themselves with Superfastmatch. And of course we're also hopeful that others might be inspired to contribute back to it. Providing the system's output as JSON, for example, is a long-planned feature that would be easy to implement and of considerable value.
For now, though, please have a look at the project repo and start thinking about what SFM might make possible for you. You don't need to look for a needle in a haystack anymore--you just need a few good haystacks.
Continue readingOpen States API: 1 Year Later
Last September we announced the first public release of the Open States API. The API enables programmatic access to all of the key artifacts of the state legislative process. The API currently provides a standard interface to bills, votes, legislators, committees, and events across 36 states, Washington DC, and Puerto Rico.
Seeing as it has been a year since this first public release it seems like a good time to check on on where we are today and where we're going next.
Continue readingBeige Alert
Recently I've begun work on a new project here with a very simple idea: tell us the issues or keywords that you're interested in, and we'll let you know right away when something happens in state or federal government that you care about.
A straightforward idea, but very powerful. If you're a reporter focusing on immigration, you can know as soon as a state introduces a border control bill. If you're an environmental activist, you can learn right away of all the attempts in Congress to give or take away power to the EPA.
Continue reading