As Open States closes in on our initial goal of supporting all 50 state legislatures (just 3 more to go!) we're also planning to put out a report card evaluating state legislative data across every state.
As the 40+ individuals that have sat down and helped us scrape state sites can affirm, most states simply don't do a decent job of making legislative information available so we're hoping that this can serve as a sort of wake up call to states that make this vital data far too difficult to access. For those few states that are doing a good job we're hoping to praise their commitment to open data and point out areas where they can do better.
We've come up with a set of criteria based on Sunight's "Ten Principles for Opening Government Data" (which expand upon the 8 Principles of Open Government Data) that we feel we can fairly apply to the states and created a survey to evaluate states against this criteria.
In order to guarantee a high quality report we'd like to get several responses per state and that's where you can help us out. Click the link below to head to a form that will ask you to evaluate the information that your state legislature makes available via their official website. By doing this you'll help us ensure that our eventual report is as accurate and as complete as possible.
(If you have any questions feel free to contact jturk@sunlightfoundation.com. If there are any questions you aren't sure how to answer we'd prefer you leave them unanswered instead of guessing.)
Continue readingOpen States Source Visualized
Open States recently reached a milestone in that we now support 40 states (and DC and Puerto Rico) and at our current pace we'll reach our goal of all 50 states by sometime early next year. It is only due to the fantastic support of our community and indviduals who have showed up at hackathons or just started contributing on their own that this goal is now in sight.
I thought it might be fun to look back on how the project has grown, and luckily gource is a piece of software for visualizing the history of a repository can help do just that. Watch below to enjoy a visually stimulating look back through the last two and half years of commits to Open States. You'll see flurries of activity around our hackathons, the drastic increase in activity from 2009 to 2010 and how 2011 so far takes up more than half the video, and some of the big refactors that we've made along the way to scale the project to a size well beyond what we initially conceived of.
Continue readingLabs Olympics 2011: How Is Babby?
For this year's Labs Olympics I was on an all-star team comprised of Aaron, Alison, Tim, and myself, better known as the Labs Olympics Winners (note: we did not win, this was just our team name). Alison has a young baby at home and Aaron was out during our first brainstorming session for the birth of his niece so it wasn't a big surprise that we wound up with a plan to make a sophisticated baby monitor. (It might come as even less of a surprise that we named it How Is Babby in honor of an infamous web meme.)
At first all we knew is that we wanted to use some random gadget or assortment of Arduino sensors to give geek parents a way to monitor their geek children, but it wasn't until we realized we had a spare Microsoft Kinect sitting around the office that we realized exactly how far we could take it.
Kinect
The Kinect is an impressive device, sporting 4 microphones, RGB and IR cameras, an additional depth sensor, and a motor that allows vertical panning. Getting the Kinect running on Linux is a fairly well documented process. We leaned heavily on instructions from the OpenKinect community, which worked pretty much without issue. After doing the usual
cmake, make, make installdance, things worked without issue on Ubuntu 11.04.
Also included in the OpenKinect source tarball are bindings for a half dozen languages, including Python. Having a Python wrapper made things incredibly easy to experiment with as I had access to python OpenCV bindings for displaying image data and NumPy for manipulating the matrices that the Kinect driver returns.
With these tools in hand we just had to decide what we actually wanted to get from the Kinect. We decided to take regular snapshots to present via a web interface, and also have a mechanism for the Kinect process to notify the web client when there was motion. Snapshots were extremely easy: with just a single line of code, we were able to bring back the RGB image from the Kinect's main camera and convert it to a suitable format using OpenCV. Once we made the discovery that there was also the option to bring in an IR image, we added a night-vision mode to our application as well. This way, the parent can adjust the camera to either take a standard image in normal light situations or switch to the IR camera for the night. (Due to a hardware limitation of the Kinect, it is impossible to use the RGB and IR camera at the same time.)
Given the uncertainty in the amount of available light and the fact that the depth sensor provided simpler data to work with (essentially a 2D matrix of depth values refreshed about 30 times per second), we decided to use the depth sensor to detect motion. NumPy's matrix operations made this a breeze. By averaging the depth of the frame and comparing the deviation across a range of frames, we could flag each individual frame as likely containing motion or not. Depending on the desired sensitivity of the alerts, the application would wait for anywhere from ten to thirty frames of consecutive motion before notifying the web application that the baby was on the move.
The Web Application
As opposed to a traditional baby monitor, which has a dedicated viewing apparatus, we liked the idea of a web console that could be viewed from anywhere, including via a mobile device. The main features of the web app would be viewing, motion alerts, and configuration of features such as SMS notifications and nightvision. The basic web app was built with Django, but we used a few add-on libraries to help accomplish our goals in the two days given for the contest.
We decided that the easiest way to get images to the user was to have the web page embed a single image that the monitoring software would update at a set interval. We used Socket.IO for a very light-weight solution to keep the image updated to the latest version. In the best case scenario, i.e. the user's browser supports it, Socket.IO will use WebSockets to keep the connection open, but will degrade gracefully and fall back to AJAX or other means to get the job done.
Because our team lacked a designer, we used a CSS framework to take care of cross-browser issues and provide some pre-designed UI elements. Twitter just recently released their Bootstrap framework, so we went with it. It styled all of the UI elements on our site, including a navigation bar, alert boxes, buttons, and a form. Although we had some unresolved trouble with the form elements not lining up properly with their labels, it proved very easy to work with, overall.
The remaining technical component of the website was the AJAX alerts on motion events detected (and logged in a DB table) by the backend. There were a few criteria for how it needed to work, the most important being that alerts needed to be somewhat persistent to the user, so that a user couldn't miss an all-important alert saying that the baby was moving, just because they were clicking quickly between pages on the site, for instance. This meant that we needed something more sophisticated than Django's inbuilt messaging framework (django.contrib.messages). The answer came in the form of django-persistent-messages. It was built to work right on top of Django's messaging system, so it worked seamlessly and was a no-brainer to set up. With django-persistent-messages working, alerts now would not disappear unless dismissed by the user, hopefully averting any potential baby-on-the-move mishaps.
In the end, there were a few features we had to leave unfinished to get the project out the door on time, including audio monitoring and SMS messaging, but we were pretty happy with the results. As usual, all of our code is available on GitHub: How Is Babby.
Continue readingOpen States API: 1 Year Later
Last September we announced the first public release of the Open States API. The API enables programmatic access to all of the key artifacts of the state legislative process. The API currently provides a standard interface to bills, votes, legislators, committees, and events across 36 states, Washington DC, and Puerto Rico.
Seeing as it has been a year since this first public release it seems like a good time to check on on where we are today and where we're going next.
Continue readingOpen States Reaches Halfway Mark
Today marks an important milestone for the Open State Project: the addition of New York to our list of experimental states brings our total number of supported states to 25 (plus Washington DC). This marks the halfway point on our journey to bring clean, consistent, machine readable legislative information to all 50 states.
This means that residents of 25 states (accounting for approximately two-thirds of US citizens) can access their state's legislative data in a variety of machine readable formats (JSON, XML, CSV) and will soon be benefiting from sites like like OpenGovernment.org and MyGov365 that use our bulk downloads and free API to keep citizens informed about their state legislature.
Continue readingSunlight Labs & Google Summer of Code 2011
We're proud to announce we've been accepted as a mentoring application for the Google Summer of Code 2011.
If you aren't familiar with Google Summer of Code, it is a great opportunity for college students and open source organizations to work together. Google pays students a $5000 stipend in exchange for their work on an eligible project. For more details about the program in general visit the GSoC 2011 website.
This is our third year participating and we're looking forward to another great summer and a new batch of students and projects.
Continue readingSunlight @ PyCon 2011
Two years ago we held an Open Government Sprint at PyCon 2009. We had never hosted an event like that before, and had no idea what to expect. To our amazement we ended up with one of the largest groups of any of the sprint projects, completely filling our room for the first few days. Approximately 30 people attended and kicked off what has now become the Open State Project.
Next week, we'll be heading to PyCon and hosting an Open Government Hackathon for the third year in a row. The primary focus will again be the Open State Project but our space is open to everyone interested in government data. If you have a project you'd like to hack on let us know and I'll be sure to mention your project when I plug the sprint. If you aren't attending PyCon but happen to be near Atlanta you're welcome to join too, the Hackathon is free and open to the public (March 14th-16th @ the Hyatt Regency in downtown Atlanta).
Additionally, I'm going to be presenting a poster on the technical aspects of the Open State Project on Sunday. I'll be around to talk about the project itself but also web scraping and opening government data in general, so if you're at PyCon stop by during the poster session Sunday morning and say hi.
Continue readingNew Hampshire Opens Its Legislative Data
As recently covered on TechPresident, the New Hampshire General Court (their state legislature) has made an extremely welcome addition to their website in the form of a downloads section.
New Hampshire isn't the first state to offer such a thing: New Jersey has a similar section on their website, and quite a few states like New York and Kansas are introducing APIs to their new legislature websites. What is interesting, however, is the fact that the justification for offering the data presented by freshman representatives George Lambert and Seth Cohn is centered around reducing cost and strain on the legislature's website caused by web scrapers.
The load placed on sites by scraping them is something that we know a little bit about. Our Open State Project is currently crawling 18 state legislatures once a day, hitting over 100,000 pages nightly. Bulk downloads like New Hampshire's make it possible for us to take in all changes by simply downloading a few files every night instead of hitting thousands of pages--most of which haven't changed. Even though we take precautions like rate limiting our scrapers and having them back off if the site seems to be failing, we still see the occasional failure during our scraping run, which unfortunately only causes us to have to run the scraper again.
New Hampshire and its citizens will see other benefits of the bulk data beyond a less-burdened website. Consumers of the data will now be able to take the data in much faster than they previously could. There's also a much smaller potential for errors when you are importing data from a machine readable source like a CSV or database file. This means that tools built on top of scraped data (like the recently launched OpenGovernment beta) will be able to have more accurate and up to date data.
Those responsible for making this change happen in New Hampshire should be proud of the change that they've enacted. A preliminary glance at the actual New Hampshire data makes it look promising. As the data is quite new unfortunately they are not yet including roll call votes or links to the full text of bills, but we'll reach out to them to see if these oversights can be fixed in the near future. Hopefully New Hampshire is just one of many states that will start seeing the benefits of providing bulk access. To help show what is possible we'll be adding New Hampshire support to the Open State Project as soon as possible.
Continue readingSunlight Labs Virtual Office Hours: Jan 28 @ 3pm
All of us here are always interested in ways that we can better interact with all of you, whether it is contributing to our projects, posting on the Sunlight Labs Google Group, or just contacting us directly we're always impressed with the dedication and creativity of our community.
In addition to these better known channels we also have an IRC channel, but it is usually pretty quiet, so we'd like to try something new as a way to give more of you direct access to our team.
Friday, January 28th at 3pm Eastern we're going to be hosting office hours in #sunlightlabs on irc.freenode.net. This will be your chance to join us to talk about what we're working on, show off what you're working on, ask us questions, or just hang out and chat.
Continue readingWe Have a Winner: Contributor Raffle Update
Last October we announced that to celebrate our 100th project on GitHub we'd be giving away a prize to one lucky contributor.
Earlier this week we drew the names from all of those who were eligible and entered and Brandon Lewis was drawn as the winner.
Continue reading