Stream Congress: An HTML5 App in the Google Chrome Web Store

by

Earlier today, Google announced their Chrome Web Store. For its launch, Sunlight is thrilled to announce a new HTML5 app called Stream Congress. Stream Congress gives you a quick look into what exactly your members of Congress are up to. Resembling a lifestream (but for Congress), the app takes in data points from various sources and combines them into a clean, real-time interface. Consider the app to be in a Developer Preview for now: we’re going to launch it in earnest when the new Congress begins in January. Your feedback is appreciated.

Today, I wanted to share with the Sunlight developer community the process behind building this HTML5 app. It really does feel like we’re entering a new era of the Web, and it’s important for the civic hacking community to lead the way.

Beyond the Buzz

Why not just use bookmarks? That’s a usual question when first presented with the notion of web apps, particularly with how they’re presented in the Chrome Web Store. And it’s true: technically, web apps are just web sites. In fact, that’s long been considered a strength of the Web. Web sites don’t need to be downloaded or installed; they’re just accessed through an address. But with iPhone and Android, the idea behind apps has grown in popularity. As we’ve seen in the debate between native development versus web development for mobile devices, native apps are usually preferable to websites.

But for the past few years, we’ve seen an incredible amount of innovation around web browsers — efforts to make web apps feel more like native apps. Google, Apple, Mozilla, and Microsoft have been trying to outdo each other: to make the Web faster, easier to use, and more secure. That innovation has been aided by a large number of new standards and technologies that have popped up, most of which are described on Google’s excellent site HTML5Rocks.

The moniker “HTML5” is often disparaged because it’s grown to encompass more than just the actual HTML5 spec as drafted by the WHATWG. But I think using that term to describe this menagerie of exotic, new web technologies actually makes sense. These technologies and standards are meant to be used together and complement each other. They’re of the same generation. So don’t fret. It’s fine to lump these technologies in together with HTML5: CSS3, Web Workers, WebSockets, Geolocation, Local Storage, IndexedDB, AppCache, Notifications, File API, Canvas, and WebGL.

HTML5 in Practice

Reading about the new whiz-bang HTML5 technologies is one thing, but it’s another to actually implement them in an app. There are three important lessons I got out of this experience that I want to share with you. First, websockets are incredibly useful and quite refreshing, but very difficult to implement. Second, it’s important to know how to fall back when HTML5 technologies aren’t present. Third, the amount of JavaScript required to build HTML5 apps begs for some order and sanity.

WebSockets

The most interesting part of Stream Congress is how the streaming is done. Entries stream in live, as they’re fetched from various sources around the Internet. Before HTML5, we’d use AJAX polling to do this, or a long-polling technique like Comet. But both polling techniques can be expensive, AJAX on the client side, and Comet on the server side. Enter WebSockets, which are just simple TCP sockets implemented in the browser (with a small handshake implementation on the server-side).

The most distinguishing feature of WebSockets is that they don’t speak HTTP. This can make one feel uneasy at first: HTTP is so deeply intertwined with the concept of the Web. But WebSockets provide an important functionality to web browsers: an always-on, persistent socket to a server. The JavaScript code for WebSockets is simple enough. In fact, most of the JavaScript APIs for the HTML5 suite of technologies are straightforward and easy to work with. Here’s how Stream Congress listens for new entries as they happen (edited for clarity):

var liveStream = function() {
  var liveSocket = new WebSocket("ws://streamcongress.com:8080/live");
  liveSocket.onmessage = function(payload) {
    addToStream(payload.data);
  };
};

What’s more difficult to implement is the WebSocket server. Server-side libraries for WebSockets aren’t yet mature, especially when compared to web app frameworks like Rails and Django. Being a Rubyist, I looked at a few options:

  • em-websocket – A low-level library built on top of EventMachine, an event-based networking library.
  • socky – A push server built on top of em-websocket. I initially looked at this, but realized that it only had support for server-to-client communication (push), but not client-to-server communication, which still relied on AJAX. It does, however, integrate well into a Rails app.
  • cramp – The library I eventually settled on, which is a sort of mini-framework for WebSockets.

The entire WebSocket server fits in one Ruby file, note even breaking 80 lines of code. Here’s the WebSocket for the live stream (abbreviated for clarity):

    # 
    class LiveSocket < Cramp::Websocket

      @@users = Set.new

      on_start :user_connected
      on_finish :user_left
      periodic_timer :check_activities, :every => 5

      def user_connected
        @@users << self
      end

      def user_left
        @@users.delete self
      end

      def check_activities
        @latest_activity = Activity.new if @latest_activity.nil?
        new_activities = Activity.where(:_id.gt => @latest_activity._id).
                                  desc(:_id)
        @latest_activity = new_activities.first unless new_activities.empty?
        render new_activities.to_json
      end
    end

I’ll admit, I’m not 100% comfortable with this code. For example, a LiveSocket object is instantiated for every client that connects. To make things worse on myself, I keep track of the @latest_activity for every single client instance, meaning I hit the database for each client every 5 seconds. Not a big deal with a database that caches responses, but not exactly ideal. I settled on this solution because it was the one that consistently worked. Creating a class variable @@latest_activity led to weird contention issues that I was never able to resolve.

Because WebSockets are just plain TCP, your messages can be passed around using any format. I chose to simply pass around JSON objects as strings, as described on the Bamboo Blog.

Deploying the WebSocket server led to more issues, as cramp’s native thin binding capped persistent connections at 256. I worked around that by deploying thin directly with these options:

$ thin start --max-persistent-conns 10000 -e production -p 8080 -R cramp/socket_server.ru

At the RubyConf hackathon last month, I met someone from Pusher. Pusher provides WebSocket push (server-to-client) as a service. With Stream Congress, client-to-server communication is important, so I stayed with Cramp. But if you’re looking to offload the work of hosting a WebSocket server yourself, have a look at Pusher.

Another hurdle concerned testing the WebSocket server. After quite a bit of searching, I discovered that there’s exactly one client library out there that correctly implements the latest WebSocket spec: node-websocket-client. It is the only option that I know of if you want to do any sort of programmatic testing of your WebSocket server. I used it for load testing.

Falling Back

If you’ve visited Stream Congress in a browser other than Chrome, you know that it doesn’t even try to handle other web browsers. That’s not exactly a best practice — a shortcoming that needs to be rectified.

The good news is that there are plenty of open-source projects that will help you degrade gracefully if the latest HTML5 technologies aren’t present on the client-side:

  • Socket.io for sockets, falling back using multiple techniques. Even has a Node.js server-side component.
  • web-socket-js for Flash fallback on WebSockets.
  • store.js for localStorage, which gives you a key-value store in the browser.
  • history.js for the HTML5 history API, essential for clean single-page apps.
  • WorkerFacade when Web Workers aren’t available.

Structured JavaScript

Lastly, a word about JavaScript. You’ll be writing a bit more than you’re used to. While building this app, the vast majority of my time was spent in JavaScript. Looking at the app’s application.js, it’s clear to see that it’s devolving into pyramid code. Callbacks are nested in callbacks a bit too much, creating a potentially disastrous pyramid of callbacks.

The most notable attempt at solving these issues is Backbone.js, by two-time Apps for America winner Jeremy Ashkenas. If Backbone was around when I had started the project, I would have definitely gone with it.