The Data and Tech Behind Sitegeist

by

It’s been three weeks since we launched Sitegeist and the response has greatly exceeded (my) expectations! There have been over 27,000 downloads of the iOS and Android apps and a flood of feedback emails. Thanks again to the Knight Foundation and IDEO for their help in creating the app!

There were a bunch of questions about the data and technology used to produce Sitegeist, so let’s dive into how the project works.

Data

Our goal with Sitegeist was to show how government data could be made useful to the average person. We incorporated demographic information, campaign contributions, weather and other data that comes directly from the government or through a secondary source. We also use some privately owned data sets to supplement this information.

US Census Bureau

Tha majority of the data used in Sitegeist comes from the US Census Bureau and their wonderful API. To help with development, we’ve created census, a Python wrapper for the Census API. For example, a call to get the name and number of children under 5 years of age for every state:

c = Census("MY_API_KEY")
c.acs.state(("NAME", "B01001_004E"), Census.ALL)

I’ve never worked with the Census bulk downloads, but I hear from colleagues that it can be a daunting task. Their API makes it incredibly easy to slice and dice the numbers as needed, combining data across “tables” for any geography you are working with.

Influence Explorer

Campaign contributions by ZIP code is provided by our own Influence Explorer project. Contributions within a ZIP code are totaled based on party affiliation of the recipient. Pretty straightforward, not much to say about this.

EPA Easter Egg

Have you ever looked at the Environment pane and seen the oozey alert? No? Good for you, you haven’t been around any contaminated sites! We wanted Sitgeist to include a number of “easter eggs” that show up or change only in certain contexts. It’s an easy way to make apps like this fun and more interesting. Due to time contraints, we were only able to get a few things like this added, but there are some ideas for more that we might work on.

We loaded a bunch of locations that the EPA considers to be contaminated; Superfund sites and more. If a site exists within five miles of your current location, we show the scary ooze creeping down the Environment pane. Clicking on this ooze will show you a page from the EPA site with more information on the contaminated location.

Third-party Services

All of the previously mentioned data sets have been loaded, cached and are sitting in a database on our server. The other third-party services, whether due to terms of service restrictions or praticality, are loaded on demand via their respective APIs. These services include Yelp and Foursquare for local business information and Dark Sky and Weather Underground for weather data.

A Note On Locations

The data sets that power Sitegeist use a number of different geographries: census tracts, ZIP codes, top-n closest from a location and things within a certain radius of a location. As far as the app is concerned, you exist at a point in a 2d plane. It would be inaccurate to say you are in San Francisco if the data you are seeing is tied only to a few small blocks within the city. Likewise, the names of some geographies make no sense at all; do you really care that you are in Census Tract 107?

The goal of the design was to present a clean and readable infographic-style display of the information. Geographic metadata was left out because it only contributed to clutter that most people don’t care about. I emphasize, most people; the people that do care have definitely let us know!

It’s definitely an interesting problem. If you have any insights, let me know in the comments.

The Web Servers

All of this wonderful data resides on our servers. When you select a location, the latitude and longitude are passed along with the ID of the pane you want to view. Of the various geographies we keep track of (census tracts, ZIP codes, etc.), the boundaries of any shape that contains your location is found. This uses a customized version of Chicago Tribune’s boundaryservice. We then match those geographic boundaries with any data we have, making calls to third-party APIs as needed. The collected data is rendered into templates and returned to you as the beautiful infographics you see in the app.

In order to reduce load on our servers, the rendered panes are cached for a short period of time. So if you make a request again for the same pane in the same location, we’ll just return the pane instead of making the API and database calls again. The cache timeout is very short, 10 minutes, so you’ll always get relatively fresh data.

Mobile GPS devices are not incredibly accurate and your location can move around many meters even if you are standing still. I really don’t want to waste the time doing database calls just because your phone corrected itself and moved you 5 meters to the left. The cache takes your location into account as well and will return cached data if you are within a certain “snap radius” of a previously rendered and cached request.

Android and iOS

Finally we have the mobile apps themselves. Though the infographic panes are HTML, the apps are native to their respective platforms. At the start of the project I looked at the possibility of using a pure HTML or cross-platform framework, but none of them allowed for the responsiveness and system integration we wanted. Going native allows us to take advantage of platform features like social sharing, maps, GPS and others while still using a single platform for generating the pane content. The other advantage of using web views for content is that we can make certain updates to the application, such as adding a new data view, without having to update both apps or waiting for Apple’s App Store approval process.

Admin Dashboard

I also created an admin panel that allows us to load up panes for any combination of devices, locations and such. This has been incredibly useful when testing new features or troubleshooting data associated with specific locations.


The source for the Android app is available now and the web and iOS projects will be published within the next week.

If you haven’t yet, check out Sitegeist!