Tools for Transparency: Google Refine

by

Kicking off TFT for 2011 is guest blogger Rebekah Heacock, co-director of the Technology for Transparency Network and a Project Coordinator at Harvard’s Berkman Center for Internet and Society. Heacock holds a Master’s degree in International Affairs from the Columbia University School of International and Public Affairs.

For the past six months, I’ve served as the co-director of the Technology for Transparency Network, an organization that documents the use of online and mobile technology to promote transparency and accountability around the world. One of the most common challenges the project leaders we’ve interviewed face is making sense of large amounts of data.

In countries where governments keep detailed digital records of lobbying data and education expenditures, data wrangling is a time-consuming, labor-intensive task. In countries where these records are poorly maintained, this task becomes even harder — everything from inconsistent data entry practices to simple typos can derail data analysis.

Google Refine (formerly Freebase Gridworks) is a free, open-source tool for cleaning up, combining, and connecting messy data sets. Rather than acting like a traditional spreadsheet program, Google Refine exists “for applying transformations over many existing cells in bulk, for the purpose of cleaning up the data, extending it with more data from other sources, and getting it to some form that other tools can consume.”

At its most basic level, Google Refine helps users quickly summarize, filter and edit datasets by allowing them to view patterns and to spot and correct errors quickly. More advanced features include reconciling data sets (i.e., matching text in the set with existing database IDs) with data repository Freebase, geocoding, and fetching additional information from the Web based on existing data.

Though it runs through an Internet browsers, Google Refine operates offline, making it attractive for those with limited bandwidth or privacy concerns — a group that includes many of the projects listed on the Technology for Transparency Network.

Google Refine isn’t going to solve the problem of poor data availability, but for those who manage to gain access to existing records, it can be a powerful tool for transparency.

For more information, check out the links and video below: