Sunlight Foundation

Tools for Transparency: A How-to Guide for Social Network Analysis with NodeXL

This post by guest blogger Justin Grimes is the second and last half of a special edition of our Tools for Transparency series by guest blogger Justin Grimes series. Justin (@justgrimes) is a PhD candidate at the University of Maryland's College of Information Studies, a research assistant at the Information Policy and Access Center (iPAC), and a member of the Human Computer Interaction Lab (HCIL). His research areas focus on information policy and information access. In general he geeks out at hacking transportation data and loves talking about all things data.

Last week, Justin talked us through a Social Network Analysis (SNA) of people tweeting with the TransparencyCamp 2012 hashtag #tcamp12:

For more about this infographic and general Social Network Analysis, you can check out Justin's last post. If you're ready to try SNA for yourself, here's his guide for how to get started:

As I said earlier, you need two things to do social network analysis: software and a question. NodeXL will be our software. Our question for this example will be what does network of Twitter users at TransparencyCamp 2012 look like? To answer this question I’m going to analyze Twitter activity of TransparencyCamp 2012 by capturing all tweets that contain the hashtag #tcamp12. I’ll give you a step-by-step walkthrough of how I answered this question.

Prerequisites:

  • Windows machine (or Linux w/ Wine)
  • Microsoft Excel 2007 or higher
  • NodeXL
  • Internet connection
I’ll assume that you have all of these installed and ready to go for this example.



1) To get started we need to load NodeXL...


Open up NodeXL Excel Template and click “NodeXL” from the toolbar.

2) Now we are going to get our data...


Click "Import" from the Ribbon.

Notice that there are a variety of different ways to load and import data into NodeXL. We are going to import data directly from from Twitter for this example. Since we are gathering data from a search query we are going to select “from twitter search network.”

Click "From Twitter Search Network..."

Type query under "search for people whose tweets contain:"

In this example we are going to type in our query term "#tcamp12". Feel free to query any word or hashtag. Try to think about your query. Put some effort into formulating a query. Make sure its specific. Broad terms and homographs won't be useful. For example searching for "apple" could include results from Apple the company, apple the food, etc. #hashtags help.

Selections for under “Add an edge for each”

Check Follows relationship (slower) Check “Replies-to” relationship in tweet Check “Mentions” relationship in tweet Check Tweets that is not a “replies-to” or “mentions”

Other selections:

Uncheck Limit to __ people. Check Add a Tweet column to the Edges Worksheet Check Add statistics columns to the Vertices worksheet (slower).

Select under “Your Twitter Account”

The best way to collect data is by having a Twitter account that has authorized NodeXL to collect data on your behalf. If this is your first time running NodeXL you will want to select “I have a Twitter account, but I have not yet...” It will open a browser window and ask you to authenticate NodeXL by logging into Twitter. Type your user and password and authorize the app. You will be given a pin number which you will type back into into NodeXL application. You only have to do this once: NodeXL will remember this in the future. If you have run NodeXL before select instead “I have a Twitter account and I have authorized...”. If you don’t have a Twitter account, you will want to select “I don’t have a Twitter...”

IMPORTANT: The selections on this screen will affect what data is collected from Twitter. Be careful with your selections. Depending on the size of a network this can take a long time or you might get rate limited by Twitter*. To avoid this try limiting the number of people and/or uncheck “Follows relationship” and “Add statistics columns to the Vertices worksheet” but know that you will get less data for your efforts.

What is a rate limit, you ask? It's the name for a restriction put on to a user of a public APIs (application programming interface). A rate limit basically restricts your requests in some way. In this case Twitter restricts the number of queries that can be made by a user in the span of an hour. If you reach a rate limit then you must wait a period of time before you make any more requests. Think of it as being placed in a penalty box and, just like the penalty box, you'll just have to sit there and stew until your time is up.

Once everything has been selected click “OK”. If you have time out or hit a rate limit and can’t wait go back and select the defaults.

3) Wait while all the data is being collected...


Remember if this takes too long, or you get rate limited and don’t want to wait, you can limit your data.

Go back to import screen and select:

Check Limit to __ people; and select “100”

4) Ta-da!


Now that data has been gathered we can begin to explore our network. Notice the two panes. One shows several spreadsheets of data: edges (nodes), vertices, groups, group vertices and overall metrics. The other pane will show a graphical representation of our network.

Save the file.

Before we start we should save our work. Pick a filename and a location. I named my files after the type of data, query and time. For example: nodexl_twitter_tcamp12_051012.xlsx.

NOTE: You'll notice that your data (and graph) will probably not resemble the one I did earlier. This is ok. The reason for this is that too much time has passed for NodeXL to easily access this data from Twitter. If anybody wants to play with the original data file I scraped, I've made my data available for download here.

5) Let’s start analyzing our data...


To help simplify things we are going to automate some of the analysis process.

Click “Refresh Graph”

A graph is generated. Sadly this doesn’t tell us much. The data is still messy and requires a little more work.

Go the the ribbon menu and...

Select Type: Directed (default)

There are basically two different graphs types: directed and undirected. Undirected graphs have edges with no orientation (i.e no direction). Directed graphs have direction that has meaning. For example if we have a directed graph where A is connected to B this means that A is connected to B in some fashion but the relationship is not reciprocated. If we had an undirected graph and if A is connected to B, then B is also connected to A because the relationship is mutual and reciprocal. Think of this as "Twitter vs Facebook". Facebook relationships are symmetrical if you friend someone you are both friends with each other. Twitter relationships are asymmetrical if you follow someone that doesn’t mean they automatically follow you.

Select Layout: Fruchterman-Reingold (default)

There are lots of different methods for laying out a graph. Two popular methods provided by NodeXL are the Fruchterman-Reingold and Harel-Koren Fast Multiscale which use their respective algorithms to optimize the layout of the graph. Don’t worry if you are curious you can explore various layout methods easily.

Click “Automate”

Select all except for “Save image to file”

This automated process will do several things: merge duplicate edges which are unnecessary noise; automagically attempt to group nodes by a cluster algorithm; generate useful metrics about the network; create subgraphs for each node; and generate a graph of the network.

6) Rawr! Behold your mighty SNA wizardry!



Notice the graph generated in the right pane and notice the “vertices” tab (if the “vertices” tab is not selected go ahead and select it).

Let’s start exploring the results.

In the “vertices” tab you’ll notice several columns. Most of the columns are self explanatory so let’s look at the few you might not be familiar with: degree, in-degree, out-degree, betweenness of centrality, closeness of centrality, eigenvector centrality, and subgraph. These are all metrics that can be used to analyze a social network. Degree centrality measures the number of edges of a node. If graph is directed, degree metrics will be split into in-degree (points inward) and out-degree (points outward). Degree centrality can be considered a measure of popularity. The higher the degree the more directly connected the person is. Betweenness centrality is a measure of “a node’s centrality in the network equal to the number of shortest paths from all other vertices to all others that pass through that node” or more simply it is a measure of a node’s ability to bridge different subnetworks. If you remove nodes that have a high betweenness of centrality subnetworks become disconnected. The higher the betweenness centrality score the better and it is a useful metric for understanding important nodes on the network. Closeness centrality is a measure of the average shortest distance from each vertex to each other vertex. Direct connections and shortest paths are important. A lower closeness centrality score is better. Eigenvector centrality is a metric that measures the degrees of the nodes that a node is connected to. Similar to degree but this extends itself to calculate how “connected” are the nodes connected to you. Think of it as a way of determine how popular a person’s friends are. Subgraphs are like mini “ego” graphs created for each node on the network. Each subgraph shows all the nodes that node is connected to.

In the graph pane, you’ll notice that you can select individual nodes, move nodes, zoom and scale the graph to better see things. When you select a vertex (node) you will see it selected in the “vertices” tab. Let’s take a moment and make it easier to identify vertices on the graph. Click the button “Autofill Columns” in the NodeXL ribbon. Next click on the vertices tab. Under vertex label, select “vertex”. Then click the “Autofill” button, and finally, close. Notice that Twitter user names have been generated and associated with each node. Next click on “graph options”. Here you can make changes to the graph to improve legibility. You can change the color, size, opacity and curvature of edges, and for vertices you can change the size, opacit change effects, etc.

Feel free to take a moment and explore this data. Sort various columns to see who is the top in each metric. Explore various nodes to see how they are connected. Look at groupings. Does anything seem interesting? To help in your exploration use “Dynamic filters” to filter and explore results. Click on “dynamic filters” button in the graph pane. From here you can use the double box sliders to select only certain nodes that met some condition (i.e time, metric,characteristic). Once you filter results you can use “lay out again” feature to lay out only vertices that match those conditions. Just click the drop down arrow on “lay out again” select “lay out visible vertices again”. Try different methods for laying out the graph.

Now click on the “overall metrics” tab. You’ll see useful metrics for the overall graph. You’ll see the total number of vertices (nodes), edges and self loops. Self loops are nodes that are connected to themselves. In this case, self loops are mostly like retweets. Three metrics you'll encounter here that you might not have heard before are geodesic distance, graph density and modularity. Geodesic distance is metric for measuring the distance between two vertices in a graph is the number of edges in a shortest path connecting them. It is the number of edges in the shortest possible walk from one vertex to another. Graph density is a metric that measures the sum of edges divided by the number of possible edges. Modularity is metric for measuring the structure of a graph.

If you would like to export your graph as an image, right click on the graph in the graphs pane and click “Save Image to File” then click “Save Image”.

There is plenty more stuff that I didn’t get to cover in this post, but this should be enough to get you started on your road to SNA mastery. Below are some additional readings for social network analysis and NodeXL.

Further readings:

Now go have fun!

Tools for Transparency: NodeXL

This week's Tools for Transparency post is part of a two-part mini-series by guest blogger Justin Grimes. Justin (@justgrimes) is a PhD candidate at the University of Maryland's College of Information Studies, a research assistant at the Information Policy and Access Center (iPAC), and a member of the Human Computer Interaction Lab (HCIL). His research areas focus on information policy and information access. In general he geeks out at hacking transportation data and loves talking about all things data.

Visualizing the TransparencyCamp Community


I attended TransparencyCamp 2012 earlier this month and, like every other year that I have attended, there were lots of people and good conversations. This year I was particularly amazed at the sheer number and diversity of those in attendance. This got me thinking about the people drawn to this event and the relationships between them. I wondered, “wouldn’t it be neat to see what this community looks like?” So I decided to gather some Twitter data and do a little social network analysis on the #tcamp12 community.

Here are the results...

Click to see the full image at a a higher resolution.

What you are looking at is a graphical visualization of the community that tweeted with the hashtag #tcamp12 during TransparencyCamp 2012.

This graph was made using NodeXL and contains all Twitter users who sent tweets with the TCamp hashtag from April 28th to May 1st, 2012. In this graph you can basically see “who’s talking to whom" -- meaning the “circles” are Twitter users and the “lines” signify a mention from one user to another user. In this graph there are 367 nodes (“Twitter users”) with 1107 unique edges (“mentions”).

The graph is laid out using a Fruchterman-Reingold algorithm. Twitter users are grouped by color automagically by the Clauset-Newman-Moore clustering algorithm. Twitter users are sized by "betweenness centrality" -- a useful metric for evaluating nodes in a network besides just popularity (i.e. number of direct connections you have with other people). In technical terms, betweenness of centrality measures a “node’s centrality in the network equal to the number of shortest paths from all other vertices to all others that pass through that node”. In layman’s terms, this helps us identify the people (or "nodes") who bridge different networks or communities within a network or community. In essence, the higher the value of "betweenness", the more important you are to maintaining connections between groups. You are “the broker” between communities and have influence as such. Start removing nodes that have a high betweenness of centrality score and groups become disconnected and isolated.

The average betweenness centrality for the #TCamp12 community is 834.807. Keep this number in mind as you review the table below.

Top 10 #TCamp12 users ranked by betweenness of centrality:

@tcampdc              23502.981
@sunfoundation  16236.783
@craigfifer             15258.757
@tsagov                 14022.989
@citizentools        13420.000
@elle_mccann       12504.825
@digiphile              11569.597
@_anna_shaw       10835.748
@javaun                  8020.142
@joelogon              7213.984

Overall graph metrics:

Vertices: 367
Unique Edges: 1107
Self-Loops: 164

Maximum Geodesic Distance (Diameter): 8
Average Geodesic Distance: 3.540974
Graph Density: 0.007020443
Modularity: 0.447527

Below is another visualization of the same data but this time clustered groups are organized in boxes and the layout is done by using Harel-Koren Fast Multiscale algorithm. This graph is a little better in terms of clarity because it highlights different subnetworks.

Click to see the full image at a a higher resolution.

DIY NodeXL


So how can you do this type of analysis to help understand your community members or the ways in which they interact? Easy! and I’m going to show you how to get started. To do this I will explain the basics of social network analysis and then, I will then walk you through the process of collecting, analyzing, and visualizing social network data using a tool called NodeXL.

So what is social network analysis (SNA)?

Social network analysis (SNA) is the methodological study of social networks. Social networks are social structures made up entities (i.e. individual people, organizations, etc) and their dyadic ties (i.e. relationship, connection, etc). In technical terms we call these entities “nodes” or “vertices” and we call these ties “edges” or “links” or “connections”. A social network graph visualizes the network of nodes and edges.

Besides being just generally interesting, social network analysis is one way of helping us make sense of the world around us. Networks are everywhere. Social network analysis is a good way to understand social structures in our society and can be particularly useful towards mapping and measuring the relationship between people.

To perform social network analysis you’ll need software to help you perform the analysis (and a question). There are lots of amazing software tools for performing social network analysis to choose from: NodeXL, Gelphi, Pajek, etc. For beginners, I always recommend using NodeXL. NodeXL itself is an open source plugin for Microsoft Excel. It is free, easy to use, requires no programming experience, little prior SNA knowledge, and has wonderful documentation and a solid community supporting it. One of the nicer features of NodeXL is that it can automagically import data straight from social network sites such as Twitter and Flickr. The only serious drawback or criticism I have for NodeXL is that it Windows only and requires Microsoft Office. [Disclaimer - although NodeXL was largely developed at Microsoft, I’m affiliated with the HCIL, which has several members who have contributed to this project; I was not one of them].

As I said earlier, you need two things to do social network analysis: software and a question. NodeXL will be our software. Our question for this example will be what does network of Twitter users at TransparencyCamp 2012 look like? To answer this question I’m going to analyze Twitter activity of Transparency Camp 2012 by capturing all tweets that contain the hashtag #tcamp12.

To get the answer to this question, stay tuned until next week when we'll share Justin's step-by-step NodeXL guide. In the meantime, if you have Windows and want to start playing with social network data on your own, click here to download the #TCamp12 data file Justin used to complete the analysis above.

UPDATE: For the second part of this series, click here!

Help Us Plan TransparencyCamp 2013

Even though TransparencyCamp 2013 is roughly 12 months away, we’re already thinking ahead to how we can make next year’s event even better.

Our Labs team just published a round-up of “The Tech Behind TransparencyCamp”. Want to learn more about the TCamp web and mobile sites, the incredible level of detail that went into optimizing registration (not a joke), and our choice to use Etherpad over other wiki/note-taking options? Jeremy Carbaugh has your answers and more, and gives a sneak peek at what we’re thinking for next year.

We also want to hear from you. If you attended TCamp, please let us know what you think by taking a minute or two to complete this short survey: http://snlg.ht/TCamp12Survey. Last year’s survey results had a direct impact on this year’s Camp: So, for those of you who appreciated our abundance of maps and wayfinding materials, noticed an uptick in student participation, and enjoyed the lightning talks, you can thank your fellow TCampers for their input.

Some of our user-directed signage in the wild. We renamed all classrooms with new TCamp names based on which floor of the building they were located on to improve findability. Great success over last year! Original picture and more TCamp snaps on Flickr.

TransparencyCamp 2012: Reflections, Next Steps, and Thanks

Sunlight closed its doors today to take a rest after last weekend, but still I find myself pouring over Twitter and through Flickr, soaking in TransparencyCamp. TCamp 2012 was by and far the best Camp we’ve ever held, if your tweets and notes and contributions and photos and energy and exclamation points and vowed next steps are to be believed -- and I think they are.

Consider: The earliest TCamps brought people together who defined the leading edge of “opengov” in the US at the time, drawing together about 100 to 150 Campers. In 2011, we leapfrogged, gathering 200+ Campers together and opening the door to more local and international conversation. But this year was something else: Over April 28th and 29th, we brought together over 400 people from 27 countries and over 26 states to discuss the present and future of government transparency in the US and all over the world. At this point, the numbers no longer just reflect TransparencyCamp: They show that the movement as a whole is growing. For a good snapshot, check out this most excellent TransparencyCamp 2012 recap video:

Unconferences really are fueled by the participants, and so I don’t say lightly that it is because of each and every person who attended that the TCamp experience was so positive and promising. In our staff debrief this week, Sunlighters were enthusiastic to point out that the level of dialogue and debate at this year’s Camp was like nothing before. Many people shared with me variations of a similar story, one that exemplifies one of my favorite rules of unconferencing: “Everyone who is in the room is supposed to be there.” The story usually goes that in some mindblowing session about legislative data or crazy opengov tactics or the future of journalism and government accountability, one attendee or another begins to tell a story about what they’ve heard about the opengov situation overseas, in a country like Malaysia, only to have someone tap them on the shoulder and say, “I’m from Malaysia.” After this weekend, I think it’s safe to say that’s an authentic TransparencyCamp experience.

This is the new wave of TransparencyCamp: leveraging the power of face-to-face interaction to bust borders between countries and fields of work, overcome technical and procedural hurdles, and get into the kind of creative problem-solving that actually solves problems. We took a lens to these and other themes in our concluding session where we asked those Campers brave and caffeinated enough to last to the very end shared what they planned to do in the next week, month, and year after TCamp. Here are some gems I picked up from this session and throughout the conference:

  • Based on a conversation driven by a representative from Wikimedia, several Campers are going to look at how to create a global multilingual TransparencyCamp wiki to log resources, conversation, and best practices.
  • Kevin Curry, creator of CityCamp and Program Director of Code for America’s Brigade team, said that he’ll be launching a FOIA Brigade to help cities open data related to their FOI laws.
  • Jeanne Holm, the evangelist for Data.gov, launched a new website at TransparencyCamp: Developer.data.gov and discussed Data.gov’s investment in exploring open sourced technology.
  • mySociety.org's Tom Steinberg announced his intention to develop an open source, collaboratively built platform between now and TransparencyCamp 2013, with the hope of showing it off at next year’s unconference.
  • Matthew McNaughton, a TCamp11 veteran from Jamaica, shared that he's going to explore how to bring the Open311 system to his home country.
  • An army of people -- women, men, old, young, US nationals, and others -- stood up and told the crowd “I’m going to start coding.” And the folks who were already coding, like one of our lightning talk speakers, Juan-Pablo Velez, said, “I’m going to try to build the civic hacking movement at home.”
  • And to underscore a point I'll make below, many folks expressed their interest in bringing TCamp itself home. Here are the various dream Camps that we might see coming into the world in the next 12 months:

TransparencyCamp Malaysia
TransparencyCamp Latin America
TransparencyCamp Georgia
TransparencyCamp Europe
TransparencyCamp Hawaii

I shared a commitment of my own, too: After this Camp, I’m going to publish all the documentation we’ve created about how to run a transparency unconference online on the TransparencyCamp website. Inspired by the participants who, like Pedro Markun and Daniela Silva, were so excited to bring TransparencyCamp home, they made a session out of it, and by the participants in my “Meta-TransparencyCamp: Unconference Organizing” session, it seems like the logical next step.

What will you do after TransparencyCamp? Let us know. From planning to implementation, we’re interested in following these projects and others. Whether or not you joined us in DC for Camp, be sure to share what you're up to by joining and posting to the TransparencyCamp Google Group.

Being exposed to all the great minds at TCamp -- representing local, state, national, international, journalistic, academic, technical, and political interests -- was an incredibly humbling and inspiring experience. Thanks for reminding me why I do the work I do. Hope to see you in 2013.

TransparencyCamp 2012 is this Weekend!

TransparencyCamp is THIS Saturday and Sunday -- April 28th and 29th -- and it is sold out. We are going to have an enormous unconference about opengov on our hands, folks: As of Tuesday, April 24th, we sold 400 tickets and, based on the way the waitlist has grown since, we’re expecting a good deal more to join us. (Shh: Hear that? Although we were accounting for a 400 person conference, there is same-day registration available on site at Camp that will let you in if you didn’t manage to buy a ticket at time. Camp’s going to be cosy, but not uncomfortable.)

What can you expect from an enormous unconfernece? The same deal of energy, thoughtfulness, and commitment to community-driven community-building that TransparencyCamp has always relied on. To us, the sudden uptick in numbers (last year’s unconference broke all previous records, gathering up 278 people by the time the weekend was over) is evidence of increasing recognition in the relevance of transparency to different fields of advocacy and policy (especially in an election year), and the ever broadening network of people inside and out of government working to advance transparency and public access to public information (open data). This video from last year’s Camp gives a good snapshot:

For those of you who can’t make it this weekend, fear not. It's not the full TCamp experience, but we will be posting some video of recorded sessions online post-Camp. In addition, during TCamp, we expect to have a Google Hangout running and, of course, our Twitter engine in full steam: Catch “official” TransparencyCamp tweets from @TCampDC and follow #TCamp12 for the general flow of conversation.

All of us here at Sunlight look forward to meeting you this weekend, to thinking through the challenges, successes, and next steps for opengov -- and to having fun. Considering how serious an unconference about open government could be, I’m always astounded and energized by the playfulness and interactivity of Camp. I hope you will be, too!

Can’t wait to start meeting people? Join our Google Group -- http://groups.google.com/group/transparencycamp -- and/or catch up on our “Guess Who’s Coming to TCamp” series, where you can meet: Beth Sebian, Matej Kurian, Michael Mulley, Maria Baron, Marko Rakar, Dondon Parafina, Wong Aung and three of our awesome Transparency Camp Scholars.

See you Saturday!

Guess Who's Coming to TCamp12: The Dondon Parafina and Wong Aung Edition

"Guess Who's Coming to TCamp12" is a mini-series we started to introduce some of the faces that you'll see at TCamp, something we hope will be useful to attendees and non-attendees alike. So far we've highlighted Beth Sebian, Matej Kurian, Michael Mulley, Maria Baron, Marko Rakar, and three awesome Transparency Camp Scholars. Today we are proud to introduce Dondon Parafina, of the Philippines, and Wong Aung, a Burmese activist.

Redempto Santander Parafina ("Dondon") is the Network Coordinator of the Affiliated Network for Social Accountability in East Asia and the Pacific (ANSA-EAP), a regional program of the Ateneo School of Government and the World Bank Institute. His work covers Cambodia, Indonesia, Mongolia, and the Philippines. His work on social accountability is advancing ideas and practices in various fields, particularly procurement, ICT, youth involvement, and the education, health, and public works sectors. He is currently spearheading an education initiative called Check My School, a blended online and offline platform for information access and citizen feedback.

Prior to joining ANSA-EAP, Dondon spent five years as the coordinator of Government Watch, or G-Watch, an anti-corruption program at the Ateneo School of Government in the Philippines. While there, he coordinated various citizen participation initiatives, including nationwide programs monitoring textbook procurement and delivery and school building construction.

Dondon has been active with the Coalition Against Corruption, the Transparency and Accountability Network, DPWH's Integrity Development Committee, the Procurement Transparency Group, and several youth groups including the Boy-scouts and Rotary Youth. He answered a few questions about his work.

Where did the idea for CheckMySchool come from?

I conceptualized and designed the Check My School initiative based on my relatively long experience in monitoring the education sector in the Philippines. Many of our initiatives monitor individual items (e.g. textbooks, school buildings) and particular procedural concerns like procurement. I felt the need for Check My School to provide a more comprehensive look at the education services and hopefully link them with the higher development outcome of learning. So the initiative covers various info sets, such as enrollment, personnel (teaching and non-teaching, rooms (academic and non-academic), textbooks, seats, computers, toilets, budget, and national test results. The other trigger for introducing the Check My School is to take advantage of technology. There are now 27 million Filipino Facebook users and we also wanted to tap into the civic energies of these netizens.

What kind of impact has your work had?

After one year of implementation, we made some impact in issue resolution through very quick actions on practical issues that were submitted through the platform. There's a case of classroom repair worth P4.8 million (US$113k)  that was continued immediately because of CMS feedback. Textbooks were also replenished   toilets were renovated, and another toilet was donated by alumni group in direct response to CMS report. 

I think the other impact is that we are now starting to replicate the initiative. We have started the south-south knowledge exchange with Indonesia for their adaptation of Check My School. Other countries also expressed interest, like Kenya, Moldova and Papua New Guinea.

Wong Aung is the International Campaign Adviser at the Shwe Gas Movement in Burma. The movement seeks to raise awareness about the social, economic, and environmental impacts of the Shwe Gas Project in and outside of Burma through first hand research and community organization. The Shwe Gas Project involves the exploitation of underwater natural gas deposits off the coast of western Burma's Arakan State. Burma's military junta and a consortium of Indian and Korean corporations made a deal to explore and develop these deposits. These fields are expected to hold one of the largest gas yields in Southeast Asia and could represent the Burmese government's largest single source of income.
What kind of communities do you work with and what does your day to day work entail?

In his role as International Campaign advisor Wong Aung works in exile to bring the voices of project affected communities to the regional and international level, as well as back into Burma through advocacy to political actors and mainstream Burmese.

What would you like conference attendees to understand about the Shwe Gas Project?

The Shwe Gas Project is a massive resource extraction and infrastructure development which has been planned and implemented by the former military junta (and their corporate partners) with absolutely no input from or thought for the local people. The project will generate huge revenues (US$29 billion over 30 years) for the Burmese state but under the current system there is no transparency in how these revenues are spent. The Shwe Gas Movement is demanding the project to be suspended until community rights and the environment are protected, affected peoples share in benefit ,  and transparency and accountability mechanisms are in place.

What's the best place to go to find out more about your work and other transparency initiatives in Burma?

Visit www.shwe.org  and www.earthright.org to find out more about the work of the Shwe Gas Movement as well as Extractive sectors transparency and justice in Burma.

Join us at TransparencyCamp April 28th and 29th just outside of Washington, DC to meet Maria, Marko and other folks -- inside and out of government -- who are working to making our government more open, accountable, and transparent. Register today at http://transparencycamp.org -- and hurry! Space is limited.

Guess Who's Coming to TCamp12: The Marko Rakar and Maria Baron Edition

"Guess Who's Coming to TCamp12" is a mini-series we started to introduce some of the faces that you'll see at TCamp, something we hope will be useful to attendees and non-attendees alike. Last week we highlighted Beth Sebian, Matej Kurian, Michael Mulley, and three awesome Transparency Camp Scholars. We're kicking this week off with Maria Baron, out of Argentina, and Marko Rakar, from Croatia.

Maria Baron is the Executive Director at Fundacion Directorio Legislativo, a nonpartisan organization in Argentina that promotes the strengthening of legislative branches of government and the consolidation of the democratic system through dialogue, transparency, and access to public information. Maria has a Master's degree in International Relations from Bologna University, Italy and is a Ph.D. candidate in Political Science at the National University of San Martin, Argentina. She is also a journalist and has worked as numerous organizations in Argentina and abroad that work to reduce corruption and enforce ethical behavior. A Fulbright-APSA Congressional Fellow, she has published seven editions of Directorio Legislativo: Who are our legislators and how they represent us, in addition to numerous other publications about legislative transparency. She took the time to answer a few questions about her work and transparency in Argentina.

 Can you tell us how you initially got involved in legislative transparency?

In 1997 I interned at a Washington DC based organization called Witness for Peace that worked to promote changes in the US forign policy and the international institutions, towards some countries in Latin America. We went to Congress talk to members about the situation in the region. So they provided me with a little book with information on who's who in Congress, members' bio, staff information, and so on. When I came back to Argentina I decided to replicate that initiative in my own Congress when at the time Congress' websites only contained the list of the members. And surprisingly, one member appeared both as a representative and as a Senator! I had no organization to back me, so I started fundraising by myself. I found one potential donor that told me, "if you can get all the members to agree to give you all the information, I'll pay for half of the printing".I worked for 8 months non stop every day. And the last two months I slept one every to nights. I gathered all the information and used my savings to pay for the other half.  When the book was printed I put them in a big back pack and knocked on every door on every office in Congress. I sold all of them. That's how I started.

What kind of effects has the publication of Directorio Legislativo had?

* CULTURE of SECRECY: We have battled for legislators' finantial statements to become public. We created a volunteer network of 100 to call senators and in four months the president of the senate issued an internal resolution to allow for the publicity. In the lower house we litigated and the issue went up to the Supreme Court. We have litigated against Congress four other times on access to public documents and won them all.

What's the best place to go to find out more about your work and other transparency initiatives in Argentina?

The best place is our website www.directoriolegislativo.org. We also coordinate the Latin American Network for legislative transparency www.transparencialegislativa.org. And there are other organizations in Argentina that work on transparency and have a lot of work done: Asociación por los Derechos Civiles www.adc.org.ar Poder Ciudadano www.poderciudadano.org Cippec www.cippec.org

 

Marko Rakar is one of Croatia's leading political bloggers and transparency activists. He was recently in the news for publishing a massive, easily searchable database of all public procurement data for government spending in Croatia dating back to July 1, 2009. His NGO, vjetrenjaca.org (Windmill), has been dubbed the Croation "wikileaks". He has a history of exposing fraud and abuse in the Croatian political system. In 2009 he published a searchable database of Croatian voters, shining a light on the fact that there are more registered voters than citizens in the country. He was kind enough to answer a couple of questions about his work.
What's your relationship been like with the Croatian government? 

We had a change of government in late december last year and while previous one was actively harrasing me (including arresting at one point) this one actually asked for a number of inputs from me on different subjects; in the last few months I have been hired on some government data projects, I was also choosen to be one of the participants in Openg Government Partnership steering committe (for croatian "chapter" of OGP). It is far early to tell how will this develop or if we will have some results to show, but with new government it is a completely different (and so far positive) story.

What work of yours do you think has had the most impact?

as for the impact; we have done a number of different projects, some of those were clearly with educational value (for example visualisation of croatian state budget, or "state budget calculator" which allowed anyone to create their own version of state budget) and they were all very successfull and seen and used by hundreds of thousands of people (in a country of 4.5 mil people), we have done some actions which might be characterized as investigative journalism although they are also based on collecting and processing data - few weeks back we have published (so far secret) intinerary of governments plane which we reconstructed from (foreign) public sources. But the largest impact was voters list project simply because it affects everyone in the country and now everyone knows how the elections are manipulated and it is only a question of how to resolve this issue (which is not so simple). Our latest project with procurement was top story of the week in Croatia and we got unbeliveable press time for it, but it is too early to tell what will the true effect be in the future, but we know for a fact that journalists AND public prosecutors office use it on a daily basis.

Join us at TransparencyCamp April 28th and 29th just outside of Washington, DC to meet Maria, Marko and other folks -- inside and out of government -- who are working to making our government more open, accountable, and transparent. Register today at http://transparencycamp.org -- and hurry! Space is limited.

Guess Who's Coming to TCamp12: The Michael Mulley Edition

"Guess Who's Coming to TCamp12" is a mini-series we started to introduce some of the faces that you'll see at TCamp, something we hope will be useful to attendees and non-attendees alike. So far this week we've highlighted Ohio advocate Beth Sebian, Transparency International Slovakia's Matej Kurian, and three awesome Transparency Camp Scholars. Today, we are happy to present Michael Mulley, who is working to open up the Canadian Parliament.

Michael founded Open Parliament Canada on the premise that "Parliament's goings-on are important." The goal is to make public Parliamentary information "meaningfully public," meaning easily shareable and machine-readable. Mulley Recently moved to Montreal from New York "in search of better bagels". In New York he studied computer science and linguistics while working in tech consulting. He currently operates a web development operation called Only Connect.

Michael answered some questions on his passion for open government, challenges he faced while building Open Parliament Canada, and the response his site as received. He also shared some advice for others thinking of setting up a parliamentary monitoring site in their own country

 

 

Has the Canadian Parliament noticed your work? Do you have any interaction with them?

Parliament as an institution has certainly noticed my work, and I've had some friendly and useful conversations with IT staff. I won't pretend that I haven't encountered lots of bureaucratic delay and frustration, and I can't claim that I led to their opening data, but I since I started Open Parliament our House of Commons -- whose internal data architecture is actually surprisingly good -- has started releasing a fair of bit of data in XML.

Lots of Members of Parliament use the site too. They're generally happy with it -- after all, my goal is to get people to listen to what they're saying -- and I've useful discussions with a few.

You list some other websites as inspirations for OpenParliament.ca. What inspired you to be inspired by them? What made you want to get involved in open government?

Honestly? An engineer's frustration at things that are more complicated than they should be. I saw TheyWorkForYou and thought it was just a self-evidently good idea. It didn't exist in Canada yet -- there was a nice vote-tracking site, but nothing with TheyWorkForYou's focus on user-friendliness and MPs' actual words -- and I thought it should, so I made it.

Were there any particularly interesting challenges you faced in gathering the information you present on the website? Is it entirely automated?

It's entirely automated (though that no-cell-coverage camping trip two weeks after launch was still pretty stressful!). I now have access to a bunch of XML feeds, but when I launched a couple of years ago everything was web scrapers, which are a source of constant boring challenges that make you realize that virtually every initial assumption you made was incorrect. For example, I assumed -- quite reasonably! -- that times were on a 24-hour clock. Turns out that when a session extends past midnight, the clock just keeps ticking past 24: if MPs have to work late, so does time. We had a filibuster recently which took us past 80 o'clock.

More fun has been trying to find ways to analyze the information -- finding haiku hidden in the debates, using simple Bayesian stats to find out which words and phrases our different parties are fond of.

You described the Canadian open data portal as having "relatively little in the way of visible results, a pale shadow of...the US and the UK". What's the best thing the Canadian government could do for its open data program? Give it resources and dedicated team with a mandate to both educate within the government and interact with the outside world.

The open data program was revealed fully-formed, with a site full of PR fluff and a license that barred using data in any way that might make the government look bad. The license was fixed soon enough, and a few promising things have come out of the program. But the pattern of changes coming only via ministerial press releases has continued. I have no idea who's actually running the open data program or what their plans are, and the combination of a not-particularly-useful site and a complete lack of outreach or communication makes me worry that our government will be able to say "Nobody used our open data, so we eliminated the program for cost savings."

Is there any advice you'd give to people thinking of doing a parliamentary monitoring website in their own country? Look at  similar sites elsewhere and read mySociety's brilliant guide on creating such a site.

Parliamentary-monitoring sites as a genre are about eight years old now, and have reached the point where most developed countries -- and several developing ones! -- have a good, widely-used site. I think lots of us are interested in ways of reusing each others' work, and that's one of the things I'm really looking forward to discussing at TCamp.

And, finally: fun and informality are powerful weapons that you can use and your government largely can't. This doesn't mean cheapening politics or introducing bias; it means making things user-friendly and enjoying yourself.

Join us at TransparencyCamp April 28th and 29th just outside of Washington, DC to meet Michael and other folks -- inside and out of government -- who are working to making our government more open, accountable, and transparent. Register today at http://transparencycamp.org -- and hurry! Space is limited.

Guess Who's Coming to TCamp12: The TCamp Scholars Edition

Guess Who’s Coming to TCamp12” is an mini-series we started to introduce some of the faces you'll see at TCamp, something we hope will be useful to attendees and non-attendees alike. This week, we’ve highlighted Ohio advocate, Beth Sebian and Transparency International Slovakia’s Matej Kurian. Today, we bring you a few of the TransparencyCamp Scholars.

The TransparencyCamp Scholarship program was started as part of our 2011 Camp. It’s an application driven process that provides partial travel stipends for folks from around the country (and the world) to come to Washington, DC to join us for Camp. This year, we accepted 10 Scholars -- a mix of long-time and first-time opengov activists, developers, journalists, and thinkers. Like last year, we’ll do a round-up of the full list of Scholars post-Camp, but first, here’s a sneak peek at these awesome peeps:

Yvette Cabrera

Berkeley, California


Currently, Yvette interns with the Oakland Food Policy Council, blogging on topics like aquaponics, food policy, interesting events, and supporting the Council’s efforts in building partnerships and identifying key regional allies and decision-makers.

Think food policy has nothing to do with transparency? Think again. From the data held by government agencies like EPA, FDA, and USDA to having access to the meetings and records of government boards charged with setting local policy, those invested in food distribution, quality, and regulation have plenty of concerns that overlap with us transparency geeks. When asked why Yvette in particular wants to come to TransparencyCamp, she answers:

I want to learn about building transparency in the government on a national and local level in order to create a food system that is healthy and just for everybody. Transparency to me means efficiency and increased citizen participation in decision-making, and I think that is the only logical way to improving the current food system that we have here in the U.S.
 

Nuno Moniz

Porto, Portugal


Nuno is a civic hacker whose interests in open civic data have led him to work on a variety of different projects. His first was to open up the Portuguese State Budget, making it available in JSON. Using this information and the Open Knowledge Foundation’s “Bubble Tree” (a way to display interactive visualizations of spending data), Nuno went on to create visualizations for both the Portuguese 2012 State Budget and the Azorean 2012 Autonomous Region Budget.

Currently, Nuno is sinking his teeth into the meat of Portuguese legislative data. “For the last 6 months (and for the next 6 months) I've been working on my Master's Thesis: in a nutshell, I'm transforming three years of Portuguese Legislation's .PDFs into open data.” Knowing that the TransparencyCamp community is full of civic hackers from all over the world who work on legislative data and others who can provide help insight on the use and governing of this information, Nuno hopes to lead a session at TCamp about his work:

"Opening the Portuguese Legislation: What useful information lies in the documents?" was the name of the session I proposed [on Google Moderator]. As I said before, I've been working for the last months on an open legislation project. The objective of this session, besides sharing the project, its development status, and the "bumps along the way", would be to think what more information lies in the legislation texts. Which and what entities are present in those texts? People, Organizations? What do we gain by processing, discovering and interlinking that information and not just publishing its text? How could mapping that information add more transparency in the legislative process? Questions for the debate, and at the end, I hope, new and better ideas. :)

Dan Schneiderman

Rochester, New York


Dan says that he got into the world of opengov-ery because of his “passion for playing with big data and seeing how it can be used to help people.” Building off his experience at TCamp 2011, he hopes that TCamp 2012 will be an opportunity to explore new possibilities for future projects and how he can become involved with the transparency movement after he graduates.

To kick off this exploration, Dan plans to brings to TCamp the fruits of an independent study of government data he’s been working on using the javascript library D3. His study mashes up information from Data.gov, the Open States API, and a large collection (340,000!) of tweets relating to Super Tuesday that he scraped. Want to learn more? Find Dan’s session at TransparencyCamp.

Join us at TransparencyCamp April 28th and 29th just outside of Washington, DC to meet Matej and other folks -- inside and out of government -- who are working to making our government more open, accountable, and transparent. Register today at http://transparencycamp.org -- and hurry! Space is limited.

Guess Who's Coming to TCamp12: The Matej Kurian Edition

Let the countdown to TransparencyCamp 2012 continue with another edition of "Guess Who's Coming to TCamp12". Through this mini-series we will introduce some of the faces you'll see at TCamp, something we hope will be helpful for attendees and a  provide a neat window into the festivities for those who can't make it. Yesterday, we introduced you to Beth Sebian from Cleveland, Ohio. Today we are excited to highlight one of our international attendees!

Matej Kurian is the program coordinator at Transparency International Slovakia. One of his recent projects is Open Contract Portal, developed by TI Slovakia and Fair Play Alliance, aimed at increasing transparency and accountability in public spending by empowering citizens. Matej has an MA in Political Science from the Central European University. His self-reported specialties include accountability, transparency, corruption, open government, data-driven projects, and non-democratic regimes. Before joining Transparency International Slovakia, he had internships at A.T. Kearney and the Slovak Governance Institute.

TI Slovakia's procurement and contracts websites are among the best in the world. Matej was kind enough to answer a few questions about their features, design, and impact:

 What kinds of features do your procurement and contracts sites have that others don't?

Most of the procurement sites provide little more than a sophisticated list of contracts. We're trying to add an analytical layer to data, essentially empowering users to run their own tests.  Open Contract Portal is to my best knowledge first of its kind in the world, I am not aware of any other country that mandates publishing of public contracts online.  

What made these sites possible, from the government and from TI Slovakia?

Government did not play any role in the projects, save for the regulatory framework that mandates that original data that we scrape have to published. Open Society Foundations funded both of the projects, Siemens Integrity Initiative funded Procurement Portal.  

While TI Slovakia did not have any previous experience with building and managing online portals, our expertise in procurement and data-driven analysis helped in designing the portal.   

Has this had any policy impact, or has it made the impact of procurement policies clearer?

While non-specialist use of the portals is still quite low, specialist groups made use of them. For example, based on the portal data Transparency argued for mandatory use of electronic reverse auctions, or had been able to compare pre-electoral spending of governments. Both of the portals contributed to debate on quality of the public data.

Join us at TransparencyCamp April 28th and 29th just outside of Washington, DC to meet Matej and other folks -- inside and out of government -- who are working to making our government more open, accountable, and transparent. Register today at http://transparencycamp.org -- and hurry! Space is limited.

« Previous
1 2 3