Sunlight Foundation

West Virginia mine kept separate records for regulators

Last year, we wrote about the fatal Upper Big Branch mine explosion that killed 29 people in West Virginia. You can read all about it here, here and here.

Although the story of the Massey Energy-owned mine is controversial for many reasons, we were most concerned with the issues that related to the mine violations data collected and processed by the Mine Safety and Health Administration (MSHA), a government agency. (See what we mean in the video at the end of this post.)

Unfortunately, recent events reveal that there are greater transparency issues afoot: After West Virginian Sen. Jay Rockefeller called for Senate hearings on the Upper Big Branch Mine disaster, a subsequent investigation revealed that Massey Energy kept two separate records - a more accurate one for themselves and a cleaned up version for the federal regulators.

In a report by the Washington Post, Kevin Stricklin, a MSHA official said;

Managers were aware that chronic hazardous conditions were not recorded. What they’re required to do is list all the hazards in the official book. This is the book that not only MSHA looks at ... but it should be the book that miners and other people who are going into the mine should look at so they would be aware of any conditions in the mine before they go in.
Massey sold the Upper Big Branch mine to Alpha Natural Resources in June 2011. Luckily, the new owners are much more supportive of the ongoing investigations.

Though it’s still early in the process, the results of this investigation reaffirm our call to make public information more available and more searchable, so that we can hold our government more accountable. Indeed, the previous findings by the MSHA indicated that the government officials were quick to blame the blast on natural factors that were triggered by methane gas and coal dust. Now, the federal regulators realize that the mine owners sent in fake information that did not represent what was actually going on at the mine.

The Upper Big Branch Mine disaster is a clear example of how transparency is a two-way street: This incident could perhaps have been avoided if both the regulators and the mine owners did their part in providing the kind of information that would have saved the miners lives.

What Do You Want to Get Out of TransparencyCamp?

The open government movement (like most of the online world) is obsessed with “unconferences” -- meet-ups, of sorts, where the participants determine the content of and lead sessions around a pre-determined theme. When done right, it can be a powerful tool for building community.

Sunlight held its first unconference, TransparencyCamp, three years ago in an effort to get the diverse groups of people thinking about and working for government transparency together. From the conversations and problem-solving that took place there, we’ve seen the emergence of some incredible initiatives - take, for example, CityCamp.

This year, we want to go further. We want to focus on government transparency not just on Capitol Hill, but where you live. So, we need your help.

Please take a minute to fill out this survey and let us know what you want to get out of TransparencyCamp.

Never been to a TransparencyCamp or even an unconference before? Not a problem. We’re still interested in knowing what open government issues interest you, what you would want to get out of this sort of experience and how we can improve on the experiences you’ve had at similar events in the past.

TransparencyCamp 2011 will be open to people from across the country. We’re relying on your input to make it the best it can be.

http://transparencycamp.org/survey

Thanks for your help.

Tools for Transparency: Open Atrium

Today, our guest post is written by Joshua Gay, a programmer, activist, and community organizer whose interests revolve around technology, government, education, and computer user freedom.

My personal interest in the Open Atrium project came about this past fall when I began volunteering to help with the Public Equals Online Wiki. The so-called "PEO" Wiki has a lot of potential for being a good place to coordinate and collaborate on state and national transparency initiatives and projects. However, the software it is built-upon, MediaWiki, needs to be highly customised in order to make it a compelling platform for a community to start using. In my efforts to customize and improve the wiki, I have been using the features and design of Open Atrium as a sort of roadmap for improving the wiki in hopes that I can make it a more useful, powerful, and compelling tool for the transparency community.

The Open Atrium project describes itself as a "part intranet, part do-it-yourself project with a kick of open source hotness," and it certainly is one of the hottest Drupal-based projects out there. Its feature list is impressive, and for many organisations or web-based communities, I could imagine it becoming the primary tool for both project management and development. Here is a quick snapshot of it's six biggest features:

Case Tracker - Open Atrium is designed around the principle of users and groups. Every group on the system can create an unlimited number of projects within the Case Tracker, and within each project you can create to-do items. Each item can be organized and prioritized according to categories or milestones, assigned to group members, and discussions and progress notifications on to-do items can be made through a nested commenting system.

Calendar - Although not feature rich as Google calendar, Open Atrium's calendar does present events in a similar, colorful fashion, supports single or multiday features, and syncs with calendars that support iCal.

Blog - This blog contains all of the basic features you would expect with nested commenting, file attachments, and granular notification system. But, what I think makes this blogging system unique is that it is integrated into the system, and therefore, blog posts can be used as a way to discuss projects and share ideas with other members of your group and community as well as with the outside world.

Shoutbox - This Twitter-like update system is a great way to share quick updates with your group members. What I like best about the Shoutbox is that it integrates a social element into the rest of the workflow.

Documents - This is a simple, but nice collaborative document editor that supports: attachments, a revision system with a nice way to compare different versions, and a nice built print function that allows you to export and share the final product.

Dashboard - The Dashboard is where the entire system comes together and gives you a snapshot of all the activity happening across your groups. It is designed around "widgets" (like iGoogle), where users can add, remove, or arrange the widgets on the dashboard however they like. And, of course, it includes a Twitter-feed widget.

One exciting aspect about the design of Open Atrium is that its developers have designed it around the principle of features being designed like "plug-ins." Hopefully, as adoption grows, we will also a growing list of optional features that you can add to your own custom instance of Open Atrium.

I believe that Open Atrium is a powerful tool for transparency, not only for its potential use by government agencies (which would be amazing -- imagine a legislative feature!), but also an important tool for the transparency movement.

Improvements Needed For High Value Datasets On Data.gov

This morning a number of organizations -- POGO, OMB Watch, CREW, National Security Archive, the Center for Democracy and Technology  and the Open The Government coalition-- and Sunlight sent a letter to Vivek Kundra, Federal CIO, about improvements needed to the release of High Value Datasets on Data.gov. Here are the core recommendations included. Please tell us what you think in the comments below.

As advocates for government openness, we support the Administration’s efforts to provide the public with access to information through Data.gov. We are eager to work with you to ensure the success of Data.gov and, in that spirit, write to raise our concerns with the datasets submitted by agencies to fulfill their requirement under the Open Government Directive to post three high value datasets by January 22, and to offer constructive suggestions for improving their usefulness.

As an overall recommendation, we urge you to add public representatives to the Open Government Initiative interagency working committee and ask the committee to address the problems and recommendations identified below.

Release Format and Usability by the Public

We understand one of the primary purposes of Data.gov is to enable the technology community and transparency advocates to most effectively use the data to make a direct impact on the daily lives of the American people. The format of the data plays a key role in its usability; many within the community of advocates who re-use and repackage government data would prefer data in CSV format, rather than the XML format in which many of the posted databases are provided. Accordingly, we recommend that you strike an appropriate balance between formats (such as XML) that serve the coding community and web-based presentations by agencies that can be used and understood by the general public.

In addition, some of the currently posted files are quite large, ranging upward to several hundred megabytes. Their large size undermines their usefulness for most people or organizations. The large number of currently posted datasets also makes it difficult to find a particular database of interest. We therefore recommend that if a Data.gov dataset is available from an agency through a web-based interface, Data.gov link to that interface on the dataset's Data.gov landing page. For a consumer looking for information on a car seat, for example, it would be far easier to search the Department of Transportation's online database rather than scrolling through screen after screen of raw data in XML format. Additionally, as agencies continue to post datasets to Data.gov, efforts should be made to identify those of greatest public interest that lack such interfaces and develop web interfaces that allow the data to be explored online.

Further, while we agree there is value in aggregating government data in a single site, it is questionable how much the collocation of the currently posted information on Data.gov actually benefits the public. The site is not searchable by topic and does not provide any way to bring together data from different sources on similar topics.

As an enhancement to the organization of the site, we recommend that you use tagging or metadata to enable the public to bring together information on a topic. The thesaurus that USA.gov uses provides a useful example of the needed vocabulary.

Value of Data

The release of the datasets also has prompted discussions about the value and the quality of the released data, and the additional value provided by access to existing data in a new format. We believe repackaging old information is of marginal value, yet that is what many agencies have done with their recent postings on Data.gov. According to the Sunlight Foundation, of 58 datasets posted by major agencies, only 16 were previously unavailable in some format online. This leaves the impression that agencies posted easily available data, the proverbial low-hanging fruit, rather than seriously considering which of their datasets truly are of high value. While these initial postings can be considered a test run, more attention needs to be directed toward ensuring the overall quality and usefulness of the data.

In addition, sustained attention should be paid to the possibility of making some of the datasets available as feeds that are constantly up to date, rather than as static datasets that are pulled down and then reposted on an occasional basis. We recommend that agencies be required to explain why the data is high value by having them designate which of the “high value criteria” the data meets: information that can be used to increase agency accountability and responsiveness; improve public knowledge of the agency and its operations; further the core mission of the agency; create economic opportunity; or respond to need and demand as identified through public consultation. Similarly, we recommend requiring agencies to indicate whether a high value dataset was previously unavailable, available only with a FOIA request, available only for purchase, or available, but in a less user-friendly format. Going forward, this will make it much easier to track how agencies are complying with the other requirements of the Open Government Directive. While we appreciate the value of data that furthers the mission of an agency, we believe it is equally important to make available to the public data that holds an agency accountable for its policy and spending decisions. We hope to see more datasets of this type available in the near future.

Quality

As is to be expected in efforts of this type, there were a number of glitches--datasets that could not be downloaded or, once downloaded, could not be opened (the Central Contractor Registration FOIA extract from the General Services Administration seems to have caused several users problems). Additionally, some datasets were incomplete (the Hazard Grant Mitigation Program data released by FEMA is missing 23 years of data between 1966 and 1989). Even more troubling, some did not have header rows, and for those that did, their Data.gov pages did not always link to code sheets explaining what those header rows meant. Without this information, the data cannot be used.

We therefore urge the implementation of a responsive feedback mechanism that allows the public to alert an agency that a specific dataset is not working, lacks information, or is missing explanatory material and provides a response to the concerns within a specified time. One way to address this may be to include an agency contact with the ability to resolve any database problems or provide information about the database. The interagency working group could sample the quality of these agency-specific dialogues to ensure that they are having an impact and to develop recommendations on best practices to improve the responsiveness. Additionally, we strongly recommend that all datasets on Data.gov be directly associated with their code sheets.

Finally, we are concerned with the current lack of public notice when data is removed from the site. We respectfully urge you to note all raw tools and data that are removed from Data.gov, and to provide an explanation for their removal.

Many of the concerns outlined above apply across all or many of the agencies’ datasets. Accordingly, we think that standards for handling these types of problems can easily be addressed through the interagency working group and then disseminated amongst the agencies.

Defective by Design?

David Moore at Open Congress has an excellent post up explaining how the current life of a bill in Congress is riddled with disclosure holes. I can't do more than say, go read David's post. Here's some choice graphs:

The reason is that the “Baucus Bill” is only a “mark”, not yet an official Senate bill, which means (to summarize reductively) that the digital text that constitutes the .pdf does not make its way off internal government web servers to the official website of the Library of Congress, THOMAS — and in turn, does not make its way to government transparency web resources such as GovTrack and OpenCongress. Before that happens, this mark of the health care bill needs to be reconciled with other Senate committee versions of the same, which will then be put forward for consideration to the U.S. Senate as a whole. Health care reform is leading news coverage & blog analysis of American politics right now, this is a major document in the mix, and there’s not a widely-recognized, user-friendly resource for online examination by the public at large. You should have better access to this info! You should have — at your fingertips — immediate, unrestricted digital access to the full text of any piece of legislation the very moment it’s released publicly by Congress.

...

The current Congressional process for publishing data is, to borrow a phrase from the Free Software Foundation, Defective By Design. As we see in many proprietary, top-down systems affecting the public interest, it’s insistently closed-off. Congress’ processes for distributing legislative info is fundamentally broken — it could and should relatively easily be fixed, starting now. Whether or not you support the Baucus markup or the House version of the health care reform bill, we hope you agree that the public has a right to read this important iteration & political volley in the process.

Colbert, Open Secrets, Open Data, and Visualizations

Two nights ago, in his The Word segment, Steven Colbert actually used his show to do some investigative work into the money-in-politics connections that might have motivated to turn Rep. Luis Gutierrez' position on pay-day loans from oppose to support slight (read: non-existent) restrictions. Watch it:

The Colbert Report Mon - Thurs 11:30pm / 10:30c
The Word - Have Your Cake And Eat It, Too
colbertnation.com
Colbert Report Full Episodes Political Humor NASA Name Contest
Of course, I can't help but remind readers that the Center for Responsive Politics recently opened up their data--20 years worth--to be mashed, mixed, and visualized. Already we are seeing visualization pop up. Check out these from the University of Michigan.

Also, Sunlight's Chief Evangelist Greg Elin penned a guest column over at ProgrammableWeb about the release of this mountain of data. It's well worth the read.

OpenSecrets Goes OpenData

This is very big news. As of today, Center for Responsive Politics’ site OpenSecrets.org has gone “open data.” For the first time in their 26-year history, CRP has made its most popular data archives (think campaign financing, lobbying, 527 data, etc.) fully available to the public for download for free. They’ve opened up 200 million (yes, that's the right number!)  data records from their archive so that citizens, activists, journalists and anyone else interested in following money in U.S. politics can data dive and rummage around. This means researchers and Web developers can take the standardized and coded money-in-politics data, such as campaign contributions and lobbying expenditures, to create timelines, charts, maps and other illustrations to see more clearly how Washington really works.

Sheila Krumholz, CRP director and long-time colleagues, knows that by putting their data into more hands they will put more eyes on Washington. This, in turn, will engage more Americans in their government, and that will fuel change in how our government functions. We agree which is why Sunlight providing the financing to make this happen. "All these enhancements to OpenSecrets.org are about one thing: showing more people how money's influence on politics affects their lives--and empowering them to do something about it," Sheila said in a statement.

The data is being released under a Creative Commons license.

All of the staff, for its entire history (full disclosure, I was the ED at CRP from 1984-1997) have worked incredibly hard building the group’s long-earned reputation for accuracy and integrity. And now they are giving the public the keys to take government transparency to the next level. This will have a long-term impact, undoubtedly inspiring many effective and creative uses of the data by civic hackers, journalists and bloggers.

Congratulations to Sheila and her team for this momentus step forward. And congrats to all of us for having the wisdom to use it to further tell the story of the role of money as the fuel that drives our politics.