Bulk Data

 

Regulations.gov Continues to Improve, but Still Has Potential for Growth

rdg_shotRecently, the EPA eRulemaking team released a new version of Regulations.gov, a website that tracks the various stages of the rulemaking processes of hundreds of federal agencies, and collects and publishes comments from the public about this rulemaking. We’ve written about Regulations.gov before, and continue to be impressed with the site’s progress in making the sometimes-daunting intricacies of federal regulations more approachable to members of the general public.

This release brings several new features that further this goal. Styling on many document pages has been significantly improved, making it much easier to read both rule and comment text. The presentation of metadata has also been made cleaner, so researchers can more easily find identifiers that help them connect a particular rule to related documents on other websites, such as FederalRegister.gov or RegInfo.gov. New panes have also been added to help users understand the public participation that has occurred so far in a given rulemaking, and to more easily recognize opportunities for further participation.

Of course, since last year’s release of the Regulations.gov API, Regulations.gov is more than just an informational website; it has also become a data provider that now facilitates a variety of third-party participation and analysis tools, as their Developers page now highlights. One such tool is Sunlight’s recently-released Docket Wrench, which uses Regulations.gov data to explore questions of corporate and public influence in the federal regulatory process. Docket Wrench evolved from two years’ worth of effort exploring the possibilities of analysis on federal regulatory comment data, and we believe the time we’ve spent building it has given us a unique perspective on the avenues of research this data makes available, as well as the opportunities for further growth and improvement in regulatory comment data going forward.

The team behind Regulations.gov deserves enormous credit for the progress they’ve made, but there remains much work to be done to give the public a complete, accessible and useful path into the federal regulatory process.

Read more

Is the GPO a Digital Printer or a Digital Publisher?

The tension between the Government Printing Office's traditional role as a printing operation and its future as a publisher of digital government information was apparent at a meeting of the House Appropriations Committee's Legislative Branch Subcommittee last week.

In her testimony, acting Public Printer Davita Vance-Cooks stressed the GPO's efforts to transition to the digital age and acknowledged that the agency's role has evolved to that of a publishing operation. Unfortunately, the GPO has often failed to take steps that would allow it to fully embrace that role and ensure its future as an essential source of information.

Read more

House Convenes Second Public Meeting on Legislative Bulk Data

On January 30th, the House of Representatives held a public meeting on its efforts to release more legislative information to the public in ways that facilitate its reuse. This was the second meeting hosted by the Bulk Data Task Force where members of the public were included; it began privately meeting in September 2012. (Sunlight and others made a presentation at a meeting, in October, on providing bulk access to legislative data.) This public meeting, organized by the Clerk's office, is a welcome manifestation of the consensus of political leaders of both parties in the House that now is the time to push Congress' legislative information sharing technology into the 21st century. In other words, it's time to open up Congress.

The meeting featured three presentations on ongoing initiatives, allowed for robust Q&A, and highlighted improvements expected to be rolled out of the next few months. In addition, the House recorded the presentations and has made the video available to the public. The ongoing initiatives are the release of bill text bulk data by GPO, the addition of committee information for docs.house.gov, and the release on floor summary bulk data. It's expected that these public meetings will continue at least as frequently as once per quarter, or more often when prompted by new releases of information.

As part of the introductory remarks, the House's Deputy Clerk explained that a report had been generated by the Task Force at the end of the 112th Congress on bulk access to legislative data and was submitted to the House Legislative Branch Appropriations Subcommittee. It's likely that the report's recommendations will become public as part of the committee's hearings on the FY 2014 Appropriations Bill, at which time the public should have an opportunity to comment.

Read more

Access to Legislation Gets Better, Promise of More to Come

Earlier today, Speaker Boehner and Majority Leader Cantor and the Government Printing Office announced an improvement in how legislation is made publicly available. Starting in the 113th Congress, GPO will make all bills available for bulk download in XML format. While this doesn't change much from a technological perspective, it does mark a significant change from a policy perspective.

Read more

Keeping Authentication Simple

The point of publishing bulk data is so it can be reused as widely as possible. This is particularly true for government data, which belongs to the public.

Government agencies can sometimes also be concerned with ensuring the authenticity of their legal information - especially when the data might be seen as an official source. It breaks down into two major concerns: integrity (ensuring the text is accurate), and origin (proving it's official). A lot of people are used to the "wax seal" model of authenticity - the experience of opening a PDF and seeing that the document is signed and official. This model quickly breaks down for distributing bulk data.

The goals of ease of reuse and authentication are frequently presented as being in tension, but that tension is just as frequently overstated. There are straightforward approaches to guaranteeing authenticity of bulk data that do not encumber reuse.

Read more

Learning how to navigate Congress.gov

The new and much improved location for Congressional information, beta.congress.gov, has plenty of resources to offer users. Now the Library of Congress (LOC) is offering webinars and in-person training to help users navigate the expanding website. We applaud LOC for providing a variety of training opportunities for those seeking a better understanding of the information available.

Read more

Looking Forward to the THOMAS Beta Website

In the near future, Congress is expected to release a major upgrade to its aging legislative information website THOMAS. The long-overdue update is part of a much larger effort to "enhance the effectiveness of mission-critical systems," a response to significant public and internal pressure to improve congressional efficiency and transparency. The launch of "THOMAS Beta" is the first step towards developing what the Library of Congress describes as a completely "modern legislative information system" that will replace THOMAS and Congress' more sophisticated internal legislative tracking website "LIS" in FY 2014. Both THOMAS and LIS will stay online alongside the beta website for several years.

While THOMAS Beta has been shown to stakeholders inside Congress, as far as I am aware there has been no formal engagement process with the public to identify specifications, discuss wireframes, or generally make sure the site meets the public's needs. It is expected that such conversations will occur after the launch as the site is built out. My understanding is that the majority of the work on THOMAS Beta thus far has been to modernize the underlying information architecture, with many of the new bells and whistles and apps to be rolled out over time.

Two years ago, the Sunlight Foundation gathered ideas from the community for upgrading THOMAS, and in July 2010 we highlighted three additional ideas, but the primary recommendation continues to be requiring all of the underlying information behind THOMAS to be made available to the public "in bulk."  In other words, all of the legislative information behind THOMAS and LIS should be made available in a way that's easy for machines to understand so that developers can more easy and reliably build tools like OpenCongress, GovTrack, the Congress Android App, and Scout that re-use information in clever new ways.

The House leadership has endorsed the idea of bulk access and established a nascent bulk data task force, but not everyone inside Congress is fully on board with the effort. From an external perspective, we have requested that public stakeholders be included on the bulk data task force, which is being coordinated by the House Clerk's office. Along similar lines, for several years we and others have asked the Library of Congress to form an advisory group on THOMAS (as it is responsible for overseeing THOMAS), and we hope the impending launch of THOMAS Beta will make this a reality.

It's important to understand the context in which the THOMAS Beta rolls out. In the last year, the House of Representatives released an innovative legislative information portal, docs.house.gov, which provides bulk access to House data in a way that is more timely than THOMAS, and will soon provide materials from House committees in addition to documents concerning floor proceedings. The House also held three conferences on legislative transparency and created the bulk data task force. In addition, more than 85 organizations will release a declaration on parliamentary openness in Rome this Saturday at the World e-Parliament Conference that endorses providing information in open and structured formats. And the free, open-source parliamentary information system-in-a-box Bungeni is continuing to gain steam around the world.

We are eagerly looking forward to the launch of THOMAS Beta, and will pay particularly close attention to whether the Library of Congress, which has general responsibility for the project, has built a system that uses modern techniques -- such as bulk access and APIs -- to make information available to the public.

Media Spotlight on Congress Stalling Open Access to Legislation

The media's magnifying glass is concentrating attention on actions by the House Appropriations Committee that could stall progress on the public's access to legislative information. The Sunlight Foundation and our allies continue to push Congress to stop dragging their feet and join the 21st century by allowing developers access to open legislative data to build the tools to keep citizens informed about what their government is doing.

Please find and call your Representative at 202-224-3121 or write to reinforce the American public's hunger to read and follow legislation. Here are some excerpts from recent media coverage on this important transparency issue:

Roll Call reports on Republican House leadership's strong support for bulk access and quotes Rep. Crenshaw misunderstanding the issue of authentication:

“The Speaker pledged to make the 112th Congress the most open and transparent Congress in history and to make legislative data available online and in bulk,” said Michael Steel, spokesman for Speaker John Boehner (R-Ohio). “He continues to look for the best way to do that.”

“Facilitating public access to bulk legislative data ... has been and will continue to be a priority for this committee,” echoed Salley Wood, spokeswoman for the House Administration Committee. But lawmakers’ hands would be tied until a task force could be convened and report back on its findings, according to the House report language.

“We wanted to create a system where we could have this available but also make sure we protect the authenticity and integrity of all this information,” said Rep. Ander Crenshaw (R-Fla.), chairman of the Appropriations Subcommittee on the Legislative Branch.
The Washington Examiner addresses the committee's confusion over how citizens use and should access government information:
Folks with computers -- notably, professional and citizen journalists -- would be able to take information about massive numbers of bills and analyze them in myriad ways -- if Congress would allow such information to be downloaded from THOMAS in bulk.

It won't. And, according to a new draft report from the House Appropriations Committee, it won't be allowing bulk data downloads from THOMAS anytime soon.

Instead of taking a step towards greater transparency, the committee got hung up on whether people would know if the data they're seeing on the Internet were accurate and really from Congress -- "authentication," they call it.
FierceGovernment notes the lack of a deadline for decision making:
The report retains language decried by transparency opponents that would indefinitely postpone public bulk downloads of legislative information in XML. Good government groups, including the Sunlight Foundation, have pressed for the Library of Congress to release the bulk data used to track legislative developments in the library's THOMAS website, arguing that they could do a better job of presenting information.
TechPresident reports on the frustration among transparency advocates:
Open government advocates are up in arms over what appears to be another attempt by government bureaucrats to stall the move to enable bulk data downloads of legislative information online.

Slashdot opens the issue for conversation to their community:

The House Appropriations Committee is considering a draft report that would forbid the Library of Congress to allow bulk downloads of bills pending before Congress. The Library of Congress currently has an online database called THOMAS (for Thomas Jefferson) that allows people to look up bills pending before Congress. The problem is that THOMAS is somewhat clunky and it is difficult to extract data from it. This draft report would forbid the Library of Congress from modernizing THOMAS until a task force reports back. I am pretty sure that the majority of people on Slashdot agree that being able to better understand how the various bills being considered by Congress interact would be good for this country.

Legal Informatics also has a nice collection of blog posts on this issue.

Follow the latest developments here.

#FreeTHOMAS

Does information about legislation belong to Congress or to the American people? This basic question is at the heart of a fight over how Congress releases data about what it does. Americans increasingly use the Internet to make sense of the world around them, and open data opens up Congress in a way that's never been possible before.

In the pre-YouTube pre-iPhone pre-Amazon days, Congress built a website -- THOMAS -- to let citizens follow legislation from home. THOMAS was revolutionary ... in 1995. But the Internet continued to develop, becoming more sophisticated and interactive, allowing web developers to easily share the data behind their websites with others. It's why we can book flights on Travelocity, check the weather on our phones, and follow legislation on OpenCongress and GovTrack.

Unlike Travelocity and the National Weather Service, Congress doesn't share the data behind THOMAS with anyone. Instead, web developers must reverse-engineer the website to transmute its pages into usable data, like assembling a puzzle from thousands of ragged pieces without a picture on the box as a guide. This slow, difficult, and time-consuming process isn't perfect, but it's responsible for how most Americans follow what's happening in Congress.

The better approach is for Congress to publish the data behind THOMAS. Government regularly does this elsewhere, and "bulk data" is responsible for clever new uses of information developed by citizens, journalists, and even the government itself.

In upcoming days, the House is likely to pass legislative language that pays lip service to releasing THOMAS data while putting the idea in a deep freeze. This would be a disaster. But it's not too late. Tell your representative that you want Congress to publish legislative data now.

PS. For more information and the latest developments, go here