Committee on House Administration

 

GPO is Closing Gap on Public Access to Law at JCP's Direction, But Much Work Remains

The GPO's recent electronic publication of all legislation enacted by Congress from 1951-2009 is noteworthy for several reasons. It makes available nearly 40 years of lawmaking that wasn't previously available online from any official source, narrowing part of a much larger information gap. It meets one of three long-standing directives from Congress's Joint Committee on Printing regarding public access to important legislative information. And it has published the information in a way that provides a platform for third-party providers to cleverly make use of the information. While more work is still needed to make important legislative information available to the public, this online release is a useful step in the right direction.

Narrowing the Gap

In mid-January 2013, GPO published approximately 32,000 individual documents, along with descriptive metadata, including all bills enacted into law, joint concurrent resolutions that passed both chambers of Congress, and presidential proclamations from 1951-2009. The documents have traditionally been published in print in volumes known as the "Statutes at Large," which commonly contain all the materials issued during a calendar year.

The Statutes at Large are literally an official source for federal laws and concurrent resolutions passed by Congress. The Statutes at Large are compilations of "slip laws," bills enacted by both chambers of Congress and signed by the President. By contrast, while many people look to the US Code to find the law, many sections of the Code in actuality are not the "official" law. A special office within the House of Representatives reorganizes the contents of the slip laws thematically into the 50 titles that make up the US Code, but unless that reorganized document (the US Code) is itself passed by Congress and signed into law by the President, it remains an incredibly helpful but ultimately unofficial source for US law. (Only half of the titles of the US Code have been enacted by Congress, and thus have become law themselves.) Moreover, if you want to see the intact text of the legislation as originally passed by Congress -- before it's broken up and scattered throughout the US Code -- the place to look is the Statutes at Large.

In 2011, GPO published 58 volumes of the Statutes at Large, covering 1951-2009, but did not break the volumes down into their constituent documents. Up until that point, the public laws were available as individual documents on THOMAS from 1989 to present as HTML (and PDF in some instances), and from 1789 to 1875 as TIFF (unwieldy image) files from the Library of Congress. Even with this recent release, 76 years of federal law are still unavailable online in any format from any official source; and the files released for the years 1789 to 1875 by the Library of Congress are difficult to use.

Read more

House Convenes Second Public Meeting on Legislative Bulk Data

On January 30th, the House of Representatives held a public meeting on its efforts to release more legislative information to the public in ways that facilitate its reuse. This was the second meeting hosted by the Bulk Data Task Force where members of the public were included; it began privately meeting in September 2012. (Sunlight and others made a presentation at a meeting, in October, on providing bulk access to legislative data.) This public meeting, organized by the Clerk's office, is a welcome manifestation of the consensus of political leaders of both parties in the House that now is the time to push Congress' legislative information sharing technology into the 21st century. In other words, it's time to open up Congress.

The meeting featured three presentations on ongoing initiatives, allowed for robust Q&A, and highlighted improvements expected to be rolled out of the next few months. In addition, the House recorded the presentations and has made the video available to the public. The ongoing initiatives are the release of bill text bulk data by GPO, the addition of committee information for docs.house.gov, and the release on floor summary bulk data. It's expected that these public meetings will continue at least as frequently as once per quarter, or more often when prompted by new releases of information.

As part of the introductory remarks, the House's Deputy Clerk explained that a report had been generated by the Task Force at the end of the 112th Congress on bulk access to legislative data and was submitted to the House Legislative Branch Appropriations Subcommittee. It's likely that the report's recommendations will become public as part of the committee's hearings on the FY 2014 Appropriations Bill, at which time the public should have an opportunity to comment.

Read more

Access to Legislation Gets Better, Promise of More to Come

Earlier today, Speaker Boehner and Majority Leader Cantor and the Government Printing Office announced an improvement in how legislation is made publicly available. Starting in the 113th Congress, GPO will make all bills available for bulk download in XML format. While this doesn't change much from a technological perspective, it does mark a significant change from a policy perspective.

Read more

Looking for the "Constitution Annotated" on Constitution Day

It's been 225 years since the signing of the U.S. Constitution in September 1787, so the three years that have elapsed since we first asked the Library of Congress to publish the invaluable legal treatise Constitution Annotated online in a machine-readable format are little more than 1.3% of the age of our country. And the 670 days (i.e. 1 year and 10 months) that have flown by since Congress directed the Constitution Annotated be published online as it is updated, along with two other "vital legislative and legal documents," are but a brief flicker in geological terms. But in political terms, another congressional session is about to pass without the Library of Congress and GPO making good on their obligation to provide this important document to the American people.

I've run out of clever ways to say this, especially with so many others saying the same thing, but here goes. The Constitution Annotated is an important legal treatise that provides an easily understandable exploration of how Supreme Court decisions interpret the U.S. Constitution. It's already published on Congress' internal website as it is updated, and it should be published online in the same way. At a minimum, the Library and GPO should meet their obligation to do as Congress directed: publish these documents online "as quickly as possible." An informed public is the cornerstone of our democracy, and they should have this information readily available to them.

Looking Forward to the THOMAS Beta Website

In the near future, Congress is expected to release a major upgrade to its aging legislative information website THOMAS. The long-overdue update is part of a much larger effort to "enhance the effectiveness of mission-critical systems," a response to significant public and internal pressure to improve congressional efficiency and transparency. The launch of "THOMAS Beta" is the first step towards developing what the Library of Congress describes as a completely "modern legislative information system" that will replace THOMAS and Congress' more sophisticated internal legislative tracking website "LIS" in FY 2014. Both THOMAS and LIS will stay online alongside the beta website for several years.

While THOMAS Beta has been shown to stakeholders inside Congress, as far as I am aware there has been no formal engagement process with the public to identify specifications, discuss wireframes, or generally make sure the site meets the public's needs. It is expected that such conversations will occur after the launch as the site is built out. My understanding is that the majority of the work on THOMAS Beta thus far has been to modernize the underlying information architecture, with many of the new bells and whistles and apps to be rolled out over time.

Two years ago, the Sunlight Foundation gathered ideas from the community for upgrading THOMAS, and in July 2010 we highlighted three additional ideas, but the primary recommendation continues to be requiring all of the underlying information behind THOMAS to be made available to the public "in bulk."  In other words, all of the legislative information behind THOMAS and LIS should be made available in a way that's easy for machines to understand so that developers can more easy and reliably build tools like OpenCongress, GovTrack, the Congress Android App, and Scout that re-use information in clever new ways.

The House leadership has endorsed the idea of bulk access and established a nascent bulk data task force, but not everyone inside Congress is fully on board with the effort. From an external perspective, we have requested that public stakeholders be included on the bulk data task force, which is being coordinated by the House Clerk's office. Along similar lines, for several years we and others have asked the Library of Congress to form an advisory group on THOMAS (as it is responsible for overseeing THOMAS), and we hope the impending launch of THOMAS Beta will make this a reality.

It's important to understand the context in which the THOMAS Beta rolls out. In the last year, the House of Representatives released an innovative legislative information portal, docs.house.gov, which provides bulk access to House data in a way that is more timely than THOMAS, and will soon provide materials from House committees in addition to documents concerning floor proceedings. The House also held three conferences on legislative transparency and created the bulk data task force. In addition, more than 85 organizations will release a declaration on parliamentary openness in Rome this Saturday at the World e-Parliament Conference that endorses providing information in open and structured formats. And the free, open-source parliamentary information system-in-a-box Bungeni is continuing to gain steam around the world.

We are eagerly looking forward to the launch of THOMAS Beta, and will pay particularly close attention to whether the Library of Congress, which has general responsibility for the project, has built a system that uses modern techniques -- such as bulk access and APIs -- to make information available to the public.

Agency Report Transparency Bill Delayed For Technical Fixes

Markup of a bill to make agency reports to Congress transparent did not occur as planned on Thursday after the measure was pulled to allow technical improvements. The Committee on House Administration was set to review the Access to Congressionally Mandated Reports Act, which if enacted would require that reports from agencies to Congress be available altogether on a single website, thereby improving transparency and facilitating congressional oversight.

Committee Chairman Dan Lungren explained the delay:

We had originally planned to consider a fifth bill, H.R. 1974. However, some of our colleagues identified some issues with it that they wanted additional time to work out. After consulting with Representative Quigley, the sponsor, they asked that we remove it from today’s schedule and we have done so.

The scheduling of the legislation for markup is a strong indicator of support for the measure by committee Republicans. Committee Democrats released a statement Thursday afternoon:

House Administration Democrats support the bill, but we wanted to make some technical improvements to the bill prior to marking it up. We hope to see continued action on this bill in the coming weeks.

The legislation, introduced by Rep. Mike Quigley, enjoys broad bipartisan support and already has been favorably reported out of the Committee on Oversight and Government Reform. It also enjoys broad support from the transparency community.

We hope that the measure will be ready for consideration by the full House in the near future. A companion measure is pending before a Senate committee.

Agency Report Transparency Bill Set for Markup Tomorrow

Tomorrow the Access to Congressionally Mandated Reports Act will get its turn in the spotlight. The legislation, which would require reports from agencies to Congress be available online on a single website, is set for a mark-up before the Committee on House Administration. The bipartisan bill was already favorably reported by the Committee on Oversight and Government Reform in June 2011, but must pass another hurdle before going to the House floor. It enjoys widespread support from members of the transparency community.

The bill fixes a problem that has bedeviled Congress and watchdogs for years. Federal agencies are required to submit reports to Congress, but they often fail to do so, and even reports that have been submitted often cannot be found on agency websites or congressional webpages. This makes oversight incredibly difficult.

ACMRA solves these problems by requiring that all congressionally mandated reports be sent to GPO, which would then publish them online on a single website. (The House Clerk already compiles a master list of the reports that must be filed.) Centralization publication will make the reports easy to find -- and it would become a trivial task to identify when agencies have failed to file on time. The reports must be submitted in open formats and can be downloaded in bulk, so they are easy to open and analyze. In limited circumstances, some of the contents of the reports can be redacted for national security or other reasons, but only if the redaction is permissible under FOIA. It's also worth noting that GPO says that the costs of implementing the legislation are not significant and would be borne by the agency.

Rep. Mike Quigley introduced the legislation and spearheaded efforts to get it enacted in the House. He is now joined by 17 co-sponsors. Senator Lieberman introduced a companion measure in the Senate, which is cosponsored by Senators Collins and Coburn.

A favorable report by the Committee on House Administration could set the stage for quick passage in the House and a hearing in the Senate. CHA has supported a number of other open government measures, so it is hoped that the legislation will meet quick approval.

Two Steps Forward on Improving Public Access to Legislative Information

As I wrote yesterday, each day seems to bring a small step forward on improving public access to legislative information, with two notable developments today.

First, Rep. Honda gave a tantalizing hint of progress on bulk access to legislative data at this morning's subcommittee markup of the Legislative Branch Appropriations bill (sorry no video). He said that "there is exhaustive discussion on bulk data downloads in the [sub]committee report." It's not clear exactly what this means -- the subcommittee report won't be made available to the public until the full committee markup, which is tentatively scheduled in two weeks -- but it's an indication that public attention has joined with bipartisan support from appropriators, overseers, and leadership to make progress on making legislative information available to the American people.

From what I've heard, the pushback is coming largely from the support agencies, although the nature of those concerns are not clear. With the Law Library of Congress taking the lead on THOMAS in recent years, including making some small but useful changes to the site, there is hope that they will grow into their role as facilitators of online transparency. All along, the public interest community has been asking for bulk access to THOMAS data and the creation of an advisory committee on THOMAS.

Second, the objections raised by legislative support agencies are not particularly weighty, at least according to a 2008 memo from the Library of Congress to the Committee on House Administration regarding the availability of THOMAS data. As far as I'm aware, this is the first time it's been made accessible to the public. What's notable is how the Library of Congress was technologically positioned to deliver on legislative data transparency four years ago, but apparently did not move forward. At a minimum, it should alleviate concerns about the difficulty of technological implementation.

According to the memo, the Library expected to finish developing an XML database containing bill metadata such as bill summaries, status of bills, and information on co-sponsors four years ago (in May 2008.) What's revealing about this is that much of the information about legislation has been available in a structured database for nearly half a decade -- and in the kind of format that developers need.

Moreover, the Library reports that "the resources will be available to copy the database daily into an Anonymous File Transfer Protocol [FTP] site so it is accessible to the public" by the time the LIS 2.0 database is completed. This would allow the data to be made available in bulk. (There are better ways to do so, but this is an acceptable solution.)

Also at the time of the memo, March 2008, full text of bills and committee reports were available on GPO Access, but not in XML. From what we can tell, nearly all bills are now available in XML, although it is unclear whether committee reports are prepared in XML. All of this could also be made available in bulk using the technology described in the memo.

The memo raises one major policy implication concerning who owns the data, contemplating that it belongs to the House, Senate, Congressional Research Service, and Government Printing Office. In the literal sense, that's backwards: the information is owned by the American people and held in trust by Congress and its legislative agencies. These entities do serve as repositories of the information, however, and deserve consideration as to the technological means by which it is made available. However, that's with the understanding that these entities should strive to meet the public's need for the information and expansively follow the policies set by Congress in favor of transparency.

We'll continue to keep a close eye on how all this develops.

Library of Congress letter to Committee on House Administration on THOMAS

Appropriators Should Consider Public Access to Leg Info at Friday Mark-up

Public access to legislative information could get a boost this Friday at a House subcommittee hearing. The Legislative Branch Appropriations subcommittee will be marking up Congress' budget for FY 2013, which will present the opportunity to require that the data behind THOMAS be made available to the public in a better format.

Why does this matter? Simply speaking, our democracy is founded upon an informed public acting through its elected officials to make policy. THOMAS makes this possible, but its limitations make it difficult.

Developers and programmers have worked to overcome THOMAS's limitations, creating websites like OpenCongress and GovTrack.us that together have nearly twice as many visitors as THOMAS, mobile device apps like Sunlight's "Congress" Android app that's been downloaded 400,000 times, as well as integrating the data into news coverage (like at the New York Times) and special purpose sites like WashingtonWatch.com.

Unfortunately, weaknesses in how THOMAS makes the data available limits what can be accomplished by even the most talented developer. No one expects THOMAS to do everything, but it suffers from basic problems. Its web page addresses break after 15 minutes, it doesn't provide redlines of bills, you can't get alerts when legislation is moving, and it does a poor job of integrating relevant legislative data. There's a laundry list of improvements here. In addition, there are other tasks that shouldn't be done by THOMAS, but should exist... whether as simple as connecting relevant CRS reports to legislation or as dynamic as adding an interactive social media layer.

These are examples of the benefits of opening up the data that drives THOMAS. Beneath the 1990s web interface is an up-to-date database of bills, bill status information, legislative summaries, and much more. Releasing the data in a developer-friendly format (i.e. structured data made available in bulk) would empower innovators to improve upon the services THOMAS provides, and to go in entirely new directions, all at no cost to the public.

When the THOMAS website went live on January 5, 1995, it was the result of a bipartisan effort to grant "citizens across the country and around the world ... access, via the Internet, to congressional information." THOMAS significantly improved how legislative information was made available online -- it provided additional materials in a centralized location, and did not charge the public for access -- with a pledge that over time "enhancement[s] will be made to THOMAS to upgrade its features."

While citizens around the world gained access to some congressional information, enhancements to THOMAS's capabilities have been limited in scope. Its limitations kindled a desire in users to be able to build their own tools to make use of legislative data. These efforts have been severely hampered because THOMAS doesn't give the public access to its underlying database, instead releasing its information piecemeal through thousands of webpages.

This challenge was partially overcome by technologists like Josh Tauburer, who in 2004 launched GovTrack.us, which he describes in his great new book Open Government Data as "one of the first websites world-wide to offer comprehensive parliamentary tracking for free and with the intention to be used by everyday citizens." But there's a catch. The unstructured way the THOMAS data was released required him to find some way to gather and organize the data.

He turned to screen scraping, which involves "programmatically loading up web pages, looking at their HTML source, and extracting information using simple pattern matching." Jim Harper at Washington Watch, which tracks bills and government spending, also uses screen scraping. They've run into similar problems: screen scrapers don't catch all the data, they're a pain to build, they easily break, and can suffer from a time lag. All of this could easily be fixed by publicly releasing the structured database behind THOMAS.

In fact, releasing the database -- often referred to as providing "bulk access to data" -- is a longstanding open data principle that has been called for by many people over the years.

In May 2007, a coalition of organizations and experts released the Open House Report, which recommended (among other things) the creation of a "Legislation Database."

"Congress should make available to the public a well-supported database of all bill status and summary information currently accessible through the Library of Congress. This database, as well as its supporting files, should be in a structured, non-proprietary format such as XML. "

This recommendation was embraced by Representative Mike Honda, then Chairman of the House Legislative Branch Appropriations Subcommittee. In November of 2007, a committee staffer asked the Library of Congress "to report back on solutions to provide raw legislative data to the public, as well as the resources required to accomplish this." No such report has been released by the Library to the public.

Around the same time, legislative language was inserted into an explanatory statement accompanying the Omnibus Appropriations Act of 2009 (P.L. 111-8) that declared "There is support for enhancing public access to legislative documents, bill status, summary information, and other legislative data through more direct methods such as bulk data downloads and other means of no-charge digital access to legislative databases."

This direct endorsement of bulk access to legislative data did not yield measurable results from the Library of Congress, which is responsible for the THOMAS database. Not did the myriad of meetings, phone calls, and letters from congressional staff to the Library.

Over time, there has been a shift of responsibility for THOMAS to the Law Library from other parts of the Library of Congress, as announced in their January 5, 2010 holiday newsletter. Although the newsletter raised hoped that the "analysis of the system's functionality and content based on user feedback" would lead to improvements in access to the underlying data, no movement on this issue was forthcoming. Even so, the public and members of Congress have continued to press forward on the issue.

For example, in May 2010, I had the opportunity to testify on behalf of Sunlight before the House Legislative Appropriations subcommittee. We called on Congress to:

Grant the public access to legislative documents, bill status and summary information, and other legislative data no later than 120 days after the start of FY 2012. We also ask for the immediate creation of an advisory committee, composed of relevant legislative agency employees and members of the public, that will meet regularly to address the public's need for access to this information, and the means by which it is provided.

In September 2010, Rep. Foster introduced legislation to improve public access to THOMAS. The bill would have provided bulk access to bill summary and other THOMAS data, created an advisory committee to make recommendations on improving THOMAS, and urged the Library to work towards providing bulk access to the full text of the legislation. The session ended before there was an opportunity for action.

Even though the 112th Congress brought a change in leadership in the House, bipartisan interest in making this information available to the public continued. Indeed, over the years appropriators, overseers, and leadership have pushed the ball forward. In June of 2011, the Committee on House Administration held a hearing  on making congressional documents available electronically as a transparency and cost-savings measure. One of the panelists, Cornell's Tom Bruce, advocated that the House focus on providing legislative data in bulk and in a timely fashion.

In December, Reps. Cantor and Hoyer co-hosted a Congressional Hackathon, which brought together nearly 300 developers and policy wonks to discuss how to use technology to make the legislative branch more open. Out of that meeting came three action items, the first of which was "providing legislative data in a bulk format to enable third-party developers to create more dynamic interfaces for legislative information."

By the middle of the month, the Committee on House Administration set forth standards for the electronic posting of House and committee documents and data. In January, the House launched a groundbreaking transparency portal. It provides a one stop website where the public can access all House bills, amendments, resolutions for floor consideration, and conference reports in XML, as well as information on floor proceedings and more. Information will ultimately be published online in real time and archived for perpetuity. So far, only documents considered by the full House are available online, but it's expected that Committee documents will be available by the beginning of 2013.

The House transparency portal is a tremendous breakthrough, but it does have significant limitations. Because it came online in 2012, it doesn't capture the historical information contained in the THOMAS database. As a House resource, it doesn't have Senate records. And it doesn't contain bill summaries, related bills, and other information prepared by the Library of Congress and GPO that are made available through THOMAS. Therese limitations can be overcome in time, and they clearly points the way to the future, especially if the Library of Congress doesn't act.

On February 2, the House held a full day Legislative Data and Transparency Conference, which brought together nearly all of the key players in making congressional information available to the public. On behalf of Sunlight, I delivered a talk on benchmarks for measuring success for legislative data transparency, which clearly included a call for THOMAS data to be made available in bulk. Surprisingly, the Library of Congress' representative, when directly asked about THOMAS, indicated the issue wasn't even on the radar. Three days later, the Sunlight Foundation submitted comments to the House Legislative Branch Appropriations Committee on the importance of making legislative data available to the public, as did Josh Tauburer and Open Congress.

By April, a coalition of 30 organizations wrote a letter to legislators asking Congress to provide bulk access to THOMAS and create an advisory body. Part of the letter reads as follows:

We estimate that for every person that goes directly to the THOMAS website, at least two people visit a third-party website. But even these sites must rely on legislative information generated and maintained by Congress, which is only available through the difficult-to-use THOMAS website. There will always be a need for a congressionally-mandated website, but Congress should ensure that the innovative and transformative uses of legislative information by third parties is grounded upon accurate and timely data. And that means providing bulk access to everyone.

So here we are in May. The three best legislative opportunities to require bulk access to THOMAS this legislative year, in increasing order of difficulty, are in the Leg Branch Approps Subcommittee mark-up on Friday, the full committee mark-up, and in the final vote on the House floor. (The Senate also provides an opportunity, but the House traditionally has led on these issues.)

It's time to fulfill the promise of citizen access to legislative information. Congress should require bulk access to THOMAS legislative data no later than 120 days of passage of the appropriations bill, and create an advisory committee that regularly meetings to look at public access to legislative information and is composed of people inside and outside of government. It would make information that's already required to be publicly available much more useful to everyone, and impose (at best) a minimal cost.

THOMAS was created by Congress to make legislative information freely available to the public, but the Library has not kept up with best practices. Congress should break the logjam and keep the promise of making free legislative information available to everyone in a way that encourages the public to make the most of it.