Library of Congress

 

GPO is Closing Gap on Public Access to Law at JCP's Direction, But Much Work Remains

The GPO's recent electronic publication of all legislation enacted by Congress from 1951-2009 is noteworthy for several reasons. It makes available nearly 40 years of lawmaking that wasn't previously available online from any official source, narrowing part of a much larger information gap. It meets one of three long-standing directives from Congress's Joint Committee on Printing regarding public access to important legislative information. And it has published the information in a way that provides a platform for third-party providers to cleverly make use of the information. While more work is still needed to make important legislative information available to the public, this online release is a useful step in the right direction.

Narrowing the Gap

In mid-January 2013, GPO published approximately 32,000 individual documents, along with descriptive metadata, including all bills enacted into law, joint concurrent resolutions that passed both chambers of Congress, and presidential proclamations from 1951-2009. The documents have traditionally been published in print in volumes known as the "Statutes at Large," which commonly contain all the materials issued during a calendar year.

The Statutes at Large are literally an official source for federal laws and concurrent resolutions passed by Congress. The Statutes at Large are compilations of "slip laws," bills enacted by both chambers of Congress and signed by the President. By contrast, while many people look to the US Code to find the law, many sections of the Code in actuality are not the "official" law. A special office within the House of Representatives reorganizes the contents of the slip laws thematically into the 50 titles that make up the US Code, but unless that reorganized document (the US Code) is itself passed by Congress and signed into law by the President, it remains an incredibly helpful but ultimately unofficial source for US law. (Only half of the titles of the US Code have been enacted by Congress, and thus have become law themselves.) Moreover, if you want to see the intact text of the legislation as originally passed by Congress -- before it's broken up and scattered throughout the US Code -- the place to look is the Statutes at Large.

In 2011, GPO published 58 volumes of the Statutes at Large, covering 1951-2009, but did not break the volumes down into their constituent documents. Up until that point, the public laws were available as individual documents on THOMAS from 1989 to present as HTML (and PDF in some instances), and from 1789 to 1875 as TIFF (unwieldy image) files from the Library of Congress. Even with this recent release, 76 years of federal law are still unavailable online in any format from any official source; and the files released for the years 1789 to 1875 by the Library of Congress are difficult to use.

Read more

House Convenes Second Public Meeting on Legislative Bulk Data

On January 30th, the House of Representatives held a public meeting on its efforts to release more legislative information to the public in ways that facilitate its reuse. This was the second meeting hosted by the Bulk Data Task Force where members of the public were included; it began privately meeting in September 2012. (Sunlight and others made a presentation at a meeting, in October, on providing bulk access to legislative data.) This public meeting, organized by the Clerk's office, is a welcome manifestation of the consensus of political leaders of both parties in the House that now is the time to push Congress' legislative information sharing technology into the 21st century. In other words, it's time to open up Congress.

The meeting featured three presentations on ongoing initiatives, allowed for robust Q&A, and highlighted improvements expected to be rolled out of the next few months. In addition, the House recorded the presentations and has made the video available to the public. The ongoing initiatives are the release of bill text bulk data by GPO, the addition of committee information for docs.house.gov, and the release on floor summary bulk data. It's expected that these public meetings will continue at least as frequently as once per quarter, or more often when prompted by new releases of information.

As part of the introductory remarks, the House's Deputy Clerk explained that a report had been generated by the Task Force at the end of the 112th Congress on bulk access to legislative data and was submitted to the House Legislative Branch Appropriations Subcommittee. It's likely that the report's recommendations will become public as part of the committee's hearings on the FY 2014 Appropriations Bill, at which time the public should have an opportunity to comment.

Read more

Access to Legislation Gets Better, Promise of More to Come

Earlier today, Speaker Boehner and Majority Leader Cantor and the Government Printing Office announced an improvement in how legislation is made publicly available. Starting in the 113th Congress, GPO will make all bills available for bulk download in XML format. While this doesn't change much from a technological perspective, it does mark a significant change from a policy perspective.

Read more

Congress launches THOMAS successor Congress.gov

Seventeen years after the creation of THOMAS, Congress today launched a sleeker, more intuitive and user-friendly legislative information website, beta.congress.gov.

What's noticeable about this evolving beta website, besides the major improvements in how people can search and understand legislative developments, is what's still missing: public comment on the design process and computer-friendly bulk access to the underlying data.

We hope that Congress will now deeply engage with the public on the design and specifications process and make sure that legislative information is available in ways that most encourage analysis and reuse.

It's also worth remembering what the Library of Congress said in 1996 as it considered what should be included in its legislative information system:

To be most useful to Members of Congress, the legislative information system must provide access to a wide range of current and historical information, including existing statutes, support agency analyses, academic studies, court decisions, budget and financial data, regulations and executive branch policies, public and private sector analyses, lobby group position papers, and newspaper reports from local, national, and international sources.

We will have more to say as we dig deeper into the website. The Library of Congress' news release is below.

LOC News Announcement on Beta.Congress.gov

Looking Forward to the THOMAS Beta Website

In the near future, Congress is expected to release a major upgrade to its aging legislative information website THOMAS. The long-overdue update is part of a much larger effort to "enhance the effectiveness of mission-critical systems," a response to significant public and internal pressure to improve congressional efficiency and transparency. The launch of "THOMAS Beta" is the first step towards developing what the Library of Congress describes as a completely "modern legislative information system" that will replace THOMAS and Congress' more sophisticated internal legislative tracking website "LIS" in FY 2014. Both THOMAS and LIS will stay online alongside the beta website for several years.

While THOMAS Beta has been shown to stakeholders inside Congress, as far as I am aware there has been no formal engagement process with the public to identify specifications, discuss wireframes, or generally make sure the site meets the public's needs. It is expected that such conversations will occur after the launch as the site is built out. My understanding is that the majority of the work on THOMAS Beta thus far has been to modernize the underlying information architecture, with many of the new bells and whistles and apps to be rolled out over time.

Two years ago, the Sunlight Foundation gathered ideas from the community for upgrading THOMAS, and in July 2010 we highlighted three additional ideas, but the primary recommendation continues to be requiring all of the underlying information behind THOMAS to be made available to the public "in bulk."  In other words, all of the legislative information behind THOMAS and LIS should be made available in a way that's easy for machines to understand so that developers can more easy and reliably build tools like OpenCongress, GovTrack, the Congress Android App, and Scout that re-use information in clever new ways.

The House leadership has endorsed the idea of bulk access and established a nascent bulk data task force, but not everyone inside Congress is fully on board with the effort. From an external perspective, we have requested that public stakeholders be included on the bulk data task force, which is being coordinated by the House Clerk's office. Along similar lines, for several years we and others have asked the Library of Congress to form an advisory group on THOMAS (as it is responsible for overseeing THOMAS), and we hope the impending launch of THOMAS Beta will make this a reality.

It's important to understand the context in which the THOMAS Beta rolls out. In the last year, the House of Representatives released an innovative legislative information portal, docs.house.gov, which provides bulk access to House data in a way that is more timely than THOMAS, and will soon provide materials from House committees in addition to documents concerning floor proceedings. The House also held three conferences on legislative transparency and created the bulk data task force. In addition, more than 85 organizations will release a declaration on parliamentary openness in Rome this Saturday at the World e-Parliament Conference that endorses providing information in open and structured formats. And the free, open-source parliamentary information system-in-a-box Bungeni is continuing to gain steam around the world.

We are eagerly looking forward to the launch of THOMAS Beta, and will pay particularly close attention to whether the Library of Congress, which has general responsibility for the project, has built a system that uses modern techniques -- such as bulk access and APIs -- to make information available to the public.

After 578 Days, Where's the Constitution Annotated?

578 days ago, Congress directed that the legal treatise Constitution Annotated be published online, but it's still not available. The Constitution Annotated, aka CONAN, is a 100-year-old continuously updated congressional report that explains the US Constitution as it has been interpreted by the Supreme Court. With so many important rulings coming out of the High Court, it's important to understand the effect of its decisions on the Constitution.

Here's what Congress, via the Joint Committee on Printing, required in a November 17, 2010 letter:

Update the online edition [of the Constitution Annotated] as frequently as possible, and to create new and improved functions on the CONAN site. The Congress and the public should find this site accessible and user-friendly.

The master file for CONAN is updated frequently and is available as a website accessible only to Congress. (The public version is updated only once a decade and is released in a barely usable format, which is why JCP sent the letter in the first place.) Many organizations have asked that CONAN be published online in its original (XML) format. JCP has directed that it be published online in a timely fashion, but in the less-useful PDF format. (It would be fine to publish it in both.)

This shouldn't be a particularly hard project, so we can only help but wonder why there's been such a long delay, and how much longer we'll have to wait? As an interim measure, it may be simplest for Congress simply to release to the public what it already publishes on the Congress' internal website. That should require the technological equivalent of flipping a switch.

This upcoming year, CONAN will be up for its once-a-decade print edition. With at least 4,870 statutorily mandated copies, at an guesstimated cost of $226 per copy, the House and Senate will pay over $1.1 million to prepare a document that will go out of date almost immediately. (Even assuming that 60% of the costs are for layout, which is necessary for an online edition as well, that's still $440,000 to print a very heavy doorstop.)

Some of these costs may be avoided by asking Congressional offices whether they prefer a paper version or electronic access, as is the practice with other legislative documents. But the bigger question is: what's taking so long? Is this a sign of bigger problems inside the Library of Congress and GPO? When will this finally be finished?

It looks like we'll have to continue to wait and see.

Media Spotlight on Congress Stalling Open Access to Legislation

The media's magnifying glass is concentrating attention on actions by the House Appropriations Committee that could stall progress on the public's access to legislative information. The Sunlight Foundation and our allies continue to push Congress to stop dragging their feet and join the 21st century by allowing developers access to open legislative data to build the tools to keep citizens informed about what their government is doing.

Please find and call your Representative at 202-224-3121 or write to reinforce the American public's hunger to read and follow legislation. Here are some excerpts from recent media coverage on this important transparency issue:

Roll Call reports on Republican House leadership's strong support for bulk access and quotes Rep. Crenshaw misunderstanding the issue of authentication:

“The Speaker pledged to make the 112th Congress the most open and transparent Congress in history and to make legislative data available online and in bulk,” said Michael Steel, spokesman for Speaker John Boehner (R-Ohio). “He continues to look for the best way to do that.”

“Facilitating public access to bulk legislative data ... has been and will continue to be a priority for this committee,” echoed Salley Wood, spokeswoman for the House Administration Committee. But lawmakers’ hands would be tied until a task force could be convened and report back on its findings, according to the House report language.

“We wanted to create a system where we could have this available but also make sure we protect the authenticity and integrity of all this information,” said Rep. Ander Crenshaw (R-Fla.), chairman of the Appropriations Subcommittee on the Legislative Branch.
The Washington Examiner addresses the committee's confusion over how citizens use and should access government information:
Folks with computers -- notably, professional and citizen journalists -- would be able to take information about massive numbers of bills and analyze them in myriad ways -- if Congress would allow such information to be downloaded from THOMAS in bulk.

It won't. And, according to a new draft report from the House Appropriations Committee, it won't be allowing bulk data downloads from THOMAS anytime soon.

Instead of taking a step towards greater transparency, the committee got hung up on whether people would know if the data they're seeing on the Internet were accurate and really from Congress -- "authentication," they call it.
FierceGovernment notes the lack of a deadline for decision making:
The report retains language decried by transparency opponents that would indefinitely postpone public bulk downloads of legislative information in XML. Good government groups, including the Sunlight Foundation, have pressed for the Library of Congress to release the bulk data used to track legislative developments in the library's THOMAS website, arguing that they could do a better job of presenting information.
TechPresident reports on the frustration among transparency advocates:
Open government advocates are up in arms over what appears to be another attempt by government bureaucrats to stall the move to enable bulk data downloads of legislative information online.

Slashdot opens the issue for conversation to their community:

The House Appropriations Committee is considering a draft report that would forbid the Library of Congress to allow bulk downloads of bills pending before Congress. The Library of Congress currently has an online database called THOMAS (for Thomas Jefferson) that allows people to look up bills pending before Congress. The problem is that THOMAS is somewhat clunky and it is difficult to extract data from it. This draft report would forbid the Library of Congress from modernizing THOMAS until a task force reports back. I am pretty sure that the majority of people on Slashdot agree that being able to better understand how the various bills being considered by Congress interact would be good for this country.

Legal Informatics also has a nice collection of blog posts on this issue.

Follow the latest developments here.

Bulk Access Language Tweaked by Approps

The House Appropriations Committee had apparently tweaked its report language regarding bulk access to legislative information. The  report approved by the Committee has been replaced on website, but I have a copy of the original. Here's how the final paragraph has changed:

 Accordingly, and before any bulk data downloads of legislative information are authorized, The Committee directs the establishment of a task force composed of staff representatives of the Library of Congress, the Congressional Research Service, the Clerk of the House, the Government Printing Office, and such other congressional offices as may be necessary, to examine these and any additional issues it considers relevant and to report back to the Committee on Appropriations of the House and Senate.

What does this mean? It's responsive to one of the concerns we raised about the language, that "the report language is terribly overbroad: it prohibits the establishment of bulk data downloads of legislative information prior to the reporting back of the task force."  At least, as a matter of law, efforts around bulk access will not be frozen. Given that this restrictive language was inserted in the first place, it remains to be seen whether efforts around bulk data will continue as a matter of practice. (We have some reassurance on this count from the Speaker's Office.)

All the other concerns we raised before remain:

  • Why doesn't the task force include non-governmental participants if its focus is releasing information to the public?

  • When must it report back? There's no deadline for action.

  • Will draft reports be made available to the public for comment? Will meetings be open?

  • Why will the final report only be given to appropriators? It should be available to all members of Congress and to the public as well.

  • How are the issues entrusted to this task force any different form the issues already addressed by the Library of Congress in this 2008 memo? Where are any follow-on reports, contemplated in that memo, that engaged in "an examination of permanence and authentication of legislative data, along with any attendant issues, risks and workload?"

While we'd prefer the task force be open and transparent, to a large extent it is a red herring. The issues that it has been tasked have either been addressed previously or are largely irrelevant. It's important that people continue to call and write their members of Congress.

 

Below the jump is the full text of the revised report language.

During the hearings this year, the Committee heard testimony on the dissemination of congressional information products in Extensible Markup Language (XML) format. XML permits data to be reused and repurposed not only for print output but for conversion into ebooks, mobile web applications, and other forms of content delivery including data mashups and other analytical tools. The Committee has heard requests for the increased dissemination of congressional information via bulk data download from non-governmental groups supporting openness and transparency in the legislative process. While sharing these goals, the Committee is also concerned that Congress maintains the ability to ensure that its legislative data files remain intact and a trusted source once they are removed from the Government's domain to private sites.

The GPO currently ensures the authenticity of the congressional information it disseminates to the public through its Federal Digital System and the Library Congress's THOMAS system by the use of digital signature technology applied to the Portable Document Format (PDF) version of the document, which matches the printed document. The use of this technology attests that the digital version of the document has not been altered since it was authenticated and disseminated by GPO. At this time, only PDF files can be digitally signed in native format for authentication purposes. There currently is no comparable technology for the application and verification of digital signatures on XML documents. While the GPO currently provides bulk data access to information products of the Office of the Federal Register, the limitations on the authenticity and integrity of those data files are clearly spelled out in the user guide that accompanies those files on GPO's Federal Digital System.

The GPO and Congress are moving toward the use of XML as the data standard for legislative information. The House and Senate are creating bills in XML format and are moving toward creating other congressional documents in XML for input to the GPO. At this point, however, the challenge of authenticating downloads of bulk data legislative data files in XML remains unresolved, and there continues to be a range of associated questions and issues: Which Legislative Branch agency would be the provider of bulk data downloads of legislative information in XML, and how would this service be authorized. How would `House' information be differentiated from `Senate' information for the purposes of bulk data downloads in XML? What would be the impact of bulk downloads of legislative data in XML on the timeliness and authoritativeness of congressional information? What would be the estimated timeline for the development of a system of authentication for bulk data downloads of legislative information in XML? What are the projected budgetary impacts of system development and implementation, including potential costs for support that may be required by third party users of legislative bulk data sets in XML, as well as any indirect costs, such as potential requirements for Congress to confirm or invalidate third party analyses of legislative data based on bulk downloads in XML? Are there other data models or alternative that can enhance congressional openness and transparency without relying on bulk data downloads in XML?

The Committee directs the establishment of a task force composed of staff representatives of the Library of Congress, the Congressional Research Service, the Clerk of the House, the Government Printing Office, and such other congressional offices as may be necessary, to examine these and any additional issues it considers relevant and to report back to the Committee on Appropriations of the House and Senate.

#FreeTHOMAS

Does information about legislation belong to Congress or to the American people? This basic question is at the heart of a fight over how Congress releases data about what it does. Americans increasingly use the Internet to make sense of the world around them, and open data opens up Congress in a way that's never been possible before.

In the pre-YouTube pre-iPhone pre-Amazon days, Congress built a website -- THOMAS -- to let citizens follow legislation from home. THOMAS was revolutionary ... in 1995. But the Internet continued to develop, becoming more sophisticated and interactive, allowing web developers to easily share the data behind their websites with others. It's why we can book flights on Travelocity, check the weather on our phones, and follow legislation on OpenCongress and GovTrack.

Unlike Travelocity and the National Weather Service, Congress doesn't share the data behind THOMAS with anyone. Instead, web developers must reverse-engineer the website to transmute its pages into usable data, like assembling a puzzle from thousands of ragged pieces without a picture on the box as a guide. This slow, difficult, and time-consuming process isn't perfect, but it's responsible for how most Americans follow what's happening in Congress.

The better approach is for Congress to publish the data behind THOMAS. Government regularly does this elsewhere, and "bulk data" is responsible for clever new uses of information developed by citizens, journalists, and even the government itself.

In upcoming days, the House is likely to pass legislative language that pays lip service to releasing THOMAS data while putting the idea in a deep freeze. This would be a disaster. But it's not too late. Tell your representative that you want Congress to publish legislative data now.

PS. For more information and the latest developments, go here