by Daniel Schuman and Eric Mill
House Appropriators may deal a tremendous blow to prospects for improving public access to legislative information. In a draft report expected to accompany the Legislative Branch Appropriations Bill for 2013, scheduled for a full committee vote tomorrow, appropriators misunderstand how data can be "authenticated," and kick responsibility for improving public access to legislative data to a non-public task force with no set reporting date. Unless corrected, this draft report represents a tremendous step backward for transparency, and fails to seriously grapple with the history of efforts to free legislative information for widespread public use.
Legislative Information Should Be Widely Disseminated
The purpose of THOMAS is to bring legislative information to as many people as possible; preservation and authentication is best handled through other long-established methods that THOMAS was never intended to address. The lack of authenticity to THOMAS data does not present a problem for most users. Rather, the largest problem with THOMAS is that the data is not provided so that it can be easily copied, placing a significant burden on citizens who wish to make sophisticated use of the information. The THOMAS website directly provides nearly a million users each month with an "inauthentic" version of information about legislative activities, a practice that will continue unabated under the draft committee report. While THOMAS often links to a GPO document that is "authenticated," its display of bill text, legislative summaries, cosponsor data, and other information is not certified as being correct, and often changes because of the Library's errors in how it publishes the data.
To the extent to which THOMAS information should be authentic, the report does not engage with best practices around authenticity of data on the Internet. Verifying the authenticity of data can be performed securely and reliably with the use of metadata external to the data itself. In fact, this is precisely how GPO's FDSys currently authenticates XML documents of the US government, including its legislation, regulations, and laws. GPO accompanies each document it publishes with a "PREMIS" metadata file that includes information needed to cryptographically verify the authenticity of documents. For example, here's the PREMIS file accompanying HR 6289. Worries about authenticity are a red herring.
Bulk Access is a Separate Question from Authenticity
Bulk access to THOMAS data is a simpler and less controversial step than this draft report contemplates. The underlying information is already publicly available on the THOMAS website, and third parties already are scraping the data from the site to make it available in bulk. It simply makes sense for the Library to meet the needs of the public directly through providing the data in bulk itself. This merely opens up another avenue to access info that's already being released. It would also eliminate any errors created through the scraping process.
The Draft Report Creates a Secret, Never-Ending Process
The draft report would require the establishment of a task force to examine and report back on a number of issues raised in the report regarding bulk access to legislative data. This is seriously flawed in several major ways.
First, bulk access is about granting the public better access to legislative information. It stands to reason that the public should be included in all discussions. However, the proposed task force does not include any non-governmental participants. A number of individuals and organizations are expert in these matters, and should be full participants.
Second, the draft report imposes no deadline for a report from the task force. The last time Appropriators required a task force on a similar matter, four year ago, it never reached any conclusions or reported back. Without a deadline, the same will happen here.
Third, the task force's report should be provided to the public as well as to the committee at the time it is completed. Draft reports should also be made available for public comment.
Fourth, the report language is terribly overbroad: it prohibits the establishment of bulk data downloads of legislative information prior to the reporting back of the task force. Making use of modern technology to provide information in better ways should be something that is encouraged, not prohibited. Information is already being provided to the public in bulk regarding certain legislative activity. Would this report language stop the GPO from providing bulk access to the Congressional Record, as it does now? Would it prohibit the House of Representatives from providing bulk access through its innovative docs.house.gov portal? If so, that would be a disaster for transparency.
Finally, the idea of a task force to assess these questions ignores that these issues were already addressed by the Library of Congress in a 2008 memo.The memo explained that the XML database containing bill metadata was expected to be able to be released in bulk by May 2008. It also stated that "CRS... will continue to identify and analyze ... the following policy matters for the Committee's consideration," including "data accuracy" and "data permanence and authentication." Where are the results of CRS's analysis? What is the strategic plan for THOMAS referenced in the memo? Where is the study promised that would engage in "an examination of permanence and authentication of legislative data, along with any attendant issues, risks and workload?"
Simply put, the draft committee report's establishment of a task force is another recipe for delay. We saw this four years ago, the last time the Library was pressed to make improvements on this issue. The time is long past for action, and the Appropriations Committee will be judged on whether it makes another plan to make a plan, or whether it establishes real deadlines for progress. THOMAS itself was created in a matter of months when the Speaker of the House decided it was a priority. Bulk access to legislative data will also come about when legislators decide that being transparent is more important than establishing a task force to talk about it.