As I wrote yesterday, each day seems to bring a small step forward on improving public access to legislative information, with two notable developments today.
First, Rep. Honda gave a tantalizing hint of progress on bulk access to legislative data at this morning's subcommittee markup of the Legislative Branch Appropriations bill (sorry no video). He said that "there is exhaustive discussion on bulk data downloads in the [sub]committee report." It's not clear exactly what this means -- the subcommittee report won't be made available to the public until the full committee markup, which is tentatively scheduled in two weeks -- but it's an indication that public attention has joined with bipartisan support from appropriators, overseers, and leadership to make progress on making legislative information available to the American people.
From what I've heard, the pushback is coming largely from the support agencies, although the nature of those concerns are not clear. With the Law Library of Congress taking the lead on THOMAS in recent years, including making some small but useful changes to the site, there is hope that they will grow into their role as facilitators of online transparency. All along, the public interest community has been asking for bulk access to THOMAS data and the creation of an advisory committee on THOMAS.
Second, the objections raised by legislative support agencies are not particularly weighty, at least according to a 2008 memo from the Library of Congress to the Committee on House Administration regarding the availability of THOMAS data. As far as I'm aware, this is the first time it's been made accessible to the public. What's notable is how the Library of Congress was technologically positioned to deliver on legislative data transparency four years ago, but apparently did not move forward. At a minimum, it should alleviate concerns about the difficulty of technological implementation.
According to the memo, the Library expected to finish developing an XML database containing bill metadata such as bill summaries, status of bills, and information on co-sponsors four years ago (in May 2008.) What's revealing about this is that much of the information about legislation has been available in a structured database for nearly half a decade -- and in the kind of format that developers need.
Moreover, the Library reports that "the resources will be available to copy the database daily into an Anonymous File Transfer Protocol [FTP] site so it is accessible to the public" by the time the LIS 2.0 database is completed. This would allow the data to be made available in bulk. (There are better ways to do so, but this is an acceptable solution.)
Also at the time of the memo, March 2008, full text of bills and committee reports were available on GPO Access, but not in XML. From what we can tell, nearly all bills are now available in XML, although it is unclear whether committee reports are prepared in XML. All of this could also be made available in bulk using the technology described in the memo.
The memo raises one major policy implication concerning who owns the data, contemplating that it belongs to the House, Senate, Congressional Research Service, and Government Printing Office. In the literal sense, that's backwards: the information is owned by the American people and held in trust by Congress and its legislative agencies. These entities do serve as repositories of the information, however, and deserve consideration as to the technological means by which it is made available. However, that's with the understanding that these entities should strive to meet the public's need for the information and expansively follow the policies set by Congress in favor of transparency.
We'll continue to keep a close eye on how all this develops.
Library of Congress letter to Committee on House Administration on THOMAS