Sunlight Foundation

Will the House's Leg Spending Bill Match Its Transparency Priorities?

In the last 18 months, the House of Representatives has made significant strides towards greater openness and transparency in congressional deliberations, but significant work remains. The Legislative Branch Appropriations Bill for 2013, which was marked-up by a subcommittee last week, presents a major vehicle for the House leadership to make good on its promise to implement common-sense transparency measures this session.

While there are many issues that can be addressed a number of different ways, Sunlight will be looking at  the full committee markup to see if the bill:

-- Provides bulk access to THOMAS data
-- Fully funds the Office of Congressional Ethics
-- Requires Publication of  CRS Reports online
-- Publishes the Constitution Annotated online as it's updated in XML
-- Reinstates the Office of Technology Assessment
-- Makes reports to Congress available online
-- Publishes House spending information in an appropriate format for the data

Improve Public Access to THOMAS Data

THOMAS was created by Congress to make legislative information freely available to the public, but the Library of Congress has not kept up with best practices. One such practice -- "bulk access" -- would ease the development of new tools and technologies by publishing THOMAS data files online, promoting accurate and timely information dissemination. Congress has expressed its support for bulk data as have many organizations, but the Library continues to stall despite a 2008 memo describing how easy it would be to implement.

At the recent legislative subcommittee hearing, Rep. Honda mentioned that text has been inserted into the committee's report that would in some way address the bulk data question. The last time this happened, the language was watered down sufficiently so that the Library of Congress successfully evaded its obligations over the last half a decade. We hope the bill will contain these two provisions:

(1) Congress directs the Library of Congress to implement bulk access to THOMAS within 120 days of passage (2) Congress directs the Library of Congress to immediately create an advisory committee on improving public access to legislative information that is composed of people inside and outside of government.

Fully Fund the Office of Congressional Ethics

The Office of Congressional Ethics is the House of Representatives' independent ethics watchdog. It came into existence in March 2008 after a series of corruption scandals prompted congressional leaders to explore creating a transparent, outside enforcement entity. While OCE is not as robust as originally contemplated, it plays a crucial role in ethics oversight. Last year, the office survived a counterproductive effort by nearly 100 members of Congress to significantly reduce its funding. This year's appropriations bill maintains OCE's funding at $1,548,000, which is the same level as last year. We believe that OCE should be strengthened, but at a minimum, its funding should be sustained at least at this level.

Publish CRS Reports Online

Congressional Research Service reports undergird the public's understanding of Congress, but CRS no longer directly releases the reports to the public. As a consequence, while many reports used by citizens, courts, and government employees are on the internet, they are often out-of-date, and a fair number are available only for a fee or not at all. By comparison, sister agencies like CBO and GAO regularly publish reports online. For more than a decade, organizations and members of Congress have urged that CRS reports be publicly available, and CRS concerns have been refuted by a former counsel to the House of Representatives. The reports are already digitized and available on Congress's intranet; it would take a trivial effort to publish them online.

During the markup of the 2012 Appropriations Bill, Rep. Leonard Lance introduced an amendment that would have required the Clerk of the House and the Secretary of the Senate to maintain a website containing CRS Reports and Appropriation products while protecting confidential advice from CRS. Similar legislation has been introduced by Rep. Quigley. We hope that House Appropriators will move to make these reports more readily available to the public.

Release the Constitution Annotated Online

The Constitution Annotated (or CONAN) is a continuously-updated 100-year-old legal treatise that explains the Constitution as it has been interpreted by Supreme Court. Maintained by CRS and printed by GPO, a hard copy is published (and put online) only once a decade, with printed updates every two years. However, CONAN is updated frequently, with those updates available on Congress' internal website. In November 2010 (18 months ago), the Joint Committee on Printing directed that the continuously-updated version of CONAN be made available online as a searchable PDF, but it still is not. Many organizations have asked that the underlying document be published online in its original (XML) format, which is more user friendly than a PDF, and would take minimal effort to release.

This upcoming year, the Constitution Annotated will be up for its once-a-decade print edition. With at least 4,870 statutorily mandated copies, at an estimated cost of $226, the House and Senate will pay over $1.1 million for a document that will go out of date almost immediately. We suggest that some of these costs may be recouped by asking House offices if they wish to receive a print copy, as a continuously updated web version is already made available to all congressional offices. Regardless, we urge that the web version that is already made available to congressional offices also be made available to the American people in its web friendly format. While publishing the document as a PDF would be a small step forward, the best use of taxpayer dollars to maximize usability would be to publish it in XML, the format in which it is prepared.

Other Provisions

Sunlight support additional measures in the Legislative Branch Appropriations bill. Those provisions include:

The reinstatement of the Office of Technology Assessment, as proposed by Rep. Rush Holt last year. OTA provided Congress http://sunlightfoundation.com/blog/taxonomy/term/Office-of-Technology-Assessment/ with the “means for securing competent, unbiased information concerning the physical, biological, economic, social, and political effects” of technology.

Inclusion of the Access to Congressionally Mandated Reports Act, which would would gather together all reports to Congress from federal agencies in one place. It requires that they be published online by GPO in bulk, in open formats, and in a timely fashion, so that people can easily learn about the work of the federal government. The legislation would not require any additional appropriation, and would bring much needed transparency and coordination. It has already passed the Committee on Oversight and Government reform, was introduced in the Senate, and is awaiting action by the House.

Avoiding decreasing funding levels for the House of Representatives and certain legislative support agencies below the subcommittee proposal. Funding for the House has already diminished by at least 10% over the last two years. This raises the concern that congressional staff may become more susceptible to influence from lobbyists, and that support entities (like GPO, the Clerk, and the Library of Congress) that have transparency roles will be less able to fulfill their missions.

Publishing the House Expenditure Reports in a data-friendly format such as CSV. The quarterly reports contain all spending by the House of Representatives, and are currently published online as a PDF. Starting in 2009, then Speaker-Pelosi began publishing House Expenditure Reports online, which was a significant step forward in making them available, as they had only been published in giant books. Unfortunately, publishing columns of data in a PDF does not allow for the data to be analyzed. Simply put, we're only halfway to House spending transparency. The Sunlight Foundation goes through significant effort to scrape the data from the PDFs and put them into spreadsheets, but this should really be done by the House. It would increase accuracy and timeliness -- and so long as the House releases the information, it should do so in the most useful way possible.

Two Steps Forward on Improving Public Access to Legislative Information

As I wrote yesterday, each day seems to bring a small step forward on improving public access to legislative information, with two notable developments today.

First, Rep. Honda gave a tantalizing hint of progress on bulk access to legislative data at this morning's subcommittee markup of the Legislative Branch Appropriations bill (sorry no video). He said that "there is exhaustive discussion on bulk data downloads in the [sub]committee report." It's not clear exactly what this means -- the subcommittee report won't be made available to the public until the full committee markup, which is tentatively scheduled in two weeks -- but it's an indication that public attention has joined with bipartisan support from appropriators, overseers, and leadership to make progress on making legislative information available to the American people.

From what I've heard, the pushback is coming largely from the support agencies, although the nature of those concerns are not clear. With the Law Library of Congress taking the lead on THOMAS in recent years, including making some small but useful changes to the site, there is hope that they will grow into their role as facilitators of online transparency. All along, the public interest community has been asking for bulk access to THOMAS data and the creation of an advisory committee on THOMAS.

Second, the objections raised by legislative support agencies are not particularly weighty, at least according to a 2008 memo from the Library of Congress to the Committee on House Administration regarding the availability of THOMAS data. As far as I'm aware, this is the first time it's been made accessible to the public. What's notable is how the Library of Congress was technologically positioned to deliver on legislative data transparency four years ago, but apparently did not move forward. At a minimum, it should alleviate concerns about the difficulty of technological implementation.

According to the memo, the Library expected to finish developing an XML database containing bill metadata such as bill summaries, status of bills, and information on co-sponsors four years ago (in May 2008.) What's revealing about this is that much of the information about legislation has been available in a structured database for nearly half a decade -- and in the kind of format that developers need.

Moreover, the Library reports that "the resources will be available to copy the database daily into an Anonymous File Transfer Protocol [FTP] site so it is accessible to the public" by the time the LIS 2.0 database is completed. This would allow the data to be made available in bulk. (There are better ways to do so, but this is an acceptable solution.)

Also at the time of the memo, March 2008, full text of bills and committee reports were available on GPO Access, but not in XML. From what we can tell, nearly all bills are now available in XML, although it is unclear whether committee reports are prepared in XML. All of this could also be made available in bulk using the technology described in the memo.

The memo raises one major policy implication concerning who owns the data, contemplating that it belongs to the House, Senate, Congressional Research Service, and Government Printing Office. In the literal sense, that's backwards: the information is owned by the American people and held in trust by Congress and its legislative agencies. These entities do serve as repositories of the information, however, and deserve consideration as to the technological means by which it is made available. However, that's with the understanding that these entities should strive to meet the public's need for the information and expansively follow the policies set by Congress in favor of transparency.

We'll continue to keep a close eye on how all this develops.

Library of Congress letter to Committee on House Administration on THOMAS

Improve Public Access to Legislative Information

Today 30 organizations from across the political spectrum joined together to ask Congress to improve public access to legislative information. Our joint letters to congressional appropriators and rulemakers urges Congress to direct that the THOMAS legislative database be published online and to establish an advisory committee on further improvements.

THOMAS, Congress' legislative information website that provides basic information about legislative and congressional actions, has fallen far behind the needs of its users. Many have turned to important websites like GovTrack, OpenCongress, and WashingtonWatch to monitor congressional activities.

These sites and others, which repackage and add important context to legislative activities, extract data from the THOMAS website through a painstaking and often brittle process. To make this process easier and more reliable, the Library of Congress should publish THOMAS information "in bulk," which makes the entire legislative database available for download at once, instead of publishing information in such a way that it can only be gathered by scraping data from hundreds or thousands of webpages.

Bulk access to legislative information is already common practice inside and outside the government. For example,

The transparency community, technology innovators, journalists, good government organizations, and private companies have long sought bulk access to legislative data. In May 2007, a coalition of organizations called on Congress to "embrace structured data by publishing the status of legislation and other information to the web ... in structured data formats". In 2009, Congress articulated support for bulk access to legislative data in an explanatory statement accompanying an appropriations bill. And in November 2011 one of the action items emerging from the House's Congressional Facebook Hackathon was an endorsement of releasing "structured machine-readable legislative data ... in a bulk format."

This past year the Sunlight Foundation, GovTrack, and Open Congress submitted testimony to House Appropriators calling for bulk access to legislative information. We applaud the major strides made by the House of Representatives in improving public access to the House's legislative information, but what's missing is the kind of information only available through the THOMAS website. This includes bill summaries, bill status information, bill co-sponsors, and other information that provides important context for legislation.

We estimate that for every person that goes directly to the THOMAS website, at least two people visit a third-party website. But even these sites must rely on legislative information generated and maintained by Congress, which is only available through the difficult-to-use THOMAS website. There will always be a need for a congressionally-mandated website, but Congress should ensure that the innovative and transformative uses of legislative information by third parties is grounded upon accurate and timely data. And that means providing bulk access to everyone.

Organizations encourage rulemakers to publish THOMAS legislative information in bulk

Organizations urge appropriators to publish THOMAS data in bulk

Tell Congress to open up

Making sure that people can get information about what our government is doing is the heart of what we do at Sunlight. And right now, there’s a chance to make some big changes.

A committee in Congress is working on an appropriations bill that could make it easier to find out what Congress is doing by changing how information is released by the Library of Congress through a website through THOMAS. They’re writing the bill as we speak (er, type...), so this is a perfect moment to speak up for greater transparency.

Why we need quality information from the Library of Congress

Currently, the only way we can get to know about legislation and see any action taken on a bill is through a website operated by the Library of Congress -- known as THOMAS. Because THOMAS is not easy to use for ordinary folks, a few tech groups including Open Congress, GovTrack and PopVox have built tools to make the process of reading government legislation online much easier. However, extracting information from THOMAS is no walk in the park. This is because the information has to be collected from thousands of pages and can be glitchy and delayed.

We need Congress to change that. They can do this by requiring the Library of Congress to put online legislative data in THOMAS using a “geek” favorite process known as “bulk access.” This process makes accessing online information simpler, faster and easier. And really, all the cool kids in government are doing it these days. Literally hundreds of thousands of data sets are available on Data.gov, the House of Representatives has a spiffy new transparency portal and even the good ‘ol Government Printing Office has gotten into the act. Bulk access means that the public gets reliable information right when they need it -- immediately. And legislative information, what Congress is doing and actions on bills, pretty obviously falls into the category of information the public needs to be especially accurate and available immediately.

Topics like the release of quality government information online isn’t something members of Congress are used to hearing about from their constituents, but that’s why it’s so important that we take action. Every call that we make will be that much more impactful, and knowing that constituents are paying attention will go a long way toward making sure that Congress does the right thing by increasing transparency.

Four people in Congress that have the ability to make this change right now. They are the chairmen and ranking members of this committee: Rep. Ander Crenshaw (FL-4), Rep. Mike Honda (CA-15), Sen. Ben Nelson (NE) and Sen. John Hoeven (ND). And they need to hear from us.

If any of these four are your representatives, call them! Call 1-888-793-9786 and enter your zipcode to be connected to their offices.

If you’re not, that’s okay -- this is an issue that affects of all of us, and we need to spread the word. So go ahead and contact them, but do it in a way that lets other people in their districts hear about it -- online.

Click on any of the below links to tweet at them or post on their facebook pages that Americans deserve to know what our government is doing. Make sure to tell them this bill -- the legislative branch appropriations bill -- needs to do two things:

  • Require the Library of Congress to implement bulk access to THOMAS
  • Create an advisory committee of people both inside and outside government to make sure we have the best public access to legislative information possible

Write on Sen. Ben Nelson's Facebook wall: http://www.facebook.com/senatorbennelson Or tweet at him:

Write on Sen. John Hoeven's Facebook wall: http://www.facebook.com/SenatorJohnHoeven

(Sen. Hoeven doesn't appear to have a twitter account)

Write on Rep. Ander Crenshaw's Facebook page: http://www.facebook.com/pages/Congressman-Ander-Crenshaw/

Or tweet at him:

Write on Rep. Mike Honda's Facebook page: http://www.facebook.com/RepMikeHonda

Or tweet at him:

Partners in Data Transparency: Parliaments and Non-Profits

This week I participated in an international meeting on "Achieving Greater Transparency in Legislatures through the Use of Open Document Standards." It was co-hosted by the United Nations, the U.S. House of Representatives, and the Inter-Parliamentary Union, and included representatives from 16 parliaments, non-governmental representatives, multi-lateral organizations, and academia. It is impossible to recapitulate all the conversations that took place, but  presentations are (or will be) available online here and video will be available online as well.

I was struck by the candor of the participants, the breadth of the undertakings by the various parliaments, and the apparently sincere desire of many parliaments to learn from each other and from the non-governmental community. For my part, I made a presentation on the state of legislative transparency in the American context, with a focus on principles to evaluate whether electronically-stored government data is being properly made available for public use, followed by an examination of first steps that parliaments can take to increase public access to legislative information. The full text of my remarks are available below.

Access to Parliamentary Information and Open Data Standards

Put THOMAS on the Fast Track

Earlier this week, appropriators held a hearing on funding for the legislative agencies that make government information available to the public.  Three members of the open government community, the Sunlight Foundation, the Participatory Politics Foundation, and Josh Tauberer, filed comments on the importance of making legislative information directly available to the public as a downloadable database, instead of item-by-item, which is the current practice.

The Sunlight Foundation testified on this topic (a.k.a. "bulk access")  last year, and has sketched out some interesting new tools that it could empower. But of course, one major use would be to strengthen the already fantastic services available at OpenCongress and GovTrack, while supporting additional innovations.

Progress on bulk access has been slow. Several years ago, the Congress required the Library of Congress and others to examine the issue, but these agencies have dragged their heels and -- as far as we know -- have failed to finish that analysis. Sunlight's comments are available below.

Sunlight Foundation Bulk Access to THOMAS Testimony Leg Approps 2012-02-06

Video Blackout of Hearing on Budgets for Legislative Support Agencies

This Tuesday, there will be hearing on budgets for the Library of Congress, the Government Printing Office, the Government Accountability Office, and the Congressional Budget Office. It's too bad that the public won't have a real opportunity to learn about these important agencies, as the meeting is not expected to be webcast by the committee, and (if I remember correctly) the hearing room is so tiny that few if any members of the public will be able to attend.

That's too bad, especially because this is the first opportunity to hear firsthand how last year's budget cuts have affected agencies' abilities to do their jobs, and learn about agency and congressional priorities for the upcoming year. It's also the first time we'll hear from the new acting Public Printer  (the head of GPO); and perhaps the newly appointed head of the Congressional Research Service will be presented and introduced by the Librarian of Congress.

Only the House and Senate Legislative Appropriations Committees regularly hold annual public hearings on the workings of these agencies; the oversight committees (Committee on House Administration and Senate Rules) generally do not, and the Joint Committee on the Library and Joint Committee on Printing no longer holds substantive meetings in public.

The new House rules require that all committees provide "audio and video coverage of each hearing or meeting" that "allows the public to easily listen ... and view the proceedings" "to the maximum extent practicable." All of the House committees have at least one hearing room that is equipped with a camera, and the House Recording Studio will provide a camera upon a committee's request. Unfortunately, this hearing is being held in a room without a camera, and I've been informed that the Committee has not requested one. The Appropriations Committee has not scheduled any other hearings for Tuesday, so the room with the pre-positioned camera should be available.

We ran into this problem last year, when the Committee's justification for holding the meeting in the same  tiny, camera-less room (HT-2) was that it was more convenient to hold the hearing in the Capitol than in one of the legislative buildings. Even if convenience were more important than  the public access rule, the House Recording Studio could still provide a camera, and there are rooms in the newly constructed $600+ million Capitol Visitor Center (i.e. in the Capitol) that already have cameras installed. We would send a video crew ourselves, but only organizations accredited by the House Radio-Television Correspondents' Gallery can ask permission from the Committee to record the event, and the Sunlight Foundation doesn't qualify for membership.

Another change from last year is that members of the public are not invited to speak at the hearing, although they may submit written comments. Along with several others, I took the opportunity to speak last year, where I called for bulk access to THOMAS data and public access to CRS reports. I will submit comments for the record, but written comments are much less effective than speaking directly to the Members of Congress. It's too bad, especially because one of the major lessons of last Thursday's House Legislative Data and Transparency Conference  is that the Library of Congress and GPO have apparently been ignoring their legal obligation to make progress on public access to bulk data. Ironically, it was this very Committee that imposed the obligation upon them in the first place, 3 years ago.

As with everything in Congress, things could still change for Tuesday's hearing -- its time, date, location, and whether it will webcast or covered by the media. I plan on attending, and if I can make it into the room, I'll post an update.

Benchmarks for Measuring Success for Legislative Data Transparency

The following are my notes for remarks I delivered at the House Legislative Data and Transparency Conference on February 2, 2012. They've been updated to include hyperlinks, but were delivered largely as written. The official page for the conference, with video, is here.

Thank you to Matt Lira and Steve Dwyer for the introduction, and to the House of Representatives for holding such an important and timely conference. This kind of event has been a long time in coming.

I must acknowledge the excellent panels that have been happening all day. And I would be remiss if I didn't commend the Committee on House Administration for adopting "standards for the electronic posting of house and committee documents and data," which are already transforming the House in a very positive way.

Because I'm limited to 10 minutes, let me briefly commend three documents to all of you which lay out a transparency vision in greater breath and detail than is possible here. They are the Open House Project Report, the Ten Principles for Opening Up Government Data, and the report from the Congressional Facebook Hackathon.

I've been asked to speak about benchmarks for measuring success in making legislative data available online. I feel like a kid in a candy store, but I will try to restrain myself.  When I speak about the House, please construe my remarks as applying to the Senate and the legislative support agencies as well.

 

What is Transparency For?

In determining benchmarks, it's incumbent on us to assess, at least briefly: what good is online transparency anyway? Here's how I see transparency adding value to our political process. It provides relevant information to decisionmakers at the time they need it. It levels the playing field between the special interests and everyone else so we all have an equal opportunity to find out what's going on. It lets the American people and their elected representatives have a solid basis for a conversation about priorities. It helps congress work more efficiently, by eliminating redundancies and identifying bottlenecks. It allows the agencies to better understand what they're supposed to do. It helps businesses make money by improving their ability to predict government actions. And most importantly, transparency is the cornerstone of a democracy.

This is all pretty ethereal, so I'll get to the point. To the maximum extent possible, legislative information must be available online, in real time, and in machine readable formats. With the exception of internal deliberations protected by the speech or debate clause, or national security and some personnel matters, the Congress's business is the people's business. So let me break down this formulation of online, in real time, and in machine readable formats into concrete benchmarks.

 

Online Publication

Publishing information online is a major hurdle in of itself. A lot of information isn't online, but instead is only available if you know the right person, or go to the right room and ask for a hardcopy, and so on. Should you have to know someone on staff to get a copy of the chairman's mark on a bill before it's voted on? Do we really want to make people trudge down to the House's legislative resource center to print out documents at 10 cents a page? It certainly cannot make any sense to have to request a CRS report through your representative or pay 20 bucks online to buy a copy.

Almost as bad as the failure to publish online is secrecy through obscurity. If information is locked inside an image file and not susceptible to a search engine, or is in an entirely random location, or is hidden on page 400 of the congressional record, it's not really helpful to anyone.

In addition, old information can be just as important as newly created information. For example, there's a huge gap in the availability of committee reports. Along the same lines, while ignorance of the law is no defense for a crime, the actual enactment of the law, known as the Statutes are Large, is not available online for a nearly 80-year period.

Let me offer some concrete benchmarks by which we can judge improvements on this.

  1. The House of Representatives should conduct an audit of all the different types of information it produces and releases, including whether it's online, and where it can be found.

  2. To the extent the House (or legislative support agencies) has information that is already in electronic format -- from the documents in the Clerk's office to CRS reports to hearing transcripts -- that information should be put online in whatever format its currently in. It's also worth considering whether legislative data should include sometimes released items like Dear Colleagues and Whip notices. We can worry later about improving how this information is made available, but just to start, put them online.

 

Real-time publication

Moving on, let's now talk about real-time publication. This is the kind of idea that makes a lot of people uncomfortable, but I'd suggest a common-sense starting point: think about the time frame and context in which a document is used. An amendment that's going to be voted on in 2 hours needs to be online just as soon as it's drafted. A bill that's going to be voted on in 2 legislative days needs to go up pretty quickly as well. You should know about a committee hearing a week in advance. Other items, like the House disbursement reports, can take a little longer.

Don't get me wrong. The goal should be real-time publication for everything. But the evaluation of what that means in the short term can be context dependent. But that context changes if the document is originally created in digital format -- in that circumstances, there shouldn't be any wait.

Here are some benchmarks:

  1. All committee reports, amendments, and bills should be available online as they are introduced. The House should monitor the lag time between introduction and when they appear on THOMAS or the committee websites. I've done this, and it can be a while before some bills show up. Evaluate the extent of the problem, and work to reduce it.

  2. All hearing notices should be available online 7 days prior to the hearing.

  3. Many committees are skirting House rules about publishing video of hearings. House appropriators are particularly guilty of this. The House should review whether meetings are being held in rooms where video capability exists natively or could be added through use of the House's video service, and pester the committees if they're opting out of recording. When only one meeting in a particular committee is going on at a time, it should be streamed online so long as it is open to the public. It's time to review behavior and start slapping some wrists. Perhaps the House should create a mechanism for the public to report on non-webcast hearings.

 

Machine Readability

So let's move on to discuss machine-readable formats. This is what really allows the idea of House of Representatives as a platform for democracy to succeed.

The biggest wish of many staffers is to be able to dynamically see how an amendment would modify a bill,  how that bill would change the law, (and eventually how an agency would promulgate a regulation, how the courts interpret that regulation, and back to congress again.) Along the same lines, people looking at a bill want to know if there are other, similar bills, in this congress or in previous ones, whether there are committee reports, CRS and GAO evaluations, and so on. If you cannot find a way to tie this information together, this dream becomes impossible.

Legislative data needs to be released as highly structured data. In other words, a machine needs to be able to look at the content and "know" what it is looking at. This would require the use of languages like XML, which allows this kind of value-added context. But to make it work, we also need a way to uniquely describe people and bills and amendments and so on -- cleverly enough embodied in commonly-accepted unique identifiers. There are already tons of these identifiers being used, but the House needs to consistently and widely employ them.

Sometimes, structured language is used when creating a document, or unique identifiers are used to describe data items in a document, but that document is stripped naked before it is released to the public. There are some circumstances where this makes sense, like hiding the different internal drafts of a bill. But most of the time, it serves no real purpose. The data that's removed could be very helpful to those on the outside. Leave it in.

Let me add that PDFs, especially PDFs that are image files, do not promote transparency. They make it difficult to impossible to extract data from documents. If you must use a PDF, make sure that the underlying data is available some other way as well.

That brings me to a point about how the data is made available. A lot of transparency advocates build scrapers to try to transform data that's published online and put it back into a useful structure. Josh Tauburer, for example, scrapes THOMAS to turn it into a database. It's like trying to unscramble an egg.

Legislative data, such as that in THOMAS, should be made available online in bulk. Give folks the database all at once or in very large chunks, and let them figure out how to use it. (See our wiki page for more resources regarding how to improve THOMAS.)

Here are my benchmarks:

  1. All bills, amendments, and votes should be published online in XML, or some other structured format. Make scrapers unnecessary.

  2. End the tyranny of only publishing in PDFs. House expenditure reports are a giant database -- publish them as a spreadsheet file, not a PDF. The Constitution Annotated is prepared in XML, don't publish it as a PDF.

  3. Encourage the use of unique identifiers, whether they come from inside the House or elsewhere. The data needs to be interoperable.

 

Concluding Remarks

My time is running short, so I will only make two more comments about process.

First, today's conference, and the standards released by the House in December, are a good thing.

As a benchmark, we need to have another conference like this one within the next year as a way of assessing how well we have done, and we should continue with these conferences on a regular basis.

Second, we need to foster collaboration between those inside and outside government. In particular, technologists who are trying to use legislative data need to be able to get technology questions answered by the responsible internal stakeholder. And policy works can help provide direction so that the new services developed by the House meet the needs of the public. I suggest:

  1. The creation of a standing committee, composed of internal and external stakeholders, that meets at least quarterly, if not monthly, to discuss these issues.

  2. A listserv where people who are not in DC can engage in this discussion with people inside and outside of government.

I appreciate your time and the opportunity to speak. Thank you very much.

In #HackWeTrust - The House of Representatives Opens Its Doors to Transparency Through Technology

Yesterday, members of the House of Representatives hosted a ground-breaking public discussion on how to give the public better access to congressional information. Around 300 developers, policy wonks, hill staffers, and others crowded into the Capitol Visitor Center to discuss how to use technology to make the legislative branch more open, transparent, and accessible. The event was sponsored by Majority Leader Eric Cantor and Minority Whip Steny Hoyer.

Matt Lira, the Director of New Media for Rep. Cantor, opened the conference by hailing it as "our television moment," hearkening back to when House proceedings were first televised so they could be watched by the American people. Steve Dwyer, Rep. Hoyer's Director of Online Community and Technology, expressed his hopes that the day's conversation posed "a new model for collaboration between Congressional staff, advocacy groups, and private companies, where we can come together and meet face-to-face over common goals." We could not agree more. Open government is the common ground shared by leaders in both political parties, and we applaud them for their herculean effort to bring people together to work on these issues.

A lot of important information about the ongoing work of the House was publicly revealed at the conference during the first hour, but equally as important, the remaining three hours had attendees break into smaller groups to tackle persistent problems, resulting in incredibly important conversations between staff, technologists, and advocates that rarely occur, and never before on this scale. Intrepid reporter Alex Howard has already published video and photographs from the presentations, and Rep. Cantor posted a short video.

One of the most edifying presentations was made by Reynold Schweickart, the technology guru for the Committee on House Administration, regarding ongoing House efforts to open itself up. Here are the highlights:

  • Next week the Committee on House Administration will likely hold a hearing to consider and adopt legislative data standards.

  • Along a similar line, the committee is working on improving/implementing legislative drafting in XML, including how to make the data more accessible internally and to outside users. (We can only hope that this includes discussion of bulk access to this information.)

  • There are plans to  start publishing floor and committee documents in a machine readable format at permanent URLs. In addition, there will soon be naming conventions for documents that the House rules require to be made publicly available, with the goal of having permanent URLs by 2013.

  • GPO, which has begun publishing historic statutes at large online, will start publishing the historic slip laws as individual files, so that you can easily see (and link to) legislation as it was enacted by Congress. (I have a lot more to say about this here.)

  • A meeting was held with representatives from all the offices that are involved in creating and disseminating legislative data. If a true collaboration arises, what this could mean is the creation and use of data standards to describe legislation (and its constituent parts) from when it is drafted, through the amendment process, at passage, and upon codification. This would be revolutionary.

  • There are ongoing improvements on how video from committee hearings is recorded and made available to the public, with an emphasis on standardizing and making available meta data. (While not a lot of detail was offered, Carl Malamud, who has long advocated for broadcast quality video from the floor and committee hearings, probably has a lot to add on this issue.)

  • There's also ongoing efforts with respect to how constituent communications are received by members of congress, and efforts to make it easier to hire capable vendors.

  • Finally, there was a stated willingness to consider to what extent the House Rules need to be amended to allow technological modernization that will make the chamber more transparent.

Later on, Darrell Issa, who chairs the Committee on Oversight and Government Reform, announced the launch of "Madison" -- a tool whereby the public can comment on legislation as it is being drafted. Here's a rather grainy photo. Rep. Issa explained the concept: "When a member introduces a bill, it should be interoperably commented on, and [those comments] should be part of the markup consideration. Under the Madison initiative, [interest] group's input will be noted and appreciated, and exposed to the world in real time." While similar in concept to PublicMarkup and Open Congress, the difference is that it would be managed and monitored by the office responsible for reviewing the legislation, giving the opportunity to track ideas (and influence) as it occurs. Indeed, after the conference ended, Rep. Issa's staff hosted a hackathon to help improve the tool so it can be unveiled for public use. Stay tuned.

I haven't even begun to speak about the break-out sessions, which I will briefly summarize. Participants broke into four working groups that focused on the following topics: legislative correspondence, legislative workflow and data, public relations and press relations, and casework and constituent services. We reconvened at the end of the conference to discuss our recommendations for improvements. It's too lengthy to go into here. But, on that topic, I would be remiss to not point to an earlier collaborative effort, the Open House Project, which in 2007 raised many of the same issues and outlined a series of recommendations. (And I can't resist plugging this list of ideas for improving THOMAS).

The outstanding question in my mind is: where do we go from here? Much of the conversation can continue on these open policy and technology listservs, at the hashtag #HackWeTrust, and on pages being set up by Facebook* (who sent many developers to participate in the conference). Even so, it would be great to harness this enthusiasm to hold additional events that bring together experts, staff, technologists, and advocates to address the important but complex questions of how to make the legislative branch open, transparent, and technology-friendly. Similarly, it may make sense to institutionalize this discussion as well, perhaps through working group(s), listservs, or other means.

  • Updated to include the Facebook page. Also, check out this colloquy between Reps. Cantor and Hoyer that took place today and discussed yesterday's hackathon.
« Previous
1 2 3