Sunlight Foundation

House To Be More Open: OKs Online Publication Standard

This morning, the House of Representatives took a tremendous step into the 21st century when the Committee on House Administration unanimously adopted "Standards for the Electronic Posting of House and Committee Documents & Data."

Taking effect on January 1, 2012, the resolution instructs the Clerk of the House to maintain a single website where the public can access all House bills, amendments, and resolutions for floor consideration in XML. In addition, committees will be encouraged to post their documents on that site in XML whenever possible -- and searchable PDFs when not -- with the expectation that mandatory publication requirements in XML will soon be imposed. The House will also store video of hearings and markups, and work to implement standards "that require documents to be electronically published in open data formats that are machine readable," thereby enabling transparency and public review.

In a statement, Committee on House Administration Chairman Dan Lungren said “With the adoption of these standards, for the first time, all House bills, resolutions and legislative documents will be available in XML in one centralized location. Providing easy access to legislative information increases constituent feedback and ultimately improves the legislative process. ”

Three cheers to Chairman Dan Lungren, Ranking Member Bob Brady, members of the committee, and its staff for moving this important issue forward. As was discussed at the recent #hackthehouse conference, as well as in our longstanding Open House Project Report (pdf), there's a lot more to do, but this is a major stride towards implementing Speaker Boehner and Majority Leader Cantor's pledge to " publicly releasing the House’s legislative data in machine-readable formats." The Senate could do well by following this example, as could legislative support agencies like the Library of Congress and GPO.

Standards for the Electronic Posting of House and Committee Documents & Data

O Conan! Where art thou? Legal treatise a no-show

Seven months ago, the order was given for the legal treatise, known as the Constitution Annotated (or CONAN), to be published online, but so far without result. CONAN is a government publication that explains the Constitution as interpreted by the Supreme Court. The Joint Committee on Printing directed the Congressional Research Service and the Government Printing Office provide "enhanced access" to that document, which means that CONAN should be published online as it is updated, albeit as a searchable PDF and not the structured data format that we (and many others) requested.

A frequently updated version of the Constitution Annotated is available to congressional staff on Congress' internal website -- and in the structured data format that we want. All that's available to the public, however, is a decade-old copy, and a handful of scatter-shot updates. What's strangely funny is that only a few minutes work would be required to publish the Congress-only version of CONAN online, but transforming CONAN into the much-less useful PDF version has taken seven months ... and counting. Perhaps some lessons could be learned from last week's Committee on House Administration hearing on modernizing information delivery in the House.

Tomorrow, the Joint Committee on Printing and the Joint Committee on the Library will hold a very rare public meeting. It's for organizational purposes -- 6 months after Congress convened -- so don't get too excited. Movement is measured slowly, especially since the JCP's website hasn't been updated in several years. But if you're so inclined, the hearing is set for 11:30am in SC6, which is on the Senate side of the US. Capitol. We'll see you there.

Moving Congress Online: Modernizing Information Delivery in the House

by Eric Mill, Sunlight Foundation Developer, and Jacob Hutt, Policy Intern

What would Congress look like with bill markups conducted on iPads, real-time versioning of statutes, and without bulky, printed Federal Registers? The Committee on House Administration Subcommittee on Oversight held a hearing on “Modernizing Information Delivery in the House” on Thursday, with Members of Congress and the public vying to answer that question. In 2007, the Sunlight Foundation issued "The Open House Project Report," which addressed issues surrounding how to make the House more open and transparent, and has continued to work on these issues to this day.

The first panel featured Rep. Greg Walden (R-OR) and Rep. Mike Honda (D-CA), who took turns addressing the cost-saving and increased transparency that would arise from a stronger emphasis on digital dissemination of House legislative documents; they also addressed concerns that may arise from an electronic-centric focus.

Rep. Walden, who led the Republican transition effort in 2009, explained how some congressional publications, such as the Federal Register and House Calendar, are more useful and up-to-date in electronic format. Some products, such as staff directories, really make sense only as electronic documents. He added that shifting to a electronic form of distribution would save taxpayers millions of dollars every year in printing costs.

While agreeing with Rep. Walden, Rep. Honda added that it may not make sense for Congress to go entirely paperless, and that it still may be more cost efficient for certain documents to be printed by GPO. He also raised concerns about how documents would be archived, and how they would be made available to those members of the public without access to computers.

During the course of the conversation, Chairman Gingrey and Rep. Nugent raised the issue of ensuring document authenticity. Rep. Lofgren added that certain populations in the US do not have internet access, and may rely on print copies. And Rep. Honda further explained that GPO reports that 70% of the cost of document production come from its layout, with the remaining 30% arising from actual printing costs.

In the second panel, witnesses detailed technological updates that Congress could employ to cut back on expenses while making congressional processes more efficient.

Thomas Bruce, Research Associate and Director at Legal Information Institute at Cornell Law School, advocated for converting legislative data into “interoperable, machine-readable formats,” preferably XML. He explained the many benefits of an electronic Congress, citing cost reduction and more easily accessible data for private developers. In his written testimony, Mr. Bruce suggested that print-on-demand facilities would provide for access to digital information in hard copy format would resolve Congress and the public's lingering need for paper copies. He also called for targeted Internet accessibility programs to close the gap for those members of the public who do not have online access.

The House, said Mr. Bruce, should focus on providing legislative data in bulk and in a timely fashion, with extensive metadata, so that services like the LII's U.S. Code could be made even more accurate and up to date. He also argued that providing this level of data access creates an economy of data with a great deal of business value, drawing a comparison to the government's publication of weather and climate data.

Kent Cunningham, Chief Technology Advisor for the US Public Sector at Microsoft Corporation, and Morgan Reed, Executive Director of the Association for Competitive Technology, also spoke.

Mr. Reed presented what a live markup of a bill could look like if conducted on an iPad or laptop instead of printed on hundreds of sheets of paper. He also argued that switching to a digital platform would not just be sufficient but “transformative” for how Congress does business.

When Rep. Lofgren questioned the witnesses, she said that the House's technology priorities should be "open source," "interoperability," and "security". This drew some cautious responses from Mr. Cunningham and Mr. Reed. Mr. Cunningham replied that it is possible to do "open source but closed platform," and Mr. Reed emphasized that the House should be "goals-based, not terms-based" with regards to considering open source in its technology selection. These jargon-laden responses mostly reflected the needs of vendors interested in providing contracting services to the House. The dismissal of "open source" as a buzzword likely reflected concerns about competition.

In a fairly uncommon but welcome addition, Reps. David Dreier (R-CA) and Doc Hastings (R-WA) submitted statements for the record. Rep. Hastings' testimony provided innovative examples of how the Natural Resources Committee has reduced its printing and saved money. And Rep. Dreier provided a fascinating insight into the unique needs of the House Rules Committee, and how it has developed electronic tools to speed efficiency and meet time-sensitive demands.

Cruching Numbers on the President's Economic Report

James Jacobs (of Free Government Info) writes that the Economic Report of the President, which provides an overview of the nation's economy, is available online.

The Economic Report of the President is available from the White House web site in 3 formats: PDF, Kindle, and the open ePub format which Barnes & Noble's Nook, Sony's Reader, and other ebook-reader-software can use. The epub format, being an open, non-proprietary standard is, potentially, much easier to preserve for the long-term than proprietary formats like Kindle and PDF.
On its webpage, the White House says that the Report will be available in HTML format, too.

30 percent of the report -- 137 out of 462 pages -- contains statistical tables (see appendix B [PDF]), but the report's format is not machine readable. In other words, you can't grab the data and play with it in a spreadsheet. Fortunately, those statistics are available elsewhere on the Internet from the Government Printing Office. (The White House does link to the GPO's page.)

Sunlight Lab's Clay Johnson explains in his philippic against Adobe why this is important:

When a government agency publishes its data and documents as PDFs, it makes us Open Government advocates and developers cringe, tear our hair out, and swear a little (just a little)....

Here at Sunlight we want the government to STOP publishing bills and data in PDFs and Flash and start publish them in open, machine readable formats like XML and XSLT.

Minneapolis on road to transparency

Tired of waiting for your city to become more transparent? Tony Webster, John Schrom and Ryan Johnson decided to take responsibility for their city of Minneapolis and create a software platform in order to "open municipal government, encourage clean and information-based elections, track issues and inspire community engagement and public participation." Their new project is called Open Minneapolis.

Webster, having formerly interned for a city council member in college, had seen first hand how much information never made it to the public or was difficult to access.

Talking with Webster on the phone he explained the rationale behind the project. "There is so much that happens behind the scenes that we don't know about. There has been a lot of great projects at the federal level, and in some cases state level, but not usually in Minnesota. I really wanted that transparency to come to Minneapolis."

The project has been active since July 2009 and this week they launched a site listing their goals and showing some fantastic preview images. As a journalist, having worked with numerous city and state websites across the country, I would eat my hat in exchange for this type of data accessibility and clear user interface.

While all the goals of Open Minneapolis are important to me as a journalist and a citizen one is particularly catching my eye: the implementation of a standardized XML format for public meetings.

OpenMinneapolis XMLMunicipal meetings are difficult enough to sit through in person. Waiting to mine a PDF release for the data you need is even worse. With the implementation of an XML format for public meetings analysis of this data will become a breeze.

Imagine visualizations of city council actions going back years or interactive flow charts showing how a particular proposal was fought over. Ideas that are possible now but only with a disproportionate amount of work - an XML standard would make it straightforward and quick.

The team behind Open Minneapolis has formed a charitable non-profit, CivicEquity, that would oversee such a standard as it expanded beyond Minneapolis. CivicEquity would also distribute the software Open Minneapolis is based on: the team will release it as open-source for any non-commercial use!

As the Sunlight Foundation expands its activities into states and cities it is projects like these that will truly make our mission successful.

If you're interested in policy work, web development, have legal expertise or data acquisition experience head on over to their site and get involved today. The team has a grant application to the 2010 Knight News Challenge - their grant proposal is here - please rate it!

Federal Register XML Release

According to the Washington Post and BoingBoing, the Government Printing Office will today release, for the first time, the XML version of the Federal Register -- available to the public, online, featuring the Federal Register back to the year 2000.

This is a very important move.

The Federal Register is the primary traditional vehicle for public access to government information.  Government activity like rulemaking, public meeting notices, and Presidential actions, among many other things, all are required to be published in the Federal Register.  Its comprehensiveness, however, has led to its notorious inapproachability.  The substantive minutae of government action has hidden in plain sight for too long, on pages like this one, out of reach for the layperson.

Sunlight has poked, prodded, questioned, advised, and even funded approaches to fixing this problem.  We have been particularly excited about Carl Malamud's work on the Federal Register, as he approaches solutions to transforming the structured Federal Register data.

Sunlight has also seen GovPulse among the winners in our second Apps for America II contest, and devoted significant effort to our own LOUISdb.org that scrapes and parses, among other things, the Federal Register.

That scraping and parsing, which has too long been the stuff of third party government data wranglers, is all too familiar a limitation for citizen develpers.  The story goes like this.  The government prepares data and documents (sometimes) in a useful structured format, like XML.  They then publish that information without the valuable structure, in a format like PDF or plain text.  Programmers then copy the text from a web page each day, and try to restructure the useful information to make government data more useful.

This is how NGOs have built access to the Federal Register (until today).  This is how GovTrack.us has given new life to THOMAS legislative data, and allowed other sites like OpenCongress.org and WashingtonWatch.com to innovate with the same data.  We've managed to get Congress to organize a bulk data task force to address this issue, but have yet to see bulk access to legislative information.

That's one reason this move from GPO is such a big deal.

If one piece of government information should be put up in XML first, it's probably the Federal Register.  First, because it's supposed to be the public face for a wide array of government information.  Perhaps more importantly, however, it should make it much easier to secure access to other structured data sources we've pursued, like bill data, or like the Constitution Annotated.  And while we've had success before, like getting the Senate to post its votes in XML, the Federal Register represents a much weightier challenge.

Now that the XML will be available, we can expect to see a renaissance of public reuse of Federal Register data.  Sites that let you follow government activity by geographical or issue area will now feature more reliable, more timely data, since all that scraping and parsing will now be unnecessary.  More advanced analysis will also be possible as well, allowing for trends and patterns to more readily emerge from this vital collection of national information.

Arranging for big bulky government institutions to hand over access to structured data is never simple.  They're institutionally resistent to change, although we've discovered that some of our biggest allies are people within government looking to make things work better.  In this case, the GPO, itself a legislative support agency, had to work with the Office of the Federal Register and the National Archives to prepare public access to the structured data.  GPO especially deserves our praise, for overcoming a morass of jurisdictional, legal, and technical challenges, and granting the public advanced access to the Federal Register.

Today's move bodes well for our collective ability to engage with our government, and sets a strong example as we look for our government to recognize its role in supporting the public sphere online.

I'm looking forward to writing similar posts about the Code of Federal Regulations, THOMAS bill data, and the Constitution Annotated, among many others.

Update: Here's the White House announcement.

When it was created 73 years ago, the Register was a tremendous advance in making government more open and accountable to the American people. But this "newspaper" is heavy reading. The text is dense and detailed and organized chronologically in a Department-by-Department and Agency-by-Agency format, making it more accessible in practice to avid government-watchers and experienced interest groups than the general public.

... You can find the Federal Register in XML each day at www.gpo.gov or on data.gov. We encourage enterprising readers to take advantage of this new format and turn their creativity to the task of making the Register even more readable, accessible, and user-friendly. We'll be looking for the best ideas to incorporate in how we publish this newspaper of our democracy.

CRS On Making the Constitution Annotated Available in XML

Last week, the Sunlight Foundation urged the Government Printing Office to publish the legal treatise Constitution Annotated (a.k.a. CONAN) online in XMLCONAN explains the U.S. Constitution section by section, describing in its usual (and legally required) non-partisan fashion how the U.S. Supreme Court has interpreted the Constitution's provisions. CONAN contains analysis of nearly 8,000 Supreme Court cases.

We contacted the Librarian of Congress, who has statutory responsibility for preparing CONAN, for his opinion on making the treatise available online in XML. (Although it is prepared in XML, GPO publishes CONAN online in plain text and PDF format, sans meta-data. As a result, the structured data is unavailable to those who may want to republish, remix, or otherwise engage with the treatise.)

The Congressional Research Service*, which is part of the Library of Congress and whose staff actually write CONAN, made themselves available to answer our questions, summarized below:

(1) Would CRS agree to making the Constitution Annotated available online in XML every two years, when the document is printed?

(2) Would CRS agree making the Constitution Annotated available online in XML as that document is updated and released on Congress's intranet? (This would be more frequent than the every-other-year publication schedule.)

Here is CRS's response:
The Congressional Research Service and the Government Printing Office plan to discuss publication of the Constitution Annotated and possible future enhancements.
It is not entirely clear what this means. What we hope is that this statement indicates movement towards an arrangement whereby CRS frequently provides the XML file to GPO on a regular basis, and GPO makes that file -- untouched -- available for download on its website. Stay tuned.

Thanks to BoingBoing for the coverage.

  • Disclosure: I used to work for CRS.

220+ Years Later, It's Time to Publish the Constitution Annotated Online in XML

constitutionToday, the Sunlight Foundation called upon the Government Printing Office to publish the legal treatise The Constitution Annotated online in XML format as it is updated. The Constitution Annotated has been written by the Library of Congress for nearly 100 years, and contains analysis of nearly 8,000 U.S. Supreme Court cases.

Over the decades, GPO has published print versions of this extraordinary resource every two years, with limited electronic versions available from 1992 edition onward. Although the Library of Congress has drafted the Constitution Annotated in XML for a number of years, that data is no longer present when it is published online by GPO. [Update: To clarify, GPO has never published the XML data. However, CRS currently creates that document in XML format, and has done so for a number of years.] Releasing the treatise in XML would allow for the easy sharing of information between different kinds of computers, applications, and organizations, and provide a roadmap to the underlying data.

In addition to asking for The Constitution Annotated to be published online in XML, we are also asking that as the data is updated and made available to Congressional staff, it also be made available to the general public. For an example of what that could look like, see Cornell University Law School's transformation of the data.

Today is the 222th anniversary of the adoption of the Constitution. In 1787, it was made available to the American people by the most modern technology of the day. We should do no less today, and provide the Constitution (along with commentary) in XML.

Constitution Annotated Letter

The full text of the letter is after the jump.

The Honorable Robert C. Tapella Public Printer of the United States Government Printing Office 732 North Capitol Street, NW Washington, DC 20401-0001

September 17, 2009

Dear Mr. Tapella:

Today is the 222th anniversary of the adoption of the United States Constitution. It is in light of this momentous historical event that I am writing on behalf of the Sunlight Foundation to ask that the GPO begin to immediately publish the legal treatise "The Constitution of the United States, Analysis and Interpretation" (The Constitution Annotated) online in XML.

The Constitution Annotated is the oldest continuously published treatise on the Constitution, containing analysis of nearly 8,000 U.S. Supreme Court cases. Prepared by the Library of Congress for nearly 100 years, it provides a wealth of resources to scholars and laypersons alike.

The Library of Congress now transmits this document to your office in XML format for publication, so GPO needs only to electronically publish that file. Moreover, the GPO should publish the treatise as it is updated, and not every two years, as is current practice.

Publishing The Constitution Annotated online without encoding it in XML is analogous to printing it without a table of contents, index, chapter breaks, or footnotes. As you know, XML is a standard for laying out data in a format that allows other computers to easily parse that data. Releasing this document in XML would allow the easy sharing of information between different kinds of computers, applications, and organizations, and provide a roadmap to the underlying data.

GPO’s publication of The Constitution Annotated in XML will further the agency’s mandate of making available government information to the public in a timely fashion. Here, GPO can provide a substantive and timely view of the Constitution’s enduring role in our democracy, and uphold the President’s pledge to increase accessibility to government information.

If you have any questions regarding this request, please feel free to contact me.

Sincerely,

Ellen S. Miller Executive Director

Updated: to add a "plus" sign

Weekly Media Roundup - May 8, 2009

Today, May 8th, marks the 125th birthday of Harry S Truman, our 33rd president. He once said, "Secrecy and a free, democratic government don't mix." Amen, Mr. President.

Here are a few of the more interesting media mentions of Sunlight and our friends and grantees from this week:

Monday morning, Tom Lee, a technology director at Sunlight, appeared on C-SPAN’s “Washington Journal” taking questions about Recovery.gov, the Web site set up to track spending under the federal government’s economic stimulus program. Tom is working on SubsidyScope, a project of The Pew Charitable Trusts, that looks at the role of federal subsidies in the economy. Below is the video of the segment:

Speaking of Recovery.gov, Matt Kelley with USA Today reported that the Web site won't have details on contracts and grants until October and may not be complete until next spring — halfway through the program. Kelley quotes Greg Elin, Sunlight’s chief evangelist, saying people accustomed to getting easily searchable information quickly could be frustrated. "If we have to wait until October to get the information or to the end of the year to get a powerful recovery.gov site, the Obama administration will have missed an important opportunity."

Katrina Vanden Heuvel, editor of The Nation, in an op-ed titled "Ways to Protect Our Democracy," highlights the work of Sunlight and Sunlight Labs, and mentions the Apps for America contest. Vanden Heuvel quotes Gabriela Schneider, "This is the next generation of civic engagement…We see it as a way to revitalize democracy. The transparency work is a catalyst for the greater democracy reform movement."

The U.S. Senate announced this week that it was going to start publishing roll call votes in XML, an online format that’s easily reusable by other programs. XML allows the data to be manipulated and organized in such a way that public interest groups can get a much more thorough picture of Senate voting patterns. In writing about the move, the Politico’s  Victoria McGrane quoted John Wonderlich, Sunlight's policy director, as saying the Senate’s decision was “spectacular.” The Examiner newspapers editorialized that the move signals the Senate had finally joined the 21st Century. As encouraging and important as this step by the Senate is, I’d hold off on that designation until senators start disclosing campaign finance data online and in a timely manner.

The New York Times’ Stephanie Strom highlighted the campaign to get Congress to release to the public Congressional Research Service reports, highlighting the efforts of Open CRS, Center for Democracy and Technology, OpentheGovernment.org and Sunlight.

Jeanne Cummings at the Politico wrote about “lobbyist contact” disclosures posted on government department and agency Web sites. She made note of a review conducted by Paul Blumenthal, Sunlight’s senior writer, that found only 14 of a possible 29 departments and agencies have created Web pages to disclose lobbyist inquiries. On March 20, President Obama issued a memo to all agencies involved with the distribution of funds from the American Recovery and Reinvestment Act requiring them to disclose all communications between lobbyists and agency officials. John Fritze with USA Today wrote that Obama’s effort to make lobbying more transparent has shed little light on the behind-the-scenes, special-interests lobbying thus far. He quotes Melanie Sloan, director of Citizens for Responsibility and Ethics in Washington, "We're looking to have more disclosure, not less. If this was supposed to give us more disclosure, why is it that you're not seeing lobbyist communications?"

Mother Jones' Jonathan Stein profiled Lisa Rosenberg, Sunlight’s government affairs consultant, terming her "K Street's worst nightmare" and "the lobbyist lobbyists hate." He wrote that Lisa is "not your average influence peddler," but does the "unthinkable" by lobbying for more oversight and regulation of lobbying. Stein quotes Lisa, "I have no friends...My lobbyist colleagues are cringing at the things that I do."

Joshua Zumbrun at Forbes.com wrote about six ways Uncle Sam can help rescue newspapers. One of his proposals is for the government to help ease newspapers into nonprofit status, citing the Center for Responsive Politics and the Center for Public Integrity as examples of nonprofit organizations that are already making an impact.

Thanks, and see you next Friday!

Senate Reverses Policy, Posts Votes in XML

The US Senate has finally reversed its longstanding policy of restricting public access to raw data about how Senators vote, and is now posting XML of votes on Senate.gov.

This move follows a recent initiative, led by Senator DeMint, to request the Senate Rules Committee post the votes data.

While this issue may seem to be arising out of the blue, with recent coverage in the Politico, Senate votes XML have been brooched as a perennial roadblock. It would seem, however, that the number of people affected by the restriction grew to the point where they could no longer be ignored, and common sense prevailed.

Just as the recent rewriting of Web use restrictions has led to creative Internet use among Members of Congress, the new votes data should help fuel a renaissance of vote analysis and visualization. XML encourages advanced processing and analysis, making votes legible to both humans and computers, and giving us a new view on how Senators vote.

Senator DeMint and Senator Durbin deserve praise for quickly acting to address the data issue, as do many staffers and administrators. Senator Lieberman expressed support for vote data access in the fall of 2007, and Jerry Brito wrote about the issue earlier that year as well.

This is what transparency reform looks like. Complicated, messy, confusing, often bipartisan, often initially unsuccessful, and helpfully spurred on through public involvement. If this case serves as any example at all, we should be very encouraged about future efforts.

For today, though, Nice Work, Senate!

« Previous
1 2