According to the Washington Post and BoingBoing, the Government Printing Office will today release, for the first time, the XML version of the Federal Register -- available to the public, online, featuring the Federal Register back to the year 2000.
This is a very important move.
The Federal Register is the primary traditional vehicle for public access to government information. Government activity like rulemaking, public meeting notices, and Presidential actions, among many other things, all are required to be published in the Federal Register. Its comprehensiveness, however, has led to its notorious inapproachability. The substantive minutae of government action has hidden in plain sight for too long, on pages like this one, out of reach for the layperson.
Sunlight has poked, prodded, questioned, advised, and even funded approaches to fixing this problem. We have been particularly excited about Carl Malamud's work on the Federal Register, as he approaches solutions to transforming the structured Federal Register data.
Sunlight has also seen GovPulse among the winners in our second Apps for America II contest, and devoted significant effort to our own LOUISdb.org that scrapes and parses, among other things, the Federal Register.
That scraping and parsing, which has too long been the stuff of third party government data wranglers, is all too familiar a limitation for citizen develpers. The story goes like this. The government prepares data and documents (sometimes) in a useful structured format, like XML. They then publish that information without the valuable structure, in a format like PDF or plain text. Programmers then copy the text from a web page each day, and try to restructure the useful information to make government data more useful.
This is how NGOs have built access to the Federal Register (until today). This is how GovTrack.us has given new life to THOMAS legislative data, and allowed other sites like OpenCongress.org and WashingtonWatch.com to innovate with the same data. We've managed to get Congress to organize a bulk data task force to address this issue, but have yet to see bulk access to legislative information.
That's one reason this move from GPO is such a big deal.
If one piece of government information should be put up in XML first, it's probably the Federal Register. First, because it's supposed to be the public face for a wide array of government information. Perhaps more importantly, however, it should make it much easier to secure access to other structured data sources we've pursued, like bill data, or like the Constitution Annotated. And while we've had success before, like getting the Senate to post its votes in XML, the Federal Register represents a much weightier challenge.
Now that the XML will be available, we can expect to see a renaissance of public reuse of Federal Register data. Sites that let you follow government activity by geographical or issue area will now feature more reliable, more timely data, since all that scraping and parsing will now be unnecessary. More advanced analysis will also be possible as well, allowing for trends and patterns to more readily emerge from this vital collection of national information.
Arranging for big bulky government institutions to hand over access to structured data is never simple. They're institutionally resistent to change, although we've discovered that some of our biggest allies are people within government looking to make things work better. In this case, the GPO, itself a legislative support agency, had to work with the Office of the Federal Register and the National Archives to prepare public access to the structured data. GPO especially deserves our praise, for overcoming a morass of jurisdictional, legal, and technical challenges, and granting the public advanced access to the Federal Register.
Today's move bodes well for our collective ability to engage with our government, and sets a strong example as we look for our government to recognize its role in supporting the public sphere online.
I'm looking forward to writing similar posts about the Code of Federal Regulations, THOMAS bill data, and the Constitution Annotated, among many others.
Update: Here's the White House announcement.
When it was created 73 years ago, the Register was a tremendous advance in making government more open and accountable to the American people. But this "newspaper" is heavy reading. The text is dense and detailed and organized chronologically in a Department-by-Department and Agency-by-Agency format, making it more accessible in practice to avid government-watchers and experienced interest groups than the general public. ... You can find the Federal Register in XML each day at www.gpo.gov or on data.gov. We encourage enterprising readers to take advantage of this new format and turn their creativity to the task of making the Register even more readable, accessible, and user-friendly. We'll be looking for the best ideas to incorporate in how we publish this newspaper of our democracy.