220+ Years Later, It’s Time to Publish the Constitution Annotated Online in XML

by

constitutionToday, the Sunlight Foundation called upon the Government Printing Office to publish the legal treatise The Constitution Annotated online in XML format as it is updated. The Constitution Annotated has been written by the Library of Congress for nearly 100 years, and contains analysis of nearly 8,000 U.S. Supreme Court cases.

Over the decades, GPO has published print versions of this extraordinary resource every two years, with limited electronic versions available from 1992 edition onward. Although the Library of Congress has drafted the Constitution Annotated in XML for a number of years, that data is no longer present when it is published online by GPO. [Update: To clarify, GPO has never published the XML data. However, CRS currently creates that document in XML format, and has done so for a number of years.] Releasing the treatise in XML would allow for the easy sharing of information between different kinds of computers, applications, and organizations, and provide a roadmap to the underlying data.

In addition to asking for The Constitution Annotated to be published online in XML, we are also asking that as the data is updated and made available to Congressional staff, it also be made available to the general public. For an example of what that could look like, see Cornell University Law School’s transformation of the data.

Today is the 222th anniversary of the adoption of the Constitution. In 1787, it was made available to the American people by the most modern technology of the day. We should do no less today, and provide the Constitution (along with commentary) in XML.

Constitution Annotated Letter

The full text of the letter is after the jump.

The Honorable Robert C. Tapella Public Printer of the United States Government Printing Office 732 North Capitol Street, NW Washington, DC 20401-0001

September 17, 2009

Dear Mr. Tapella:

Today is the 222th anniversary of the adoption of the United States Constitution. It is in light of this momentous historical event that I am writing on behalf of the Sunlight Foundation to ask that the GPO begin to immediately publish the legal treatise “The Constitution of the United States, Analysis and Interpretation” (The Constitution Annotated) online in XML.

The Constitution Annotated is the oldest continuously published treatise on the Constitution, containing analysis of nearly 8,000 U.S. Supreme Court cases. Prepared by the Library of Congress for nearly 100 years, it provides a wealth of resources to scholars and laypersons alike.

The Library of Congress now transmits this document to your office in XML format for publication, so GPO needs only to electronically publish that file. Moreover, the GPO should publish the treatise as it is updated, and not every two years, as is current practice.

Publishing The Constitution Annotated online without encoding it in XML is analogous to printing it without a table of contents, index, chapter breaks, or footnotes. As you know, XML is a standard for laying out data in a format that allows other computers to easily parse that data. Releasing this document in XML would allow the easy sharing of information between different kinds of computers, applications, and organizations, and provide a roadmap to the underlying data.

GPO’s publication of The Constitution Annotated in XML will further the agency’s mandate of making available government information to the public in a timely fashion. Here, GPO can provide a substantive and timely view of the Constitution’s enduring role in our democracy, and uphold the President’s pledge to increase accessibility to government information.

If you have any questions regarding this request, please feel free to contact me.

Sincerely,

Ellen S. Miller Executive Director

Updated: to add a “plus” sign

Categorized in:
Share This:
  • This is a great idea, Daniel. But really it *all* should be in XML. My thoughts are here.

  • Chris:

    “No, no, you’re confusing HTML/XHTML with XML. XHTML is only one specific type of XML (although a hugely popular one).”

    Actually, XHTML is an XML dialect, defined by a DTD. It was also put out to pasture about a year and a half ago. XHTML is dead; the future is HTML5, which also is an XML dialect. XHTML never again shall grace us with its mime type confusions.

    XML is the correct choice here. XHTML has been relegated to a historic footnote, and is over.

    “Publishing the document in whatever XML format it’s currently in (assuming it’s valid) XML would be the HUGE win”

    Er, that’s exactly what they said they were going to do. You told them they didn’t mean XML, they meant XHTML, but they should use XML instead.

    No, they meant XML.

    “That task can be handled by XSLT after the fact.”

    CSS is entirely adequate; browsers can display XML directly, and have been able to do so for quite some time. There is no need to involve transformations for a document which does not change except to be statically added to; it should merely be presented in its final form.

    Thomas Bruce:

    “That said, I am not sure what magic W3C would bring to the party, but I’m willing to be educated.”

    Deep standards knowledge, awareness of cross-technology interoperability issues and the sophistication that comes with experience in marking up difficult documents.

  • Why does the guvmint have to do the conversion? Have I.Q.’s sharply dropped in the last 56 years?

  • As Daniel kindly notes, we’ve had an online version of this for some time — and great difficulty in obtaining the updates via GPO. The original data was donated to us by CRS some years ago.

    The underlying format is, indeed, XML.

    Layering semantic-web technology atop this needs to be done with care and an eye toward the fact that in law and with respect to constitutional questions in particular there is almost no such thing as a neutral label, categorization, or identifier — at least in the eye of some beholders. One of the remarkable things about the CRS document is the objectivity and neutrality with which it approaches its subject. That will be difficult to maintain. And it is impossible to overstate the level of minutiae that some people will find significant — take, for example, all of the tax-resister arguments that adhere to things like the capitalization of the word “citizen”.

    That said, I am not sure what magic W3C would bring to the party, but I’m willing to be educated.

  • Applaud! Would like non-snailmail version of his address…

  • Chris

    No, no, you’re confusing HTML/XHTML with XML. XHTML is only one specific type of XML (although a hugely popular one). Publishing the document in whatever XML format it’s currently in (assuming it’s valid) XML would be the HUGE win. There’s no need for the GPO to do the additional step of converting it to XHTML/XML just so it looks nice in a web browser. That task can be handled by XSLT after the fact.

  • miles

    snap!

    As a librarian I think that is absolutely amazing. I would love to import this into a database and be able to map it out.

    The possibilities are endless. And yes I agree with John that advisory from W3C would be essential.

  • I agree with the sentiment.

    I hope the people implementing this will take the time to research existing doctypes before creating one, and make sure that at least one team member is well versed in web ontological technologies to the point of being a borderline zealot. The way this is implemented will need to be set in stone once complete, and it’s important that the doctype provide a broad set of underlying markup data to ensure maximum accessiblity and cross-referencability.

    It may be worth seeking advisory from the W3C. This is too important to be done lightly, and I’m sure a W3C staffer will recognize said importance and put the team in touch with the authors of the requisite specifications.

    Bravo, Robert Tapella and/or Ellen Miller: you’re doing a thing of such fundamental importance that few people will recognize the effect of your plan until long after it’s executed.

    Thank you, Mr. Schuman, for making this and similar things better known.