Web Harvest Archive


I’m glad to have just found the archive of old Web sites from members of Congress, maintained by the Center for Legislative Archives under the National Archives and Records Administration (NARA). (more after the jump.)

The collection seems well organized and easy to peruse, with solid explanations of their methodology and disclaimers about what’s available based on the crawling.

My main suggestion is that the archiving happen with greater frequency, perhaps coordinated in order to capture the greatest amount of material possible, and for those responsible for the Web Harvest to coordinate with the CAO, systems administrators, and vendors to be sure that the digital records management practices used in organizing member sites encourages easy crawling and archiving by NARA and CLA.

The House has a document laying out best practices for documents management for House offices; I wonder if the digital materials management should be expanded to include digital materials availability, perhaps including standards like sitemapping, in order to ensure the preservation of member sites?

My other suggestion is to increase the exposure of the captured sites, perhaps encouraging links from the bioguides, or current member sites, and to ensure that the collection itself is crawlable through search engine indexing practices.