Introducing congressional-record! This is a project that can parse the flat text of the Congressional Record from the Government Printing Office's HTML files and produce bulk XML data for the entirety of the digital record — no database required.Continue reading
Last Monday we launched an update to our Capitol Words project, which indexes and tokenizes the Congressional Record daily. With the launch behind us and the dust starting to settle, I'd like to walk through how we get from raw text to attributed, searchable quotations, and provide some examples of how you can interact with the data directly.
Before delving into how it works, though, it's important to acknowledge the myriad developers whose work on this project has made it possible. I'm only the most recent steward of the site; the bulk of the data legwork for this iteration was handled by Aaron Bycoffe and Jessy Kate Schingler, and the web interface owes its beauty to Caitlin Weber and Ali Felski. Timball provided the hardware, and the list continues from contributions to the scrapers all the way back to the original conception and implementation of the idea by Josh Ruihley and Garrett Schure. It's the combined efforts of everyone involved that brought us the site that's available today.
Now, without further ado...Continue reading
More than three years ago, we launched a website called Capitol Words that gave an at-a-glance view of what word was most popular in Congress. Today, the Sunlight Foundation is unveiling the completely revamped and rewritten Capitol Words.Continue reading
A year ago today, Congress’ Joint Committee on Printing directed that three sets of vital legislative and legal documents be... View ArticleContinue reading
Sen. Jon Kyl, R-Ariz., recently got into a bit of trouble when he falsely stated on the floor of the... View ArticleContinue reading
I’m rather late in sharing the news, but “enhanced access” to three of “our nation’s vital legislative and legal documents”... View ArticleContinue reading
More than a dozen lawmakers inserted statements supporting a biotechnology provision added to the House health care bill that was... View ArticleContinue reading
Back in May, in anticipation of the coming health care debate, Republican pollster, strategist and word smith Frank Luntz penned... View ArticleContinue reading