Introducing congressional-record! This is a project that can parse the flat text of the Congressional Record from the Government Printing Office's HTML files and produce bulk XML data for the entirety of the digital record — no database required.
Continue readingThe data behind Capitol words
Last Monday we launched an update to our Capitol Words project, which indexes and tokenizes the Congressional Record daily. With the launch behind us and the dust starting to settle, I'd like to walk through how we get from raw text to attributed, searchable quotations, and provide some examples of how you can interact with the data directly.
Before delving into how it works, though, it's important to acknowledge the myriad developers whose work on this project has made it possible. I'm only the most recent steward of the site; the bulk of the data legwork for this iteration was handled by Aaron Bycoffe and Jessy Kate Schingler, and the web interface owes its beauty to Caitlin Weber and Ali Felski. Timball provided the hardware, and the list continues from contributions to the scrapers all the way back to the original conception and implementation of the idea by Josh Ruihley and Garrett Schure. It's the combined efforts of everyone involved that brought us the site that's available today.
Now, without further ado...
Continue readingAnnouncing the Return of “Capitol Words”
More than three years ago, we launched a website called Capitol Words that gave an at-a-glance view of what word was most popular in Congress. Today, the Sunlight Foundation is unveiling the completely revamped and rewritten Capitol Words.
Continue readingA Year Later, Little Progress on Digitizing Legislative Documents
A year ago today, Congress’ Joint Committee on Printing directed that three sets of vital legislative and legal documents be... View Article
Continue reading#notintendedtobeastatement
Sen. Jon Kyl, R-Ariz., recently got into a bit of trouble when he falsely stated on the floor of the... View Article
Continue readingJCP directs enhanced access to 3 of “our nation’s vital legislative and legal documents”
I’m rather late in sharing the news, but “enhanced access” to three of “our nation’s vital legislative and legal documents”... View Article
Continue readingLobbyists Put On Ventriloquist Act
More than a dozen lawmakers inserted statements supporting a biotechnology provision added to the House health care bill that was... View Article
Continue readingHealth Care Word Soup: Luntz Memo
Back in May, in anticipation of the coming health care debate, Republican pollster, strategist and word smith Frank Luntz penned... View Article
Continue readingWeekly Media Roundup – April 17, 2009
Here are a few of the more interesting media mentions of Sunlight and our friends and grantees from this week:... View Article
Continue readingCapitol Words 2.0
Want to know what lawmakers are talking about on Capitol Hill but you can’t figure out how to get any... View Article
Continue reading