Yesterday, the House of Representatives massively improved its feed of live updates from the House floor. The House Clerk has been hosting a live floor feed for a long time, but this update breaks out related bills and votes more cleanly, adds times down to the second for each update, and drastically cleans up the HTML of the page.
But most wonderfully, the cleaner HTML doesn't really matter, because they also turned on a live XML feed. We've been scraping the HTML from this page for a while to serve our Real Time Congress API and the apps that use it, which has meant ongoing maintenance to keep up with changes to their system. An XML feed will open the doors for more people to make use of the information, and improve the quality of the offerings of others who already use it.
This move is part of a broader plan by the House to increase the availability and timeliness of its data, and to offer it in bulk and in machine readable formats wherever possible. We have been vocal advocates of this sort of progress for a long time, and it's gratifying to see concrete steps toward this goal.
After working with the House's XML feed myself, there are a few improvements I can recommend:
Provide an XML link that doesn't require a date in the URL, that contains information for the latest legislative day. For now, I still have to briefly load in the HTML version and search for the download link to figure out what legislative day is the most current. (Legislative "days" are not calendar days, and a session that runs past midnight has the same "legislative day" throughout.)
Link all related bills, resolutions, and reports for a floor update in the XML. Right now, only one bill is associated with an update, even if multiple are mentioned. I'm forced to create this linkage with regular expressions.
Link to related votes in the XML. The HTML version aggregates all votes for the day, but neither the HTML nor the XML version link them with individual updates. I'm forced to create this linkage with regular expressions.
Feel free to check out our parsing code as an example of how you might work with this data.
Even with those issues, the House is miles and miles ahead of the Senate, which has never provided an adequate feed of floor activity. There is an official summary of the Senate floor's events, but it is only updated the day after the fact, and doesn't even contain timestamps for archival purposes. The Republican caucus runs their own live floor feed, but its language is occasionally partisan, and is more a selection of quotes than a stream of activity.
The version we currently use in our own services (inspired by @senatefloor) is the Senate Periodical Press Gallery. It's updated throughout the day, but lacks timestamps and has a big disclaimer at the top about being unofficial, despite running on a senate.gov domain. It also doesn't mention related vote numbers, making it impossible to extract links to roll call votes. We make it work, but it's very suboptimal.
The House has done a great job in setting an example of making its activities available to citizens in a rapid and flexible way. If the Senate can swallow its pride and look to the House' version as a model, they too can walk a little taller in the 21st century.