Capitol Words

 

How Congress Talks about Sunshine Week

Sunshine Week 2013 is well underway. While yesterday we looked at how well our state government made information available online, today we turn our attention to Capitol Hill.

This week is about supporting policies that maintain our right to know and the importance of open government. So how does Congress do just that? Using Sunlight’s Capitol Words tool, let’s take a look at how lawmakers talk about Sunshine Week.

Democrats say “FOIA” the most in Congress, with Sen. Patrick Leahy (D-VT) leading the pack. The senator is also at the top when “right to know” is mentioned in the Congressional Record.

The “open government” chart (see above) on Capitol Words illustrates how Congress began using the term more and more in recent years. You can see lots of spikes in 2007 when Congress was debating and passing the Honest Leadership and Open Government Act.

And since Sunshine Week coincides with the birthday of James Madison, here is how Congress remembers our fourth president.

Does Congress Love You?

In celebration of Valentine’s Day, we wanted to see what Congress keeps close to its heart. Using Capitol Words, we searched the Congressional Record to see what lawmakers love from the good ole’ U. S of A. to baked goods to — well, themselves.

Comparing these common declarations, Capitol Words shows that Democrats and Republicans are pretty even when it comes to love (50% to 49%, respectively), while Republicans say “I hate” more than their colleagues across the aisle (54% to 45%).

Congress regularly mentions a love of country and our democratic principals. Since 1996, the phrase “I love my country” is more popular among Democrats than Republicans. However, GOP lawmakers say “I love the Constitution” more.

Both Senate Majority Leader Harry Reid and Minority Leader Mitch McConnell love their chamber, but former Sen. Robert Byrd (D-WV) said “I love the Senate” more than anyone else. When the current Hawaii governor Neil Abercrombie served in Congress, he said “I love the House” the most.

Our elected officials like to talk about their personal interests, too. Former comedian Sen. Al Franken (D-MN) loves TV, and Reps. Donna Edwards (D-MD) and Louise Slaughter (D-NY) love football. And former Rep. Todd Akin (R-MO) loves pie (but not pie charts).

Have a very Happy Valentine’s Day courtesy of Capitol Words. And don’t forget Congress loves you.

Learn How to Use the Capitol Words API on Codecademy

We're excited to announce that we've partnered with Codecademy to make our Capitol Words API easier to learn and use. Codecademy is an online learning environment that currently supports novice coders learning Ruby, Python, Javascript and HTML/CSS. They also run Codeyear, which sends you a lesson a week in a project of your choice. They even have some notable participants:

Today they're launching a new spate of courses, including an introduction to HTTP and APIs and courses in other partner APIs, including Bit.ly and NPR. To participate, we've submitted lessons on using our Capitol Words API in Python. You'll learn the basics of making requests to the API, how to use the different endpoints, how to sort, paginate and how to use different query parameters. It's geared towards a beginner to intermediate Python student with minimal knowledge of RESTful APIs and JSON (all of which can be gleaned from the intro course mentioned above).

We hope to add more languages and APIs in the future, but we thought Capitol Words would be a fun API to kick off with. So, go take a look! And don't forget to check out our other APIs here.

Gun Control and Gun Rights: Legislation, Policy and Influence

The tragedy at Sandy Hook Elementary has brought gun policy back to the forefront of our national conversation. As a nonpartisan, nonprofit Sunlight takes no stance on the issue, but we have put together a collection of resources looking at the legislation, policy and influence around gun rights and gun control, plus the groups and lawmakers involved.

The Gun Lobby

Sunlight Foundation Senior Fellow Lee Drutman reviews the political influence of the National Rifle Association and the leading gun control group, the Brady Campaign to End Gun Violence. Read his full analysis in this blog post.

Lee notes that when it comes to the debate on gun policy, Congress is pretty much only hearing from one side. The NRA spends 66 times what the Brady Campaign spends on lobbying, and 4,143 times what the Brady Campaign spends on campaign contributions. Since 2011, the NRA spent at least $24.28 million: $16.83 million through its political action committee, plus $7.45 million through its affiliated Institute for Legislative Action.

According to Influence Explorer records, the Brady Campaign spent $5,800 this election cycle and reported $60,000 in lobbying costs.

Read more

The 12 Days of APIs

IMG_1609‘Tis the season for application programming interfaces. Sunlight is in a festive mood. Not only are we hosting a pretty rad open house this week, but we have the perfect present for the open data developer in your life: a Sunlight Labs API key!

Here are our “12 days of APIs,” with a few bulk data sets thrown in to round it out. No singing required! Be sure to also check out some new additions and better accessibility we’ll have available in 2013.

12 minutes spent researching our API offerings on Sunlight Academy, which includes a brief tutorial video.

11 television markets reported more than 1,500 political ad filings this election. Download data about who bought more than $3 billion in political ads in 2012 from Political Ad Sleuth.

10 methods provided in the Sunlight Congress API. Our most popular API includes basic information on members of Congress, legislator IDs and lookups between places and the politicians that represent them.

9 political races had more than $20 million in outside spending this election. Download the bulk data on the money spent by super PACs, unions, corporations, nonprofits and other groups this cycle at Follow the Unlimited Money.

8 data sets covered by the Influence Explorer API (neé TransparencyData), which includes federal and state campaign contributions, federal lobbying, government grants and contracts, EPA violations, federal regulations and more.

7 collections presented in the Real Time Congress API. Get as close to real-time data as possible on bills, votes, amendments, videos, floor updates, committee hearings and documents.

6 standard arguments to query in the Capitol Words API. Search the Capitol Record since 1996 and filter your results by state, party, chamber, date, start date or end date.

5(0) states available in the Open States API, which also covers D.C. and Puerto Rico. Use the RESTful API or bulk download to access the only comprehensive collection of state legislative data in the U.S.

4 ways to get Political Party Time data. Use the JSON feed, CSV file, RSS feed or relational zip file to know when politicians are fundraising and who is hosting the events.

3 mobile apps powered by our APIs: Real Time Congress for iPhone, Congress for Android and OpenStates for iPhone and iPad. (And check out Call on Congress if you don’t have a smartphone.)

2 options to get Scout alerts, by email or via text message. Scout uses a variety of Sunlight APIs—Capitol Words, Real Time Congress and Open States—to deliver real-time policy alerts on state and national issues, as well as has special user option for developers.

And a listserv to follow what’s happening in Sunlight Labs.

Flickr photo of partridge in a pear tree light display by K. van Santen.

Stay in the loop! Legislative Hill briefing on Scout TODAY

Are you a Hill staffer working on health care? The Farm Bill? DISCLOSE?

Or do you track legislation for your member?

If so, Sunlight is doing a briefing on our latest legislative alert tool, Scout, for Capitol Hill staff today, Friday July 13th at 2pm in Rayburn room 2203. (You can RSVP online here)

We built Scout to help everyone -- including busy Capitol Hill staffers -- better monitor the issues and policies important to them. The tool combines a variety of sources including the Congressional Record, THOMAS, GovTrack, the Federal Register and Sunlight’s Open States project to create legislative  searches about issues you care about. From federal legislation, speeches, regulations, to bills across all fifty states, Scout lets you search and create alerts all in one place.

The briefing will also include an overview on Sunlight’s other legislative and governmental tools such as Capitol Words and our Congress mobile app.

We hope you can join us today Friday the 13th in Rayburn 2203 (just avoid stepping on cracks or walking under ladders).

Rabbit foot optional.

Congress far from exemplary in SAT word proficiency

(This post was prepared in collaboration with Dan Drinkard)

If you do well on your SAT test, then you will ____ your chance of becoming a member of the U.S. Congress some day.

A. Vindicate

B. Scrutinize

C. Compromise

D. Discredit

E. Enhance

While the correct answer should probably be E (Enhance), the reality is that it might be closer to C  (Compromise) or D (Discredit).  At least, when it comes to the 112th Congress, top SAT words are far and few between.

We find that only 10 members of Congress have used at least 20 of the Kaplan 100 Most Common SAT Words so far in the 112th Congress, and that only 92 members of Congress have used at least 10 of these words. More than half of the members of Congress have used five or fewer. And 32 members did not use a single Kaplan 100 word, while 52 members only said one. In total, 0.046% of all words spoken in the Congressional Record were Kaplan 100 words.

For an analysis of how Congressional speech has dropped by a full grade level since 2005, click here.

Among the Kaplan 100, the word spoken most frequently in Congress is “compromise.” It had been uttered 1,820 times this Congress as of the end of April, far more as an aspiration than a description. Majority Leader Sen. Harry Reid (D-NV) has uttered the word 142 times, more than anyone else. Unfortunately, speaking it does not make it so.

Likewise, the other top words – prosperity (923 times), integrity (883 times), and exemplary (582) – also seem far more hopeful than reality-based. Table 1 (below) shows the Kaplan 100 words spoken most frequently in the 112th Congress.

Table 1. Top 20 most-spoken Kaplan 100 words, 112th Congress

Of the Kaplan 100, 14 words are missing entirely from the Congressional Record for the 112th Congress so far. They are: abbreviate; conformist; enervating; evanescent; florid; hackneyed; haughty; hedonist; ostentatious; perfidious; pretentious; querulous; sagacity; submissive.

For the full list of the top 100 words and how much they’ve been spoken and by whom, click here.

Who’s used the most unique SAT words in the 112th Congress? That distinction belongs to Senator Patrick Leahy (D-VT), who, as of April 2012, had used 27 of the Kaplan 100, putting him just ahead of fellow Senator Dick Durbin (D-IL), who has verbalized 26 of the 100 words so far, and Sen. Orrin Hatch (R-UT), who has uttered 25. Leahy has also used Kaplan 100 words a total of 127 times, also just edging out Durbin, who used the words 122 times.

Rounding out the top ten list for most unique Kaplan 100 words spoken are Sen. Mitch McConnell (R-KY), Sen. Benjamin Cardin (D-MD), Rep. Dennis Kucinich (D-OH), Rep. Steve King (R-IA), Sen. Dianne Feinstein (D-CA), Sen. John McCain (R-AZ), and Sen. Olympia Snowe (R-ME). All have got there by speaking at least 100,000 words so far in the 112th Congress. Of the top ten list, Snowe has both the highest grade level for her speech (14th grade), and the highest number of Kaplan 100 words per 100,000 words spoken: 76.5.

 

Table 2. Members who speak the most unique Kaplan 100 words For a full list of how all members compare, click here.

The changing complexity of congressional speech

(This post was prepared in collaboration with Dan Drinkard)

Congress now speaks at almost a full grade level lower than it did just seven years ago, with the most conservative members of Congress speaking on average at the lowest grade level, according to a new Sunlight Foundation analysis of the Congressional Record using Capitol Words.

Of course, what some might interpret as a dumbing down of Congress, others will see as more effective communications. And lawmakers of both parties still speak above the heads of the average American, who reads at between an 8th and 9th grade level.

Today’s Congress speaks at about a 10.6 grade level, down from 11.5 in 2005. By comparison, the U.S. Constitution is written at a 17.8 grade level, the Federalist Papers at a 17.1 grade level, and the Declaration of Independence at a 15.1 grade level. The Gettysburg Address comes in at an 11.2 grade level and Martin Luther King’s “I Have a Dream” speech is at a 9.4 grade level. Most major newspapers are written at between an 11th and 14th grade level. (You can find more comparisons here)

All these analyses use the Flesch-Kincaid test, which produces the 'reads at a n-th grade level' terminology that is likely familiar to many readers. At its core, Flesch-Kincaid equates higher grade levels with longer words and longer sentences. It is important to understand the limitations of this metric: it tells us nothing about the clarity or correctness of a passage of text. But although an admittedly crude tool, Flesch-Kincaid can nonetheless provide insights into how different legislators speak, and how Congressional speech has been changing.

To see how different legislators rank, click here for a full database of all current members of Congress.

To see how many top SAT words lawmakers speak, click here.

Historical trends

Overall, the complexity of speech in the Congressional Record has declined steadily since 2005, with the drop among Republicans slightly outpacing that for Democrats (see Figure 1). Through April 25, 2012, this year's Congressional Record clocks in at a 10.6 grade level, down from 11.5 in 2005.

Between 1996 and 2005, Republicans overall spoke at consistently 2/10ths of a grade level higher than Democrats, except for 2001, when a rare moment of national unity also seems to have extended to speaking at the same grade level. But following 2005, something happened, and Congressional speech has been on the decline since. For Republicans as a whole, the decline was from an 11.6 grade level to a 10.3 grade level in 2011 (up slightly to 10.4 in 2012 so far). For Democrats, it was a decline from 11.4 to 10.6 in 2011 (also up slightly to 10.8 in 2012 so far.)

Figure 1. Congressional speech grade level by year

 

 

 

Ideology and speech complexity

To analyze the relationship between ideology and speech level, we took the first dimension DW-Nominate scores (DW1) for the current Congress, as of April 25, 2012. For the non-political scientists in the audience, DW1 scores take roll call voting data to place members of Congress on a liberal-conservative scale. On this scale, -1 is most liberal and 1 is most conservative. A negative value on the scale implies that the member votes most often with Democrats; a positive value implies that the member votes most often with Republicans.

Turning to Figure 2, we can immediately notice that grade level of Congressional Record speeches declines among Republicans as the voting record becomes more conservative. Among Republicans, the drop from the most moderate to most conservative is, on average, almost three whole grade levels, from 13th to 10th grade.

Among Democrats, the scatterplot does not reveal any relationship between grade level and ideology. However, when we hold all other factors constant in the regression analysis (see further below), we find that being on the far left is associated with lower speech grade levels. There is also a clearer correlation between further left voting score and lower grade level among more junior members.

Figure 2. The relationship of ideology to speech grade level

 

 

 

Changing members and members’ changes

It’s hard to pinpoint the exact cause of the decline. Perhaps it reflects lawmakers speaking more in talking points, and increasingly packaging their floor speeches for YouTube. Gone, perhaps, are the golden days when legislators spoke to persuade each other, thoughtfully wrestled with complex policy trade-offs, and regularly quoted Shakespeare.

The data indicate that part of the decline has to do with new junior members speaking at a lower grade level than more senior members, and some of it has to do with individual senior members simplifying their speech over time.

Figure 3 (below) breaks Congress into four seniority cohorts and details the relationship between ideology and grade level for speeches in the 112th Congress.

Here, a telling pattern emerges. Among the newest members (those with 1-3 years in their seat), there is drop off in speech level as we move from the center out to either extreme of the political spectrum, though the pattern is more pronounced on the far right. For the next cohort (4-10 years of experience), the same pattern continues on both the political right and left, though the relationship is much stronger among Republicans.

For the next cohort (11-20 years in their seat), the pattern on the right (more conservative, simpler speech) remains, but the pattern on the left reverses (there is a slight correlation between more liberalism and higher speech grade level). In the most senior cohort (more than 20 years in their seat), Republicans speak, on average, at a higher level than Democrats, with only the slightest relationship between conservatism and more simple speech.

Figure 3. Ideology and Seniority

 

 

At the individual level, prior to the 109th Congress (2005-2006), both individual Democrats and Republicans on average grew more sophisticated in their speech with each passing session of Congress. Individual Democrats gained on average 0.06 grade levels per session, and Republicans gained on average 0.12 grade levels per session. Then, starting with the 109th Congress, the trends reversed. Individual Democrats began dropping 0.07 grade levels of speech per session and individual Republicans began dropping 0.12 grade levels per session.

 

Table 1. Average estimated effect of each passing Congress on individual member grade level

(results from regression analysis estimating annual member change with member fixed effects)

 

The top and bottom lawmakers by grade level

Table 2 (below) shows the 20 members of Congress with the lowest grade level score for their Congressional record corpus dating back to 1996. Of them, 85% (17 of 20) are Republicans; 65% (13/20) are freshmen, and another 15% (3/20) are sophomores. Additionally, 90% (18/20) are House members. The two Senators to make the bottom 20 are Rand Paul (R-KY) and Ron Johnson (R-WI), both Tea Party-supported freshmen.

Table 2. Bottom 20 speakers by grade level (all speeches since 1996)

Republicans also outnumber Democrats among the members who speak at the highest grade levels. Among the top 20, 12 are Republicans, 7 are Democrats, and one (Joe Lieberman) is an Independent. And eight of the top ten are Republicans. There are also 14 House members and six Senators. And perhaps most notably, there are only two freshmen and three sophomores. More than half of the members have been in their seat for at least 15 years, which is well above the median of nine years across all members of the 112th Congress.

Table 3. Top 20 speakers by grade level (all speeches since 1996)

Regression analysis

To estimate the effects of all the different factors (holding all the other factors constant), we estimated two ordinary least squared regression models. Model 1 uses the different factors to explain the variation in the grade level of individual members’ combined speeches since 1996. Model 2 uses the same factors to explain the grade level of member speeches just in the 112th Congress. (The correlation between all speeches since 1996 and just 112th Congress speeches for non-freshmen members is 0.74. For freshmen, these two measures will obviously be the same.)

For Democrats, moving from most moderate (DW1 score of 0) to most liberal (DW1 score of -1) is associated with a decrease in 1.59 grade levels for all speeches since 1996 combined, and an decrease of 1.35 grade levels for speeches from just the 112th Congress, all else being equal. This estimate is statistically significant.

For Republicans, moving from most moderate (0) to most conservative (1) is associated with a decrease in 2.07 grade levels in speech for all speeches since 1996 combined, and 2.06 grade levels for just the 112th Congress, all else being equal. Both are statistically significant. That the estimates for the relationship between ideology and grade level are consistent across the two models shows that this is both a current and a historic phenomenon.

Another takeaway point from the regression analysis is that the more a member speaks overall, the more simply that member is likely to speak, all else being equal. For just the 112th Congress, going from least to most talkative is associated with a decrease in almost a grade level and a half. For the historic corpus, going from the least talkative to most talkative member is associated with a decrease in a full grade level.

Socioeconomic status of member district does not play much of a role, so there is no story to tell of members speaking to their constituents. If anything, the reverse is true. Having a higher percentage of high school graduates in the district or state is associated with members speaking at a slightly lower grade level (though since half of the districts have high school graduation levels between 82% and 90%, this doesn’t add up to all that much). District median income (which is closely correlated with education generally) has no relationship to speech grade level. There is also no statistically significant difference between chambers. Members of Congress from the Northeast speak at a slightly higher grade level than their colleagues from the rest of the country.

Of course, a fair amount of variation remains unexplained. There are many reasons why members speak at different levels, and these explanations only tell part of the story.

 

Table 4. OLS Regression explaining member speech level (standard errors in parenthesis, significant variables bolded)

Does it matter?

Earlier this year, the University of Minnesota’s Smart Politics noted that Obama’s 2012 State of the Union address clocked in at an eighth-grade level for the third year in a row, and that Obama’s average grade level of 8.4 was well below the average of 10.7 for the previous 67 addresses. Fox News ran the story alongside the image of a child in a dunce cap, and right-wing blogs mocked the President’s intelligence.

Others pointed out that maybe speaking clearly was a good thing. After all, the SOTU speech was pretty much right at the level of the average American’s reading level. And writing gurus like George Orwell (“If it is possible to cut a word out, always cut it out”) and Strunk & White (“omit needless words”) famously advise simplicity.

But whether you see it as plain speak or you see it as a dumbing down, the data are clear: The overall complexity of speech in the Congressional Record has dropped almost a full grade level since 2005. And those on the political extremes, especially those on the far right, tend to be associated with the most simple speech patterns.

Methodology for generating grade level scores

(by Dan Drinkard)

Grade levels were calculated using Flesch-Kincaid readability tests applied to various facets of text queries against the Capitol Words API. For example, Barbara Lee's entire corpus of words spoken can be retrieved by paging through the following url: http://capitolwords.org/api/text.json?bioguide_id=L000551&apikey=####.

Flesch-Kincaid scores can be determined as: 0.39 * (Words/Sentences) + 11.8 * (Syllables/Words) - 15.59.

To derive counts: The python Natural Language Toolkit (NLTK)'s sentence tokenizer was used to count sentences, the Capitol Words ngram tokenizer was used to count words, and the Carnegie Mellon pronouncing dictionary was used to count syllables. For fallback syllable counting when a word wasn't present in the dictionary, three different sets of calculations employing different methods were tried—discarding unknown words, treating unknown words as the average word of 1.66 syllables, and using a trained fallback syllable counter from NLTK_Contrib. We found the results of each method to be nearly indistinguishable from the others. An example F-K calculator (this one using the aforementioned 'padding with averages' method) can be found at https://gist.github.com/2483508.

 

As senator, Santorum was obsessed with abortion

The United States Senate deals with a wide range of issues, both foreign and domestic, but the ones that preoccupied Rick Santorum the most during his tenure appear to have been gynecological. An examination of the surging GOP presidential contender's record using the Sunlight Foundation's Capitol Words (LINK) reveals the degree to which Santorum favored topics such as abortion, fetuses and wombs when he was serving in Congress' upper chamber.

According to our analysis, between January 1, 1996 and January 3, 2007 (his last day as a member of the Senate), the then-junior senator from Pennsylvania spoke the following words more than anybody else in the Senate: abortion, partial-birth, fetus, fetal, womb. He also uttered the following phrases more than anyone else: “base of the skull,” and “life of the mother."

Total Santorum utterances (1/1/1996-1/3/2007) Total Senate utterances (1/1/1996-1/3/2007) Santorum % Rank
abortion 1014 8328 12.2% #1
partial-birth 379 1787 21.2% #1
fetus 145 780 18.6% #1
"partial birth" 116 466 24.9% #1
fetal 99 1134 8.7% #1
womb 90 369 24.4% #1
"base of the skull" 34 48 70.8% #1
"life of the mother" 74 307 24.1% #1

Though he was just one of 100 senators, Santorum was responsible for approximately one of eight utterances of “abortion” during the ten years covered by our analysis, and approximately one in five utterances of “fetus” and “partial-birth.”

As a Senator, Santorum was the sponsor of the Partial-Birth Abortion Ban Act of 2003, which criminalized the so-called  “partial-birth abortion,"  as opponents term a controversial procedure for ending late-term pregnancies. Doctors who perform this procedure now face a fine and up to two years in prison. He was also a co-sponsor of a number of bills that would have prohibited children from crossing state lines to receive an abortion, and would have required abortion providers to tell pregnant women aware that the abortion will cause their unborn child pain.

As a candidate, Santorum has built a campaign around his strong opposition to abortion and gay rights. He wants to ban all abortions and to ban all same-sex marriage.

Santorum recently went so far as to compare himself to “a Jesus candidate.” He was not, however, the “Jesus” Senator. According to Capitol Words, that distinction belongs to Robert C. Byrd.

 

HOW WE DID THIS:

The numbers in this post were generated using Capitol Words, a Sunlight Foundation project that analyzes the frequency with which different terms appear in the Congressional Record. We used the Capitol Words API to calculate how often Sen. Santorum used each phrase versus the entire Senate during the time periods in which Santorum was in office and for which we have data. An example query is:

http://capitolwords.org/api/dates.json?apikey=API_KEY_GOES_HERE&phrase=partial-birth&start_date=1996-01-01&end_date=2007-01-03&chamber=Senate&bioguide_id=S000059

To get the term count for the entire Senate, we simply removed the "bioguide_id" parameter. You can find more information about the Capitol Words API in this blog post.

The data behind Capitol words

Last Monday we launched an update to our Capitol Words project, which indexes and tokenizes the Congressional Record daily. With the launch behind us and the dust starting to settle, I'd like to walk through how we get from raw text to attributed, searchable quotations, and provide some examples of how you can interact with the data directly.

Before delving into how it works, though, it's important to acknowledge the myriad developers whose work on this project has made it possible. I'm only the most recent steward of the site; the bulk of the data legwork for this iteration was handled by Aaron Bycoffe and Jessy Kate Schingler, and the web interface owes its beauty to Caitlin Weber and Ali Felski. Timball provided the hardware, and the list continues from contributions to the scrapers all the way back to the original conception and implementation of the idea by Josh Ruihley and Garrett Schure. It's the combined efforts of everyone involved that brought us the site that's available today.

Now, without further ado...

Read more