Sunlight analysis reveals only 15 percent of congressional websites are HTTPS ready
About a month ago, Sunlight’s [Nicko Margolies](http://sunlightfoundation.com/team/nmargolies/) noticed [something strange](https://twitter.com/SFnicko/status/582645903099621376). As a concerned netizen, he had installed [EFF’s HTTPS Everywhere plugin](https://www.eff.org/HTTPS-EVERYWHERE), which will automatically send you to the HTTPS version of a page if it’s available. But as he was visiting various Senate websites, he noticed that many would bring up Sen. Barbara Boxer’s [website](https://www.boxer.senate.gov/) instead of the intended Web page. Curious, he hailed me, Sunlight’s system administrator, and I started to investigate.
First, I examined the SSL certificate for the California Democrat’s site, then several other senators. It quickly became apparent that the entire Senate relied on [one misconfigured SSL certificate](https://gist.github.com/timball/0e63c2bb6e429d585eb3) and some poor Web server settings, causing the everywhere plugin to redirect to Boxer’s site even when looking at another, distinct page. We tweeted out our finding; the [official Senate Sergeant at Arms](https://twitter.com/SenateSAA/status/582678896690229248) replied that they would look into it. Job done. Let the professionals get on with it.
Last week, I decided to check up on the current state of SSL support in the Senate and in Congress generally. What I found was that it was not substantially better: In fact, only 15 percent of congressional websites are completely ready for HTTPS. In order to quantify it I wrote some [code to figure it out](https://github.com/sunlightlabs/capitolhttpstester).
In this article we will describe the methodology of the survey and present the survey results. We will also offer a brief analysis of what can be done to address the situation. It is important to note that this evaluation should not and is not a reflection on individual members of Congress or their websites, but is reflective of the entities that host those websites. We know this because across the 652 websites surveyed they were only served from 24 IP addresses:
“`code >>> ips = set() >>> for site in survey_group: … ips.add(socket.gethostbyname(parse_url(site[‘url’]).host)) … >>> ips set([‘22.214.171.124’, ‘126.96.36.199’, ‘188.8.131.52’, ‘184.108.40.206’, ‘220.127.116.11’, ‘18.104.22.168’, ‘22.214.171.124’, ‘126.96.36.199’, ‘188.8.131.52’, ‘184.108.40.206’, ‘220.127.116.11’, ‘18.104.22.168’, ‘22.214.171.124’, ‘126.96.36.199’, ‘188.8.131.52’, ‘184.108.40.206’, ‘220.127.116.11’, ‘18.104.22.168’, ‘22.214.171.124’, ‘126.96.36.199’, ‘188.8.131.52’, ‘184.108.40.206’, ‘220.127.116.11’, ‘18.104.22.168’]) >>> len(ips) 24 “` Of those 24 unique IP addresses, just two (22.214.171.124 and 126.96.36.199) serve the bulk, 609 out of 652, of those homepages.
What follows is my report.
### What is HTTPS?
[Hypertext Transfer Protocol Secure (HTTPS)](http://en.wikipedia.org/wiki/HTTPS) is the method by which data is transferred over the Web in a secure manner. Users know they’re on an HTTPS webpage when a small padlock icon shows up in the address bar. The padlock, like the HTTPS in the URL, means that the Web browser and the Web server have agreed to encrypt the full contents of the Web page and that the user can feel confident that none of the information was at risk of compromise during the transmission of that Web page from server to browser. It’s a little bit like the difference between using a postcard and sealing a letter in an envelope. Someone other than the sender can easily add text to a postcard; to tamper with a letter, one must open the envelope.
And HTTPS is coming. Two [major Web](http://googleonlinesecurity.blogspot.com/2014/08/https-as-ranking-signal_6.html) [browser companies](https://blog.mozilla.org/security/2015/04/30/deprecating-non-secure-http/) have decided enforcement of secure socket layer (SSL) connections will be mandatory and that visiting non-SSL Web pages will be considered an error. While the author of this analysis does not necessarily endorse this view completely, he nevertheless decided to undertake a survey of congressional websites to see if they were ready for HTTPS. Of the 652 websites surveyed, only 98 (15 percent) passed completely.
Both [House](https://gist.github.com/timball/b04f4f8867f0d763e568) and [Senate](https://gist.github.com/timball/0e63c2bb6e429d585eb3) certificates use similar levels of encryption, that being [TLSv1/SSLv3](https://www.chromium.org/Home/chromium-security/education/tls) with [AES256-SHA](http://en.wikipedia.org/wiki/Advanced_Encryption_Standard). Technically, the [SSLv3 is considered obsolete](https://isc.sans.edu/forums/diary/SSLv3+POODLE+Vulnerability+Official+Release/18827/) and new certs using more robust ciphers should be generated. We did not penalize a Web page’s grade because of this flaw, [even though](http://en.wikipedia.org/wiki/POODLE) the [Internet considers](https://www.us-cert.gov/ncas/alerts/TA14-290A) it a [major problem](https://community.qualys.com/blogs/securitylabs/2014/10/15/ssl-3-is-dead-killed-by-the-poodle-attack).
### How did we analyze the HTTPS status of congressional websites?
For this analysis, we examined 652 websites, including those of all senators, representatives, leadership offices, congressional committees, and congressional support offices. The [software](https://github.com/sunlightlabs/capitolhttpstester) was written to check individual website SSL certificates and make HTTPS requests. Once those requests were served, the resulting page content was examined for [mixed content](https://wiki.mozilla.org/Security/Features/Mixed_Content_Blocker#1._Feature_overview) and [non-relative internal introspective links](http://en.wikipedia.org/wiki/Uniform_resource_locator#Protocol-relative_URLs), which are signifiers of whether a site was ready for HTTPS. Once the survey was complete, a scoring metric was calculated for each website.
[The software](https://github.com/sunlightlabs/capitolhttpstester) to produce the survey had four main functions:
- Gather a list of relevant congressional homepages to test.
- Retrieve and examine each website SSL certificate.
- Retrieve the homepage and examine both the server response and homepage content.
- Calculate a scoring metric based upon those three previous steps.
The Sunlight Foundation provides an API to [retrieve information about members of Congress](https://sunlightlabs.github.io/congress/legislators.html). A secondary source was needed to [retrieve information about various congressional committees](https://github.com/unitedstates/congress-legislators). From these sources the software gathered:
- entity name;
- entity chamber;
- whether entity was a member of Congress or a congressional committee; and
- the URL for the entity’s homepage.
Daniel Schuman of the [Congressional Data Coalition](http://congressionaldata.org/) provided the author with a list of legislative branch committees and office URLs, and had [written elsewhere](http://www.rollcall.com/issues/57_13/campus-notebook-outdated-websites-mocked-207657-1.html?pos=hbtxt) about the need for this kind of analysis. (These were added to the [unitedstates](https://github.com/unitedstates/congress-legislators) GitHub repository.)
Once this information was gathered, each entity’s homepage SSL certificate was retrieved via [python’s SSL library](https://docs.python.org/2/library/ssl.html) and examined. Critical data at this step included:
- whether the CommonName (CN) or SubjectAltName (SAN) matched the homepage hostname;
- certificate expiration date; and
- certificate cipher information.
For our more technically minded readers (others can skip this paragraph), another way to have done this step would be to use the openssl command line tool. From a UNIX command prompt, the following example can be used to examine a certificate, like so:
“`sh $ echo -n | openssl s_client -connect www.example.com:443 | sed -ne ‘/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p’ > example.crt $ openssl x509 -text -in example.crt “` Here are links for the text of both U.S. Senate and U.S. House HTTPS certificates:
- [Senate HTTPS SSL certificate text](https://gist.github.com/timball/0e63c2bb6e429d585eb3);
- [House HTTPS SSL certificate text](https://gist.github.com/timball/b04f4f8867f0d763e568).
A single “forced” request to retrieve the entity’s HTTPS homepage was made, forced in the sense that the URL schema for the homepage was explicitly set to `https://`, but encryption was not enforced. This technique allowed us to survey websites that did not have valid encryption settings. This step examined:
- the server’s HTTP response;
- whether the client was redirected to another Web page; and
- whether the resulting Web page contains mixed https/http content.
Once all the data from the three previous steps were gathered, a scoring metric was applied based on the homepage’s security profile, and a table was generated. From preliminary tests, 11 unique security states were found to exist. They are specified in the [`grade_summary`](https://github.com/sunlightlabs/capitolhttpstester/blob/master/maketable.py#L26) variable of the HTML table generating code. Those security states were then translated into grades.
In order to make the grades emotionally meaningful, each grade has an associated emoji ranging from a green check mark to an red “X” emoji.
### What did we find?
The results are not stellar. Take a look at the graphic below to view our findings.
If you’d like to embed this chart here’s the html code: “` “`
Of the 652 websites, just 98 (or 15 percent) had no issues and worked correctly. Ignoring the non-relative URL and mixed content warning raises that percentage to 37.7 percent of Capitol Hill homepages with somewhat functional HTTPS homepages. A worryingly common behavior was to force visitors back to the non-SSL body Web page, either `www.house.gov` or `www.senate.gov`. Because this could confuse users, we considered this to be a failure. Sadly, 22.2 percent of all congressional pages were just broken, most likely due to server misconfiguration.
### Analyzing the HTTPS status of the House versus the Senate
Examining chambers individually skews results wildly in favor of the House. The number of House member websites implementing SSL correctly was 86, or 19.6 percent. Ignoring non-relative URL and mixed content warning raises that percentage to 43.8 percent. The most common behavior for secure house member websites, comprising 47 percent, was to force redirection to the insecure Web pages — thereby nullifying the intended purpose of the secure Web server.
By contrast, in the Senate, a mere 2 percent of homepages worked with SSL correctly: [Boxer’s](https://www.boxer.senate.gov/) and [Sen. Richard Durbin’s](https://www.durbin.senate.gov/), D-Ill., proving that it is possible to have a valid configuration in the the Senate infrastructure. Expanding the results to include non-relative URL and mixed content raises that to 19 percent. Among the more secure Senate websites, the most common flaw, found in 46 percent of them, was for the Web server to throw an error saying “Forbidden.”
The House is also bolstered by the number of members whose hostname matched the house.gov SSL certificate. Only one member failed that test: Rep. Robert Aderholt, R-Ala. The other 437 member and delegate websites passed. Aderholt’s website failed to even connect in our repeated tests. We attribute this to the House SSL certificate having a SAN of `*.house.gov`, which allowed any host ending in `.house.gov` to match its SSL certificate.
The subject of Aderholt’s homepage is confounding. In some browser/operating systems, his page worked under HTTPS as intended. In others, it would simply never connect, resulting in a client-side timeout. Unfortunately, our software always experienced the client timeout, meaning we could not fairly evaluate Aderholt’s homepage — resulting in a failure. This may well be an unfair evaluation on our part, but it is worth highlighting because it may indicate some underlying server configuration issues.
On the other hand, the Senate only had 38 matching member hostnames, leaving the 62 other senators with the inability to even have a valid certificate. That severely impacted scores. Unlike the House SSL certificate, the Senate explicitly listed entities in its SAN field. This limited the total number of hosts that could be matched. Unfortunately, a wildcard certificate like the House’s would not work — `*.senate.gov`, for example — because the Senate seems to enforce use of the `www.` prefix. And so called “wildcard” certificates only authenticate one level of subdomain per domain. For example, `one.domain.com` would match, but `two.one.domain.com` would not match a wildcard certificate for `*.domain.com`.
Thankfully, the Senate will have to fix all these issues by Dec. 11, 2015; after that day, the current SSL certificate will expire, forcing all 100 senators to have invalid SSL certificates. The House’s SSL certificate expires two months after that on Feb. 6, 2016.
### What have we learned?
In general, you should close unused and nonfunctioning [ports](http://en.wikipedia.org/wiki/Port_(computer_networking)) to the general public. If a port is open, you should be using it properly — otherwise close it. The current state of HTTPS support for members of Congress needs work. It’s more than likely that several server configurations will have to be adjusted. Thankfully, the [Mozilla Foundation has a set of recommendations](https://wiki.mozilla.org/Security/Server_Side_TLS). Leaving misconfigured ports open — even if they are secure — is worse than having them closed.
Finally, Sunlight is interested in seeing Congress take sound steps to properly secure its — and the American people’s — information. This author, in particular, hopes that lawmakers will read this analysis and ponder some of the questions that have been raised, potentially making changes to improve their security practices. To that end, we’ll run these tests again periodically to identify any changes that they may or may not make. See you all very soon!