Editor’s note: Never in a million years would we have guessed at the response that this post would have generated. There is clearly a lot at stake about the future of the Internet. We understand the frustration of any group trying to engage the public in an arcane policymaking process, particularly when the process is flawed. Since posting our analysis, there have been reports that the Federal Communications Commission (FCC) now acknowledges that the data is incomplete. We hope this generates an open and constructive conversation about the need for open, better, complete and machine-readable government data.
A letter-writing campaign that appears to have been organized by a shadowy organization with ties to the Koch Brothers inundated the Federal Communications Commission with missives opposed to net neutrality (NN), an analysis by the Sunlight Foundation reveals.
Over the past several months, the Federal Communications Commission has been working towards a new set of rules around net neutrality, and a large part of that process has been accepting comments from the public. In September, [we reported on our analysis of the comments from the first comment period of this rulemaking](http://sunlightfoundation.com/blog/2014/09/02/what-can-we-learn-from-800000-public-comments-on-the-fccs-net-neutrality-plan/), and we’d now like to take a look at the comments from the second, which the FCC [released in bulk in October](http://www.fcc.gov/blog/fcc-releases-open-internet-reply-comments-public). We again used natural language processing techniques to examine the approximately 1.6 million comments we successfully extracted from this batch of comments, helping to expose important topics discussed in the comments, and to group similar comments together.
Among our key findings from round two:
* In marked contrast to the first round, anti-net neutrality commenters mobilized in force for this round, and comprised the majority of overall comments submitted, at 60%. We attribute this shift almost entirely to the form-letter initiatives of a single organization, American Commitment, who are single-handedly responsible for 56.5% of the comments in this round.
* In large part because of this campaign, the percentage of comments submitted that we believe to have been form letter submissions was significantly higher for this round than the last one, at 88%.
* Non-form-letter submissions had a similar sentiment distribution as comments in the first round, at less than 1% opposed to net neutrality.
* In general, many more comments were difficult to classify in this round than in the first round. Some of the new campaigns on the anti-net neutrality side appear to have been crafted to use similar language to the successful pro-neutrality campaigns of the first round, while supporting opposite conclusions, and many non-form-letter comments used talking points from both camps, making their ultimate intents unclear.
* As with the last round, the corpus also included submissions on behalf of telecommunications firms, advocacy organizations, etc., which were written using formal legal language that set them apart from the bulk of the comments. Again, these were a tiny fraction of a percent of overall comments.
* Combined with the first round comments, we characterize 41% of the total comments submitted as being anti-net neutrality (with the balance being a mix of pro-NN and comments with no clear opinion), and we estimate that 79% of submissions came as part of form letter campaigns.
Below is a revised version of our comment visualization tool, this time exploring the data from the second comment period.
We again did a deep dive into the topics that came to light from this model. As expected many of the same topics recurred in the comments in this round:
* Opposition to paid priority or tiered speed was again commonly discussed in pro-NN comments. Form letter campaigns discussing this topic included those from FreePress, BattleForTheNet, Credo, Daily Kos and the Sierra Club.
* Many commenters again discussed various legal rationales for net neutrality, with phrases like “common carrier,” “title II,” and “public utility.” Such phrases occurred in about half of comments in this iteration.
* Arguments about the economy were common in both pro-NN and anti-NN comments, with disagreements as to which policy best favored economic growth.
Additionally, particularly on the now-better-represented anti-net neutrality side, some new framings were apparent:
* Similar to, but less ambiguous than, the messaging that emerged from tea-partier groups in round one, was a set of arguments that dominated the anti-NN comments in round two, and that we believe originated with conservative activist organization American Commitment. Comments from this campaign had a shared template, with different targeted messages inserted between the second and third paragraph. Those targeted messages centered on topics as far ranging as personal freedoms, economic threats, the poor state of US public utilities, and the characterization of pro-NN advocates as extreme leftists (Free Press’s Robert McChesney is portrayed as a Communist).
* A separate, smaller contingent that opposed FCC action on net neutrality suggested that while net neutrality regulation might be within the government’s purview, it would be better left to Congress. Most of the comments in this group came from a form letter campaign organized by TechFreedom.
Our identification of form letters followed the same approach as last round: identify clusters with particularly low variance and peruse them to confirm shared boilerplate language. This task was much easier with the second round, however, because there was less noise within each cluster. Because the corpus as a whole contained mostly form letters, partitioning it into clean “neighborhoods” was not difficult. Also, the uniformity of the comments submitted through campaigns like American Commitment’s, TechFreedom’s and BattleForTheNet’s made clustering them together fairly straightforward. American Commitment’s clusters were very well behaved because their shared boilerplate was distinctive enough to exclude them from other groupings, hence the large blue supercluster that houses nearly all of their clusters. American Commitment’s tendency to have clusters of approximately 32,000 comments made spotting them easy, too.
For comparison purposes, here are simplified versions of the form letter visuals from parts one and two, side-by-side:
## A new get-out-the-comments player
The clear takeaway in examining the comments from round two is the way in which the campaigns we attribute to American Commitment completely changed the balance of opinions expressed. With their comments excluded, the corpus would have looked quite a bit like the first round:
* About 728,000 total comments (vs. about 800,000 in round one)
* 75% of comments would have been form letters (compared to about 60% from the first round)
* About 4% of comments would have opposed net neutrality, only a slight increase from the first comment round
Perhaps just as striking as the scale of American Commitment’s efforts was the breadth; most form letter organizers drove large-scale submission of a single comment template, and while many allowed submitters to customize their comments, most submitters apparently chose not to do so. This resulted in one group of nearly identical submissions for most campaign organizers (this kind of behavior is also typical of our experience with form letters in other regulatory arenas). A few more sophisticated campaigners had more than one template, or allowed submitters to plug variant sentences into a single template, but this was generally the extent of the per-submitter variation.
American Commitment, by contrast, had at least 30 different comment variants, many offering wildly different rationales justifying their positions, and taking positions across the political spectrum in their specifics. The number and timing are almost identical across comment templates, which we believe most likely suggests random assignment of prospective submitters to different comment pools, perhaps as a means of testing which messaging drew more submitters, or possibly to try and evade the kind of automated form letter grouping we and others did in the first round. Here is the comment template:
Dear Mr. Wheeler,
As an American citizen, I wanted to voice my opposition to the FCC’s crippling new regulations that would put federal bureaucrats in charge of internet freedom, and urge you to stop these regulations before they’re enacted.
If the federal government goes through these plans to regulate the internet, I know that the internet will change — and not for the better.
[ INSERT VARIANT PARAGRAPH COMMENT HERE ]
Like many Americans, I believe that the internet should remain free of government control and unnecessary regulation — just as it has for the last twenty years of unprecedented growth.
Please stop the FCC’s dangerous new regulations, and protect the future of internet freedom here in America.
[APPLICANT HOME ADDRESS]
…and here’s a sampling of the variant comments, along with their submission counts and timelines:
|The Internet is not broken, and does not need to be fixed. Left-wing extremists have been crying wolf for the past decade about the harm to the Internet if the Federal government didn’t regulate it. Not only were they wrong, but the Internet has exploded with innovation. Do not regulate the Internet. The best way to keep it open and free is what has kept it open and free all along — no government intervention.||150654|
|Americans have been getting faster and faster Internet speeds because of competition in the free economy, not because of anything the government has done. The Internet does not need the federal government’s “help” and neither the American people nor their elected representatives are asking for the federal government to place political controls over the Internet. The people calling for government control over the Internet are a tiny minority of far-left political activists, and the FCC knows it. Any effort by the FCC to regulate the Internet will be seen by the vast majority of the American people for what it is — another lawless Obama Administration power grab.||32281|
|The Internet is the biggest economic, intellectual, and artistic success story of the century, and it rose up because of free people, not stifling government. The federal government needs to keep its hands off the Internet. It is not broken, and it does not need to be fixed. It is the federal government, not the Internet, that is broken, and in need of fixing.||32257|
|Before our government can handcuff a citizen, it must have some reasonable evidence that they have done something wrong. Before the FCC places regulatory handcuffs on Internet providers, shouldn’t the government present evidence that they have actually done something wrong? If the police were to handcuff someone because they might, theoretically, maybe, kind of do something wrong someday, there would be justifiable outrage. Such is the case with the FCC’s attempt to place regulatory handcuffs on Internet providers — just in case they might do something wrong someday. The FCC’s rulemaking in the absence of any actual problem, any actual misbehavior on the part of Internet providers, or any consumer harm is beneath the dignity of an expert agency.||32412|
|The ideological leader of the angry liberals calling for you to reduce the Internet to a public utility is Robert McChesney, the avowed Marxist founder of the socialist group Free Press. In an interview with SocialistProject.ca, McChesney said: “What we want to have in the U.S. and in every society is an Internet that is not private property, but a public utility…At the moment, the battle over network neutrality is not to completely eliminate the telephone and cable companies. We are not at that point yet. But the ultimate goal is to get rid of the media capitalists in the phone and cable companies and to divest them from control.” In a country of over 300 million people, even an extremist like McChesney can find, perhaps, millions of followers. But you should know better than to listen to them.||32198|
## Estimating sentiment percentages in non-form comments
Our overall estimate for the roughly 60/40 split between anti-NN and pro-NN was relatively easy to make, since we could confidently classify 88% of comments after reading the 50 form letters that served as their respective prototypes. Still, we were curious to see what the makeup of the non-form-letter comments was. Not only do the remaining documents represent a significant chunk of the corpus, but they’re also potentially the most interesting. These comments reflect the personal interpretations of their authors and give a sense as to how different advocacy messages are shaping how the public thinks about this complex issue.
A brief aside: of the 12% of documents that were not form letters, 14,999 (about 1% of the corpus) looked like this:
To Chairman Tom Wheeler and the FCC Commissioners,
No Content Found — Please specify some content
This submission is obviously an error. Submissions like this appear as the lone gray circle in the form letter visual above. It appears that all of these submissions were just filled in with name and address information, and no actual content. We were able to locate what we think was the source of this phenomenon: Daily Kos specifically directed participants to write their own comment, rather than using a form letter, in [this campaign](http://campaigns.dailykos.com/p/dia/action3/common/public/index.sjs?action_KEY=1035). It appears that about 15,000 respondents didn’t read the instructions and submitted what were essentially blank documents.
The final 11% of comments (184,120 documents) presented a problem. There were only two of us working on this project, and reading the whole bunch would have kept us busy for quite a while. We decided, instead, to manually read and classify a random sample of 1,840 documents (about 1% of the 11%) to make a training set for an automatic classifier, which is a typical text-mining approach to addressing this type of problem. We trained a similar text classifier in our earlier post to try to estimate the number of expert and non-expert comments in non-form letters.
We selected a random 20% of comments from each of the high-variance clusters, which were predominantly non-form-letter clusters. Of those, we selected a random 1,844 documents to classify by hand. Unfortunately, anti-NN examples were very rare (9 documents) and the rest of the set was split between pro-NN (1575 documents) and those that were either too vague or inscrutable (260 documents). This is data that is too unbalanced for training an automatic clustering algorithm, and so we treated it as a rough estimate of the makeup of the non-form comment pool: 85.4% pro-NN, 14% unclear, and 0.6% anti-NN.
This is hardly a scientific approach, but it’s not very surprising to find a preponderance of pro-NN sentiment in the non-form-letter comments. Free Press organized the submission of over 100,000 comments that included the applicant’s name and a short, unprompted message. Furthermore, as mentioned above, there is evidence that Daily Kos charged its participants specifically to write non-form-letter comments.
## Public dialogue or public rant?
Our experience analyzing these comments has given us a unique vantage point on the public’s relationship to regulatory bodies like the FCC, and the role that advocacy organizations play in mediating that relationship. The FCC’s Electronic Comments Filing System is not primarily designed to serve as a platform for debating regulators’ role in serving the public. Nonetheless, when the public was invited to comment upon rules that many believe would have serious consequences for the business community and consumers alike, it naturally gave rise to one of those elusive “national conversations” about a complex and contentious issue.
The term “conversation” might be a bit generous in this case, but if there was one, it’s easy to imagine that the original participant — the FCC itself — might consider it to have completely de-railed. Very few of the comments address specific elements of Chairman Wheeler’s proposed rules. Instead, they focus on the general notion that network neutrality is something that should be either protected or eschewed, depending on a commenter’s personal or professional concerns. These concerns, however, are not always directly relevant to the issue at hand.
On the pro-NN side, arguments include network neutrality’s role in protecting our right to free speech and preventing Internet providers from charging consumers higher fees for faster service.
There can be no freedom if you favor one product over another. Net neutrality is important for protecting free speech, innovation, and healthy competition. Don’t let something this unjust happen to the world. (6018210841-8285)
Without net neutrality, people of the lower middle class wouldn’t be able to afford internet fees, so they’d be stunted on their growth as a race of technology. In this day and age, internet connection is so unbelievably vital to being with the goings on, whether it be email, internet news, articles, job applications, the internet play an enormous role in today’s society, that if most of America couldn’t afford, we’d be setting back our progress as a nation. Plus, you might get riots. (6018211177-9593)
Needless to say, private companies are under no obligation to uphold the First Amendment, and it’s already an ISP’s prerogative to charge its customers more or less according to the speed of their connections. These areas of discussion are at best secondary to the main issues that Wheeler’s proposed rules would tackle. The FCC has also shown no willingness (or, frankly, technological capacity) to fulfill the surveillance-culture nightmares mentioned in other pro-NN comments:
Protect A Free Net. We Don’t Need The FCC To Turn Into An NSA 2.0. (6018211039-5702)
Arguments from anti-NN commenters are at times similarly outside the scope of the FCC’s request. Some commenters seem to understand an “open Internet” to be an Internet without any security:
Open Internet sounds in theory like the right thing to do. Of course! But what about terrorists who creep into our every day lives, no matter how much we protect ourselves? Who’s going to protect the Open Internet?
Anti-NN comments are also sometimes fearful of invasions of privacy:
The internet is fine the way it is. please leave it alone so the common people can enjoy it. big brother NSA should concentrate on the true enemy of the country not all Americans! (6018305588)
But on the other hand, the majority of anti-NN comments seem primarily to take issue with the fact that the FCC regulates anything at all:
I do not understand my government’s “need” to fix what isn’t broken. Please keep your hands and your laws off the Internet. I see the Internet as a place where the best and worst can exist side by side without hurting anyone. Please, considering it is possibly the last vestige of free speech in the world, allow it to create itself according to the needs of its varied users. Thank you. (02-047-005216)
As is often the case with complex issues in the public sphere, framing is everything. Both sides in this digital debate appeal to universally cherished values like freedom, personal choice, security, and economic prosperity. It’s easy to see how those foundational American ideals can be used to generate the submission of millions of passionate responses. What’s less clear is whether or not these concerns, often tangential to the issue at hand, are likely to aid the FCC (which, as we’ve pointed out before, is under no obligation to read all comments) in making its final ruling.
## A note about data quality
As with the first round of filings, the number of comments we’re including here is short of what the FCC says it released. The bulk download on which this archive was based contains, according to the FCC, about 2.5 million comments, but as best as we can determine, there simply aren’t that many comments in the archive. It’s difficult for us to be sure, however, because the format in which the comments were released was extremely challenging to parse.
The first seven files in the zip archive contained about 725,000 comments, which aligns with what the FCC announcement told us was the number of submissions posted to the agency’s Electronic Comment Filing System. But the FCC also said it is including email comments that didn’t make it into their main system, and we surmise that the remaining files in the zip archive were these comments. This chunk of comments, however, was concatenated together and then arbitrarily chunked into output files, with no delimiter characters between either one comment and the next or one metadata field and the next, such that it was almost impossible to separate comments from one another.
As was true in round one, we fail to see how the FCC arrived at the count that was widely publicized. Clearly, 1.67 million documents is far short of 2.5 million (the number reported in the commission’s [blog post](http://www.fcc.gov/blog/fcc-releases-open-internet-reply-comments-public)). We spent enough time with these files that we’re reasonably sure that the FCC’s comment counts are incorrect and that our analysis is reasonably representative of what’s there, but the fact that it’s impossible for us to know for sure is problematic, and while we laud the FCC for its good intentions in releasing this data in bulk, we expect better-quality releases from federal agencies to the public. The technical difficulties plaguing the FCC that have hampered their collection of public feedback in this rulemaking are, at this point, well-documented, and it’s clearer now than ever that the FCC needs to make a serious investment in technical infrastructure if it wants the community to seriously engage with its data. Thankfully, it seems that FCC technical staff is aware of these problems, because this kind of release just isn’t good enough.
## Comment data
As with the first round, we’re pleased to make available a cleaned up version of the bulk comments for this round of comments. We’ve split the comments from the FCC dump into individual JSON files (one per comment), including both the ECFS comments and the mangled email messages, and also parsed and split an aggregate submission from FreePress representing several thousand comments that showed up as one unintelligible comment in the FCC data.
As a general rule in processing and counting documents, we treated each document submitted to ECFS or received by email as one submission. In certain cases where it was clear that a single submission contained large numbers of distinct comments aggregated together, we made a best effort where feasible to separate those comments into individual records in our data. Petitions, or other circumstances where a single comment was paired with a list of names, were treated as single comments.
We weren’t able to explore many of the ideas we had during the first round about possible avenues for further investigation, and would heartily encourage researchers interested in this data to download the scrubbed versions and consider doing so.
We’d again like to thank Radim Řehůřek, maintainer of the [gensim](http://radimrehurek.com/gensim/) library, which was crucial to our text analysis.