Clearing up the confusion about our analysis of net neutrality comments to the FCC

by and

The debate over net neutrality is anything but neutral. That’s abundantly clear from the intense response and questions unleashed by Sunlight’s latest analysis of public comments on the Federal Communications Commission’s proposal to regulate Internet traffic.

Our commitment to transparency, to open data and to the informed use of that data prompted us to do this follow up to address some of the concerns and reaction:

  1. A number of groups on the pro-net neutrality side of the debate are telling us that they submitted far more comments than we found in the download from the FCC. As we pointed out in our initial post, there’s a big discrepancy between the number of comments the FCC says it received and the number we were able to find in the files the agency released to the public.
  2. The conservative group that appears to have generated the vast majority of comments in the second set of comments we analyzed said we confirmed it “won” the comment period. In fact, as we were careful to point out in both our first and second post, these numbers cannot be read the same way as a baseball score. That’s partly because of data noise (more on that below) and partly because of the way those numbers were generated, both factors which we went to some pains to elucidate in our post.

Also important to underscore: Sunlight, a nonpartisan nonprofit that advocates for more transparency, undertook this analysis in service of that mission. Our aim is to lift the veil on how government decision makers are influenced, who is doing the influencing and what, if any, special interests or agendas might be behind those efforts.

Now, to the data.

As of now, it looks like there’s a major discrepancy between the numbers of comments the FCC reported receiving and the number we actually found in files they released to the public. This is something we pointed out in our earlier posts, but since it has become an issue, let’s be crystal clear: At this point, there’s a difference of 1,124,656 between what the FCC is reporting and what we counted in the files the agency provided.

Moreover, groups such as Battle For the Net (Free Press, Fight for the Future and Demand Progress) and ColorOfChange.org insist they sent the FCC far more comments than we were able to find in the data released.

Fight for the Future, in particular, disagreed with the counts in our analysis, claiming that one of its form letter campaigns produced 367,000 comments in the FCC’s dump. Upon further examination, we believe Fight for the Future didn’t actually count the number of distinct documents from its campaign that occurred in the dump, but rather did only a rudimentary full-text search for key phrases from the campaign to see how many times those phrases occurred. This failed to account, however, for the fact that some comments are duplicated (that is to say, occur more than once with identical text, submitter and unique ID number) within the data, likely because of sloppy exporting processes on the part of the FCC. Indeed, after closer inspection to confirm our numbers, we found 96,263 comments that are included more than once in different parts of the export file, which we had correctly excluded from our analysis during our initial data-cleaning process. We thus stand by our numbers as reported, and continue to maintain that they accurately characterize the data as it appears in the dump produced by the FCC.

Whether the FCC’s dump actually includes all documents they received is another story, but figuring out whether there are missing comments and why is a project for the FCC, not Sunlight. Our experience working with these records, however, suggests several possible explanations:

  1. Counting comments versus counting signatures: Some submissions are single PDF documents containing numerous comments and signatures. They list a number near the beginning that the FCC may be using to perform its counts. That number is the total combined count of (a) comments with signatures, and (b) a separate list that is comprised only of signatures. For our purposes, we were interested in counting comments and did not take signature-only submissions into account. This isn’t to suggest that signature-only submissions shouldn’t be counted, but the focus of our report meant that we discarded them. In many cases, the difference between comment-plus-signature counts and comment-only counts is extreme. In this document from Free Press, for instance, the reported count is 60,269 whereas we identified 14,848 comments. Presumably, the difference (45,421) is attributable to the list of signatures that begins on page 1,240 and continues to the end of the document. Once again, we’re not impugning the count provided by Free Press, but for our project, we explicitly chose only to count comment submissions.
  2. Faulty import processes for CSV submissions: All of the reported cases of under-representation involve campaigns that opted for CSV submission. It seems possible, especially given what we know about the technical challenges that are already well-documented at FCC.
  3. Faulty export processes for CSV submissions: Even if the submission process went smoothly, another point of possible failure would be the export process that generated the downloadable bulk files. As we’ve already reported, these files showed inconsistent formats, terrible encoding mistakes and often lacked any record or field delimiters. It’s not unreasonable to suggest that sufficient care was not taken in producing the bulk downloads.

As we pointed out in the beginning of the post, we relied upon FCC data. It’s worth repeating: There’s a big discrepancy between the number of comments the FCC says it received and the number we were able to find in the files released to the public. Moreover, it’s worth reiterating that our purpose was never to disclose who were the “winners” or “losers” of the public comment period on net neutrality. Others may have used our analysis to make such a call, but we never did.

If nothing else, this exercise points out both the value and the shortcomings of government data and suggests that for the sake of decision makers, and the taxpayers who pay them, it might be worth investing in a more modern, reliable way of compiling and reporting this information.