Reasons to Not Release Data, Part 8: Privacy

by and

Earlier this month, we shared a crowdsourced collection of the top concerns data advocates have heard when they’ve raised an open data project with government officials at the federal, state, and local level, and we asked for you to share how you’ve responded. Dozens of you contributed to the project, sharing your thoughts on social media, our public Google doc, and even on the Open Data Stack Exchange, where 8 threads were opened to dive deeper into specific subjects.


Drawing from your input, our own experience, and existing materials from our peers at the National Neighborhood Indicators Partnership and some data warriors from the UK, we’ve compiled a number of answers — discussion points, if you will — to help unpack and respond to some of the most commonly cited open data concerns. This mash-up of expertise is a work in progress, but we bet you’ll find it a useful conversation starter (or continuer) for your own data advocacy efforts.

Click here to see other posts in this series.

Over the next few weeks, we’ll be sharing challenges and responses from our #WhyOpenData list that correspond to different themes. Today’s theme is Privacy.


34. We can’t release data because of privacy concerns.
A. It’s classified or confidential / We can’t provide that dataset because one part is classified
  • “Is it actually confidential? Is it already published in some disaggregated way?”

  • “Why are certain parts classified? Has this information been balanced tested to see whether it is in the public interest to release it?”

  • “Is it possible to redact just the parts that cannot be released? Can the confidential parts be excluded, leaving something that’s still useful? There could be valuable information in the parts of the dataset that are not subject to privacy or security concerns.”

  • The U.S. federal FOI law has a segregability clause which supports the idea that a record should be provided if exempt portions are deleted or redacted in some way. Having a part of a record which is confidential or classified is not justification for withholding the entire record.

B. We’re worried about the mosaic effect.
  • “What information specifically leads you to this concern? What do you think are steps that could be taken to address this? Remember, these should be balance tested so that the public interest in this data can be examined. By choosing data ranges or redacting information for data fields of concern, we might be able to address this issue.”

C. We don’t have a communication strategy for explaining why this is good to release even though it looks private / People might feel like we’re releasing private information / It might look like we’re being careless with people’s information
  • “When you release the data, include an explainer of why it is being released. Acknowledge potential perceptions that the data is private, and walk people through the balancing test that was made in the decision to release the information. Explain why it’s in the public interest for the data to be released.”

  • Even though the risk of public frustration might feel bigger, there’s a lot of positive attention waiting in the wings for government agencies willing to open information that was once entirely closed because of privacy concerns. Communication strategies that use community events, like hackathons and meetups, to explore the new data set and talk about its protections can help build public comfort and familiarity with the newly available data, while also acknowledging all the work that goes into making the data suitable for release. For example, the US Department of Health participates in an annual Health Datapalooza, which includes a conference and hackathon centered on the theme of liberating public health data.


Stay tuned tomorrow for our next #WhyOpenData post on “Already” Public Data.