When can we legally share protected data?

by
(Photo credit: Thomas Leth-Olsen/Flickr)

As Sunlight explores the public use of individual-level data — with a special focus on how this works in the criminal justice system — we are also exploring the practical, legal and ethical challenges of working with individual-level data. One big reason why individual-level data is often prevented from being shared publicly is for the protection of individual privacy. In thinking about this tension between privacy rights and open government data, we have so far focused on the legal and technical context underlying the open sharing of individual-level “microdata.”

At the same time we investigate open data sharing — meaning data that anyone can download and analyze anytime, anywhere — we recognize that it can also be useful to consider how the public can benefit from data shared privately, not openly. Though average citizens cannot access a number of data sets maintained by government because of privacy concerns, they do nonetheless benefit from the work of researchers, both inside and outside of government, who have been able to share that data with each other. For instance, many states have created integrated justice information systems to allow law enforcement, courts and corrections to coordinate more effectively. States like Washington and Oregon have also used interagency data sharing programs to evaluate the effectiveness of programs, like re-entry programs for prisoners and rehabilitation for minors with mental health concerns, or in the arizona substance abuse treatment centers. While these datasets cannot be legally shared in the open, they can be shared privately in highly valuable ways.

Because of privacy law, some datasets are not only not “open,” but they are in fact very “closed” and only made available through highly restrictive procedures. Under typical privacy law, data users must meet high standards and establish relationships with the data providers to create credibility and trust. While one of the strengths of the open data movement is the potential for unexpected benefits that come from the free and unrestricted release of data, the open release of individual-level data can come with certain risks. Fears of harm stemming from the release of identified data has led to many legal restrictions on publishing it openly.

In the arena of criminal justice, there is an unusual degree of variation in whether personally identifiable information (PII) is protected. On the one hand, some criminal justice data is entirely unprotected — think of mugshots on page one of the morning paper or details from public courtroom testimony. On the other hand, the field has many, many datasets that are protected with the same care and confidentiality granted to medical records. While the criminal justice system disseminates sensitive, PII-containing data every day, it restricts access to much of it. Knowing this, we sought to understand what is currently happening with this restricted information. How are the agencies which hold this valuable, individual-level data currently getting value from it?

To reduce the risk that the privacy rights of citizens will be compromised, government agencies most often release data to a known group of users who’ve demonstrated that they will honor the terms under which data is disclosed. Government agencies release most of their sensitive data to analysts and researchers working in other government agencies, or to researchers in external research institutions who have passed rigorous application processes.

Separate clauses within data privacy laws create space for these two types of data users to use information that contains PII. First, internal governmental research use that occurs across agency boundaries is often justified in terms of improving the effectiveness of the public program. This cross-agency collaboration happens extensively in the context of juvenile justice, which remains an important focus for many data integration efforts. For example, Arizona is known for pioneering work in juvenile justice data integration that has allowed it to analyze trends and effectiveness. Oregon has also been able to use its data integration to analyze the interaction of mental health and juvenile detention through the lens of its juvenile population’s access to social services. Data integration thereby provides useful insights into the interplay of mental health, available services, and justice processes and outcomes. Early data integration success has paved the way for calls to integrate more data across agencies. For example, the frequent intersections between health services and criminal justice agencies — from the need to provide mental and physical health care to correctional inmates, to the law enforcement officer’s need to know whether there’s a medical reason for a person’s erratic behavior — are recognized as a space where there is high potential value for better integrating individual-level criminal justice and health data.

Meanwhile, external institutional researchers who satisfy strict application requirements can also gain access to restricted data under provisions that permit sharing data for study or research use. In both cases, the data users have won access to the datasets not through increasing the availability of the data, but through satisfying existing requirements for legal data-sharing. This work can also produce important dividends for government services. For example, where state and local governments would like to use of social impact bonds or other “pay for success” mechanisms to fund program improvement, data integration is essential: Researchers must be able to link data in order to demonstrate that a program has had a measurable effect.

In upcoming posts, we will explore: the regulatory infrastructure that leads institutional researchers to be viewed as trustworthy data-sharing partners for sensitive microdata; we will explore the legal means by which internal governmental data-sharing is enabled; and we will explore some of the challenges that governments face as they work to transition to a data-sharing model.