Sharing sensitive data within government

by Emily Shaw

policy

Feb 11, 2015 6:12 pm

Graphic showing globally connected online devices — (Image credit: Free Press/Flickr)

While institutional research projects depend on review by Institutional Review Boards (IRBs), data-sharing that occurs within governments is authorized in a different way. Governments have regular formal and informal practices that aid in the sharing of sensitive information, including the writing of Memorandums of Understanding (MoUs), deciding to pass enabling legislation, or developing special limited agreements. In addition, the relationships governments develop over time with trusted external partners let researchers work jointly with government, safely creating significant public value from individual-level data.

Data-sharing mechanisms

Data-sharing agreements are frequently necessary because privacy laws restrict sharing within government as well as between government and outside actors, generally limiting the sharing of personally identifiable information (PII) to the agency that collected it. However, there are regular exceptions to this rule–for example, for the purpose of evaluating public-funded services. With a specific exception in mind, agencies can sign MoUs with other agencies to describe the specific, legally-compliant rationale for sharing data, to ensure that the privacy and confidentiality of the data will be maintained, and to describe the roles and responsibilities of the agencies in the data-sharing process.

MoU-assisted interagency data-sharing varies by policy sector and location. Juvenile justice is a popular area for data-sharing, although a Juvenile Justice GPS survey found that only 27 states coordinate data-sharing across their child welfare and juvenile justice systems, although the majority of those states (19) use MoUs as the formal mechanism of collaboration. Health and corrections systems are another space for increased cooperation through MoUs, with states like Texas and Connecticut being good examples of states who have signed data-sharing MoUs for the purpose of better supporting inmates’ health. However, there is substantial room for increasing data-sharing between health and corrections agencies, and the Bureau of Justice Assistance provides helpful resources for government agencies that have yet to negotiate these agreements.

Where state and local agencies do not (or cannot) implement MoUs, state and federal privacy laws can be modified by state and local legislation. If a state agency is interested in having access to data from another agency, for example, and that sharing is potentially prohibited by a data privacy law, the agencies can make their case to the state legislature and get a bill passed to make data-sharing possible. (Minnesota, for example, recently enabled limited interagency county-level data-sharing through legislation.)

Legislation also represented an important early approach to achieving interagency data sharing before these arrangements were more commonplace. Juvenile justice has been a leading area in interagency information sharing, with legally-sanctioned data-sharing occurring between courts, detention, probation and social service agencies. Statewide juvenile justice information sharing initiatives emerged in the early 1990s, when 35 states passed laws to permit data-sharing or to seek data-based evidence in order to improve outcomes connected to juvenile justice. (The federal Office of Juvenile Justice and Delinquency Prevention has for some time offered guidelines for the development of juvenile justice information sharing programs.) State Justice Information Sharing initiatives–efforts to link the databases of state law enforcement, prosecutors, courts and corrections agencies—have also been under development since the 1990s. Like the juvenile justice initiatives, these interagency arrangements were also mainly created through state law.

Where interagency data-sharing relationships are established, governments can produce highly useful, policy-relevant analyses with the data they produce. For example, the Washington State Department of Corrections uses integrated criminal justice data, workforce data and some health and social service data to perform research (as well as make data available for external research.) As a result, the department is able to investigate such questions as the relationship between education programs and employment outcomes for their inmates.

Cooperation between governments and research institutions

By putting together the mechanisms governing government data-sharing and researcher data-management practices, governments and external research organizations are able to find legal ways to work together. Research institutions, vetted by their IRBs and sometimes working as a research consortium, are eligible to respond to formal governmental RFPs connected to data analysis or to propose their own projects using sensitive government data. Governments can then sign data-sharing MoUs with research institutions that let them legally share sensitive data and allow them to get more effective and efficient use out of their data sets, since these external research partners are more likely to have the necessary expertise to get the most from governmental administrative data. Meanwhile, although research consortia are potentially attractive partners for governmental data-holders, they can also create complications legally, since MoUs and data-sharing agreements can be challenging to write as multi-party documents.

While not a formal element, the role of mutual trust seems to play a critical role in the development of productive public-private data-sharing partnerships. MoUs are written with specific language to both limit the potential for releasing PII. They can also be written so as to ensure that government won’t be faced with additional liabilities for sharing data with an external group: for example, agencies can require that organizations submit written drafts to the government for pre-approval. By building relationships and trust over time, public-private collaborations can scale up over a series of projects as external groups demonstrate their utility and reliability with the data.

As governments seek to get the most benefit from the data they hold, they often need external partners to help them integrate and merge data sets together across departmental lines. Common data-integration projects are “longitudinal data systems” (which generally references an education-focused data integration project) or “integrated data systems” (which refer to other cross-agency data integrations.) Because this work occurs between, and not within, individual agencies, and because of the specific expertise required to perform data merging many governments are choosing to create data-sharing relationships with external research institutions in order to achieve the best outcome. Public-private partnerships for data integration aim to use the most current techniques for matching records across data sets, cleaning data and assuring its quality, ensuring the continued protection of PII within merged data, and using the appropriate statistical methods to gain policy-relevant insights from the data.

The example of California’s partnership with research organizations around county correctional data provides a clear example of how external collaboration can substantially improve public data collection and use. As a result of California’s passage of realignment legislation in 2011, the state diverted tens of thousands of inmates from the state to county correctional systems. A major focus of the legislation, in addition to reducing the overcrowding in California prisons, was to identify ways to reduce recidivism. However, in order both to track how the policy change has affected the total state correctional population, as well as to identify any potential interventions that improve recidivism rates, the counties need to implement new methods of data collection and storage. The Public Policy Institute of California has partnered with a number of California counties to help them standardize their correctional data collection and storage, thereby enabling the state to achieve its stated goals.

While the data-sharing that enables these collaborations is not open to the public, it is nonetheless very important to be aware of it because of the benefits that these integrations can achieve for creating policy-relevant information. In addition, this trend suggests that a research institution can serve as a kind of data intermediary which can eventually provide open access to data that’s transformed or aggregated in a way that both retains useful qualities and protects individual anonymity. For example, the Ohio Education Research Center supports the creation of Ohio’s state longitudinal data system and also stores the data as the state longitudinal data archive, a data source to which researchers can apply for access to data. Because the center has expertise in managing data confidentiality, they are trusted to ensure that the data they release to researchers is appropriately secured. They do this in two ways: First, they require researchers to apply for access to the data, including requiring a legal data-sharing agreement and review by the researcher’s IRB. Second, the data the center releases is deidentified and stripped of any variables that could allow individual reidentification.

The National Opinion Research Center (NORC) at the University of Chicago also serves an intermediary function for researchers, although much more broadly and ambitiously. Recognizing both the value of increasing researcher access to individual-level data and the legal concerns about individual identification, the institute has created a “data enclave” that allows researchers who have applied and been accepted for access to use individual-level data from NORC’s extensive collection of government-sponsored survey and administrative data. Researchers access the data through a virtual terminal that keeps all of the data physically located on NORC’s servers. In this way, without having direct access to the files, researchers are nonetheless able to query the data and receive statistical results, after results have been checked to ensure that releasing them does not violate individual confidentiality.

Research organizations can also serve as intermediaries not just between researchers and identified government data, but also between identified government data and the general policy audience. For example, the Providence Plan, which has signed data-sharing agreements with Rhode Island agencies in order to support their data integration efforts, also performs analyses that they then turn into “data stories.” Data stories, such as this exploration of factors predicting youth involvement in the juvenile justice system, use locally gathered data to answer policy questions in an interactive and easily shared format. With external sharing of data, even in an aggregated and transformed state, organizations need to be especially cautious to not inadvertently release any data that could lead to individuals’ identification, so organizations like the Providence Plan have a series of controls on the data they publish, including conservative cell-size restrictions and qualitative checks, that specifically look to eradicate any potential for reidentification.

While there are certainly many successes across the restricted data-sharing movement, it’s important to note that there are also a number of challenges that must be addressed. In an upcoming post, we will explore how one state was tangled by a common set of issues that can temporarily derail even a popular data integration project.