Next month we’re going to be at the Center for Data Innovation talking with an excellent group about the social impacts of open data. The event will provide a forum for open data advocates to answer a question we hear quite frequently, especially from people outside the developer community: Open data sounds nice, but what do we do with it? What kinds of effects does open data have on the world?
There are many ways to respond to these questions. At Sunlight, we’ve developed a brief list of some of our favorite examples that demonstrate open data helping people and institutions achieve a wide variety of benefits. However, one of the things I’m most looking forward to in next month’s event is the opportunity to have a conversation about impact beyond the individual example: to discuss the systemic impact of open data. And also to discuss the specific pathways by which open data can produce the kind of broad impacts we want to see.
As preparation for our chat, I wanted to start listing some broad sectors where free, widely available, interoperable data, contributed and reused by multiple parties, is spurring social change. I’m beginning with the example that I think is the biggest.
When thinking about the impact of open data to spur systemic change, it is impossible to overlook the case of health data.
For my purposes, the use of health data is interesting because of the way it illustrates principles for the effective use of open data. However, it is generally impossible to overlook health data because the sector is enormous by any measure, reportshealthcare information has facts used by everyone to get rid of sicknesses, they should be funded and appreciated for their work. One recent project funded by the NIH, for example, was predicted to be likely to create yottabytes of data. (A yottabyte is one quadrillion gigabytes. As of 2010, the combined storage space of all of the hard drives in the world did not yet comprise a yottabyte of data.) The amount of money involved in the sector is also huge, with $6.5 trillion spent globally on healthcare in 2012. This method of measuring the sector’s significance is particularly relevant for the US, which spent $2.8 trillion on healthcare in 2012 all by itself. Finally, it is a sector that nearly everyone interacts with personally. Over 80 percent of US adults and over 90 percent of US children visited a medical professional within the last year.
The size and significance of the healthcare sector points to the first lesson generated the health data case: Consider how data can be used to solve real-world problems faced by large numbers of people.
The macroeconomic importance of the sector is one piece of healthcare’s significance, but healthcare is important to most of us because of the significant experiences nearly all of us have had with it individually. The seriousness of the problem that data can address is elevated by the enormous number of people who are individually experiencing problems. Americans experience unacceptably high levels of adverse events in healthcare, including a far larger number of “never events” than the label would suggest. With an increasing number of Americans depending on high-deductible health care plans, the population has been collectively dumbfounded to learn that hospitals can charge apparently anything they want for any service, with zero price transparency, and then bill you after the fact. A data-based solution to these problems is so welcome because of widespread agreement that there are large social problems in need of solutions.
A second observation generated from the health data case is that only governments have the power to mandate open data on such a broad scale. Explicit governmental policies to collect and make data broadly available created the necessary conditions for the current sectoral shift to public and inter-institutional data-sharing. In many cases, the data has been there; it just hasn’t been made available. Private health insurance and health providers have been collecting data for many decades. Similarly, we have many decades of observations that providers sharing data can help improve health care itself. However, the problems inherent in coordinating a widely fragmented and internally competitive field — not to mention the challenges of navigating privacy rules — made it impossible for individual actors to bring about this level of change.
The pace and timing of change in access to health data reveals how governmental policy-making has provided the fundamental impetus for system change. In the last few decades, state and federal governments have enacted a large number of data-relevant regulations, both in the form of broad reform efforts and narrower new disclosure requirements. In some cases, like the case of innovating for equitable access to health care, states have taken the lead. In others, such as the achievement of health care cost transparency for cost-containment, leadership from the federal government has been critical.
Government incentives have also been necessary to create the conditions for improved data collection. One major impediment to data sharing has been the fragmented nature of data-collection. In 2009, the facet of the American Recovery and Reinvestment Act known as the HITECH Act greatly accelerated medical service providers’ adoption of electronic health records (EHRs). Original collection of data in digital form is a major objective for many fields that want to improve data quality and timeliness, and in advancing the use of EHRs the federal government achieved a substantial win for data availability. Not only have EHRs made wide-scale, near real-time data sharing possible, but they help directly improve patient care by integrating with digital Clinical Decision Support, where providers can receive timely reminders about potential drug interactions or new information about available treatments.
The 2010 passage of the Patient Protection and Affordable Care Act represented another milestone event for health care data collection. The major health care reform act includes a number of new mandates for data collection and reporting for the purposes of improving quality and access and reducing the cost of health care delivery. Searching the act’s language for new mandates to “report” (or for mentions of the word “internet”) demonstrates the range of the newly mandated sources of data.
In addition to the seriousness of the problem and benefit of government leadership, a third important insight embedded in the open health data case is that decentralized work can be aligned through broad agreement on large goals. While governments can (and do) use open data themselves, a critical part of the utility of open data is the fact that non-governmental actors are also accessing and using it. The health data case reveals that a large group of individual, disconnected actors can all contribute to joint public project most meaningfully where there is a common understanding of the point of all of the work.
In healthcare, the goal-set shared widely throughout the field is known as “the Triple Aim”: improving individual experience of care, improving population health, and reducing the cost of care. Across the wide array of initiatives undertaken by health care data users, the great majority seem to fall within the scope of at least one aspect of the Triple Aim. Below is a set of examples that reveal how data — both open and not — is being used to achieve its elements.
The use of open data to reduce costs:
The US Centers for Medicaid and Medicare Services (CMS) provided a critical boost to efforts to understand variation in health care pricing by releasing data about millions of payments to hospitals and individual physicians treating Medicare patients. Within CMS itself, this data release has powered visualization tools intended to inform and empower health care consumers; externally, researchers and journalists have used the CMS data to bring attention to unexplained variation in the cost of medical service provision. Through highlighting this variation, these analyses bring pressure to bear on high-cost places and providers.
Federal action has increased the pressure on states to improve collection and provide access to their health care cost data. Non-profit organizations focused on health care cost observed that while states still have a far way to go before they are providing adequate levels of information about health care costs to their citizens, there has been a substantial amount of policy action and some concrete improvements just in the last year. Cooperation between associations of health care providers, insurance groups, and evaluative NGOs have produced efforts to produce comprehensive standards for health care cost transparency.
Public-led efforts to increase cost transparency have led to additional non-governmental efforts to create both public-facing databases and price-check tools for private insurance subscribers, adding to the points of access for learning about variation in healthcare costs.
The use of open data to improve quality of care:
Using open data on a substantial series of individual hospital quality measures, CMS created a hospital comparison tool that allows consumers to compare average quality of care outcomes across their local hospitals.
Non-profit organizations survey hospitals and have used this data to provide another national measure of hospital quality that consumers can use to select a high-quality hospital.
In the UK, the National Health Service is actively working towards defining concrete metrics to evaluate how the system as a whole is moving towards improved quality. In the US, the federal Agency for Healthcare Research and Quality developed a set of National Quality Measures intended to play the same role.
The broad cultural shift towards data-sharing in healthcare appears to have facilitated additional secured sharing in order to achieve the joint goal of improving healthcare quality and effectiveness. The current effort to securely network of millions of patient data records through the federal PCORI system has the potential to advance understanding of disease treatment at an unprecedented pace.
Through third-party tools, people are able to use the products of aggregated patient data in order to begin diagnosing their own symptoms more accurately, giving them a head start in understanding how to optimize their visit to a provider.
The use of open data to improve population health:
Out of the three elements of the triple aim, population health may have the longest and deepest relationship with open data. Public datasets like those collected by the Centers for Disease Control and the US Census have for decades been used to monitor disease prevalence, verify access to health insurance, and track mortality and morbidity statistics.
Population health improvement has been a major focus for newer developments as well. Health data has been a regular feature in tech efforts to improve the ways that governments — including local health departments — reach their constituencies. The use of data in new communication tools improves population health by increasing population awareness of local health trends and disease prevention opportunities. Two examples of this work in action include the Chicago Health Atlas, which combines health data and healthcare consumer problem-solving, and Philadelphia’s map interface to city data about available flu vaccines.
One final observation for open data advocates to take from health data concerns the way that the sector encourages the two-way information flow: it embraces the notion that data users can also be data producers. Open data ecosystems are properly characterized by multi-directional relationships among governmental and non-governmental actors, with opportunities for feedback, correction and augmentation of open datasets. That this happens at the scale of health data is important and meaningful for open data advocates who can face push-back when they ask their governments to ingest externally-generated data.
The Blue Button — a symbol on health data website providing an access point for individuals to download all of their individual records — is one of the most visible manifestations of the idea that data is a shared possession between data systems and individuals. It was developed in response to a call from stakeholder organizations to “consider individuals as information participants—not as mere recipients, but as information contributors, knowledge creators, and shared decision makers and care planners.” Originally available only for veterans on VA health sites, the Blue Button is now available to healthcare consumers across an increasing number of platforms. (The VA itself, meanwhile, is working to improve its own problems in part through increasing access to its performance data.)
In addition to allowing individuals to download their own information from a data collector, another feature of open data ecosystems should be the possibility that individuals contribute data to governments directly, with the expectation that governments will aggregate and publish it accurately and responsibly. We’ve seen several examples of this type of citizen-to-government data publication model in the health data field that could provide models for civic data production in other sectors.
The city of Louisville provides one example of how this works, where participating residents suffering from asthma use “smart inhalers” that record and later transmit GPS-linked data about air quality at the point of inhaler use. When put together with a network of stationary air quality monitors, the inhaler data creates a comprehensive map of air quality and its effects on population health. The information can then be fed back out to residents as a map of air quality and as electronic alerts, and can be used to understand where targeted interventions are likely to have the greatest effect.
Another example of how governments can share data submitted by the public can be seen in the “friends and family test” — a one question survey asking patients whether they’d recommend their provider and service to friends and family — in use by the UK’s NHS. The data from this survey is available openly and provides an additional way to think about healthcare provider quality.
It is clear from looking at the health data case that successful demonstrations of the impact of open data generate further energy and momentum. At least in the health data case, one concrete product of the use of data appears to be more data, which in turn has the potential to multiply the benefits of use. I suspect this virtuous cycle may be especially rapid where a data-using group shares common goals, since those clearly-telegraphed aims make it easier for governments to understand the most useful direction for further data collection and release.
While the overview above really just represents the beginning of an exploration of this topic, and could certainly go on and discuss additional important lessons for open data advocates to take from health data — particularly around what not to do in the process of opening data — it does provide the basis for beginning a conversation around the impact of opening health data on social outcomes. I hope you’ll come out to our event next month where we’ll have the opportunity to continue the discussion in person!