Better Living Through Transparency: The Importance of Models

by

Craniometry: A human skull and measurement device from 1902.

At Sunlight we spend a lot of time exploring ways to open up data sets and make them more accessible. The idea is that data enables us to act collectively, making better informed decisions and building a more effective public sector. When we talk about transparency the focus is often on the possibilities that data offers. But this discussion sometimes ignores the fact that translating data into action is hard.

There’s a reason for this: data alone doesn’t provide answers.

Coming up with solutions to real life problems — like designing an effective and fair tax code or improving health care — requires an understanding of how real life works. Unfortunately, more often than not real life is messy and complicated. In order to make sense of this complexity we need models — approximations of the world that define fundamental mechanics of a given process and reduce it to understandable and meaningful terms.

As Joshua Epstein writes in a clever essay on scientific inquiry, every time we use data to draw a conclusion we also use a model. Sometimes explicitly: when a meteorologist makes a prediction about the weather they use a rigorously designed framework for translating observational data into a forecast. Sometimes not: when I look at the sky and make a prediction I’m using an implicit model based on a mix of past experience and a rather poor understanding of atmospheric processes. Both of us are using models to interpret data and both are based on assumptions about how weather works. I’m just not sure I could explain how mine functions, nor do I have any sense of how well it works.

Having access to good observational data is incredibly important to arriving at useful answers. But well designed and transparent models are equally important. In fact, having a good model is often a prerequisite to determining what to observe and how. If I want to predict the weather should I measure the temperature? Pressure? Wind direction? Where and how frequently? Without a solid theoretical framework it’s often impossible to know where to begin and it’s even harder to know when I’ve made a wrong turn.

When we use a model we embed its assumptions into the results. If key assumptions are incorrect, good data turns into supporting evidence for a potentially misguided answer. Or a bad model might drive the collection of useless data.

It’s rare to find a problem where the proper set of assumptions is obvious and agreed upon by everyone involved. But the higher the stakes, the more important it becomes that we get those assumptions right.

A case in point: the LA Times’ release of an analysis of the performance of individual teachers in the Los Angeles Unified School District (LAUSD). The paper’s database names over six thousand elementary school teachers and provides a model, known as valued-added measurement (VAM), for evaluating their impact in the classroom. The model compares a student’s test scores against the previous year’s performance, and from the difference calculates the “value” that a specific teacher added over the year.

On one hand this project is a tremendous validation of the transparency movement and an enterprising piece of journalism. The Times was able to build its database thanks to the disclosure of seven years’ worth of student testing data — just the kind of granular, high-value data that we hope to see released throughout government. And there’s little question about the importance of such performance reviews. By many accounts, the LAUSD has serious problems. Evaluating teacher performance is considered a key component of reform, and the school district’s own review has been bogged down by a debate with its union. The availability of testing data allowed the Times to analyze teacher performance directly, bypassing gridlock within the school system and empowering the public to hold individual teachers accountable for their performance.

On the other hand, there’s a reason why the school system has been slow to develop its own evaluation system: developing an objective, robust model for evaluating teacher performance is incredibly hard. Developing an objective, robust model that everyone agrees on is nearly impossible.

The value-added metric employed by the Times is probably helpful in assessing performance. A substantial body of research has indicated that this sort of modeling is a potentially useful indicator of a teacher’s impact in the classroom. However, it is based on several assumptions, including:

  • Standardized test scores in math and English are useful indicators of student achievement;
  • Year-over-year changes in student scores are an accurate measure of a teacher’s impact;
  • In aggregate, examining changes in an individual student’s scores reduces the need to control for the unique educational challenges students might face;
  • Value-added measurements allow comparisons between teachers and schools with vastly different student populations.

The implications of these assumptions are by no means agreed on within the educational community. As a result there are a number of different ways to do value-added measurement and it’s not a settled matter which is the best. Even in the ideal case these methods offer a spectrum of certainty based on the quality of the inputs.

More significantly, there are open questions about how to use these techniques in high-stakes situations. Very few school districts currently employ value-added measurement in their performance reviews. Where it is used, it is only a part of a larger evaluation process — in many cases it contributes about one third of a teacher’s final score.1

The Times mentions these concerns in its reporting but takes the stance that releasing the results from their valued-added model is a useful step forward, even if it is an imperfect or incomplete measure. Because the district was not proactive in developing these methodologies itself some kind of public intervention is now required.

It seems clear something needs to be done, but it’s important to recognize exactly what the Times is contributing to this debate. They’re not just releasing data, they’re taking a stand on how to evaluate teacher performance. This distinction is important and perhaps not immediately obvious.

For example, Kevin Drum at Mother Jones defended the paper’s actions by saying, “The data is public, and either you believe that the press should disseminate public data or you don’t.” He later revised that comment, saying he meant that it’s the paper’s right and responsibility to “disseminate meaningful public data.”

I don’t disagree with those assertions. But, at least in this instance, I think the premise is flawed — if this were only about dissemination of public data there would be no debate. What’s at issue is the definition of “meaningful.” Because the district failed to develop a definition itself it is now up to the paper and its readers to decide if value-added measurement offers meaningful results. Drum thinks this is ok:

If the tests really are poor indicators of short-term student performance, perhaps this project will make that clear. Parents, principals, and fellow teachers probably have a pretty good sense already of who the good and bad teachers are, and if the value-added testing metric used by the Times turns out to be wildly at variance with this sense, it should provoke a serious rethink.

This seems to be a likely outcome but an unfortunate one as well. We’re at once arguing that we need objective, quantitatively driven metrics for analyzing performance and at the same time in the absence of agreement on how those metrics work we will use the informal opinions of parents and colleagues (an implicit and arguably flawed model) to calibrate our approach.

We can and should expect better.

The same quantitative tools that allow us to build a value-added measurement system also allow us to interrogate and refine our methodologies. Unfortunately this sort of process is often seen as inconveniently technical and at odds with the with the desire for concrete answers. Jay Mathews at the Washington Post offers an interesting discussion about how this tension can play out in journalistic settings and importance of getting it right in cases like the Times’ database.

Those of us in the transparency movement have a responsibility to illuminate not only the data, but the ways in which data are collected and analyzed. After all, our goal is about more than providing an answer, it is about making transparent the processes though which answers are derived.

  1. For an overview of the debate regarding valued-added measurement, see “Getting Value Out of Value-Added” from the National Academies Press. Or see this for an interesting discussion of some of the methodological implications involved.