Some history and context on VAM in teacher evaluation

In the Columbia Journalism Review’s Tested: Covering schools in the age of micro-measurement, LynNell Hancock provides a rich survey of the history and context of the current debate over value-added modeling in teacher evaluation, with a particular focus on LA and NY.

Here are some key points from the critique:

1. In spite of their complexity, value-added models are based on very limited sources of data: who taught the students, without regard to how or under what conditions, and standardized tests, which are a very narrow and imperfect measure of learning,

No allowance is made for many “inside school” factors… Since the number is based on manipulating one-day snapshot tests—the value of which is a matter of debate—what does it really measure?

2. Value-added modeling is an imprecise method whose parameters and outcomes are highly dependent on the assumptions built into the model.

In February, two University of Colorado, Boulder researchers caused a dustup when they called the Times’s data “demonstrably inadequate.” After running the same data through their own methodology, controlling for added factors such as school demographics, the researchers found about half the reading teachers’ scores changed. On the extreme ends, about 8 percent were bumped from ineffective to effective, and 12 percent bumped the other way. To the researchers, the added factors were reasonable, and the fact that they changed the results so dramatically demonstrated the fragility of the value-added method.

3. Value-added modeling is inappropriate to use as grounds for firing teachers or calculating merit pay.

Nearly every economist who weighed in agreed that districts should not use these indicators to make high-stakes decisions, like whether to fire teachers or add bonuses to paychecks.

Further, it’s questionable how effective it is as a policy to focus simply on individual teacher quality, when poverty has a greater impact on a child’s learning:

The federal Coleman Report issued [in 1966] found that a child’s family economic status was the most telling predictor of school achievement. That stubborn fact remains discomfiting—but undisputed—among education researchers today.

These should all be familiar concerns by now. What this article adds is a much richer picture of the historical and political context for the many players in the debate. I’m deeply disturbed that NYS Supreme Court Judge Cynthia Kern ruled that “there is no requirement that data be reliable for it to be disclosed.” At least Trontz at the NY Times acknowledges the importance of publishing reliable information as opposed to spurious claims, except he seems to overlook all the arguments against the merits of the data:

If we find the data is so completely botched, or riddled with errors that it would be unfair to release it, then we would have to think very long and hard about releasing it.

That’s the whole point: applying value-added modeling to standardized test scores to fire or reward teachers is unreliable to the point of being unfair. Adding noise and confusion to the conversation isn’t “a net positive,” as Arthur Browne from The Daily News seems to believe; it degrades the discussion, at great harm to the individual teachers, their students, the institutions that house them, and the society that purports to sustain them and benefit from them.

What the reformers aren’t reforming

Back in December I shared this Answer Sheet blog post with some friends. The ensuing discussion revealed a disconnect between how those of us who work in education perceive the issues and how the lay public perceives the issues. This is my attempt to bridge the disconnect.

In the above post Valerie Strauss and Jay Mathews, both veteran journalists on the education beat, debate the merits of KIPP, Teach for America, and other aspects of education reform. Mathews tends to support the current wave of reforms—standardized testing, teacher merit pay, charter schools—while Strauss tends to be a skeptic. My non-education friends tended to side with Mathews, at least on the point these reforms are better than no reforms. Ming Ling and I sided with Strauss. [ML: Specifically, I agreed with Strauss’s concerns that current high-stakes accountability systems miss a lot of important information about teaching, that effective teaching requires ongoing training and support, and that improving education requires systemic policy. But I also agree with Mathews’ observations that KIPP, TfA, and charter schools have demonstrated many worthwhile achievements, and that “Fixing schools is going to need many varied approaches from people with different ideas.”]

These reforms focus on the incentive and regulatory structure of the school system. Proponents of these reforms believe that loosening the restrictions on schools and teacher pay, coupled with incentives and punishments, will let market forces take over. And market forces will lead the nation’s finest minds to the education industry, who will then find ways produce a better-quality “product” at a lower price so they can reap the rewards. The idea is intuitively appealing, but the collection of specific reform proposals contains serious flaws and don’t address key structural failures in our education system. There are reasons to believe these reforms are worse than no reforms at all.

The Flaws
Those pushing to change the incentive structure through test-based accountability, including merit pay, are assuming the tests are adequately measuring important educational outcomes. Questioning this orthodoxy is like standing next to a straw man with a large bounty on his head. To quote Jay Mathews from the post linked above,

Test-driven accountability is here to stay. Politicians who try to campaign against it are swiftly undercut by opponents who say, “What? You don’t want to make our schools accountable for all the money we spend on them?”

Nobody is against holding schools accountable for the taxpayer money spent. But holding schools accountable to a bad standard, particularly a high-stakes one, can distort people’s behavior in a counterproductive way. It’s called Campbell’s Law and social scientists have documented the effect in all areas of life, including a recent finding that texting bans may actually increase accident rates.

One doesn’t have to look very far to find examples of school districts outright cheating or otherwise trying to game the system. The Death and Life of the Great American School System by Diane Ravitch is full of such examples. Something about the standardized-testing-driven incentive structure is clearly not working as intended, but rather than stopping and developing better ways of measuring learning, the current reform crowd wants to plow ahead and raise the stakes again by linking teacher pay to this flawed system of accountability. They are not interested in asking the hard questions about what they’re really measuring and influencing.

But even if we assume the tests are good and the incentives are in the right places, it is difficult to see why market forces will necessarily improve the education system in the way those reformers are claiming. In politics, market forces are supposed to expand the pie, not equalize the slices. Despite very strong incentives to be elite athletes and coaches, a significant portion of the population can’t even run a mile without getting winded. But those who support market-based education reforms still use egalitarian rhetoric—appeals to closing the achievement gap, equipping all citizens with the tools they need to succeed.

In-school vs. non-school factors

From How to fix our schools, in which Richard Rothstein, of the Economic Policy Institute, critiques Joel Klein’s and Michelle Rhee’s approach of focusing only on firing incompetent teachers as a means to improve schools:

“Differences in school quality can explain about 1/3 of the variation in student achievement. But the other 2/3 come from non-school factors.” In-school factors go beyond teacher quality: school leadership, curriculum quality, teacher collaboration. Non-school factors include economic consequences of parental underemployment, such as geographic disruption, malnutrition, stress, poor health.

