Some history and context on VAM in teacher evaluation

In the Columbia Journalism Review’s Tested: Covering schools in the age of micro-measurement, LynNell Hancock provides a rich survey of the history and context of the current debate over value-added modeling in teacher evaluation, with a particular focus on LA and NY.

Here are some key points from the critique:

1. In spite of their complexity, value-added models are based on very limited sources of data: who taught the students, without regard to how or under what conditions, and standardized tests, which are a very narrow and imperfect measure of learning,

No allowance is made for many “inside school” factors… Since the number is based on manipulating one-day snapshot tests—the value of which is a matter of debate—what does it really measure?

2. Value-added modeling is an imprecise method whose parameters and outcomes are highly dependent on the assumptions built into the model.

In February, two University of Colorado, Boulder researchers caused a dustup when they called the Times’s data “demonstrably inadequate.” After running the same data through their own methodology, controlling for added factors such as school demographics, the researchers found about half the reading teachers’ scores changed. On the extreme ends, about 8 percent were bumped from ineffective to effective, and 12 percent bumped the other way. To the researchers, the added factors were reasonable, and the fact that they changed the results so dramatically demonstrated the fragility of the value-added method.

3. Value-added modeling is inappropriate to use as grounds for firing teachers or calculating merit pay.

Nearly every economist who weighed in agreed that districts should not use these indicators to make high-stakes decisions, like whether to fire teachers or add bonuses to paychecks.

Further, it’s questionable how effective it is as a policy to focus simply on individual teacher quality, when poverty has a greater impact on a child’s learning:

The federal Coleman Report issued [in 1966] found that a child’s family economic status was the most telling predictor of school achievement. That stubborn fact remains discomfiting—but undisputed—among education researchers today.

These should all be familiar concerns by now. What this article adds is a much richer picture of the historical and political context for the many players in the debate. I’m deeply disturbed that NYS Supreme Court Judge Cynthia Kern ruled that “there is no requirement that data be reliable for it to be disclosed.” At least Trontz at the NY Times acknowledges the importance of publishing reliable information as opposed to spurious claims, except he seems to overlook all the arguments against the merits of the data:

If we find the data is so completely botched, or riddled with errors that it would be unfair to release it, then we would have to think very long and hard about releasing it.

That’s the whole point: applying value-added modeling to standardized test scores to fire or reward teachers is unreliable to the point of being unfair. Adding noise and confusion to the conversation isn’t “a net positive,” as Arthur Browne from The Daily News seems to believe; it degrades the discussion, at great harm to the individual teachers, their students, the institutions that house them, and the society that purports to sustain them and benefit from them.


About Ming Ling
I’m an educator and researcher (trained as a cognitive scientist) who is passionate about understanding and improving how people learn. In my professional and personal lives, I seek to integrate research on learning with real-life practices that actually make a difference in how learning happens.

One Response to Some history and context on VAM in teacher evaluation

  1. I can’t help but wonder if part of the allure of VAM is that it’s seemingly objective. Unlike other methods of teacher evaluation, in which a principal or other administrator’s personal relationship with a teacher may have an impact, in VAM you just put numbers into an model and it spits out a rating. Furthermore, it removes the onus from having to do a real performance evaluation—something pretty much everyone hates doing—from administrators.

    What people don’t realize is that the building of the model itself is partially subjective. Each researcher, each statistician, has biases about what variables should be included and how to weight them. And as I noted in my previous post, small changes in the model can result in drastic differences in individual ratings. So the model isn’t so much objective as it is detached and impersonal—shielded from biases for or against any particular teacher, but potentially biased as a whole. And like any computer-based system, it is prone to spitting out results that don’t make any sense to a thinking person (like IBM’s Watson recently illustrated, by answering “What is Toronto?” for the category “US Cities”).

    And what people also don’t understand or appreciate is the magnitude of statistical noise that can be included in such a model—the lay public seems to have relatively little understanding or appreciation of statistical noise in general and that seemingly meaningful differences can actually appear purely by chance. In my work, the biggest difference between me and my non-scientist colleagues is that I begin with the assumption that small differences are statistical noise, while they spend a lot of time trying to decipher 2-3% changes in student performance measures, particularly when two measures that theoretically should move in tandem don’t (e.g., if one goes up by 2% and the other goes down by 3%). I spend a lot of time arguing that these differences are not meaningful, with inferential statistics to back it up, and it often falls on deaf ears.

    Some of these models are so noisy that it’s the equivalent of deciding the fate of one teacher at each school (or even each grade level) with a flip of a coin. Even if the model is perfectly justifiable and fair for the rest of the teachers, why should anyone’s career advancement be governed by randomness?

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: