If assessments are diagnoses, what are the prescriptions?

I happen to like statistics. I appreciate qualitative observations, too– data of all sorts can be deeply illuminating. But I also believe that the most important part of interpreting them is understanding what they do and don’t measure. And in terms of policy, it’s important to consider what one will do with the data once collected, organized, analyzed, and interpreted. What do the data tell us that we didn’t know before? Now that we have this knowledge, how will we apply it to achieve the desired change?

In an eloquent, impassioned open letter to President Obama, Education Secretary Arne Duncan, Bill Gates and other billionaires pouring investments into business-driven education reforms (revised version at Washington Post), elementary teacher and literacy coach Peggy Robertson argues that all these standardized tests don’t give her more information than what she already knew from observing her students directly. She also argues that the money that would go toward administering all these tests would be better spent on basic resources such as stocking school libraries with books for the students and reducing poverty.

She doesn’t go so far as to question the current most-talked-about proposals for using those test data: performance-based pay, tenure, and firing decisions. But I will. I can think of a much more immediate and important use for the streams of data many are proposing on educational outcomes and processes: Use them to improve teachers’ professional development, not just to evaluate, reward and punish them.

Simply put, teachers deserve formative assessment too.

Using student evaluations to measure teaching effectiveness

I came across a fascinating discussion on the use of student evaluations to measure teaching effectiveness upon following this Observational Epidemiology blog post by Mark, a statistical consultant. The original paper by Scott Carrell and James West uses value-added modeling to estimate teachers’ contributions to students’ grades in introductory courses and in subsequent courses, then analyzes the relationship between those contributions and student evaluations. (An ungated version of the paper is also available.) Key conclusions are:

Student evaluations are positively correlated with contemporaneous professor value‐added and negatively correlated with follow‐on student achievement. That is, students appear to reward higher grades in the introductory course but punish professors who increase deep learning (introductory course professor value‐added in follow‐on courses).

We find that less experienced and less qualified professors produce students who perform significantly better in the contemporaneous course being taught, whereas more experienced and highly qualified professors produce students who perform better in the follow‐on related curriculum.

Not having closely followed the research on this, I’ll simply note some key comments from other blogs.

Direct examination:

Several have posted links that suggest an endorsement of this paper’s conclusion, such as George Mason University professor of economics Tyler Cowen, Harvard professor of economics Greg Mankiw, and Northwestern professor of managerial economics Sandeep Baliga. Michael Bishop, a contributor to Permutations (“official blog of the Mathematical Sociology Section of the American Sociological Association“), provides some more detail in his analysis:

In my post on Babcock’s and Marks’ research, I touched on the possible unintended consequences of student evaluations of professors.  This paper gives new reasons for concern (not to mention much additional evidence, e.g. that physical attractiveness strongly boosts student evaluations).

That said, the scary thing is that even with random assignment, rich data, and careful analysis there are multiple, quite different, explanations.

The obvious first possibility is that inexperienced professors, (perhaps under pressure to get good teaching evaluations) focus strictly on teaching students what they need to know for good grades.  More experienced professors teach a broader curriculum, the benefits of which you might take on faith but needn’t because their students do better in the follow-up course!

After citing this alternative explanation from the authors:

Students of low value added professors in the introductory course may increase effort in follow-on courses to help “erase” their lower than expected grade in the introductory course.

Bishop also notes that motivating students to invest more effort in future courses would be a desirable effect of good professors as well. (But how to distinguish between “good” and “bad” methods for producing this motivation isn’t obvious.)


Others critique the article and defend the usefulness of student evaluations with observations that provoke further fascinating discussions.

Andrew Gelman, Columbia professor of statistics and political science, expresses skepticism about the claims:

Carrell and West estimate that the effects of instructors on performance in the follow-on class is as large as the effects on the class they’re teaching. This seems hard to believe, and it seems central enough to their story that I don’t know what to think about everything else in the paper.

At Education Sector, Forrest Hinton expresses strong reservations about the conclusions and the methods:

If you’re like me, you are utterly perplexed by a system that would mostly determine the quality of a Calculus I instructor by students’ performance in a Calculus II or aeronautical engineering course taught by a different instructor, while discounting students’ mastery of Calculus I concepts.

The trouble with complex value-added models, like the one used in this report, is that the number of people who have the technical skills necessary to participate in the debate and critique process is very limited—mostly to academics themselves, who have their own special interests.

Jeff Ely, Northwestern professor of economics, objects to the authors’ interpretation of their results:

I don’t see any way the authors have ruled out the following equally plausible explanation for the statistical findings.  First, students are targeting a GPA.  If I am an outstanding teacher and they do unusually well in my class they don’t need to spend as much effort in their next class as those who had lousy teachers, did poorly this time around, and have some catching up to do next time.  Second, students recognize when they are being taught by an outstanding teacher and they give him good evaluations.

In agreement, Ed Dolan, an economist who was also for ten years “a teacher and administrator in a graduate business program that did not have tenure,” comments on Jeff Ely’s blog:

I reject the hypothesis that students give high evaluations to instructors who dumb down their courses, teach to the test, grade high, and joke a lot in class. On the contrary, they resent such teachers because they are not getting their money’s worth. I observed a positive correlation between overall evaluation scores and a key evaluation-form item that indicated that the course required more work than average. Informal conversations with students known to be serious tended to confirm the formal evaluation scores.


Dean Eckles, PhD candidate at Stanford’s CHIMe lab offers this response to Andrew Gelman’s blog post (linked above):

Students like doing well on tests etc. This happens when the teacher is either easier (either through making evaluations easier or teaching more directly to the test) or more effective.

Conditioning on this outcome, is conditioning on a collider that introduces a negative dependence between teacher quality and other factors affecting student satisfaction (e.g., how easy they are).

From Jeff Ely’s blog, a comment by Brian Moore raises this critical question:

“Second, students recognize when they are being taught by an outstanding teacher and they give him good evaluations.”

Do we know this for sure? Perhaps they know when they have an outstanding teacher, but by definition, those are relatively few.

Closing thoughts:

These discussions raise many key questions, namely:

  • how to measure good teaching;
  • tensions between short-term and long-term assessment and evaluation[1];
  • how well students’ grades measure learning, and how grades impact their perception of learning;
  • the relationship between learning, motivation, and affect (satisfaction);
  • but perhaps most deeply, the question of student metacognition.

The anecdotal comments others have provided about how students respond on evaluations are more fairly couched in the terms “some students.” Given the considerable variability among students, interpreting student evaluations needs to account for those individual differences in teasing out the actual teaching and learning that underlie self-reported perceptions. Buried within those evaluations may be a valuable signal masked by a lot of noise– or more problematically, multiple signals that cancel and drown each other out.

[1] For example, see this review of research demonstrating that training which produces better short-term performance can produce worse long-term learning:
Schmidt, R.A., & Bjork, R.A. (1992). New conceptualizations of practice: Common principles in three paradigms suggest new concepts for training. Psychological Science, 3, 207-217.

Teach for America is not a volunteer organization…

…it’s a teacher-placement service. And depending how you feel about Teach for America’s mission and effectiveness, potentially a very expensive one.

There seems to be a common misconception that TFA is a volunteer organization like Peace Corps and Americorps, where corps members receive only a small living allowance and no wage. This editorial prompted me to try to help clear that up. While TFA corps members are considered members of Americorps, this only means TFA members are eligible for the loan forbearance and post-service education awards all Americorps members receive.

  1. Teach for America teachers are full employees of the school district in which they work and are paid out of district budgets. The school district pays corps members a full teaching salary plus benefits, just like any other teacher. TFA reports corps member salaries between $30,000 and $51,000.
  2. In some cases, school districts may also pay Teach for America a placement fee for each teacher hired from the corps. This seems to be a regional determination: this Rethinking Schools article by Barbara Miner (pdf) reports St. Louis schools paid TFA $2000 per placement; Birmingham schools reportedly paid TFA $5000 per placement.
  3. In 2008, the funding for about 25% of TFA’s operating expenses (or nearly $25 million) came from government grants. TFA also recently won a 5-year, $50 million grant in the Department of Education Investing in Innovation competition.

Add up all the taxpayer money spent, and then remember the entire 2010 TFA corps contains only 4,500 teachers. [Note: This number is of new recruits for 2010. The total number of active TFA corps members is around 8000.]

And then consider the middling results of Stanford’s 6-year study of TFA teachers in Houston (press summary, pdf full text), which found that uncertified TFA teachers only performed equivalently to other uncertified teachers and were out-performed by fully-certified teachers (as measured by student performance on standardized tests), after controlling for teacher backgrounds and student population characteristics. Even after TFA teachers become certified, they “generally perform[ed] on par” with traditionally-certified teachers.

Updated: Commenter Michael Bishop mentioned this 2004 Mathematica Policy Research study of Teach for America (pdf), which used random assignment of students to teachers. This was a one-year comparison study of TFA teachers to non-TFA teachers (novice and veteran) and found significant effects of TFA status for math results, but not for reading or behavioral outcomes.

And for those keeping score at home, the Mathematica study reports school districts paid TFA $1500 per teacher.

What the reformers aren’t reforming

Back in December I shared this Answer Sheet blog post with some friends. The ensuing discussion revealed a disconnect between how those of us who work in education perceive the issues and how the lay public perceives the issues. This is my attempt to bridge the disconnect.

In the above post Valerie Strauss and Jay Mathews, both veteran journalists on the education beat, debate the merits of KIPP, Teach for America, and other aspects of education reform. Mathews tends to support the current wave of reforms—standardized testing, teacher merit pay, charter schools—while Strauss tends to be a skeptic. My non-education friends tended to side with Mathews, at least on the point these reforms are better than no reforms. Ming Ling and I sided with Strauss. [ML: Specifically, I agreed with Strauss’s concerns that current high-stakes accountability systems miss a lot of important information about teaching, that effective teaching requires ongoing training and support, and that improving education requires systemic policy. But I also agree with Mathews’ observations that KIPP, TfA, and charter schools have demonstrated many worthwhile achievements, and that “Fixing schools is going to need many varied approaches from people with different ideas.”]

These reforms focus on the incentive and regulatory structure of the school system. Proponents of these reforms believe that loosening the restrictions on schools and teacher pay, coupled with incentives and punishments, will let market forces take over. And market forces will lead the nation’s finest minds to the education industry, who will then find ways produce a better-quality “product” at a lower price so they can reap the rewards. The idea is intuitively appealing, but the collection of specific reform proposals contains serious flaws and don’t address key structural failures in our education system. There are reasons to believe these reforms are worse than no reforms at all.

The Flaws
Those pushing to change the incentive structure through test-based accountability, including merit pay, are assuming the tests are adequately measuring important educational outcomes. Questioning this orthodoxy is like standing next to a straw man with a large bounty on his head. To quote Jay Mathews from the post linked above,

Test-driven accountability is here to stay. Politicians who try to campaign against it are swiftly undercut by opponents who say, “What? You don’t want to make our schools accountable for all the money we spend on them?”

Nobody is against holding schools accountable for the taxpayer money spent. But holding schools accountable to a bad standard, particularly a high-stakes one, can distort people’s behavior in a counterproductive way. It’s called Campbell’s Law and social scientists have documented the effect in all areas of life, including a recent finding that texting bans may actually increase accident rates.

One doesn’t have to look very far to find examples of school districts outright cheating or otherwise trying to game the system. The Death and Life of the Great American School System by Diane Ravitch is full of such examples. Something about the standardized-testing-driven incentive structure is clearly not working as intended, but rather than stopping and developing better ways of measuring learning, the current reform crowd wants to plow ahead and raise the stakes again by linking teacher pay to this flawed system of accountability. They are not interested in asking the hard questions about what they’re really measuring and influencing.

But even if we assume the tests are good and the incentives are in the right places, it is difficult to see why market forces will necessarily improve the education system in the way those reformers are claiming. In politics, market forces are supposed to expand the pie, not equalize the slices. Despite very strong incentives to be elite athletes and coaches, a significant portion of the population can’t even run a mile without getting winded. But those who support market-based education reforms still use egalitarian rhetoric—appeals to closing the achievement gap, equipping all citizens with the tools they need to succeed.

Read more of this post

Instruction matters, and community matters

On “In Massachusetts, Brockton High Becomes Success Story“:

Engaging the community of teachers and students can be much more effective than stripping it down and weeding it out. Bottom line: “Achievement rose when leadership teams focused thoughtfully and relentlessly on improving the quality of instruction.”

Concerns about the LA Times teacher ratings

On “L.A. Times analysis rates teachers’ effectiveness“:

A Times analysis, using data largely ignored by LAUSD, looks at which educators help students learn, and which hold them back.

I’m a huge fan of organizing, analyzing, and sharing data, but I have real concerns about figuring out the best means for conveying and acting upon those results. Not just data quality (what gets assessed, how scores are calculated and weighed), but contextualizing results (triangulation with qualitative data) and professional development (social comparison, ongoing support).

The importance of good teachers for all

On “Poor quality teachers may prevent children from reaching reading potential“:

However much we all might wish that great teachers leveled the playing field and brought every student up to impressive levels of achievement, this paper suggests otherwise: Bad teaching hinders all students, and good teaching helps all students– just not necessarily by the same amount or to the same level. It also contradicts the notion that capable students just teach themselves and don’t need good teachers.

Quite simply, good teaching helps all students reach their potential.

This also implies that we should be careful in how we measure achievement gaps: Variability in meeting basic skill levels (which we may reasonably expect of all students) are problematic, but overall variability (in reaching higher levels of achievement) may actually be a sign of good teaching.

(Full article available via subscription at http://www.sciencemag.org/cgi/content/full/328/5977/512?rss=1.)

On good teachers

On “What Makes a Good Teacher?“:

For years, the secrets to great teaching have seemed more like alchemy than science, a mix of motivational mumbo jumbo and misty-eyed tales of inspiration and dedication. But for more than a decade, one organization has been tracking hundreds of thousands of kids, and looking at why some teachers can move them three grade levels ahead in a year and others can’t. Now, as the Obama administration offers states more than $4 billion to identify and cultivate effective teachers, Teach for America is ready to release its data.

Really fascinating– I’m looking forward to reading TfA’s report. I wish the journalist had maintained her focus on reporting behaviors and attitudes (monitoring understanding, setting & striving for goals, grit) rather than traits and past history (GPA, leadership), but there’s still a great list in there. I hope that policy will also attend closely to the process variables.

Note that the research comes from an aggregate sample of master’s programs. There could well be distinguishing factors in some master’s programs that are beneficial, which is why I’d want more process data. True that they focused on the classroom (due to the emphasis on what teachers do), but they did at least mention the importance of having teachers reach out to students’ families. It’d be interesting to find out what kind of impact parent-education programs could have.

Wouldn’t it be wonderful to read an article about how teachers learn and improve, instead of what makes them “great”?

%d bloggers like this: