Problems with pay-for-performance

If pay-for-performance doesn’t work in medicine, what should our expectations be for its success in education?

“No matter how we looked at the numbers, the evidence was unmistakable; by no measure did pay-for-performance benefit patients with hypertension,” says lead author Brian Serumaga.

Interestingly, hypertension is “a condition where other interventions such as patient education have shown to be very effective.”

According to Anthony Avery… “Doctor performance is based on many factors besides money that were not addressed in this program: patient behavior, continuing MD training, shared responsibility and teamwork with pharmacists, nurses and other health professionals. These are factors that reach far beyond simple monetary incentives.”

It’s not hard to complete the analogy: doctor = teacher; patient = student; MD training = pre-service and in-service professional development; pharmacists, nurses and other health professionals =  lots of other education professionals.

One may question whether the problem is that money is an insufficient motivator, that pay-for-performance amounts to ambiguous global rather than specific local feedback, or that there are too many other factors not well under the doctor’s control to reveal an effect. Still, this does give pause to efforts to incentivize teachers by paying them for their students’ good test scores.

B. Serumaga, D. Ross-Degnan, A. J. Avery, R. A. Elliott, S. R. Majumdar, F. Zhang, S. B. Soumerai. Effect of pay for performance on the management and outcomes of hypertension in the United Kingdom: interrupted time series study. BMJ, 2011; 342 (jan25 3): d108 DOI: 10.1136/bmj.d108

 

Improving medical (and educational) research

On “Lies, Damned Lies, and Medical Science“:

Much of what medical researchers conclude in their studies is misleading, exaggerated, or flat-out wrong. So why are doctors—to a striking extent—still drawing upon misinformation in their everyday practice? Dr. John Ioannidis has spent his career challenging his peers by exposing their bad science.

The research funding and dissemination mechanisms need serious overhaul. I think the research enterprise also needs to institute more formal appreciation for methodologically sound replications, null results, and meta-analyses. If the goal of research is genuinely to improve the knowledge base, then its incentive structure should mirror that.

Ioannidis laid out a detailed mathematical proof that, assuming modest levels of researcher bias, typically imperfect research techniques, and the well-known tendency to focus on exciting rather than highly plausible theories, researchers will come up with wrong findings most of the time.

On the education side, we also need to help the general populace become more critical in its acceptance of news stories without tipping them over to the other extreme of distrusting all research. More statistics education, please! We need a more skeptical audience to help stop the news media from overplaying stories about slight or random effects.

Drawing inferences from data is limited by what the data measure

In “Why Genomics Falls Short as a Medical Tool,” Matt Ridley points out how tracking genetic associations hasn’t yielded as much explanatory power as hoped to inform medical applications:

It’s a curious fact that genomics has always been sold as a medical story, yet it keeps underdelivering useful medical knowledge and overdelivering other stuff. … True, for many rare inherited diseases, genomics is making a big difference. But not for most of the common ailments we all get. Nor has it explained the diversity of the human condition in things like height, intelligence and extraversion.

He notes that even something as straightforward and heritable as height has been difficult to predict from the genes identified:

Your height, for example, is determined something like 90% by the tallness of your parents—so long as you and they were decently well fed as children. … In the case of height, more than 50 genetic variants were identified, but together they could account for only 5% of the heritability. Where was the other 95%?

Some may argue that it’s a case of needing to search more thoroughly for all the relevant genes:

A recent study of height has managed to push the explained heritability up to about half, by using a much bigger sample. But still only half.

Or, perhaps there are so many genetic pathways that affect height that it would be difficult to identify and generalize from them all:

Others… think that heritability is hiding in rare genetic variants, not common ones—in “private mutations,” genetic peculiarities that are shared by just a few people each. Under this theory, as Tolstoy might have put it, every tall person would be tall in a different way.

Ridley closes by emphasizing that genes influence outcomes through complex interactions and network effects.

If we expect education research and application to emulate medical research and application, then we need to recognize and beware of its limitations as well. Educational outcomes are even more multiply determined than height, personality, and intelligence. If we seek to understand and control subtle environmental influences, we need to do much more than simply measure achievement on standardized tests and manipulate teacher incentives.

Analogies between pharmaceutical development and education

In “Research Universities and Big Pharma’s Wicked Problem,” neuroscientist Regis Kelly draws comparisons from the manufacture of biofuels to the development of new pharmaceuticals, suggesting that both are

a “wicked” problem, defined conventionally as a problem that is almost insoluble because it requires the expertise of many stakeholders with disparate backgrounds and non-overlapping goals to work well together to address an important society problem.

He then critiques the pharmaceutical industry for its imprecise knowledge, poor outcome measures, and lack of theoretical grounding:

The key issue is that we are far from having biological knowledge at anywhere close to the precision that we have engineering knowledge. We cannot generate a blueprint specifying how the human body works. … The pharmaceutical industry usually lacks good measures of the efficacy of its interventions. … We also lack a theory of drug efficacy.

His recommendations for improvement target the above weaknesses, in addition to endorsing more collaboration between researchers, engineers, and practitioners:

The pharmaceutical industry needs a much more precise blueprint for the human body; greater knowledge of its interlocking regulatory systems; and accurate monitors of functional defects. It needs clinical doctors working with research scientists and bioengineers.

If we in the education field continue to seek analogies to medicine, then we should heed these criticisms and recommendations. We too need more precise understanding of the processes by which students learn, greater knowledge of how those systems interact, and better assessment of skill and understanding. We also need closer collaboration between educational researchers, learning-environment designers, school administrators, and teachers.

Doctor quality, meet teacher quality

The New York Times ran an article on doctor “pay for performance”:

Health care experts applauded these early [“pay for performance”] initiatives and the new focus on patient outcomes. But over time, many of the same experts began tempering their earlier enthusiasm. In opinion pieces published in medical journals, they have voiced concerns about pay-for-performance ranging from the onerous administrative burden of collecting such large amounts of patient data to the potential worsening of clinician morale and widening disparities in health care access.

But there has been no research to date that has directly studied the one concern driving all of these criticisms — that the premise of the evaluation criteria is flawed. Patient outcomes may not be as inextricably linked to doctors as many pay-for-performance programs presume.

Now a study published this month in The Journal of the American Medical Association has confirmed these experts’ suspicions. Researchers from the Massachusetts General Hospital in Boston and Harvard Medical School have found that whom doctors care for can have as much of an influence on pay-for-performance rankings as what those doctors do.

Replace every instance of “doctor” with “teacher” and “patient” with “student”, and you have a cogent explanation of why teacher merit pay based on students’ standardized test scores is a terrible idea.

Evidence in educational research

On “Classroom Research and Cargo Cults“:

After many years of educational research, it is disconcerting that we have little dependable research guidance for school policy. We have useful statistics in the form of test scores…. But we do not have causal analyses of these data that could reliably lead to significant improvement.

This offers powerful reading for anyone with an interest in education. Hirsch starts off a bit controversial, but he moves toward principles upon which we can all converge: Evidence matters, AND theoretical description of causal mechanism matters.

The challenge of completing the analogy between educational research and medical research (i.e., finding the education-research analogue to the germ theory of disease) is in developing precise assessment of knowledge. The prior knowledge that is so important in influencing how people learn does not map directly onto a particular location or even pattern of connectivity in the brain. There is no neural “germ” or “molecule” that represents some element of knowledge.

Other tidbits:

  1. Intention to learn may sometimes be a condition for learning, but it is not a necessary or sufficient condition.
  2. Neisser’s law:

    You can get a good deal from rehearsal
    If it just has the proper dispersal.
    You would just be an ass
    To do it en masse:
    Your remembering would turn out much worsal.

  3. I wouldn’t characterize the chick-sexing experiments as the triumph of explicit over implicit learning, but rather, that of carefully structured over wholly naturalistic environments. One can implicitly learn quite effectively from the presentation of examples across boundaries, from prototypes and attractors, and from extremes.
%d bloggers like this: