So-called “data-driven” decision-making isn’t always smart

ASCD writes about reformers’ obsession with data and how it’s leading to a new kind of stupid:

Today’s enthusiastic embrace of data has waltzed us directly from a petulant resistance to performance measures to a reflexive and unsophisticated reliance on a few simple metrics—namely, graduation rates, expenditures, and the reading and math test scores of students in grades 3 through 8. The result has been a nifty pirouette from one troubling mind-set to another; with nary a misstep, we have pivoted from the “old stupid” to the “new stupid.”

The article goes on to describe the three major characteristics of “the new stupid”:

  1. Misusing data
  2. Oversimplifying and over-applying findings from research
  3. Fixating on performance data and failing to consider management data

The importance of this last point is grossly underappreciated by reformers.

Existing achievement data are of limited utility for management purposes. State tests tend to provide results that are too coarse to offer more than a snapshot of student and school performance, and few district data systems link student achievement metrics to teachers, practices, or programs in a way that can help determine what is working. More significant, successful public and private organizations monitor their operations extensively and intensively. FedEx and UPS know at any given time where millions of packages are across the United States and around the globe. Yet few districts know how long it takes to respond to a teaching applicant, how frequently teachers use formative assessments, or how rapidly school requests for supplies are processed and fulfilled.

For all of our attention to testing and assessment, student achievement measures are largely irrelevant to judging the performance of many school district employees. It simply does not make sense to evaluate the performance of a payroll processor or human resources recruiter—or even a foreign language instructor—primarily on the basis of reading and math test scores for grades 3 through 8.

Just as hospitals employ large numbers of administrative and clinical personnel to support doctors and the military employs accountants, cooks, and lawyers to support its combat personnel, so schools have a “long tail” of support staff charged with ensuring that educators have the tools they need to be effective. Just as it makes more sense to judge the quality of army chefs on the quality of their kitchens and cuisines rather than on the outcome of combat operations, so it is more sensible to focus on how well district employees perform their prescribed tasks than on less direct measures of job performance. The tendency to casually focus on student achievement, especially given the testing system’s heavy emphasis on reading and math, allows a large number of employees to either be excused from results-driven accountability or be held accountable for activities over which they have no control. This undermines a performance mindset and promises to eventually erode confidence in management.

Emphasis mine.

Very little of the education reform conversation focuses on school management and how it helps or hurts teachers. Through my work, I know teachers frequently have to work around last-minute schedule changes and other requests and wait out sometimes months-long processes for acquiring new printer toner or chairs that are the right height for the tables. The sanctity of the classroom is not respected, with PA announcements, administrators, fellow teachers, and visitors regularly breaking the flow of a lesson. Improving school management would likely go a long way to making the work of the teacher easier, but apparently teachers are the only school personnel who need to be held accountable for anything.

In-school vs. non-school factors

From How to fix our schools, in which Richard Rothstein, of the Economic Policy Institute, critiques Joel Klein’s and Michelle Rhee’s approach of focusing only on firing incompetent teachers as a means to improve schools:

“Differences in school quality can explain about 1/3 of the variation in student achievement. But the other 2/3 come from non-school factors.” In-school factors go beyond teacher quality: school leadership, curriculum quality, teacher collaboration. Non-school factors include economic consequences of parental underemployment, such as geographic disruption, malnutrition, stress, poor health.

Drawing inferences from data is limited by what the data measure

In “Why Genomics Falls Short as a Medical Tool,” Matt Ridley points out how tracking genetic associations hasn’t yielded as much explanatory power as hoped to inform medical applications:

It’s a curious fact that genomics has always been sold as a medical story, yet it keeps underdelivering useful medical knowledge and overdelivering other stuff. … True, for many rare inherited diseases, genomics is making a big difference. But not for most of the common ailments we all get. Nor has it explained the diversity of the human condition in things like height, intelligence and extraversion.

He notes that even something as straightforward and heritable as height has been difficult to predict from the genes identified:

Your height, for example, is determined something like 90% by the tallness of your parents—so long as you and they were decently well fed as children. … In the case of height, more than 50 genetic variants were identified, but together they could account for only 5% of the heritability. Where was the other 95%?

Some may argue that it’s a case of needing to search more thoroughly for all the relevant genes:

A recent study of height has managed to push the explained heritability up to about half, by using a much bigger sample. But still only half.

Or, perhaps there are so many genetic pathways that affect height that it would be difficult to identify and generalize from them all:

Others… think that heritability is hiding in rare genetic variants, not common ones—in “private mutations,” genetic peculiarities that are shared by just a few people each. Under this theory, as Tolstoy might have put it, every tall person would be tall in a different way.

Ridley closes by emphasizing that genes influence outcomes through complex interactions and network effects.

If we expect education research and application to emulate medical research and application, then we need to recognize and beware of its limitations as well. Educational outcomes are even more multiply determined than height, personality, and intelligence. If we seek to understand and control subtle environmental influences, we need to do much more than simply measure achievement on standardized tests and manipulate teacher incentives.

Analogies between pharmaceutical development and education

In “Research Universities and Big Pharma’s Wicked Problem,” neuroscientist Regis Kelly draws comparisons from the manufacture of biofuels to the development of new pharmaceuticals, suggesting that both are

a “wicked” problem, defined conventionally as a problem that is almost insoluble because it requires the expertise of many stakeholders with disparate backgrounds and non-overlapping goals to work well together to address an important society problem.

He then critiques the pharmaceutical industry for its imprecise knowledge, poor outcome measures, and lack of theoretical grounding:

The key issue is that we are far from having biological knowledge at anywhere close to the precision that we have engineering knowledge. We cannot generate a blueprint specifying how the human body works. … The pharmaceutical industry usually lacks good measures of the efficacy of its interventions. … We also lack a theory of drug efficacy.

His recommendations for improvement target the above weaknesses, in addition to endorsing more collaboration between researchers, engineers, and practitioners:

The pharmaceutical industry needs a much more precise blueprint for the human body; greater knowledge of its interlocking regulatory systems; and accurate monitors of functional defects. It needs clinical doctors working with research scientists and bioengineers.

If we in the education field continue to seek analogies to medicine, then we should heed these criticisms and recommendations. We too need more precise understanding of the processes by which students learn, greater knowledge of how those systems interact, and better assessment of skill and understanding. We also need closer collaboration between educational researchers, learning-environment designers, school administrators, and teachers.

But what do the data say?

Perhaps this is the time for a counter-reformation” summarizes some choice tidbits on charter schools, test-based metrics & value-added modeling, and performance-based pay and firing, from a statistician’s perspective.

On charter schools:

The majority of the 5,000 or so charter schools nationwide appear to be no better, and in many cases worse, than local public schools when measured by achievement on standardized tests.

On value-added modeling:

A study [using VAM] found that students’ fifth grade teachers were good predictors of their fourth grade test scores… [which] can only mean that VAM results are based on factors other than teachers’ actual effectiveness.

On performance-based pay and firing:

There is not strong evidence to indicate either that the departing teachers would actually be the weakest teachers, or that the departing teachers would be replaced by more effective ones.

[A study] conducted by the National Center on Performance Incentives at Vanderbilt… found no significant difference between the test results from classes led by teachers eligible for bonuses and those led by teachers who were ineligible.

In summary:

Just for the record, I believe that charter schools, increased use of metrics, merit pay and a streamlined process for dismissing bad teachers do have a place in education, but all of these things can more harm than good if badly implemented and, given the current state of the reform movement, badly implemented is pretty much the upper bound.

I’m less pessimistic than Mark is about the quality of implementation of these initiatives, but I agree that how effectively well-intentioned reforms are implemented is always a crucial concern.

Doctor quality, meet teacher quality

The New York Times ran an article on doctor “pay for performance”:

Health care experts applauded these early [“pay for performance”] initiatives and the new focus on patient outcomes. But over time, many of the same experts began tempering their earlier enthusiasm. In opinion pieces published in medical journals, they have voiced concerns about pay-for-performance ranging from the onerous administrative burden of collecting such large amounts of patient data to the potential worsening of clinician morale and widening disparities in health care access.

But there has been no research to date that has directly studied the one concern driving all of these criticisms — that the premise of the evaluation criteria is flawed. Patient outcomes may not be as inextricably linked to doctors as many pay-for-performance programs presume.

Now a study published this month in The Journal of the American Medical Association has confirmed these experts’ suspicions. Researchers from the Massachusetts General Hospital in Boston and Harvard Medical School have found that whom doctors care for can have as much of an influence on pay-for-performance rankings as what those doctors do.

Replace every instance of “doctor” with “teacher” and “patient” with “student”, and you have a cogent explanation of why teacher merit pay based on students’ standardized test scores is a terrible idea.

%d bloggers like this: