Linda Darling-Hammond on TFA and teacher preparation

Linda Darling-Hammond’s article on teacher preparation in this week’s EdWeek should be required reading for anyone interested in education policy. The article was written in recognition of Teach for America’s 20th anniversary.

Yes, my vision is that in 10 years, the United States, like other high-achieving nations, will recruit top teaching candidates, prepare them well in state-of-the-art training programs (free of charge), and support them for career-long success in high-quality schools. Today, by contrast, teachers go into debt to enter a career that pays noticeably less than their alternatives—especially if they work in high-poverty schools— and reach the profession through a smorgasbord of training options, from excellent to awful, often followed by little mentoring or help. As a result, while some teachers are well prepared, many students in needy schools experience a revolving door of inexperienced and underprepared teachers.

Darling-Hammond is probably best-known for her criticism of Teach for America’s crash-course, full-steam-ahead approach to teacher preparation. She goes on to criticize the cost/benefit ratio of the program .

Where some studies have shown better outcomes for TFA teachers—generally in high school, in mathematics, and in comparison with less prepared teachers in the same high-need schools—others have found that students of new TFA teachers do less well than those of fully prepared beginners, especially in elementary grades, in fields such as reading, and with Latino students and English-language learners.

The small number of TFA-ers who stay in teaching (fewer than 20 percent by year four, according to state and district data) do become as effective as other fully credentialed teachers and, often, more effective in teaching mathematics. However, this small yield comes at substantial cost to the public for recruitment, training, and replacement. A recent estimate places recurring costs at more than $70,000 per recruit, enough to have trained numerous effective career teachers.

She doesn’t provide a source for the last figure, unfortunately, and it’s not clear what exactly is being included in the recurring cost per TFA recruit. Even disregarding that, it is still true that TFA teachers are more expensive than traditionally-certified teachers, not obviously more effective, and they leave the profession in higher numbers.

Darling-Hammond is not anti-TFA. She just believes that it (and the rest of public education) would better serve their students if they focused more on quality teacher preparation, preparation that sets the stage for a lifelong career of successful teaching, not just a two-year commitment.

TFA teachers are committed, work hard, and want to do a good job. Many want to stay in the profession, but feel their lack of strong preparation makes it difficult to do so. For these reasons, alumni like Megan Hopkins have proposed that TFA evolve into a teacher-residency model that would offer recruits a full year of training under the wing of an expert urban teacher while completing tightly connected coursework for certification. Such teacher residencies, operating as partnerships with universities in cities like Chicago, Boston, and Denver, have produced strong urban teachers who stay in the profession at rates of more than 80 percent, as have many universities that have developed new models of recruitment and training.

On the occasion of its 20th anniversary, we should be building on what works for TFA and marrying it to what works for dozens of strong preparation programs to produce the highly qualified, effective teachers we need for every community in the 21st century.

Regression models and the 2010 Brown Center report

To continue our discussion of value-added modeling, I’d like to point readers to the recently-released 2010 Brown Center Report on American Education. The overall theme of the report is to be cognizant of the strengths and limitations of any standardized assessment—domestic or international—when interpreting the resulting scores and rankings.

In this post I will focus on part 2 of this report, which describes two different regression models used to create a type of value-added measure of each state’s education system after controlling for each state’s demographic characteristics and prior academic achievement. In simpler terms, the model considers prior NAEP scores and demographic variables, predicts how well a state “should have” done, then looks at how well the state actually did and spits out a value-added number. These numbers were then normed so a state performing exactly as predicted would net a score of zero. States doing better than predicted would have a positive score and vice versa.

The first model uses all available NAEP scores through 2009. This is as much as 19 years of data for some states—state participation in NAEP was optional until 2003. The second model only uses scores from 2003-2009, when all states had to participate. The models are equally rigorous ways of looking at state achievement data, but they have slightly different emphases. To quote from the report:

The Model 1 analysis is superior in utilizing all state achievement data collected by NAEP. It analyzes trends over a longer period of time, up to nineteen years. But it may also produce biased estimates if states that willingly began participating in NAEP in the 1990s are different in some way than states that were compelled to join the assessment in 2003—and especially if that “some way” is systematically related to achievement….

Model 2 has the virtue of placing all states on equal footing, time-wise, by limiting the analysis to 2003-2009. But that six-year period may be atypical in NAEP’s history—No Child Left Behind dominated national policy discussions—and by discarding more than half of available NAEP data (all of the data collected before 2003), the model could produce misleading estimates of longer term correlations. (p. 16)

There are some similarities in the results from the two models. Seven states—Florida, Maryland, Massachusetts, Kentucky, New Jersey, Hawaii, and Pennsylvania—and the District of Columbia appear in the top ten of both models while five states—Iowa, Nebraska, West Virginia, and Michigan—appear in the bottom of both models.

However, there are also wild swings in the ratings and rankings of many states. Five states (or 10% of the sample) rise or fall in the rankings by 25 or more places—and there are only 51 places total. Fewer than half the states are placed into the same quintile in both models.

Keep in mind the two models are qualitatively similar, using the same demographic variables and the same outcome measure. The main difference is that Model 2 runs from a subset of the data for Model 1. A third model that included different measures and outcome variables could produce results that differ critically from both these models.

To further complicate things, a sharp drop in rankings from Model 1 to Model 2 can still reflect an absolute gain in student performance in that state, as can a negative value-added score. The Brown Report highlights New York as an example. Over all time, New York (which was an early adopter of the NAEP) gained an average of 0.74 NAEP scale points per year, compared to 0.65 NAEP scale points per year for the rest of the states. Over all time, New York’s gains on NAEP are greater than what the model predicted, with a value-added score of 0.58. But between 2003 and 2009, New York only gained 0.38 NAEP scale points per year, while the gains of other states held steady at 0.62. In this period, New York’s gains are less than what the model predicted, with a value-added score of -1.21. But in terms of absolute scores, New York finished better than it started, no matter which time frame you look at.

I wanted to focus on these regression models from the Brown Report because they clearly illustrate some of the problems with using value-added models for high-stakes decisions like teacher contracts and pay. While the large disparities in rankings caused by relatively small differences in the models are interesting to researchers trying to understand the underpinnings of education, they are also exactly why value-added modeling is difficult to defend as a fair and reliable method of teacher evaluation.

Teach for America is not a volunteer organization…

…it’s a teacher-placement service. And depending how you feel about Teach for America’s mission and effectiveness, potentially a very expensive one.

There seems to be a common misconception that TFA is a volunteer organization like Peace Corps and Americorps, where corps members receive only a small living allowance and no wage. This editorial prompted me to try to help clear that up. While TFA corps members are considered members of Americorps, this only means TFA members are eligible for the loan forbearance and post-service education awards all Americorps members receive.

  1. Teach for America teachers are full employees of the school district in which they work and are paid out of district budgets. The school district pays corps members a full teaching salary plus benefits, just like any other teacher. TFA reports corps member salaries between $30,000 and $51,000.
  2. In some cases, school districts may also pay Teach for America a placement fee for each teacher hired from the corps. This seems to be a regional determination: this Rethinking Schools article by Barbara Miner (pdf) reports St. Louis schools paid TFA $2000 per placement; Birmingham schools reportedly paid TFA $5000 per placement.
  3. In 2008, the funding for about 25% of TFA’s operating expenses (or nearly $25 million) came from government grants. TFA also recently won a 5-year, $50 million grant in the Department of Education Investing in Innovation competition.

Add up all the taxpayer money spent, and then remember the entire 2010 TFA corps contains only 4,500 teachers. [Note: This number is of new recruits for 2010. The total number of active TFA corps members is around 8000.]

And then consider the middling results of Stanford’s 6-year study of TFA teachers in Houston (press summary, pdf full text), which found that uncertified TFA teachers only performed equivalently to other uncertified teachers and were out-performed by fully-certified teachers (as measured by student performance on standardized tests), after controlling for teacher backgrounds and student population characteristics. Even after TFA teachers become certified, they “generally perform[ed] on par” with traditionally-certified teachers.

Updated: Commenter Michael Bishop mentioned this 2004 Mathematica Policy Research study of Teach for America (pdf), which used random assignment of students to teachers. This was a one-year comparison study of TFA teachers to non-TFA teachers (novice and veteran) and found significant effects of TFA status for math results, but not for reading or behavioral outcomes.

And for those keeping score at home, the Mathematica study reports school districts paid TFA $1500 per teacher.

The value-added wave is a tsunami

Edweek ran an article earlier this week in which economist Douglas N. Harris attempts to encourage economists and educators to get along.

He unfortunately lost me in the 3rd paragraph.

Drawing on student-level achievement data across years, linked to individual teachers, statistical techniques can be used to estimate how much each teacher contributed to student scores—the value-added measure of teacher performance. These measures in turn can be given to teachers and school leaders to inform professional development and curriculum decisions, or to make arguably higher-stakes decisions about performance pay, tenure, and dismissal.

Emphasis mine.

Economists and their education reform allies frequently make this claim, but it is not true, at least not yet. Value-added measures are based on standardized-test scores and neither currently provide information an educator can actually use to make professional development or curriculum decisions. When the scores are released, administrators and teachers receive a composite score and a handful of subscores for each student. In math, these subscores might be for topics like “Number and Operation Sense” and “Geometry”.

It does not do an educator any good to know last year’s students struggled with a topic as broad as “Number and Operation Sense”. Which numbers? Integers? Decimals? Did the students have problems with basic place value? Which operations? The non-commutative ones? Or did they have specific problems with regrouping and carrying? In what way are the students struggling? What errors are they making? What misconceptions might these errors point to? None of this information is contained in a score report. So, as an educator faced with test scores low in “Number and Operation Sense” (and which might be low in other areas as well), where do you start? Do you throw out the entire curriculum? If not, how do you know which parts of it need to be re-examined?

People trained in education recognize a difference between formative assessment—information collected for the purpose of improving instruction and student learning, and summative assessment—information collected to determine whether a student or other entity has reached a desired endpoint. Standardized tests are summative assessments—bad scores on them are like knowing that your football team keeps losing its games. This information is not sufficient for helping the team improve.

Why do economists see the issue so differently?

An economist myself, let me try to explain. Economists tend to think like well-meaning business people. They focus more on bottom-line results than processes and pedagogy, care more about preparing students for the workplace than the ballot box or art museum, and worry more about U.S. economic competitiveness. Economists also focus on the role financial incentives play in organizations, more so than the other myriad factors affecting human behavior. From this perspective, if we can get rid of ineffective teachers and provide financial incentives for the remainder to improve, then students will have higher test scores, yielding more productive workers and a more competitive U.S. economy.

This logic makes educators and education scholars cringe: Do economists not see that drill-and-kill has replaced rich, inquiry-based learning? Do they really think test preparation is the solution to the nation’s economic prosperity? Economists do partly recognize these concerns, as the quotations from the recent reports suggest. But they also see the motivation and goals of human behavior somewhat differently from the way most educators do.

This false dichotomy makes me cringe. As a trained education research scientist who is no stranger to statistical models, value-added is not ready for prime time because its primary input—standardized test scores—is deeply flawed. In science and statistics, if you put garbage data into your model, you will get garbage conclusions out. It has nothing to do with valuing art over economic competitiveness, and everything to do with the integrity of the science.

The divide between economists and others might be more productive if any of the reports provided specific recommendations. For example, creating better student assessments and combining value-added with classroom assessments are musts.

Thank you. Here where I start agreeing—if only that had been the central point of the article. I don’t dismiss value-added modeling as a technique, but I do not believe we have anything resembling good measures of teaching and learning.

We also have to avoid letting the tail wag the dog: Some states and districts are trying to expand testing to nontested grades and subjects, and to change test instruments so the scores more clearly reflect student growth for value-added calculations. This thinking is exactly backwards.

I agree completely, but that won’t stop states and districts from desperately trying to game the system. Since economists focus so much on financial incentives, this should be easy for them to understand: when the penalty for having low standardized test scores (or low value-added scores) is losing your funding, you will do whatever will get those scores up fastest. In most cases, that is changing the rules by which the scores are computed. Welcome to Campbell’s law.

What the reformers aren’t reforming

Back in December I shared this Answer Sheet blog post with some friends. The ensuing discussion revealed a disconnect between how those of us who work in education perceive the issues and how the lay public perceives the issues. This is my attempt to bridge the disconnect.

In the above post Valerie Strauss and Jay Mathews, both veteran journalists on the education beat, debate the merits of KIPP, Teach for America, and other aspects of education reform. Mathews tends to support the current wave of reforms—standardized testing, teacher merit pay, charter schools—while Strauss tends to be a skeptic. My non-education friends tended to side with Mathews, at least on the point these reforms are better than no reforms. Ming Ling and I sided with Strauss. [ML: Specifically, I agreed with Strauss’s concerns that current high-stakes accountability systems miss a lot of important information about teaching, that effective teaching requires ongoing training and support, and that improving education requires systemic policy. But I also agree with Mathews’ observations that KIPP, TfA, and charter schools have demonstrated many worthwhile achievements, and that “Fixing schools is going to need many varied approaches from people with different ideas.”]

These reforms focus on the incentive and regulatory structure of the school system. Proponents of these reforms believe that loosening the restrictions on schools and teacher pay, coupled with incentives and punishments, will let market forces take over. And market forces will lead the nation’s finest minds to the education industry, who will then find ways produce a better-quality “product” at a lower price so they can reap the rewards. The idea is intuitively appealing, but the collection of specific reform proposals contains serious flaws and don’t address key structural failures in our education system. There are reasons to believe these reforms are worse than no reforms at all.

The Flaws
Those pushing to change the incentive structure through test-based accountability, including merit pay, are assuming the tests are adequately measuring important educational outcomes. Questioning this orthodoxy is like standing next to a straw man with a large bounty on his head. To quote Jay Mathews from the post linked above,

Test-driven accountability is here to stay. Politicians who try to campaign against it are swiftly undercut by opponents who say, “What? You don’t want to make our schools accountable for all the money we spend on them?”

Nobody is against holding schools accountable for the taxpayer money spent. But holding schools accountable to a bad standard, particularly a high-stakes one, can distort people’s behavior in a counterproductive way. It’s called Campbell’s Law and social scientists have documented the effect in all areas of life, including a recent finding that texting bans may actually increase accident rates.

One doesn’t have to look very far to find examples of school districts outright cheating or otherwise trying to game the system. The Death and Life of the Great American School System by Diane Ravitch is full of such examples. Something about the standardized-testing-driven incentive structure is clearly not working as intended, but rather than stopping and developing better ways of measuring learning, the current reform crowd wants to plow ahead and raise the stakes again by linking teacher pay to this flawed system of accountability. They are not interested in asking the hard questions about what they’re really measuring and influencing.

But even if we assume the tests are good and the incentives are in the right places, it is difficult to see why market forces will necessarily improve the education system in the way those reformers are claiming. In politics, market forces are supposed to expand the pie, not equalize the slices. Despite very strong incentives to be elite athletes and coaches, a significant portion of the population can’t even run a mile without getting winded. But those who support market-based education reforms still use egalitarian rhetoric—appeals to closing the achievement gap, equipping all citizens with the tools they need to succeed.

Read more of this post

The (other) dark side of standardized testing

Dan DiMaggio, professional standardized test-scorer, writes harrowingly in Monthly Review:

No matter at what pace scorers work, however, tests are not always scored with the utmost attentiveness. The work is mind numbing, so scorers have to invent ways to entertain themselves. The most common method seems to be staring blankly at the wall or into space for minutes at a time. But at work this year, I discovered that no one would notice if I just read news articles while scoring tests. So every night, while scoring from home, I would surf the Internet and cut and paste loads of articles—reports on Indian Maoists, scientific speculation on whether animals can be gay, critiques of standardized testing—into what typically came to be an eighty-page, single-spaced Word document. Then I would print it out and read it the next day while I was working at the scoring center. This was the only way to avoid going insane. I still managed to score at the average rate for the room and perform according to “quality” standards. While scoring from home, I routinely carry on three or four intense conversations on Gchat. This is the reality of test scoring.

The central assumption behind the push to link student test performance to teacher merit pay is that test scores accurately reflect student learning. We apparently need a system like this because we don’t trust teachers to fairly score their own students’ work. But we trust these guys.

Supplementary reading by Todd Farley:

One of the tests I scored had students read a passage about bicycle safety. They were then instructed to draw a poster that illustrated a rule that was indicated in the text. We would award one point for a poster that included a correct rule and zero for a drawing that did not.

The first poster I saw was a drawing of a young cyclist, a helmet tightly attached to his head, flying his bike over a canal filled with flaming oil, his two arms waving wildly in the air. I stared at the response for minutes. Was this a picture of a helmet-wearing child who understood the basic rules of bike safety? Or was it meant to portray a youngster killing himself on two wheels?

And the followup…

Since 1994, when I first got hired as a lowly temp for measly wages to spend mere seconds glancing at and scoring standardized tests, until the release of my non‐bestselling book last fall, I had steadfastly believed that large‐scale assessment was a lame measure of student learning that really only benefitted the multi‐national corporations paid millions upon millions upon millions of dollars to write and score the tests. I began to see the error of my ways last Thanksgiving, however, just as soon as my huge son popped from his mother’s womb, keening and wailing, demanding massive amounts of food, a closet full of clothing, and the assistance of various costly household staff (baby‐sitter, music teacher, test‐prep tutor, etc.). Only then, as my little boy first began his mantra of “more, more, more,” did I finally see standardized testing for what it really is: a growth industry.

Diluting the meaning of “highly qualified” teachers

Valerie Strauss posts:

Senators have included in key legislation language that would allow teachers still in training to be considered “highly qualified” so they can meet a standard set in the federal No Child Left Behind law.

In an era when the education mantra is that all kids deserve great teachers, some members of Congress want it to be the law of the land that a neophyte teacher who has demonstrated “satisfactory progress” toward full state certification is “highly qualified.”

Is it just me, or have I been transported to 1984? The original definition of “highly qualified teacher” in No Child Left Behind already represented what in most high-achieving countries would be a bare minimum qualification for beginning a teaching residency. Allowing teachers-in-training to be classified as “highly qualified” seems ridiculous on its face.

Strauss sees this as a giveaway to political darling Teach for America:

Teachers still in training programs are disproportionately concentrated in schools serving low-income students and students of color, the very children who need the very best the teaching profession has to offer. In California alone, nearly a quarter of such teachers work in schools with 98-100 percent of minority students, while some affluent districts have none. Half of California’s teachers still in training teach special education.

Allowing non-certified teachers to be considered “highly qualified” would be a gift to programs such as Teach for America, which gives newly graduated college students from elite institutions five weeks of summer training before sending them into low-performing schools.

In other words, test-based value-added doesn’t make sense

Education Week posts a defense of value-added metrics, attempting to address concerns about their use and reliability. By their admission, it all hinges on a central assumption:

If student test achievement is the desired outcome, value-added is superior to other existing methods of classifying teachers.

What if student test achievement isn’t the desired outcome?

So-called “data-driven” decision-making isn’t always smart

ASCD writes about reformers’ obsession with data and how it’s leading to a new kind of stupid:

Today’s enthusiastic embrace of data has waltzed us directly from a petulant resistance to performance measures to a reflexive and unsophisticated reliance on a few simple metrics—namely, graduation rates, expenditures, and the reading and math test scores of students in grades 3 through 8. The result has been a nifty pirouette from one troubling mind-set to another; with nary a misstep, we have pivoted from the “old stupid” to the “new stupid.”

The article goes on to describe the three major characteristics of “the new stupid”:

  1. Misusing data
  2. Oversimplifying and over-applying findings from research
  3. Fixating on performance data and failing to consider management data

The importance of this last point is grossly underappreciated by reformers.

Existing achievement data are of limited utility for management purposes. State tests tend to provide results that are too coarse to offer more than a snapshot of student and school performance, and few district data systems link student achievement metrics to teachers, practices, or programs in a way that can help determine what is working. More significant, successful public and private organizations monitor their operations extensively and intensively. FedEx and UPS know at any given time where millions of packages are across the United States and around the globe. Yet few districts know how long it takes to respond to a teaching applicant, how frequently teachers use formative assessments, or how rapidly school requests for supplies are processed and fulfilled.

For all of our attention to testing and assessment, student achievement measures are largely irrelevant to judging the performance of many school district employees. It simply does not make sense to evaluate the performance of a payroll processor or human resources recruiter—or even a foreign language instructor—primarily on the basis of reading and math test scores for grades 3 through 8.

Just as hospitals employ large numbers of administrative and clinical personnel to support doctors and the military employs accountants, cooks, and lawyers to support its combat personnel, so schools have a “long tail” of support staff charged with ensuring that educators have the tools they need to be effective. Just as it makes more sense to judge the quality of army chefs on the quality of their kitchens and cuisines rather than on the outcome of combat operations, so it is more sensible to focus on how well district employees perform their prescribed tasks than on less direct measures of job performance. The tendency to casually focus on student achievement, especially given the testing system’s heavy emphasis on reading and math, allows a large number of employees to either be excused from results-driven accountability or be held accountable for activities over which they have no control. This undermines a performance mindset and promises to eventually erode confidence in management.

Emphasis mine.

Very little of the education reform conversation focuses on school management and how it helps or hurts teachers. Through my work, I know teachers frequently have to work around last-minute schedule changes and other requests and wait out sometimes months-long processes for acquiring new printer toner or chairs that are the right height for the tables. The sanctity of the classroom is not respected, with PA announcements, administrators, fellow teachers, and visitors regularly breaking the flow of a lesson. Improving school management would likely go a long way to making the work of the teacher easier, but apparently teachers are the only school personnel who need to be held accountable for anything.

Doctor quality, meet teacher quality

The New York Times ran an article on doctor “pay for performance”:

Health care experts applauded these early [“pay for performance”] initiatives and the new focus on patient outcomes. But over time, many of the same experts began tempering their earlier enthusiasm. In opinion pieces published in medical journals, they have voiced concerns about pay-for-performance ranging from the onerous administrative burden of collecting such large amounts of patient data to the potential worsening of clinician morale and widening disparities in health care access.

But there has been no research to date that has directly studied the one concern driving all of these criticisms — that the premise of the evaluation criteria is flawed. Patient outcomes may not be as inextricably linked to doctors as many pay-for-performance programs presume.

Now a study published this month in The Journal of the American Medical Association has confirmed these experts’ suspicions. Researchers from the Massachusetts General Hospital in Boston and Harvard Medical School have found that whom doctors care for can have as much of an influence on pay-for-performance rankings as what those doctors do.

Replace every instance of “doctor” with “teacher” and “patient” with “student”, and you have a cogent explanation of why teacher merit pay based on students’ standardized test scores is a terrible idea.

%d bloggers like this: