Linda Darling-Hammond on TFA and teacher preparation

Linda Darling-Hammond’s article on teacher preparation in this week’s EdWeek should be required reading for anyone interested in education policy. The article was written in recognition of Teach for America’s 20th anniversary.

Yes, my vision is that in 10 years, the United States, like other high-achieving nations, will recruit top teaching candidates, prepare them well in state-of-the-art training programs (free of charge), and support them for career-long success in high-quality schools. Today, by contrast, teachers go into debt to enter a career that pays noticeably less than their alternatives—especially if they work in high-poverty schools— and reach the profession through a smorgasbord of training options, from excellent to awful, often followed by little mentoring or help. As a result, while some teachers are well prepared, many students in needy schools experience a revolving door of inexperienced and underprepared teachers.

Darling-Hammond is probably best-known for her criticism of Teach for America’s crash-course, full-steam-ahead approach to teacher preparation. She goes on to criticize the cost/benefit ratio of the program .

Where some studies have shown better outcomes for TFA teachers—generally in high school, in mathematics, and in comparison with less prepared teachers in the same high-need schools—others have found that students of new TFA teachers do less well than those of fully prepared beginners, especially in elementary grades, in fields such as reading, and with Latino students and English-language learners.

The small number of TFA-ers who stay in teaching (fewer than 20 percent by year four, according to state and district data) do become as effective as other fully credentialed teachers and, often, more effective in teaching mathematics. However, this small yield comes at substantial cost to the public for recruitment, training, and replacement. A recent estimate places recurring costs at more than $70,000 per recruit, enough to have trained numerous effective career teachers.

She doesn’t provide a source for the last figure, unfortunately, and it’s not clear what exactly is being included in the recurring cost per TFA recruit. Even disregarding that, it is still true that TFA teachers are more expensive than traditionally-certified teachers, not obviously more effective, and they leave the profession in higher numbers.

Darling-Hammond is not anti-TFA. She just believes that it (and the rest of public education) would better serve their students if they focused more on quality teacher preparation, preparation that sets the stage for a lifelong career of successful teaching, not just a two-year commitment.

TFA teachers are committed, work hard, and want to do a good job. Many want to stay in the profession, but feel their lack of strong preparation makes it difficult to do so. For these reasons, alumni like Megan Hopkins have proposed that TFA evolve into a teacher-residency model that would offer recruits a full year of training under the wing of an expert urban teacher while completing tightly connected coursework for certification. Such teacher residencies, operating as partnerships with universities in cities like Chicago, Boston, and Denver, have produced strong urban teachers who stay in the profession at rates of more than 80 percent, as have many universities that have developed new models of recruitment and training.

On the occasion of its 20th anniversary, we should be building on what works for TFA and marrying it to what works for dozens of strong preparation programs to produce the highly qualified, effective teachers we need for every community in the 21st century.

Statistical issues with applying VAM

There’s a wonderful statistical discussion of Michael Winerip’s NYT article critiquing the use of value-added modeling in evaluating teachers, which I referenced in a previous post. I wanted to highlight some of the key statistical errors in that discussion, since I think these are important and understandable concepts for the general public to consider.

  • Margin of error: Ms. Isaacson’s 7th percentile score actually ranged from 0 to 52, yet the state is disregarding that uncertainty in making its employment recommendations. This is why I dislike the article’s headline, or more generally the saying, “Numbers don’t lie.” No, they don’t lie, but they do approximate, and can thus mislead, if those approximations aren’t adequately conveyed and recognized.
  • Reversion to the mean: (You may be more familiar with this concept as “regression to the mean,” but since it applies more broadly than linear regression, “reversion” is a more suitable term.) A single measurement can be influenced by many randomly varying factors, so one extreme value could reflect an unusual cluster of chance events. Measuring it again is likely to yield a value closer to the mean, simply because those chance events are unlikely to coincide again to produce another extreme value. Ms. Isaacson’s students could have been lucky in their high scores the previous year, causing their scores in the subsequent year to look low compared to predictions.
  • Using only 4 discrete categories (or ranks) for grades:
    • The first problem with this is the imprecision that results. The model exaggerates the impact of between-grade transitions (e.g., improving from a 3 to a 4) but ignores within-grade changes (e.g., improving from a low 3 to a high 3).
    • The second problem is that this exacerbates the nonlinearity of the assessment (discussed next). When changes that produce grade transitions are more likely than changes that don’t produce grade transitions, having so few possible grade transitions further inflates their impact.
      Another instantiation of this problem is that the imprecision also exaggerates the ceiling effects mentioned below, in that benefits to students already earning the maximum score become invisible (as noted in a comment by journalist Steve Sailer

      Maybe this high IQ 7th grade teacher is doing a lot of good for students who were already 4s, the maximum score. A lot of her students later qualify for admission to Stuyvesant, the most exclusive public high school in New York.
      But, if she is, the formula can’t measure it because 4 is the highest score you can get.

  • Nonlinearity: Not all grade transitions are equally likely, but the model treats them as such. Here are two major reasons why some transitions are more likely than others.
    • Measurement ceiling effects: Improving at the top range is more difficult and unlikely than improving in the middle range, as discussed in this comment:

      Going from 3.6 to 3.7 is much more difficult than going from 2.0 to 2.1, simply due to the upper-bound scoring of 4.

      However, the commenter then gives an example of a natural ceiling rather than a measurement ceiling. Natural ceilings (e.g., decreasing changes in weight loss, long jump, reaction time, etc. as the values become more extreme) do translate into nonlinearity, but due to physiological limitations rather than measurement ceilings. That said, the above quote still holds true because of the measurement ceiling, which masks the upper-bound variability among students who could have scored higher but inflates the relative lower-bound variability due to missing a question (whether from carelessness, a bad day, or bad luck in the question selection for the test). These students have more opportunities to be hurt by bad luck than helped by good luck because the test imposes a ceiling (doesn’t ask all the harder questions which they perhaps could have answered).

    • Unequal responses to feedback: The students and teachers all know that some grade transitions are more important than others. Just as students invest extra effort to turn an F into a D, so do teachers invest extra resources in moving students from below-basic to basic scores.
      More generally, a fundamental tenet of assessment is to inform the students in advance of the grading expectations. That means that there will always be nonlinearity, since now the students (and teachers) are “boundary-conscious” and behaving in ways to deliberately try to cross (or not cross) certain boundaries.
  • Definition of “value”: The value-added model described compares students’ current scores against predictions based on their prior-year scores. That implies that earning a 3 in 4th grade has no more value than earning a 3 in 3rd grade. As noted in this comment:

    There appears to be a failure to acknowledge that students must make academic progress just to maintain a high score from one year to the next, assuming all of the tests are grade level appropriate.

    Perhaps students can earn the same (high or moderate) score year after year on badly designed tests simply through good test-taking strategies, but presumably the tests being used in these models are believed to measure actual learning. A teacher who helps “proficient” students earn “proficient” scores the next year is still teaching them something worthwhile, even if there’s room for more improvement.

These criticisms can be addressed by several recommendations:

  1. Margin of error. Don’t base high-stakes decisions on highly uncertain metrics.
  2. Reversion to the mean. Use multiple measures. These could be estimates across multiple years (as in multiyear smoothing, as another commenter suggested), or values from multiple different assessments.
  3. Few grading categories. At the very least, use more scoring categories. Better yet, use the raw scores.
  4. Ceiling effect. Use tests with a higher ceiling. This could be an interesting application for using a form of dynamic assessment for measuring learning potential, although that might be tricky from a psychometric or educational measurement perspective.
  5. Nonlinearity of feedback. Draw from a broader pool of assessments that measure learning in a variety of ways, to discourage “gaming the system” on just one test (being overly sensitive to one set of arbitrary scoring boundaries).
  6. Definition of “value.” Change the baseline expectation (either in the model itself or in the interpretation of its results) to reflect the reality that earning the same score on a harder test actually does demonstrate learning.

Those are just the statistical issues. Don’t forget all the other problems we’ve mentioned, especially: the flaws in applying aggregate inferences to the individual; the imperfect link between student performance and teacher effectiveness; the lack of usable information provided to teachers; and the importance of attracting, training, and retaining good teachers.

Some history and context on VAM in teacher evaluation

In the Columbia Journalism Review’s Tested: Covering schools in the age of micro-measurement, LynNell Hancock provides a rich survey of the history and context of the current debate over value-added modeling in teacher evaluation, with a particular focus on LA and NY.

Here are some key points from the critique:

1. In spite of their complexity, value-added models are based on very limited sources of data: who taught the students, without regard to how or under what conditions, and standardized tests, which are a very narrow and imperfect measure of learning,

No allowance is made for many “inside school” factors… Since the number is based on manipulating one-day snapshot tests—the value of which is a matter of debate—what does it really measure?

2. Value-added modeling is an imprecise method whose parameters and outcomes are highly dependent on the assumptions built into the model.

In February, two University of Colorado, Boulder researchers caused a dustup when they called the Times’s data “demonstrably inadequate.” After running the same data through their own methodology, controlling for added factors such as school demographics, the researchers found about half the reading teachers’ scores changed. On the extreme ends, about 8 percent were bumped from ineffective to effective, and 12 percent bumped the other way. To the researchers, the added factors were reasonable, and the fact that they changed the results so dramatically demonstrated the fragility of the value-added method.

3. Value-added modeling is inappropriate to use as grounds for firing teachers or calculating merit pay.

Nearly every economist who weighed in agreed that districts should not use these indicators to make high-stakes decisions, like whether to fire teachers or add bonuses to paychecks.

Further, it’s questionable how effective it is as a policy to focus simply on individual teacher quality, when poverty has a greater impact on a child’s learning:

The federal Coleman Report issued [in 1966] found that a child’s family economic status was the most telling predictor of school achievement. That stubborn fact remains discomfiting—but undisputed—among education researchers today.

These should all be familiar concerns by now. What this article adds is a much richer picture of the historical and political context for the many players in the debate. I’m deeply disturbed that NYS Supreme Court Judge Cynthia Kern ruled that “there is no requirement that data be reliable for it to be disclosed.” At least Trontz at the NY Times acknowledges the importance of publishing reliable information as opposed to spurious claims, except he seems to overlook all the arguments against the merits of the data:

If we find the data is so completely botched, or riddled with errors that it would be unfair to release it, then we would have to think very long and hard about releasing it.

That’s the whole point: applying value-added modeling to standardized test scores to fire or reward teachers is unreliable to the point of being unfair. Adding noise and confusion to the conversation isn’t “a net positive,” as Arthur Browne from The Daily News seems to believe; it degrades the discussion, at great harm to the individual teachers, their students, the institutions that house them, and the society that purports to sustain them and benefit from them.

Look for the story behind the numbers, not the numbers alone

This time I’ll let the journalists get away with their fondness for reporting the compelling individual story, since the single counterexample is the whole point here.

High-stakes testing was bad enough. But high-stakes evaluating and hiring? This is a great example of the dangers of applying quantitative metrics inappropriately. While value-added modeling may be able to capture properties of the aggregate, it makes occasional errors at the level of the individual. Just one error (whether it’s a factual or exaggerated case, it still illustrates the point) demonstrates the ethical and managerial problems in firing the wrong person based on aggregated data.

Nor do I understand the political eagerness to fire teachers so readily. I’m not convinced that teachers are such an abundant resource that we can afford to burn through them so callously. With teacher shortages in multiple areas and a national teacher attrition rate of 15-20%, we would do better to keep, train, and support the teachers we already have, rather than toss them out and discourage new recruits from joining an increasingly unfriendly profession.

While I agree that it’s important to judge teaching by its merits rather than just the years spent, we need to formulate those measurements carefully. Test scores alone give a misleading illusion of greater precision than they actually have and

Regression models and the 2010 Brown Center report

To continue our discussion of value-added modeling, I’d like to point readers to the recently-released 2010 Brown Center Report on American Education. The overall theme of the report is to be cognizant of the strengths and limitations of any standardized assessment—domestic or international—when interpreting the resulting scores and rankings.

In this post I will focus on part 2 of this report, which describes two different regression models used to create a type of value-added measure of each state’s education system after controlling for each state’s demographic characteristics and prior academic achievement. In simpler terms, the model considers prior NAEP scores and demographic variables, predicts how well a state “should have” done, then looks at how well the state actually did and spits out a value-added number. These numbers were then normed so a state performing exactly as predicted would net a score of zero. States doing better than predicted would have a positive score and vice versa.

The first model uses all available NAEP scores through 2009. This is as much as 19 years of data for some states—state participation in NAEP was optional until 2003. The second model only uses scores from 2003-2009, when all states had to participate. The models are equally rigorous ways of looking at state achievement data, but they have slightly different emphases. To quote from the report:

The Model 1 analysis is superior in utilizing all state achievement data collected by NAEP. It analyzes trends over a longer period of time, up to nineteen years. But it may also produce biased estimates if states that willingly began participating in NAEP in the 1990s are different in some way than states that were compelled to join the assessment in 2003—and especially if that “some way” is systematically related to achievement….

Model 2 has the virtue of placing all states on equal footing, time-wise, by limiting the analysis to 2003-2009. But that six-year period may be atypical in NAEP’s history—No Child Left Behind dominated national policy discussions—and by discarding more than half of available NAEP data (all of the data collected before 2003), the model could produce misleading estimates of longer term correlations. (p. 16)

There are some similarities in the results from the two models. Seven states—Florida, Maryland, Massachusetts, Kentucky, New Jersey, Hawaii, and Pennsylvania—and the District of Columbia appear in the top ten of both models while five states—Iowa, Nebraska, West Virginia, and Michigan—appear in the bottom of both models.

However, there are also wild swings in the ratings and rankings of many states. Five states (or 10% of the sample) rise or fall in the rankings by 25 or more places—and there are only 51 places total. Fewer than half the states are placed into the same quintile in both models.

Keep in mind the two models are qualitatively similar, using the same demographic variables and the same outcome measure. The main difference is that Model 2 runs from a subset of the data for Model 1. A third model that included different measures and outcome variables could produce results that differ critically from both these models.

To further complicate things, a sharp drop in rankings from Model 1 to Model 2 can still reflect an absolute gain in student performance in that state, as can a negative value-added score. The Brown Report highlights New York as an example. Over all time, New York (which was an early adopter of the NAEP) gained an average of 0.74 NAEP scale points per year, compared to 0.65 NAEP scale points per year for the rest of the states. Over all time, New York’s gains on NAEP are greater than what the model predicted, with a value-added score of 0.58. But between 2003 and 2009, New York only gained 0.38 NAEP scale points per year, while the gains of other states held steady at 0.62. In this period, New York’s gains are less than what the model predicted, with a value-added score of -1.21. But in terms of absolute scores, New York finished better than it started, no matter which time frame you look at.

I wanted to focus on these regression models from the Brown Report because they clearly illustrate some of the problems with using value-added models for high-stakes decisions like teacher contracts and pay. While the large disparities in rankings caused by relatively small differences in the models are interesting to researchers trying to understand the underpinnings of education, they are also exactly why value-added modeling is difficult to defend as a fair and reliable method of teacher evaluation.

Problems with pay-for-performance

If pay-for-performance doesn’t work in medicine, what should our expectations be for its success in education?

“No matter how we looked at the numbers, the evidence was unmistakable; by no measure did pay-for-performance benefit patients with hypertension,” says lead author Brian Serumaga.

Interestingly, hypertension is “a condition where other interventions such as patient education have shown to be very effective.”

According to Anthony Avery… “Doctor performance is based on many factors besides money that were not addressed in this program: patient behavior, continuing MD training, shared responsibility and teamwork with pharmacists, nurses and other health professionals. These are factors that reach far beyond simple monetary incentives.”

It’s not hard to complete the analogy: doctor = teacher; patient = student; MD training = pre-service and in-service professional development; pharmacists, nurses and other health professionals =  lots of other education professionals.

One may question whether the problem is that money is an insufficient motivator, that pay-for-performance amounts to ambiguous global rather than specific local feedback, or that there are too many other factors not well under the doctor’s control to reveal an effect. Still, this does give pause to efforts to incentivize teachers by paying them for their students’ good test scores.

B. Serumaga, D. Ross-Degnan, A. J. Avery, R. A. Elliott, S. R. Majumdar, F. Zhang, S. B. Soumerai. Effect of pay for performance on the management and outcomes of hypertension in the United Kingdom: interrupted time series study. BMJ, 2011; 342 (jan25 3): d108 DOI: 10.1136/bmj.d108


Some limitations of value-added modeling

Following this discussion on teacher evaluation led me to a fascinating analysis by Jim Manzi.

We’ve already discussed some concerns with using standardized test scores as the outcome measures in value-added modeling; Manzi points out other problems with the model and the inputs to the model.

  1. Teaching is complex.
  2. It’s difficult to make good predictions about achievement across different domains.
  3. It’s unrealistic to attribute success or failure only to a single teacher.
  4. The effects of teaching extend beyond one school year, and therefore measurements capture influences that go back beyond one year and one teacher.

I’m not particularly fond of the above list—while I agree with all the claims, they’re not explained very clearly and they don’t capture the below key issues, which he discusses in more depth.

  1. Inferences about the aggregate are not inferences about an individual.
  2. More deeply, the model is valid at the aggregate level, “but any one data point cannot be validated.” This is a fundamental problem, true of stereotypes, of generalizations, and of averages. While they may enable you to make broad claims about a population of people, you can’t apply those claims to policies about a particular individual with enough confidence to justify high-stakes outcomes such as firing decisions. As Manzi summarizes it, an evaluation system works to help an organization achieve an outcome, not to be fair to the individuals within that organization.

    This is also related to problems with data mining—by throwing a bunch of data into a model and turning the crank, you can end up with all kinds of difficult-to-interpret correlations which are excellent predictors but which don’t make a whole lot of sense from a theoretical standpoint.

  3. Basing decisions on single instead of multiple measures is flawed.
  4. From a statistical modeling perspective, it’s easier to work with a single precise, quantitative measure than with multiple measures. But this inflates the influence of that one measure, which is often limited in time and scale. Figuring out how to combine multiple measures into a single metric requires subjective judgment (and thus organizational agreement), and, in Manzi’s words, “is very unlikely to work” with value-added modeling. (I do wish he’d expanded on this point further, though.)

  5. All assessments are proxies.
  6. If the proxy is given more value than the underlying phenomenon it’s supposed to measure, this can incentivize “teaching to the test”. With much at stake, some people will try to game the system. This may motivate those who construct and rely on the model to periodically change the metrics, but that introduces more instability in interpreting and calibrating the results across implementations.

In highlighting these weaknesses of value-added modeling, Manzi concludes by arguing that improving teacher evaluation requires a lot more careful interpretation of its results, within the context of better teacher management. I would very much welcome hearing more dialogue about what that management and leadership should look like, instead of so much hype about impressive but complex statistical tools expected to solve the whole problem on their own.

Teach for America is not a volunteer organization…

…it’s a teacher-placement service. And depending how you feel about Teach for America’s mission and effectiveness, potentially a very expensive one.

There seems to be a common misconception that TFA is a volunteer organization like Peace Corps and Americorps, where corps members receive only a small living allowance and no wage. This editorial prompted me to try to help clear that up. While TFA corps members are considered members of Americorps, this only means TFA members are eligible for the loan forbearance and post-service education awards all Americorps members receive.

  1. Teach for America teachers are full employees of the school district in which they work and are paid out of district budgets. The school district pays corps members a full teaching salary plus benefits, just like any other teacher. TFA reports corps member salaries between $30,000 and $51,000.
  2. In some cases, school districts may also pay Teach for America a placement fee for each teacher hired from the corps. This seems to be a regional determination: this Rethinking Schools article by Barbara Miner (pdf) reports St. Louis schools paid TFA $2000 per placement; Birmingham schools reportedly paid TFA $5000 per placement.
  3. In 2008, the funding for about 25% of TFA’s operating expenses (or nearly $25 million) came from government grants. TFA also recently won a 5-year, $50 million grant in the Department of Education Investing in Innovation competition.

Add up all the taxpayer money spent, and then remember the entire 2010 TFA corps contains only 4,500 teachers. [Note: This number is of new recruits for 2010. The total number of active TFA corps members is around 8000.]

And then consider the middling results of Stanford’s 6-year study of TFA teachers in Houston (press summary, pdf full text), which found that uncertified TFA teachers only performed equivalently to other uncertified teachers and were out-performed by fully-certified teachers (as measured by student performance on standardized tests), after controlling for teacher backgrounds and student population characteristics. Even after TFA teachers become certified, they “generally perform[ed] on par” with traditionally-certified teachers.

Updated: Commenter Michael Bishop mentioned this 2004 Mathematica Policy Research study of Teach for America (pdf), which used random assignment of students to teachers. This was a one-year comparison study of TFA teachers to non-TFA teachers (novice and veteran) and found significant effects of TFA status for math results, but not for reading or behavioral outcomes.

And for those keeping score at home, the Mathematica study reports school districts paid TFA $1500 per teacher.

The value-added wave is a tsunami

Edweek ran an article earlier this week in which economist Douglas N. Harris attempts to encourage economists and educators to get along.

He unfortunately lost me in the 3rd paragraph.

Drawing on student-level achievement data across years, linked to individual teachers, statistical techniques can be used to estimate how much each teacher contributed to student scores—the value-added measure of teacher performance. These measures in turn can be given to teachers and school leaders to inform professional development and curriculum decisions, or to make arguably higher-stakes decisions about performance pay, tenure, and dismissal.

Emphasis mine.

Economists and their education reform allies frequently make this claim, but it is not true, at least not yet. Value-added measures are based on standardized-test scores and neither currently provide information an educator can actually use to make professional development or curriculum decisions. When the scores are released, administrators and teachers receive a composite score and a handful of subscores for each student. In math, these subscores might be for topics like “Number and Operation Sense” and “Geometry”.

It does not do an educator any good to know last year’s students struggled with a topic as broad as “Number and Operation Sense”. Which numbers? Integers? Decimals? Did the students have problems with basic place value? Which operations? The non-commutative ones? Or did they have specific problems with regrouping and carrying? In what way are the students struggling? What errors are they making? What misconceptions might these errors point to? None of this information is contained in a score report. So, as an educator faced with test scores low in “Number and Operation Sense” (and which might be low in other areas as well), where do you start? Do you throw out the entire curriculum? If not, how do you know which parts of it need to be re-examined?

People trained in education recognize a difference between formative assessment—information collected for the purpose of improving instruction and student learning, and summative assessment—information collected to determine whether a student or other entity has reached a desired endpoint. Standardized tests are summative assessments—bad scores on them are like knowing that your football team keeps losing its games. This information is not sufficient for helping the team improve.

Why do economists see the issue so differently?

An economist myself, let me try to explain. Economists tend to think like well-meaning business people. They focus more on bottom-line results than processes and pedagogy, care more about preparing students for the workplace than the ballot box or art museum, and worry more about U.S. economic competitiveness. Economists also focus on the role financial incentives play in organizations, more so than the other myriad factors affecting human behavior. From this perspective, if we can get rid of ineffective teachers and provide financial incentives for the remainder to improve, then students will have higher test scores, yielding more productive workers and a more competitive U.S. economy.

This logic makes educators and education scholars cringe: Do economists not see that drill-and-kill has replaced rich, inquiry-based learning? Do they really think test preparation is the solution to the nation’s economic prosperity? Economists do partly recognize these concerns, as the quotations from the recent reports suggest. But they also see the motivation and goals of human behavior somewhat differently from the way most educators do.

This false dichotomy makes me cringe. As a trained education research scientist who is no stranger to statistical models, value-added is not ready for prime time because its primary input—standardized test scores—is deeply flawed. In science and statistics, if you put garbage data into your model, you will get garbage conclusions out. It has nothing to do with valuing art over economic competitiveness, and everything to do with the integrity of the science.

The divide between economists and others might be more productive if any of the reports provided specific recommendations. For example, creating better student assessments and combining value-added with classroom assessments are musts.

Thank you. Here where I start agreeing—if only that had been the central point of the article. I don’t dismiss value-added modeling as a technique, but I do not believe we have anything resembling good measures of teaching and learning.

We also have to avoid letting the tail wag the dog: Some states and districts are trying to expand testing to nontested grades and subjects, and to change test instruments so the scores more clearly reflect student growth for value-added calculations. This thinking is exactly backwards.

I agree completely, but that won’t stop states and districts from desperately trying to game the system. Since economists focus so much on financial incentives, this should be easy for them to understand: when the penalty for having low standardized test scores (or low value-added scores) is losing your funding, you will do whatever will get those scores up fastest. In most cases, that is changing the rules by which the scores are computed. Welcome to Campbell’s law.

Diluting the meaning of “highly qualified” teachers

Valerie Strauss posts:

Senators have included in key legislation language that would allow teachers still in training to be considered “highly qualified” so they can meet a standard set in the federal No Child Left Behind law.

In an era when the education mantra is that all kids deserve great teachers, some members of Congress want it to be the law of the land that a neophyte teacher who has demonstrated “satisfactory progress” toward full state certification is “highly qualified.”

Is it just me, or have I been transported to 1984? The original definition of “highly qualified teacher” in No Child Left Behind already represented what in most high-achieving countries would be a bare minimum qualification for beginning a teaching residency. Allowing teachers-in-training to be classified as “highly qualified” seems ridiculous on its face.

Strauss sees this as a giveaway to political darling Teach for America:

Teachers still in training programs are disproportionately concentrated in schools serving low-income students and students of color, the very children who need the very best the teaching profession has to offer. In California alone, nearly a quarter of such teachers work in schools with 98-100 percent of minority students, while some affluent districts have none. Half of California’s teachers still in training teach special education.

Allowing non-certified teachers to be considered “highly qualified” would be a gift to programs such as Teach for America, which gives newly graduated college students from elite institutions five weeks of summer training before sending them into low-performing schools.

%d bloggers like this: