Teach for America is not a volunteer organization…

…it’s a teacher-placement service. And depending how you feel about Teach for America’s mission and effectiveness, potentially a very expensive one.

There seems to be a common misconception that TFA is a volunteer organization like Peace Corps and Americorps, where corps members receive only a small living allowance and no wage. This editorial prompted me to try to help clear that up. While TFA corps members are considered members of Americorps, this only means TFA members are eligible for the loan forbearance and post-service education awards all Americorps members receive.

  1. Teach for America teachers are full employees of the school district in which they work and are paid out of district budgets. The school district pays corps members a full teaching salary plus benefits, just like any other teacher. TFA reports corps member salaries between $30,000 and $51,000.
  2. In some cases, school districts may also pay Teach for America a placement fee for each teacher hired from the corps. This seems to be a regional determination: this Rethinking Schools article by Barbara Miner (pdf) reports St. Louis schools paid TFA $2000 per placement; Birmingham schools reportedly paid TFA $5000 per placement.
  3. In 2008, the funding for about 25% of TFA’s operating expenses (or nearly $25 million) came from government grants. TFA also recently won a 5-year, $50 million grant in the Department of Education Investing in Innovation competition.

Add up all the taxpayer money spent, and then remember the entire 2010 TFA corps contains only 4,500 teachers. [Note: This number is of new recruits for 2010. The total number of active TFA corps members is around 8000.]

And then consider the middling results of Stanford’s 6-year study of TFA teachers in Houston (press summary, pdf full text), which found that uncertified TFA teachers only performed equivalently to other uncertified teachers and were out-performed by fully-certified teachers (as measured by student performance on standardized tests), after controlling for teacher backgrounds and student population characteristics. Even after TFA teachers become certified, they “generally perform[ed] on par” with traditionally-certified teachers.

Updated: Commenter Michael Bishop mentioned this 2004 Mathematica Policy Research study of Teach for America (pdf), which used random assignment of students to teachers. This was a one-year comparison study of TFA teachers to non-TFA teachers (novice and veteran) and found significant effects of TFA status for math results, but not for reading or behavioral outcomes.

And for those keeping score at home, the Mathematica study reports school districts paid TFA $1500 per teacher.

Retrieval is only part of the picture

The latest educational research to make the rounds has been reported variously as “Test-Taking Cements Knowledge Better Than Studying,” “Simple Recall Exercises Make Science Learning Easier,” “Practising Retrieval is Best Tool for Learning,” and “Learning Science: Actively Recalling Information from Memory Beats Elaborate Study Methods.” Before anyone gets carried away seeking to apply these findings to practice, let’s correct the headlines and clarify what the researchers actually studied.

First, the “test-taking” vs. “studying” dichotomy presented by the NYT is too broad. The winning condition was “retrieval practice”, described fairly as “actively recalling information from memory” or even “simple recall exercises.” The multiple-choice questions popular on so many standardized tests don’t qualify because they assess recognition of information, not recall. In this study, participants had to report as much information as they could remember from the text, a more generative task than picking the best among the possible answers presented to them.

Nor were the comparison conditions merely “studying.” While the worst-performing conditions asked students to read (and perhaps reread) the text, they were dropped from the second experiment, which contrasted retrieval practice against “elaborative concept-mapping.” Thus, the “elaborate” (better read as “elaborative”) study methods reported in the ScienceDaily headline are overly broad, since concept-mapping is only one of many kinds of elaborative study methods. That the researchers found no benefit for students who had previous concept-mapping experience may simply mean that it requires more than one or two exposures to be useful.

The premise underlying concept-mapping as a learning tool is that re-representing knowledge in another format helps students identify and understand relationships between the concepts. But producing a new representation on paper (or some other external medium) doesn’t require constructing a new internal mental representation. In focusing on producing a concept map, students may simply have copied the information from the text to their diagram without deeply processing what they were writing or drawing. By scoring the concept maps by completeness (number of ideas) rather than quality (appropriateness of node placement and links), this study did not fully safeguard against this.

To a certain extent that may be the exact point the researchers wanted to make: That concept-mapping can be executed in an “active” yet non-generative fashion. Even reviewing a concept map (as the participants were encouraged to do with any remaining time) can be done very superficially, simply checking to make sure that all the information is present, rather than reflecting on the relationships represented—similar to making a “cheat sheet” for a test and trusting that all the formulas and definitions are there, instead of evaluating the conditions and rationale for applying them.

One may construe this as an argument against concept-mapping as a study technique, if it is so difficult to utilize it effectively. But just because a given tool can be used poorly does not mean it should be avoided completely; that could be true of any teaching or learning approach. Nor does this necessarily constitute an argument against other elaborative study methods. Explaining a text or diagram, whether to oneself or to others, is another form of elaboration that has been well documented for its effectiveness in supporting learning[1]. This constitutes an interesting hybrid between elaboration and retrieval, insofar as explanation adds information beyond the source but may also demand partial recall of the contents of the source even when present. If the value of explanation is solely in the retrieval involved, then it should fare worse against pure retrieval and better against pure elaboration.

All of this begs the question, “Better for what?” The tests in this study primarily measured retrieval, with 84% of the points counting the presence of ideas and the rest (from only two questions) assessing inference. Yet even those inference questions depended partially on retrieval, making it ambiguous whether wrong answers reflected a failure to retrieve, comprehend, or apply knowledge. What this study showed most clearly was that retrieval practice is valuable for improving retrieval. Elaboration and other activities may still be valuable for promoting transfer and inference. There could also be a possible interaction whereby elaboration and retrieval mutually enhance each other, since remembering and conducting inferences is easier with robust knowledge structures. The lesson may not be that elaborative activities are a poor use of time, but that they need to incorporate retrieval practice to be most effective.

I don’t at all doubt the validity of the finding, or the importance of retrieval in promoting learning. I share the authors’ frustration with the often-empty trumpeting of “active learning,” which can assume ineffective and meaningless forms [2][3]. I also recognize the value of knowing certain information in order to utilize it efficiently and flexibly. My concerns are in interpreting and applying this finding sensibly to real-life teaching and learning.

  • Retrieval is only part of the picture. Educators need to assess and support multiple skills, including and beyond retrieval. There’s a great danger of forgetting other learning goals (such as understanding, applying, creating, evaluating, etc.) when pressured to document success in retrieval.
  • Is it retrieving knowledge or generating knowledge? I also wonder whether “retrieval” may be too narrow a label for the broader phenomenon of generating knowledge. This may be a specific instance of the well-documented generation effect [4], and it may not always be most beneficial to focus only on retrieving the particular facts. There could be a similar advantage to other generative tasks, such as inventing a new application of a given phenomenon, writing a story incorporating new vocabulary words, or creating a problem that could almost be solved by a particular strategy. None of these require retrieving the phenomenon, the definitions, or the solution method to be learned, but they all require elaborating upon the knowledge-to-be-learned by generating new information and deeper understanding of it. Knowledge is more than a list of disconnected facts [5]; it needs a structure to be meaningful [6]. Focusing too heavily on retrieving the list downplays the importance of developing the supporting structure.
  • Retrieval isn’t recognition, and not all retrieval is worthwhile. Most important, I’m especially concerned that the mainstream media’s reporting of this finding may make it too easily misinterpreted. It would be a shame if this were used to justify more multiple-choice testing, or if a well-meaning student thought that accurately reproducing a graph from a textbook by memory constituted better studying than explaining the relationships embedded within that graph.

For the sake of a healthy relationship between research and practice, I hope the general public and policymakers will take this finding in context and not champion it into the latest silver bullet that will save education. Careless conversion of research into practice undermines the scientific process, effective policymaking, and teachers’ professional judgment, all of which need to collaborate instead of collide.

J. D. Karpicke, J. R. Blunt. Retrieval Practice Produces More Learning than Elaborative Studying with Concept Mapping. Science, 2011; DOI: 10.1126/science.1199327

[1] Chi, M.T.H., de Leeuw, N., Chiu, M.H., & LaVancher, C. (1994). Eliciting self-explanations improves understanding. Cognitive Science, 18, 439-477.
[2] For example, see the “Teacher A” model described in:
Scardamalia, M., & Bereiter, C. (1991). Higher levels of agency for children in knowledge building: A challenge for the design of new knowledge media. Journal of the Learning Sciences, 1, 37-68.
(There’s also a “Johnny Appleseed” project description I once read that’s a bit of a caricature of poorly-designed project-based learning, but I can’t seem to find it now. If anyone knows of this example, please share it with me!)
[3] This is one reason why some educators now advocate “minds-on” rather than simply “hands-on” learning. Of course, what those minds are focused on still deserves better clarification.
[4] e.g., Slamecka, N.J., & Graf, P. (1978). The generation effect: Delineation of a phenomenon. Journal of Experimental Psychology: Human Learning and Memory, 4, 592-604.
[5] In the following study, some gifted students outscored historians in their fact recall, but could not evaluate and interpret claims as effectively:
Wineburg, S.S. (1991). Historical problem solving: A study of the cognitive processes used in the evaluation of documentary and pictorial evidence. Journal of Educational Psychology, 83, 73-87.
[6] For a fuller description of the importance of structured knowledge representations, see:
Bransford, J.D., Brown, A.L., & Cocking, R.R. (2000). How people learn: Brain, mind, experience, and school (Expanded edition). Washington DC: National Academy Press, pp. 31-50 (Ch. 2: How Experts Differ from Novices). 

The value-added wave is a tsunami

Edweek ran an article earlier this week in which economist Douglas N. Harris attempts to encourage economists and educators to get along.

He unfortunately lost me in the 3rd paragraph.

Drawing on student-level achievement data across years, linked to individual teachers, statistical techniques can be used to estimate how much each teacher contributed to student scores—the value-added measure of teacher performance. These measures in turn can be given to teachers and school leaders to inform professional development and curriculum decisions, or to make arguably higher-stakes decisions about performance pay, tenure, and dismissal.

Emphasis mine.

Economists and their education reform allies frequently make this claim, but it is not true, at least not yet. Value-added measures are based on standardized-test scores and neither currently provide information an educator can actually use to make professional development or curriculum decisions. When the scores are released, administrators and teachers receive a composite score and a handful of subscores for each student. In math, these subscores might be for topics like “Number and Operation Sense” and “Geometry”.

It does not do an educator any good to know last year’s students struggled with a topic as broad as “Number and Operation Sense”. Which numbers? Integers? Decimals? Did the students have problems with basic place value? Which operations? The non-commutative ones? Or did they have specific problems with regrouping and carrying? In what way are the students struggling? What errors are they making? What misconceptions might these errors point to? None of this information is contained in a score report. So, as an educator faced with test scores low in “Number and Operation Sense” (and which might be low in other areas as well), where do you start? Do you throw out the entire curriculum? If not, how do you know which parts of it need to be re-examined?

People trained in education recognize a difference between formative assessment—information collected for the purpose of improving instruction and student learning, and summative assessment—information collected to determine whether a student or other entity has reached a desired endpoint. Standardized tests are summative assessments—bad scores on them are like knowing that your football team keeps losing its games. This information is not sufficient for helping the team improve.

Why do economists see the issue so differently?

An economist myself, let me try to explain. Economists tend to think like well-meaning business people. They focus more on bottom-line results than processes and pedagogy, care more about preparing students for the workplace than the ballot box or art museum, and worry more about U.S. economic competitiveness. Economists also focus on the role financial incentives play in organizations, more so than the other myriad factors affecting human behavior. From this perspective, if we can get rid of ineffective teachers and provide financial incentives for the remainder to improve, then students will have higher test scores, yielding more productive workers and a more competitive U.S. economy.

This logic makes educators and education scholars cringe: Do economists not see that drill-and-kill has replaced rich, inquiry-based learning? Do they really think test preparation is the solution to the nation’s economic prosperity? Economists do partly recognize these concerns, as the quotations from the recent reports suggest. But they also see the motivation and goals of human behavior somewhat differently from the way most educators do.

This false dichotomy makes me cringe. As a trained education research scientist who is no stranger to statistical models, value-added is not ready for prime time because its primary input—standardized test scores—is deeply flawed. In science and statistics, if you put garbage data into your model, you will get garbage conclusions out. It has nothing to do with valuing art over economic competitiveness, and everything to do with the integrity of the science.

The divide between economists and others might be more productive if any of the reports provided specific recommendations. For example, creating better student assessments and combining value-added with classroom assessments are musts.

Thank you. Here where I start agreeing—if only that had been the central point of the article. I don’t dismiss value-added modeling as a technique, but I do not believe we have anything resembling good measures of teaching and learning.

We also have to avoid letting the tail wag the dog: Some states and districts are trying to expand testing to nontested grades and subjects, and to change test instruments so the scores more clearly reflect student growth for value-added calculations. This thinking is exactly backwards.

I agree completely, but that won’t stop states and districts from desperately trying to game the system. Since economists focus so much on financial incentives, this should be easy for them to understand: when the penalty for having low standardized test scores (or low value-added scores) is losing your funding, you will do whatever will get those scores up fastest. In most cases, that is changing the rules by which the scores are computed. Welcome to Campbell’s law.

What the reformers aren’t reforming

Back in December I shared this Answer Sheet blog post with some friends. The ensuing discussion revealed a disconnect between how those of us who work in education perceive the issues and how the lay public perceives the issues. This is my attempt to bridge the disconnect.

In the above post Valerie Strauss and Jay Mathews, both veteran journalists on the education beat, debate the merits of KIPP, Teach for America, and other aspects of education reform. Mathews tends to support the current wave of reforms—standardized testing, teacher merit pay, charter schools—while Strauss tends to be a skeptic. My non-education friends tended to side with Mathews, at least on the point these reforms are better than no reforms. Ming Ling and I sided with Strauss. [ML: Specifically, I agreed with Strauss’s concerns that current high-stakes accountability systems miss a lot of important information about teaching, that effective teaching requires ongoing training and support, and that improving education requires systemic policy. But I also agree with Mathews’ observations that KIPP, TfA, and charter schools have demonstrated many worthwhile achievements, and that “Fixing schools is going to need many varied approaches from people with different ideas.”]

These reforms focus on the incentive and regulatory structure of the school system. Proponents of these reforms believe that loosening the restrictions on schools and teacher pay, coupled with incentives and punishments, will let market forces take over. And market forces will lead the nation’s finest minds to the education industry, who will then find ways produce a better-quality “product” at a lower price so they can reap the rewards. The idea is intuitively appealing, but the collection of specific reform proposals contains serious flaws and don’t address key structural failures in our education system. There are reasons to believe these reforms are worse than no reforms at all.

The Flaws
Those pushing to change the incentive structure through test-based accountability, including merit pay, are assuming the tests are adequately measuring important educational outcomes. Questioning this orthodoxy is like standing next to a straw man with a large bounty on his head. To quote Jay Mathews from the post linked above,

Test-driven accountability is here to stay. Politicians who try to campaign against it are swiftly undercut by opponents who say, “What? You don’t want to make our schools accountable for all the money we spend on them?”

Nobody is against holding schools accountable for the taxpayer money spent. But holding schools accountable to a bad standard, particularly a high-stakes one, can distort people’s behavior in a counterproductive way. It’s called Campbell’s Law and social scientists have documented the effect in all areas of life, including a recent finding that texting bans may actually increase accident rates.

One doesn’t have to look very far to find examples of school districts outright cheating or otherwise trying to game the system. The Death and Life of the Great American School System by Diane Ravitch is full of such examples. Something about the standardized-testing-driven incentive structure is clearly not working as intended, but rather than stopping and developing better ways of measuring learning, the current reform crowd wants to plow ahead and raise the stakes again by linking teacher pay to this flawed system of accountability. They are not interested in asking the hard questions about what they’re really measuring and influencing.

But even if we assume the tests are good and the incentives are in the right places, it is difficult to see why market forces will necessarily improve the education system in the way those reformers are claiming. In politics, market forces are supposed to expand the pie, not equalize the slices. Despite very strong incentives to be elite athletes and coaches, a significant portion of the population can’t even run a mile without getting winded. But those who support market-based education reforms still use egalitarian rhetoric—appeals to closing the achievement gap, equipping all citizens with the tools they need to succeed.

Read more of this post

The (other) dark side of standardized testing

Dan DiMaggio, professional standardized test-scorer, writes harrowingly in Monthly Review:

No matter at what pace scorers work, however, tests are not always scored with the utmost attentiveness. The work is mind numbing, so scorers have to invent ways to entertain themselves. The most common method seems to be staring blankly at the wall or into space for minutes at a time. But at work this year, I discovered that no one would notice if I just read news articles while scoring tests. So every night, while scoring from home, I would surf the Internet and cut and paste loads of articles—reports on Indian Maoists, scientific speculation on whether animals can be gay, critiques of standardized testing—into what typically came to be an eighty-page, single-spaced Word document. Then I would print it out and read it the next day while I was working at the scoring center. This was the only way to avoid going insane. I still managed to score at the average rate for the room and perform according to “quality” standards. While scoring from home, I routinely carry on three or four intense conversations on Gchat. This is the reality of test scoring.

The central assumption behind the push to link student test performance to teacher merit pay is that test scores accurately reflect student learning. We apparently need a system like this because we don’t trust teachers to fairly score their own students’ work. But we trust these guys.

Supplementary reading by Todd Farley:

One of the tests I scored had students read a passage about bicycle safety. They were then instructed to draw a poster that illustrated a rule that was indicated in the text. We would award one point for a poster that included a correct rule and zero for a drawing that did not.

The first poster I saw was a drawing of a young cyclist, a helmet tightly attached to his head, flying his bike over a canal filled with flaming oil, his two arms waving wildly in the air. I stared at the response for minutes. Was this a picture of a helmet-wearing child who understood the basic rules of bike safety? Or was it meant to portray a youngster killing himself on two wheels?

And the followup…

Since 1994, when I first got hired as a lowly temp for measly wages to spend mere seconds glancing at and scoring standardized tests, until the release of my non‐bestselling book last fall, I had steadfastly believed that large‐scale assessment was a lame measure of student learning that really only benefitted the multi‐national corporations paid millions upon millions upon millions of dollars to write and score the tests. I began to see the error of my ways last Thanksgiving, however, just as soon as my huge son popped from his mother’s womb, keening and wailing, demanding massive amounts of food, a closet full of clothing, and the assistance of various costly household staff (baby‐sitter, music teacher, test‐prep tutor, etc.). Only then, as my little boy first began his mantra of “more, more, more,” did I finally see standardized testing for what it really is: a growth industry.

Diluting the meaning of “highly qualified” teachers

Valerie Strauss posts:

Senators have included in key legislation language that would allow teachers still in training to be considered “highly qualified” so they can meet a standard set in the federal No Child Left Behind law.

In an era when the education mantra is that all kids deserve great teachers, some members of Congress want it to be the law of the land that a neophyte teacher who has demonstrated “satisfactory progress” toward full state certification is “highly qualified.”

Is it just me, or have I been transported to 1984? The original definition of “highly qualified teacher” in No Child Left Behind already represented what in most high-achieving countries would be a bare minimum qualification for beginning a teaching residency. Allowing teachers-in-training to be classified as “highly qualified” seems ridiculous on its face.

Strauss sees this as a giveaway to political darling Teach for America:

Teachers still in training programs are disproportionately concentrated in schools serving low-income students and students of color, the very children who need the very best the teaching profession has to offer. In California alone, nearly a quarter of such teachers work in schools with 98-100 percent of minority students, while some affluent districts have none. Half of California’s teachers still in training teach special education.

Allowing non-certified teachers to be considered “highly qualified” would be a gift to programs such as Teach for America, which gives newly graduated college students from elite institutions five weeks of summer training before sending them into low-performing schools.

In other words, test-based value-added doesn’t make sense

Education Week posts a defense of value-added metrics, attempting to address concerns about their use and reliability. By their admission, it all hinges on a central assumption:

If student test achievement is the desired outcome, value-added is superior to other existing methods of classifying teachers.

What if student test achievement isn’t the desired outcome?

Judging books by their covers

On “Corruption in textbook-adoption proceedings: ‘Judging Books by Their Covers‘”:

In 1964 the eminent physicist Richard Feynman served on the State of California’s Curriculum Commission and saw how the Commission chose math textbooks for use in California’s public schools. In his acerbic memoir of that experience, titled “Judging Books by Their Covers,” Feynman analyzed the Commission’s idiotic method of evaluating books, and he described some of the tactics employed by schoolbook salesmen who wanted the Commission to adopt their shoddy products. “Judging Books by Their Covers” appeared as a chapter in “Surely You’re Joking, Mr. Feynman!” — Feynman’s autobiographical book that was published in 1985 by W.W. Norton & Company.

The perils of averaging (or poorly selected crowd-sourcing), biased presentations, and careless writing and reviewing.

Improving medical (and educational) research

On “Lies, Damned Lies, and Medical Science“:

Much of what medical researchers conclude in their studies is misleading, exaggerated, or flat-out wrong. So why are doctors—to a striking extent—still drawing upon misinformation in their everyday practice? Dr. John Ioannidis has spent his career challenging his peers by exposing their bad science.

The research funding and dissemination mechanisms need serious overhaul. I think the research enterprise also needs to institute more formal appreciation for methodologically sound replications, null results, and meta-analyses. If the goal of research is genuinely to improve the knowledge base, then its incentive structure should mirror that.

Ioannidis laid out a detailed mathematical proof that, assuming modest levels of researcher bias, typically imperfect research techniques, and the well-known tendency to focus on exciting rather than highly plausible theories, researchers will come up with wrong findings most of the time.

On the education side, we also need to help the general populace become more critical in its acceptance of news stories without tipping them over to the other extreme of distrusting all research. More statistics education, please! We need a more skeptical audience to help stop the news media from overplaying stories about slight or random effects.

So-called “data-driven” decision-making isn’t always smart

ASCD writes about reformers’ obsession with data and how it’s leading to a new kind of stupid:

Today’s enthusiastic embrace of data has waltzed us directly from a petulant resistance to performance measures to a reflexive and unsophisticated reliance on a few simple metrics—namely, graduation rates, expenditures, and the reading and math test scores of students in grades 3 through 8. The result has been a nifty pirouette from one troubling mind-set to another; with nary a misstep, we have pivoted from the “old stupid” to the “new stupid.”

The article goes on to describe the three major characteristics of “the new stupid”:

  1. Misusing data
  2. Oversimplifying and over-applying findings from research
  3. Fixating on performance data and failing to consider management data

The importance of this last point is grossly underappreciated by reformers.

Existing achievement data are of limited utility for management purposes. State tests tend to provide results that are too coarse to offer more than a snapshot of student and school performance, and few district data systems link student achievement metrics to teachers, practices, or programs in a way that can help determine what is working. More significant, successful public and private organizations monitor their operations extensively and intensively. FedEx and UPS know at any given time where millions of packages are across the United States and around the globe. Yet few districts know how long it takes to respond to a teaching applicant, how frequently teachers use formative assessments, or how rapidly school requests for supplies are processed and fulfilled.

For all of our attention to testing and assessment, student achievement measures are largely irrelevant to judging the performance of many school district employees. It simply does not make sense to evaluate the performance of a payroll processor or human resources recruiter—or even a foreign language instructor—primarily on the basis of reading and math test scores for grades 3 through 8.

Just as hospitals employ large numbers of administrative and clinical personnel to support doctors and the military employs accountants, cooks, and lawyers to support its combat personnel, so schools have a “long tail” of support staff charged with ensuring that educators have the tools they need to be effective. Just as it makes more sense to judge the quality of army chefs on the quality of their kitchens and cuisines rather than on the outcome of combat operations, so it is more sensible to focus on how well district employees perform their prescribed tasks than on less direct measures of job performance. The tendency to casually focus on student achievement, especially given the testing system’s heavy emphasis on reading and math, allows a large number of employees to either be excused from results-driven accountability or be held accountable for activities over which they have no control. This undermines a performance mindset and promises to eventually erode confidence in management.

Emphasis mine.

Very little of the education reform conversation focuses on school management and how it helps or hurts teachers. Through my work, I know teachers frequently have to work around last-minute schedule changes and other requests and wait out sometimes months-long processes for acquiring new printer toner or chairs that are the right height for the tables. The sanctity of the classroom is not respected, with PA announcements, administrators, fellow teachers, and visitors regularly breaking the flow of a lesson. Improving school management would likely go a long way to making the work of the teacher easier, but apparently teachers are the only school personnel who need to be held accountable for anything.

%d bloggers like this: