Retrieval is only part of the picture

The latest educational research to make the rounds has been reported variously as “Test-Taking Cements Knowledge Better Than Studying,” “Simple Recall Exercises Make Science Learning Easier,” “Practising Retrieval is Best Tool for Learning,” and “Learning Science: Actively Recalling Information from Memory Beats Elaborate Study Methods.” Before anyone gets carried away seeking to apply these findings to practice, let’s correct the headlines and clarify what the researchers actually studied.

First, the “test-taking” vs. “studying” dichotomy presented by the NYT is too broad. The winning condition was “retrieval practice”, described fairly as “actively recalling information from memory” or even “simple recall exercises.” The multiple-choice questions popular on so many standardized tests don’t qualify because they assess recognition of information, not recall. In this study, participants had to report as much information as they could remember from the text, a more generative task than picking the best among the possible answers presented to them.

Nor were the comparison conditions merely “studying.” While the worst-performing conditions asked students to read (and perhaps reread) the text, they were dropped from the second experiment, which contrasted retrieval practice against “elaborative concept-mapping.” Thus, the “elaborate” (better read as “elaborative”) study methods reported in the ScienceDaily headline are overly broad, since concept-mapping is only one of many kinds of elaborative study methods. That the researchers found no benefit for students who had previous concept-mapping experience may simply mean that it requires more than one or two exposures to be useful.

The premise underlying concept-mapping as a learning tool is that re-representing knowledge in another format helps students identify and understand relationships between the concepts. But producing a new representation on paper (or some other external medium) doesn’t require constructing a new internal mental representation. In focusing on producing a concept map, students may simply have copied the information from the text to their diagram without deeply processing what they were writing or drawing. By scoring the concept maps by completeness (number of ideas) rather than quality (appropriateness of node placement and links), this study did not fully safeguard against this.

To a certain extent that may be the exact point the researchers wanted to make: That concept-mapping can be executed in an “active” yet non-generative fashion. Even reviewing a concept map (as the participants were encouraged to do with any remaining time) can be done very superficially, simply checking to make sure that all the information is present, rather than reflecting on the relationships represented—similar to making a “cheat sheet” for a test and trusting that all the formulas and definitions are there, instead of evaluating the conditions and rationale for applying them.

One may construe this as an argument against concept-mapping as a study technique, if it is so difficult to utilize it effectively. But just because a given tool can be used poorly does not mean it should be avoided completely; that could be true of any teaching or learning approach. Nor does this necessarily constitute an argument against other elaborative study methods. Explaining a text or diagram, whether to oneself or to others, is another form of elaboration that has been well documented for its effectiveness in supporting learning[1]. This constitutes an interesting hybrid between elaboration and retrieval, insofar as explanation adds information beyond the source but may also demand partial recall of the contents of the source even when present. If the value of explanation is solely in the retrieval involved, then it should fare worse against pure retrieval and better against pure elaboration.

All of this begs the question, “Better for what?” The tests in this study primarily measured retrieval, with 84% of the points counting the presence of ideas and the rest (from only two questions) assessing inference. Yet even those inference questions depended partially on retrieval, making it ambiguous whether wrong answers reflected a failure to retrieve, comprehend, or apply knowledge. What this study showed most clearly was that retrieval practice is valuable for improving retrieval. Elaboration and other activities may still be valuable for promoting transfer and inference. There could also be a possible interaction whereby elaboration and retrieval mutually enhance each other, since remembering and conducting inferences is easier with robust knowledge structures. The lesson may not be that elaborative activities are a poor use of time, but that they need to incorporate retrieval practice to be most effective.

I don’t at all doubt the validity of the finding, or the importance of retrieval in promoting learning. I share the authors’ frustration with the often-empty trumpeting of “active learning,” which can assume ineffective and meaningless forms [2][3]. I also recognize the value of knowing certain information in order to utilize it efficiently and flexibly. My concerns are in interpreting and applying this finding sensibly to real-life teaching and learning.

  • Retrieval is only part of the picture. Educators need to assess and support multiple skills, including and beyond retrieval. There’s a great danger of forgetting other learning goals (such as understanding, applying, creating, evaluating, etc.) when pressured to document success in retrieval.
  • Is it retrieving knowledge or generating knowledge? I also wonder whether “retrieval” may be too narrow a label for the broader phenomenon of generating knowledge. This may be a specific instance of the well-documented generation effect [4], and it may not always be most beneficial to focus only on retrieving the particular facts. There could be a similar advantage to other generative tasks, such as inventing a new application of a given phenomenon, writing a story incorporating new vocabulary words, or creating a problem that could almost be solved by a particular strategy. None of these require retrieving the phenomenon, the definitions, or the solution method to be learned, but they all require elaborating upon the knowledge-to-be-learned by generating new information and deeper understanding of it. Knowledge is more than a list of disconnected facts [5]; it needs a structure to be meaningful [6]. Focusing too heavily on retrieving the list downplays the importance of developing the supporting structure.
  • Retrieval isn’t recognition, and not all retrieval is worthwhile. Most important, I’m especially concerned that the mainstream media’s reporting of this finding may make it too easily misinterpreted. It would be a shame if this were used to justify more multiple-choice testing, or if a well-meaning student thought that accurately reproducing a graph from a textbook by memory constituted better studying than explaining the relationships embedded within that graph.

For the sake of a healthy relationship between research and practice, I hope the general public and policymakers will take this finding in context and not champion it into the latest silver bullet that will save education. Careless conversion of research into practice undermines the scientific process, effective policymaking, and teachers’ professional judgment, all of which need to collaborate instead of collide.

J. D. Karpicke, J. R. Blunt. Retrieval Practice Produces More Learning than Elaborative Studying with Concept Mapping. Science, 2011; DOI: 10.1126/science.1199327

[1] Chi, M.T.H., de Leeuw, N., Chiu, M.H., & LaVancher, C. (1994). Eliciting self-explanations improves understanding. Cognitive Science, 18, 439-477.
[2] For example, see the “Teacher A” model described in:
Scardamalia, M., & Bereiter, C. (1991). Higher levels of agency for children in knowledge building: A challenge for the design of new knowledge media. Journal of the Learning Sciences, 1, 37-68.
(There’s also a “Johnny Appleseed” project description I once read that’s a bit of a caricature of poorly-designed project-based learning, but I can’t seem to find it now. If anyone knows of this example, please share it with me!)
[3] This is one reason why some educators now advocate “minds-on” rather than simply “hands-on” learning. Of course, what those minds are focused on still deserves better clarification.
[4] e.g., Slamecka, N.J., & Graf, P. (1978). The generation effect: Delineation of a phenomenon. Journal of Experimental Psychology: Human Learning and Memory, 4, 592-604.
[5] In the following study, some gifted students outscored historians in their fact recall, but could not evaluate and interpret claims as effectively:
Wineburg, S.S. (1991). Historical problem solving: A study of the cognitive processes used in the evaluation of documentary and pictorial evidence. Journal of Educational Psychology, 83, 73-87.
[6] For a fuller description of the importance of structured knowledge representations, see:
Bransford, J.D., Brown, A.L., & Cocking, R.R. (2000). How people learn: Brain, mind, experience, and school (Expanded edition). Washington DC: National Academy Press, pp. 31-50 (Ch. 2: How Experts Differ from Novices). 

Improving medical (and educational) research

On “Lies, Damned Lies, and Medical Science“:

Much of what medical researchers conclude in their studies is misleading, exaggerated, or flat-out wrong. So why are doctors—to a striking extent—still drawing upon misinformation in their everyday practice? Dr. John Ioannidis has spent his career challenging his peers by exposing their bad science.

The research funding and dissemination mechanisms need serious overhaul. I think the research enterprise also needs to institute more formal appreciation for methodologically sound replications, null results, and meta-analyses. If the goal of research is genuinely to improve the knowledge base, then its incentive structure should mirror that.

Ioannidis laid out a detailed mathematical proof that, assuming modest levels of researcher bias, typically imperfect research techniques, and the well-known tendency to focus on exciting rather than highly plausible theories, researchers will come up with wrong findings most of the time.

On the education side, we also need to help the general populace become more critical in its acceptance of news stories without tipping them over to the other extreme of distrusting all research. More statistics education, please! We need a more skeptical audience to help stop the news media from overplaying stories about slight or random effects.

Drawing inferences from data is limited by what the data measure

In “Why Genomics Falls Short as a Medical Tool,” Matt Ridley points out how tracking genetic associations hasn’t yielded as much explanatory power as hoped to inform medical applications:

It’s a curious fact that genomics has always been sold as a medical story, yet it keeps underdelivering useful medical knowledge and overdelivering other stuff. … True, for many rare inherited diseases, genomics is making a big difference. But not for most of the common ailments we all get. Nor has it explained the diversity of the human condition in things like height, intelligence and extraversion.

He notes that even something as straightforward and heritable as height has been difficult to predict from the genes identified:

Your height, for example, is determined something like 90% by the tallness of your parents—so long as you and they were decently well fed as children. … In the case of height, more than 50 genetic variants were identified, but together they could account for only 5% of the heritability. Where was the other 95%?

Some may argue that it’s a case of needing to search more thoroughly for all the relevant genes:

A recent study of height has managed to push the explained heritability up to about half, by using a much bigger sample. But still only half.

Or, perhaps there are so many genetic pathways that affect height that it would be difficult to identify and generalize from them all:

Others… think that heritability is hiding in rare genetic variants, not common ones—in “private mutations,” genetic peculiarities that are shared by just a few people each. Under this theory, as Tolstoy might have put it, every tall person would be tall in a different way.

Ridley closes by emphasizing that genes influence outcomes through complex interactions and network effects.

If we expect education research and application to emulate medical research and application, then we need to recognize and beware of its limitations as well. Educational outcomes are even more multiply determined than height, personality, and intelligence. If we seek to understand and control subtle environmental influences, we need to do much more than simply measure achievement on standardized tests and manipulate teacher incentives.

Analogies between pharmaceutical development and education

In “Research Universities and Big Pharma’s Wicked Problem,” neuroscientist Regis Kelly draws comparisons from the manufacture of biofuels to the development of new pharmaceuticals, suggesting that both are

a “wicked” problem, defined conventionally as a problem that is almost insoluble because it requires the expertise of many stakeholders with disparate backgrounds and non-overlapping goals to work well together to address an important society problem.

He then critiques the pharmaceutical industry for its imprecise knowledge, poor outcome measures, and lack of theoretical grounding:

The key issue is that we are far from having biological knowledge at anywhere close to the precision that we have engineering knowledge. We cannot generate a blueprint specifying how the human body works. … The pharmaceutical industry usually lacks good measures of the efficacy of its interventions. … We also lack a theory of drug efficacy.

His recommendations for improvement target the above weaknesses, in addition to endorsing more collaboration between researchers, engineers, and practitioners:

The pharmaceutical industry needs a much more precise blueprint for the human body; greater knowledge of its interlocking regulatory systems; and accurate monitors of functional defects. It needs clinical doctors working with research scientists and bioengineers.

If we in the education field continue to seek analogies to medicine, then we should heed these criticisms and recommendations. We too need more precise understanding of the processes by which students learn, greater knowledge of how those systems interact, and better assessment of skill and understanding. We also need closer collaboration between educational researchers, learning-environment designers, school administrators, and teachers.

Evidence in educational research

On “Classroom Research and Cargo Cults“:

After many years of educational research, it is disconcerting that we have little dependable research guidance for school policy. We have useful statistics in the form of test scores…. But we do not have causal analyses of these data that could reliably lead to significant improvement.

This offers powerful reading for anyone with an interest in education. Hirsch starts off a bit controversial, but he moves toward principles upon which we can all converge: Evidence matters, AND theoretical description of causal mechanism matters.

The challenge of completing the analogy between educational research and medical research (i.e., finding the education-research analogue to the germ theory of disease) is in developing precise assessment of knowledge. The prior knowledge that is so important in influencing how people learn does not map directly onto a particular location or even pattern of connectivity in the brain. There is no neural “germ” or “molecule” that represents some element of knowledge.

Other tidbits:

  1. Intention to learn may sometimes be a condition for learning, but it is not a necessary or sufficient condition.
  2. Neisser’s law:

    You can get a good deal from rehearsal
    If it just has the proper dispersal.
    You would just be an ass
    To do it en masse:
    Your remembering would turn out much worsal.

  3. I wouldn’t characterize the chick-sexing experiments as the triumph of explicit over implicit learning, but rather, that of carefully structured over wholly naturalistic environments. One can implicitly learn quite effectively from the presentation of examples across boundaries, from prototypes and attractors, and from extremes.

Reporting education research

On “Research Upends Traditional Thinking on Study Habits“:

Psychologists have discovered that some of the most hallowed advice on study habits is flat out wrong.

My thoughts:

  1. This is old news; psychologists have known for decades about the spacing effect and variability in promoting robust learning.
  2. Learning scientists need to do a better job disseminating their findings.
  3. The problem is all the popular fluff that lacks or directly contradicts ed research yet gets published anyway.
  4. I should just be glad that this line of work is getting mass media attention now.
%d bloggers like this: