Interleaved Practice Improves Mathematics Learning

Doug Rohrer, Robert F. Dedrick, and Sandra Stershic

Published:
ERCT Check Date:
DOI: 10.1037/edu0000001
  • mathematics
  • K12
  • US
0
  • C

    Randomization was conducted at the class level with intact classes assigned to each condition.

    “Two of the classes were designated as ‘honors/gifted’ by the school, and these two classes were randomly assigned to different groups. The remaining seven classes were deemed by the school as being at the same level, and each of these classes was randomly assigned to one of the two groups.” (p. 5)

  • E

    The study employed custom-designed tests of graphing and slope problems rather than a recognized standardized exam.

    “The test booklet included a cover sheet, a sheet of paper with three graph problems, and a sheet of paper with three slope problems ... None of the problems had appeared in either a practice assignment or the review.” (p. 6)

  • T

    The study measured outcomes after approximately three months, satisfying the term-duration requirement.

    “Students received the 10 assignments on Days 1, 6, 14, … 86–88. Every student received the same review on Day 93 … Students were tested 1 or 30 days after the review.” (pp. 5–6)

  • D

    The control group’s size and baseline comparability (NAEP scores) are documented in detail.

    “The two groups scored similarly well on a test … 76% vs. 79%, t(108) = 0.62, p = .54, Cohen’s d = 0.12.” (p. 5)

  • S

    Randomization was at the class level within one school, not across multiple schools.

    “Two of the classes were designated as ‘honors/gifted’ … randomly assigned … The remaining seven classes … were randomly assigned …” (p. 5)

  • I

    The study was conducted and scored by the authors, with no independent external evaluation.

    “One or more authors visited the school and scored each student’s assignment …” (p. 5)

  • Y

    Outcomes were measured within three months, not tracked over an academic year.

    “Students received the 10 assignments … Day 93 … Students were tested 1 or 30 days after the review.” (pp. 5–6)

  • B

    Both groups received the same number of assignments, problems, and review sessions, ensuring balanced time and resources.

    “In the experiment reported here, 126 seventh-grade students received the same practice problems over a 3‑month period, but the problems were arranged so that skills were learned by interleaved practice or by the usual blocked approach. The practice phase concluded with a review session, followed 1 or 30 days later by an unannounced test.” (pp. 1–2)

  • R

    There is no evidence of an independent replication study by a different research team that confirms these findings.

  • A

    The study only assessed mathematics graphing and slope problems, not a full range of subjects.

    “Students received graph problems and slope problems … None of the problems had appeared in either a practice assignment or the review.” (pp. 1, 6)

  • G

    Follow-up ended at 30 days post-review, with no tracking until graduation.

    “Students were tested 1 or 30 days after the review …” (p. 6)

  • P

    The paper does not report a pre-registration or registry before the intervention began.

Abstract

A typical mathematics assignment consists primarily of practice problems requiring the strategy introduced in the immediately preceding lesson (e.g., a dozen problems that are solved by using the Pythagorean theorem). This means that students know which strategy is needed to solve each problem before they read the problem. In an alternative approach known as interleaved practice, problems from the course are rearranged so that a portion of each assignment includes different kinds of problems in an interleaved order. Interleaved practice requires students to choose a strategy on the basis of the problem itself, as they must do when they encounter a problem during a comprehensive examination or subsequent course. In the experiment reported here, 126 seventh-grade students received the same practice problems over a 3‑month period, but the problems were arranged so that skills were learned by interleaved practice or by the usual blocked approach. The practice phase concluded with a review session, followed 1 or 30 days later by an unannounced test. Compared with blocked practice, interleaved practice produced higher scores on both the immediate and delayed tests (Cohen’s ds = 0.42 and 0.79, respectively).

Full Article

ERCT Criteria Breakdown

  • Level 1 Criteria

    • C

      Class-level RCT

      • Randomization was conducted at the class level with intact classes assigned to each condition.
      • “Two of the classes were designated as ‘honors/gifted’ by the school, and these two classes were randomly assigned to different groups. The remaining seven classes were deemed by the school as being at the same level, and each of these classes was randomly assigned to one of the two groups.” (p. 5)
      • Relevant Quotes: 1) “Two of the classes were designated as ‘honors/gifted’ by the school, and these two classes were randomly assigned to different groups. The remaining seven classes were deemed by the school as being at the same level, and each of these classes was randomly assigned to one of the two groups.” (p. 5) Detailed Analysis: Randomization occurred at the class level, with intact classes assigned to either interleaved or blocked practice. Under the ERCT standard, class- level randomization prevents contamination and thus satisfies criterion C. Final sentence: Criterion C is met because the study randomized intact classes to treatment and control groups.
    • E

      Exam-based Assessment

      • The study employed custom-designed tests of graphing and slope problems rather than a recognized standardized exam.
      • “The test booklet included a cover sheet, a sheet of paper with three graph problems, and a sheet of paper with three slope problems ... None of the problems had appeared in either a practice assignment or the review.” (p. 6)
      • Relevant Quotes: 1) “The test booklet included a cover sheet, a sheet of paper with three graph problems, and a sheet of paper with three slope problems. We created six versions of the test by reordering the problems within each page ... None of the problems had appeared in either a practice assignment or the review.” (p. 6) Detailed Analysis: Outcomes were measured using tests created by the authors for this study, not via any widely recognized standardized exam. There is no indication of use of a state or national test instrument. Thus, criterion E is not satisfied. Final sentence: Criterion E is not met because the study used custom tests rather than standardized examinations.
    • T

      Term Duration

      • The study measured outcomes after approximately three months, satisfying the term-duration requirement.
      • “Students received the 10 assignments on Days 1, 6, 14, … 86–88. Every student received the same review on Day 93 … Students were tested 1 or 30 days after the review.” (pp. 5–6)
      • Relevant Quotes: 1) “Students received the 10 assignments on Days 1, 6, 14, 32–33, 33 or 35, 35 or 38, 45–46, 72–75, 81–82, and 86–88. Every student received the same review on Day 93, about 5 days after the last assignment. Students were tested 1 or 30 days after the review.” (pp. 5–6) Detailed Analysis: The intervention and outcome measurements spanned approximately 93 days—about a full academic term (3 months). Final assessments were conducted after this term-long interval, satisfying criterion T. Final sentence: Criterion T is met because outcomes were measured at least one full academic term after the intervention began.
    • D

      Documented Control Group

      • The control group’s size and baseline comparability (NAEP scores) are documented in detail.
      • “The two groups scored similarly well on a test … 76% vs. 79%, t(108) = 0.62, p = .54, Cohen’s d = 0.12.” (p. 5)
      • Relevant Quotes: 1) “Practice schedule was a counterbalanced within-subject variable. Students in Group 1 received interleaved practice … and Group 2 received the reverse. Group 1 (n = 59) … Group 2 (n = 67) …” (p. 5) 2) “The two groups scored similarly well on a test consisting of six multiple-choice problems from the Grade‑8 NAEP … 76% (SD 22%) vs. 79% (SD 22%), t(108) = 0.62, p = .54, Cohen’s d = 0.12.” (p. 5) Detailed Analysis: The control (blocked) group’s size and baseline comparability via NAEP scores are documented, and both groups received identical instructional materials and review. This satisfies criterion D. Final sentence: Criterion D is met because the control group’s makeup, size, and baseline performance are clearly documented.
  • Level 2 Criteria

    • S

      School-level RCT

      • Randomization was at the class level within one school, not across multiple schools.
      • “Two of the classes were designated as ‘honors/gifted’ … randomly assigned … The remaining seven classes … were randomly assigned …” (p. 5)
      • Relevant Quotes: 1) “Two of the classes were designated as ‘honors/gifted’ … randomly assigned … The remaining seven classes … were randomly assigned …” (p. 5) Detailed Analysis: Randomization was at the class level within a single school. There is no evidence that entire schools were randomized. Thus, criterion S is not met. Final sentence: Criterion S is not met because randomization did not occur at the school level.
    • I

      Independent Conduct

      • The study was conducted and scored by the authors, with no independent external evaluation.
      • “One or more authors visited the school and scored each student’s assignment …” (p. 5)
      • Relevant Quotes: 1) “One or more authors visited the school … scored each student’s assignment …” (p. 5) 2) “Each test was scored … by two raters who were blind to condition …” (p. 6) Detailed Analysis: The authors designed, delivered, and scored the intervention and assessments themselves. No independent third-party evaluation is reported. Thus, criterion I is not met. Final sentence: Criterion I is not met because the same research team conducted and assessed the intervention without external oversight.
    • Y

      Year Duration

      • Outcomes were measured within three months, not tracked over an academic year.
      • “Students received the 10 assignments … Day 93 … Students were tested 1 or 30 days after the review.” (pp. 5–6)
      • Relevant Quotes: 1) “Students received the 10 assignments … Day 93 … Students were tested 1 or 30 days after the review.” (pp. 5–6) Detailed Analysis: The study lasted roughly 3 months, far shorter than a full academic year, and there was no year‑long tracking. Thus, criterion Y is not met. Final sentence: Criterion Y is not met because the study did not track outcomes for a full academic year.
    • B

      Balanced Resources

      • Both groups received the same number of assignments, problems, and review sessions, ensuring balanced time and resources.
      • “In the experiment reported here, 126 seventh-grade students received the same practice problems over a 3‑month period, but the problems were arranged so that skills were learned by interleaved practice or by the usual blocked approach. The practice phase concluded with a review session, followed 1 or 30 days later by an unannounced test.” (pp. 1–2)
      • Relevant Quotes: 1) “In the experiment reported here, 126 seventh-grade students received the same practice problems over a 3‑month period, but the problems were arranged so that skills were learned by interleaved practice or by the usual blocked approach. The practice phase concluded with a review session, followed 1 or 30 days later by an unannounced test.” (pp. 1–2) Detailed Analysis: Both groups had identical assignments and review sessions; they differed only in how problems were ordered (interleaved vs. blocked). Neither group received any additional resources or instructional time. Therefore, criterion B is met. Final sentence: Criterion B is met because both groups had equal instructional time and materials.
  • Level 3 Criteria

    • R

      Reproduced Results

      • There is no evidence of an independent replication study by a different research team that confirms these findings.
      • Relevant Quotes: 1) “Although test scores showed positive effects for those who received the interleaved practice style over those who received blocked practice, this was not statistically significant.” (Ostrow et al., 2015) Detailed Analysis: An independent classroom study of interleaved practice (Ostrow et al., 2015) found only a non-significant trend favoring interleaving. No other external team has fully replicated this study’s findings. Thus, criterion R is not met. Final sentence: Criterion R is not met because no independent replication has confirmed the results.
    • A

      All Exams

      • The study only assessed mathematics graphing and slope problems, not a full range of subjects.
      • “Students received graph problems and slope problems … None of the problems had appeared in either a practice assignment or the review.” (pp. 1, 6)
      • Relevant Quotes: 1) “Students received graph problems and slope problems … None of the problems had appeared in either a practice assignment or the review.” (pp. 1, 6) Detailed Analysis: The outcomes focused exclusively on graphing and slope in mathematics. No other subjects were assessed. Thus, criterion A is not met. Final sentence: Criterion A is not met because only graphing and slope problems were assessed.
    • G

      Graduation Tracking

      • Follow-up ended at 30 days post-review, with no tracking until graduation.
      • “Students were tested 1 or 30 days after the review …” (p. 6)
      • Relevant Quotes: 1) “Students were tested 1 or 30 days after the review …” (p. 6) Detailed Analysis: Follow-up ended at 30 days post-review; no graduation tracking. Thus, criterion G is not met. Final sentence: Criterion G is not met because no long-term tracking through graduation was conducted.
    • P

      Pre-Registered Protocol

      • The paper does not report a pre-registration or registry before the intervention began.
      • Relevant Quotes: (No mention of pre-registration.) Detailed Analysis: The paper does not reference any pre-registration or registry before data collection, and no pre-study registration was found elsewhere. Thus, criterion P is not met. Final sentence: Criterion P is not met because the study lacks any evidence of pre-registration.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

  • Submit a Study for Evaluation

    Share your research with us for review

  • Suggest Improvements

    Provide feedback to help us make things better.

  • Update Your Study

    If you're the author, let us know about necessary updates or corrections.