Personalized Learning Initiative Interim Report: Findings from 2023-24

Monica P. Bhatt, Terence Chau, Barbara Condliffe, Rebecca Davis, Jean Grossman, Jonathan Guryan, Jens Ludwig, Matteo Magnaricotte, Shira Kolnik Mattera, Fatemeh Momeni, Philip Oreopoulos, and Greg Stoddard

Published:
ERCT Check Date:
DOI:
  • mathematics
  • reading
  • K12
  • US
  • blended learning
  • EdTech platform
  • digital assessment
1
  • C

    Randomization includes student-level lotteries, but because the intervention is tutoring, the ERCT tutoring exception applies.

    "We randomly assigned eligible students to one of three conditions—HDT, SHDT, or a business as usual group—using individual-level, classroom-level, and teacher-level lotteries." (p. 28)

  • E

    Outcomes are measured using established standardized assessments such as state tests, NWEA MAP, i-Ready, STAR, and PSAT/SAT.

    "The following assessments are included in the index outcome for Chicago: STAR, i-Ready, Illinois Assessment of Readiness (IAR), PSAT, and SAT." (p. 26)

  • T

    Interventions and end-of-year outcome measurement occur over at least an academic-term scale from intervention start to testing.

    "We piloted HDT in nine middle schools and SHDT in one school, starting in the second semester (January-May 2024)." (p. 46)

  • D

    The BAU control condition is defined and detailed baseline balance tables document control group characteristics.

    "HDT stands for high dosage tutoring, SHDT stands for sustainable high dosage tutoring, and BAU stands for business as usual (our control group)." (p. 13)

  • S

    Randomization is at student, classroom, teacher, or grade level, not at the school level.

    "Randomization Level... Student... Classroom... Grade" (Table 1, p. 13)

  • I

    Researchers co-designed the tutoring models with partner districts, so evaluation was not fully independent of intervention design.

    "In each of our PLI sites, we partnered with policymakers and district and school practitioners to co-design how to operationalize the tenets of HDT and SHDT to fit their local context." (p. 12)

  • Y

    Several sites implemented tutoring only in spring 2024 (or for about 12 weeks), which is shorter than a full academic year.

    "We piloted HDT in nine middle schools and SHDT in one school, starting in the second semester (January-May 2024)." (p. 46)

  • B

    The study explicitly tests the effect of providing additional tutoring resources relative to business-as-usual.

    "Randomization status was also preserved as we see low control crossover overall—less than 3% of students assigned to the control group received tutoring services, though they did receive all other services a school had to offer." (p. 16)

  • R

    No independent peer-reviewed replication of this specific PLI 2023-24 study design was identified.

  • A

    Primary outcomes are standardized tests in the tutored subject, not across all core subjects.

    "The outcome of interest is an index of all available relevant EOY standardized test scores in the tutored subject." (p. 7)

  • G

    This interim report reports end-of-year outcomes and does not track students to graduation; additionally, Y is not met.

    "This interim report presents the impact of tutoring on this sample of students in the 2023-2024 school year, with the subsequent sections focusing on site-by- site impacts." (p. 21)

  • P

    An OSF link is provided, but the pre-registration timestamp could not be verified to precede the study start.

    "The pre-analysis plan for this study is posted on Open Science Framework at the link below. https://osf.io/fkjmn/" (p. 66)

Abstract

This report summarizes the ongoing work by the Personalized Learning Initiative (PLI) research team to understand whether and how scaling high dosage tutoring (HDT) works in the post-pandemic environment. The study involved a large-scale randomized controlled trial with eight partners across the US in the 2023-24 school year. Findings indicate that tutoring is effective overall, with impacts ranging from 0.06-0.09 SD. Lower-cost models were found to be just as effective as higher-cost models, and virtual tutoring appeared comparable to in-person tutoring.

Full Article

ERCT Criteria Breakdown

  • Level 1 Criteria

    • C

      Class-level RCT

      • Randomization includes student-level lotteries, but because the intervention is tutoring, the ERCT tutoring exception applies.
      • "We randomly assigned eligible students to one of three conditions—HDT, SHDT, or a business as usual group—using individual-level, classroom-level, and teacher-level lotteries." (p. 28)
      • Relevant Quotes: 1) "We randomly assigned eligible students to one of three conditions—HDT, SHDT, or a business as usual group—using individual-level, classroom-level, and teacher-level lotteries." (p. 28) 2) "HDT - 4:1 tutoring – At 23 of our study schools, tutors worked with groups of four students." (p. 22) Detailed Analysis: Criterion C requires randomization at the class level or higher to reduce contamination. The ERCT exception applies when the intervention is personal teaching such as tutoring, where student-level randomization can still be acceptable. This study evaluates high dosage tutoring delivered in small groups (for example 4:1). Randomization is often at the student level (and sometimes at classroom or grade level), but because the intervention is tutoring (personal instruction), the tutoring exception applies. Final sentence: Criterion C is met because the intervention is tutoring and the ERCT tutoring exception allows student-level randomization.
    • E

      Exam-based Assessment

      • Outcomes are measured using established standardized assessments such as state tests, NWEA MAP, i-Ready, STAR, and PSAT/SAT.
      • "The following assessments are included in the index outcome for Chicago: STAR, i-Ready, Illinois Assessment of Readiness (IAR), PSAT, and SAT." (p. 26)
      • Relevant Quotes: 1) "The following assessments are included in the index outcome for Chicago: STAR, i-Ready, Illinois Assessment of Readiness (IAR), PSAT, and SAT." (p. 26) 2) "The following assessments are included in the index outcome for Fulton: NWEA MAP, Georgia Milestones, i-Ready." (p. 32) Detailed Analysis: Criterion E requires outcomes measured using standardized, widely recognized exams rather than researcher-created tests. Across sites, the report lists standardized state assessments (for example Georgia Milestones) and widely used standardized tests (for example NWEA MAP, i-Ready, STAR, PSAT/SAT) used to construct the primary end-of-year outcome indices. Final sentence: Criterion E is met because outcomes rely on established standardized assessments rather than custom tests.
    • T

      Term Duration

      • Interventions and end-of-year outcome measurement occur over at least an academic-term scale from intervention start to testing.
      • "We piloted HDT in nine middle schools and SHDT in one school, starting in the second semester (January-May 2024)." (p. 46)
      • Relevant Quotes: 1) "We piloted HDT in nine middle schools and SHDT in one school, starting in the second semester (January-May 2024)." (p. 46) 2) "Tutoring was scheduled for at least 90 minutes per week over 32-36 weeks." (p. 50) 3) "Tutoring began in February 2024 and lasted for 12 weeks." (p. 60) Detailed Analysis: Criterion T requires that outcomes are measured at least one academic term (approximately 3-4 months) after the intervention begins. The report's outcome measures are end-of-year standardized tests. Several sites ran spring pilots (January-May 2024) which spans roughly a term, and other sites ran longer (up to 32-36 weeks). Although some interventions were relatively short (for example 12 weeks after a February start), the combination of spring start plus end-of-year testing implies a term-scale follow-up window from intervention start to outcome measurement. Final sentence: Criterion T is met because outcome measurement occurs at end of year after interventions that, in the shortest cases, span approximately a term from start to end-of-year testing.
    • D

      Documented Control Group

      • The BAU control condition is defined and detailed baseline balance tables document control group characteristics.
      • "HDT stands for high dosage tutoring, SHDT stands for sustainable high dosage tutoring, and BAU stands for business as usual (our control group)." (p. 13)
      • Relevant Quotes: 1) "HDT stands for high dosage tutoring, SHDT stands for sustainable high dosage tutoring, and BAU stands for business as usual (our control group)." (p. 13) 2) "Table 2: Baseline balance, pooled analysis sample 2023-24, HDT" (p. 14) Detailed Analysis: Criterion D requires that the control group is clearly defined and described, including baseline characteristics. The report defines BAU as the control group and provides extensive baseline balance tables for pooled and site-specific samples, reporting control means, sample sizes, and baseline covariates (for example demographics and prior scores). Final sentence: Criterion D is met because the BAU control group is defined and baseline characteristics are documented in detailed balance tables.
  • Level 2 Criteria

    • S

      School-level RCT

      • Randomization is at student, classroom, teacher, or grade level, not at the school level.
      • "Randomization Level... Student... Classroom... Grade" (Table 1, p. 13)
      • Relevant Quotes: 1) "Randomization Level... Student... Classroom... Grade" (Table 1, p. 13) Detailed Analysis: Criterion S requires randomization at the school level. Table 1 reports that randomization occurred at the student, classroom, teacher, or grade level across sites. No site is described as using school-level randomization. Final sentence: Criterion S is not met because the study does not randomize at the school level.
    • I

      Independent Conduct

      • Researchers co-designed the tutoring models with partner districts, so evaluation was not fully independent of intervention design.
      • "In each of our PLI sites, we partnered with policymakers and district and school practitioners to co-design how to operationalize the tenets of HDT and SHDT to fit their local context." (p. 12)
      • Relevant Quotes: 1) "In each of our PLI sites, we partnered with policymakers and district and school practitioners to co-design how to operationalize the tenets of HDT and SHDT to fit their local context." (p. 12) Detailed Analysis: Criterion I requires that evaluators are independent of the intervention's design and delivery. The report states that the research team partnered with district and school practitioners to co-design how tutoring models would be operationalized. This indicates direct involvement in intervention design rather than evaluation of an externally designed, independently delivered program. Final sentence: Criterion I is not met because the research team co-designed the interventions with practitioners.
    • Y

      Year Duration

      • Several sites implemented tutoring only in spring 2024 (or for about 12 weeks), which is shorter than a full academic year.
      • "We piloted HDT in nine middle schools and SHDT in one school, starting in the second semester (January-May 2024)." (p. 46)
      • Relevant Quotes: 1) "We piloted HDT in nine middle schools and SHDT in one school, starting in the second semester (January-May 2024)." (p. 46) 2) "Tutoring began in February 2024 and lasted for 12 weeks." (p. 60) Detailed Analysis: Criterion Y requires that outcomes are measured at least one full academic year after the intervention begins. Several sites explicitly implemented tutoring only in the second semester or for around 12 weeks beginning in February 2024. These durations are substantially shorter than an academic year from intervention start to outcome measurement. Final sentence: Criterion Y is not met because key sites implemented tutoring for less than a full academic year.
    • B

      Balanced Control Group

      • The study explicitly tests the effect of providing additional tutoring resources relative to business-as-usual.
      • "Randomization status was also preserved as we see low control crossover overall—less than 3% of students assigned to the control group received tutoring services, though they did receive all other services a school had to offer." (p. 16)
      • Relevant Quotes: 1) "HDT stands for high dosage tutoring, SHDT stands for sustainable high dosage tutoring, and BAU stands for business as usual (our control group)." (p. 13) 2) "Randomization status was also preserved as we see low control crossover overall—less than 3% of students assigned to the control group received tutoring services, though they did receive all other services a school had to offer." (p. 16) 3) "We randomly assigned eligible students to one of three conditions—HDT, SHDT, or a business as usual group—using individual-level, classroom-level, and teacher-level lotteries." (p. 28) Detailed Analysis: Criterion B requires balanced time and resources across treatment and control, unless the study explicitly tests additional resources as the treatment variable. Here, the intervention is tutoring itself, which necessarily adds instructional time and staffing relative to BAU. The report frames BAU as the control and contrasts it with being assigned to tutoring conditions. This is a classic case where additional instructional resources (tutoring time, tutors) are the treatment variable by design. Final sentence: Criterion B is met because the study explicitly tests the impact of providing additional tutoring resources relative to BAU.
  • Level 3 Criteria

    • R

      Reproduced

      • No independent peer-reviewed replication of this specific PLI 2023-24 study design was identified.
      • Relevant Quotes: 1) "This interim report presents the impact of tutoring on this sample of students in the 2023-2024 school year, with the subsequent sections focusing on site-by-site impacts." (p. 21) Detailed Analysis: Criterion R requires an independent replication by other authors in another context, published as a peer-reviewed study. The report itself is an interim multi-site impact report and does not present itself as a replication of a prior study, nor does it cite an independent replication of this specific PLI design. A web search for independent replications of this specific PLI 2023-24 design did not identify peer-reviewed replication studies by independent teams. Final sentence: Criterion R is not met because there is no evidence of an independent peer-reviewed replication of this specific study.
    • A

      All-subject Exams

      • Primary outcomes are standardized tests in the tutored subject, not across all core subjects.
      • "The outcome of interest is an index of all available relevant EOY standardized test scores in the tutored subject." (p. 7)
      • Relevant Quotes: 1) "The outcome of interest is an index of all available relevant EOY standardized test scores in the tutored subject." (p. 7) Detailed Analysis: Criterion A requires measuring standardized exam outcomes across all core subjects, not only the tutored subject. The report consistently defines the primary outcome as end-of-year standardized tests in the tutored subject (math or reading, depending on site). It does not report standardized outcomes for non-tutored core subjects. Final sentence: Criterion A is not met because outcomes are limited to the tutored subject rather than all core subjects.
    • G

      Graduation Tracking

      • This interim report reports end-of-year outcomes and does not track students to graduation; additionally, Y is not met.
      • "This interim report presents the impact of tutoring on this sample of students in the 2023-2024 school year, with the subsequent sections focusing on site-by- site impacts." (p. 21)
      • Relevant Quotes: 1) "This interim report presents the impact of tutoring on this sample of students in the 2023-2024 school year, with the subsequent sections focusing on site-by-site impacts." (p. 21) 2) "After sample enrollment and program implementation is complete in 2026, future reports will present findings about the average impact across all years and sites..." (p. 21) Detailed Analysis: Criterion G requires tracking participants until graduation. This document is an interim report focused on end-of-year outcomes for the 2023-24 school year and describes planned future reporting after enrollment and implementation ends. There is no graduation follow-up reported here, and because Criterion Y (year duration) is not met, Criterion G cannot be met under the ERCT dependency rule. A web search for follow-up publications by the same authors reporting graduation tracking for the PLI cohorts did not identify any papers with graduation outcomes. Final sentence: Criterion G is not met because the study does not track students to graduation (and Y is not met).
    • P

      Pre-Registered

      • An OSF link is provided, but the pre-registration timestamp could not be verified to precede the study start.
      • "The pre-analysis plan for this study is posted on Open Science Framework at the link below. https://osf.io/fkjmn/" (p. 66)
      • Relevant Quotes: 1) "The pre-analysis plan for this study is posted on Open Science Framework at the link below. https://osf.io/fkjmn/" (p. 66) Detailed Analysis: Criterion P requires that the study protocol (or analysis plan) is publicly registered before the study begins, and that this timing can be verified from the registry record (for example a registration date prior to data collection or randomization). The report provides an OSF link to a pre-analysis plan. However, the OSF project page and file metadata (creation / upload dates) could not be accessed reliably through the available web tools during this review, so the registration timestamp relative to study start could not be independently verified. Final sentence: Criterion P is not met because the existence of an OSF link is not sufficient without a verifiable pre-registration date that precedes the study start.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

  • Submit a Study for Evaluation

    Share your research with us for review

  • Suggest Improvements

    Provide feedback to help us make things better.

  • Update Your Study

    If you're the author, let us know about necessary updates or corrections.