Mapping the Mechanisms of Interdisciplinary Learning Transfer from Reading to Math Achievement: Evidence from a Large-Scale Randomized Controlled Trial

Joshua B. Gilbert and James S. Kim

Published:
ERCT Check Date:
DOI: 10.26300/nxqq-jc13
  • reading
  • mathematics
  • K12
  • US
  • parent involvement
  • EdTech app
1
  • C

    Criterion C is met because random assignment occurred at the school level, which is class-level or stronger.

    "Thirty schools were randomly assigned to treatment and control conditions." (p. 6)

  • E

    Criterion E is met because outcomes include standardized exam-based measures (EOG and MAP).

    "EOG and MAP are standardized tests administered by the state in G3 and G4 in both math and reading and report high internal consistencies (α ≈ .90)." (p. 9)

  • T

    Criterion T is met because outcomes were measured well beyond one term after the intervention began.

    "Immediate outcomes were assessed in the spring of G3 and long-term outcomes were assessed one year following program implementation in spring G4." (p. 6)

  • D

    Criterion D is met because the control condition and baseline comparability information are explicitly documented.

    "The MORE intervention was then implemented in treatment schools from G1 spring to Grade 2 (G2) spring while control schools received business as usual instruction." (p. 6)

  • S

    Criterion S is met because randomization occurred at the school level.

    "Thirty schools were randomly assigned to treatment and control conditions." (p. 6)

  • I

    Criterion I is not met because the paper does not provide explicit evidence that the evaluation was conducted independently from the intervention designers.

    "The design and theory of change of the MORE intervention has been previously described in various publications (Gilbert et al., 2023; J. S. Kim et al., 2023, 2024; Mosher & Kim, 2025)." (p. 5)

  • Y

    Criterion Y is met because outcomes were tracked across multiple school years from the start of implementation through spring of Grade 4.

    "Immediate outcomes were assessed in the spring of G3 and long-term outcomes were assessed one year following program implementation in spring G4." (p. 6)

  • B

    Criterion B is met because the additional resources (PD, lessons, books, and a digital app) are integral to the MORE intervention package being tested against business-as-usual.

    "In short, MORE emphasizes the development of schemas to build domain knowledge in science and social studies through the implementation of teacher professional development, lessons, read alouds, provision of books to the home, and a digital app for students to practice reading skills." (p. 5)

  • R

    Criterion R is not met because no independent replication by a different research team is identified or documented.

    "This study analyzes data from the Model of Reading Engagement (MORE), a sustained content literacy intervention implemented in Grades 1-3 that demonstrated positive treatment effects on both near transfer reading and far transfer math outcomes in a prior study." (p. 1)

  • A

    Criterion A is not met because the standardized exam outcomes reported are limited to reading and math, not all main subjects.

    "EOG and MAP are standardized tests administered by the state in G3 and G4 in both math and reading..." (p. 9)

  • G

    Criterion G is not met because follow-up is reported through spring of Grade 4, not through graduation.

    "Immediate outcomes were assessed in the spring of G3 and long-term outcomes were assessed one year following program implementation in spring G4." (p. 6)

  • P

    Criterion P is not met because no pre-registration ID/link and pre-intervention registration date are provided or verifiable from accessible sources.

Abstract

Far transfer—the application of learning across distant domains—remains elusive in intervention research, and even when it is found, its mechanisms remain unclear or unexplored. This study analyzes data from the Model of Reading Engagement (MORE), a sustained content literacy intervention implemented in Grades 1-3 that demonstrated positive treatment effects on both near transfer reading and far transfer math outcomes in a prior study. Here, we extend the original analysis to examine the potential mechanisms of the far transfer effects previously observed on math. Latent mediation analysis shows that approximately 50% of the treatment effect on Grade 4 math is explained by Grade 3 reading, leaving the remainder attributable to other factors. The indirect effects on math are driven by broad standardized reading measures rather than narrower content-specific reading comprehension or background knowledge, suggesting that interventions targeting broad, cross-disciplinary skills may be most effective for supporting far transfer. Results are robust to high levels of unobserved confounding, alternative mediators representing reading engagement and social-emotional learning, and alternative model specifications. We conclude with a discussion of how the appropriate methodological choices for assessing transfer depend on intervention characteristics and substantive research questions.

Full Article

ERCT Criteria Breakdown

  • Level 1 Criteria

    • C

      Class-level RCT

      • Criterion C is met because random assignment occurred at the school level, which is class-level or stronger.
      • "Thirty schools were randomly assigned to treatment and control conditions." (p. 6)
      • Relevant Quotes: 1) "Thirty schools were randomly assigned to treatment and control conditions." (p. 6) 2) "Because students were nested in 30 schools and the randomization was carried out at the school level, we apply cluster-robust standard errors at the school level, following the original study." (p. 12) Detailed Analysis: Criterion C requires that the study be an RCT with randomization at the class level (or stronger, e.g., school-level), to reduce within-class contamination. The paper explicitly states that "Thirty schools were randomly assigned," which is stronger than class-level assignment because entire schools (and thus their classrooms) were assigned to a condition. The second quote reiterates that randomization "was carried out at the school level," confirming the unit of randomization. Criterion C is met because the unit of randomization is the school, which is stronger than class-level randomization.
    • E

      Exam-based Assessment

      • Criterion E is met because outcomes include standardized exam-based measures (EOG and MAP).
      • "EOG and MAP are standardized tests administered by the state in G3 and G4 in both math and reading and report high internal consistencies (α ≈ .90)." (p. 9)
      • Relevant Quotes: 1) "Our analytic sample is comprised of 2073 students with G4 math outcome data in either the End of Grade (EOG) or Measure of Academic Progress (MAP) assessments." (p. 9) 2) "EOG and MAP are standardized tests administered by the state in G3 and G4 in both math and reading and report high internal consistencies (α ≈ .90)." (p. 9) Detailed Analysis: Criterion E requires the use of standardized, exam-based assessments, not only researcher-created tests tailored to the intervention. The paper identifies EOG and MAP assessments for the key academic outcomes and explicitly labels them "standardized tests" used in both reading and math. The paper also includes researcher-developed reading measures, but the presence of standardized EOG and MAP assessments satisfies the requirement for exam-based assessment. Criterion E is met because standardized exams (EOG and MAP) are used for academic outcomes.
    • T

      Term Duration

      • Criterion T is met because outcomes were measured well beyond one term after the intervention began.
      • "Immediate outcomes were assessed in the spring of G3 and long-term outcomes were assessed one year following program implementation in spring G4." (p. 6)
      • Relevant Quotes: 1) "The MORE intervention was then implemented in treatment schools from G1 spring to Grade 2 (G2) spring while control schools received business as usual instruction." (p. 6) 2) "Immediate outcomes were assessed in the spring of G3 and long-term outcomes were assessed one year following program implementation in spring G4." (p. 6) 3) "Perhaps most surprisingly, the results also showed positive far transfer effects on state standardized math tests persisting in G4, 14 months following the end of the intervention." (p. 6) Detailed Analysis: Criterion T requires that outcomes be measured at least one academic term after the intervention begins (roughly 3 to 4 months, depending on context). The intervention began in "G1 spring" and outcomes were measured in "spring of G3" and again in "spring G4," which is far beyond a single term after the start of the intervention. The paper also notes persistence of effects "14 months following the end of the intervention," further indicating substantial follow-up time. Criterion T is met because the study measures outcomes far more than one term after intervention start.
    • D

      Documented Control Group

      • Criterion D is met because the control condition and baseline comparability information are explicitly documented.
      • "The MORE intervention was then implemented in treatment schools from G1 spring to Grade 2 (G2) spring while control schools received business as usual instruction." (p. 6)
      • Relevant Quotes: 1) "The MORE intervention was then implemented in treatment schools from G1 spring to Grade 2 (G2) spring while control schools received business as usual instruction." (p. 6) 2) "Table 2 provides a balance test of the baseline test score and demographic variables by treatment condition for the full sample." (p. 14) 3) "There are no significant differences on the demographic variables." (p. 14) Detailed Analysis: Criterion D requires a documented control group, including what the control group received and sufficient baseline information to support meaningful comparisons. The paper clearly states the control condition as "business as usual instruction," which documents what the control schools received during the intervention window. The paper also reports baseline comparability via a balance test ("Table 2") and explicitly summarizes demographic balance in the text, indicating the control and treatment groups are documented and compared at baseline. Criterion D is met because the control condition and baseline balance information are explicitly documented.
  • Level 2 Criteria

    • S

      School-level RCT

      • Criterion S is met because randomization occurred at the school level.
      • "Thirty schools were randomly assigned to treatment and control conditions." (p. 6)
      • Relevant Quotes: 1) "Thirty schools were randomly assigned to treatment and control conditions." (p. 6) 2) "Because students were nested in 30 schools and the randomization was carried out at the school level, we apply cluster-robust standard errors at the school level, following the original study." (p. 12) Detailed Analysis: Criterion S requires that randomization occur among schools (i.e., that the RCT is school-level), not only among classes or students within schools. The paper explicitly indicates that "Thirty schools were randomly assigned," and it reiterates that randomization was "carried out at the school level." Criterion S is met because the unit of random assignment is the school.
    • I

      Independent Conduct

      • Criterion I is not met because the paper does not provide explicit evidence that the evaluation was conducted independently from the intervention designers.
      • "The design and theory of change of the MORE intervention has been previously described in various publications (Gilbert et al., 2023; J. S. Kim et al., 2023, 2024; Mosher & Kim, 2025)." (p. 5)
      • Relevant Quotes: 1) "The design and theory of change of the MORE intervention has been previously described in various publications (Gilbert et al., 2023; J. S. Kim et al., 2023, 2024; Mosher & Kim, 2025)." (p. 5) 2) "To explore the mechanisms of far transfer from reading to math achievement, we use structural equation modeling (SEM), implemented with lavaan software in R (Rosseel, 2012)." (p. 11) Detailed Analysis: Criterion I requires quoted evidence that the study was conducted independently from the designers/providers of the intervention, such as an external evaluation team, or a clear disclosure that the provider did not participate in data collection and analysis. The paper positions MORE as an intervention whose "design and theory of change" are described in prior publications that include the present authors' names (Gilbert; Kim), suggesting close involvement with the intervention's development and research program. The paper describes the analytic methods used in this paper, but it does not provide a clear statement that an independent third party implemented the intervention and/or independently conducted data collection and analysis. Criterion I is not met because independence from the intervention designers is not explicitly documented in the paper.
    • Y

      Year Duration

      • Criterion Y is met because outcomes were tracked across multiple school years from the start of implementation through spring of Grade 4.
      • "Immediate outcomes were assessed in the spring of G3 and long-term outcomes were assessed one year following program implementation in spring G4." (p. 6)
      • Relevant Quotes: 1) "The MORE intervention was then implemented in treatment schools from G1 spring to Grade 2 (G2) spring while control schools received business as usual instruction." (p. 6) 2) "The treatment-control contrast was therefore a 3-year “full spiral” (G1, G2, G3) of MORE for treatment students compared to a 1-year “partial spiral” (G3 only) for control students." (p. 6) 3) "Immediate outcomes were assessed in the spring of G3 and long-term outcomes were assessed one year following program implementation in spring G4." (p. 6) Detailed Analysis: Criterion Y requires that outcomes be measured at least 75% of an academic year after the intervention begins. The paper describes a longitudinal design with implementation starting in Grade 1 (G1) and outcomes assessed in spring of Grade 3 and again in spring of Grade 4, spanning multiple school years from the start of implementation to the final follow-up. The "3-year 'full spiral' (G1, G2, G3)" description further indicates the study duration is well beyond the year-length threshold. Criterion Y is met because follow-up timing spans multiple school years, exceeding 75% of an academic year.
    • B

      Balanced Control Group

      • Criterion B is met because the additional resources (PD, lessons, books, and a digital app) are integral to the MORE intervention package being tested against business-as-usual.
      • "In short, MORE emphasizes the development of schemas to build domain knowledge in science and social studies through the implementation of teacher professional development, lessons, read alouds, provision of books to the home, and a digital app for students to practice reading skills." (p. 5)
      • Relevant Quotes: 1) "In short, MORE emphasizes the development of schemas to build domain knowledge in science and social studies through the implementation of teacher professional development, lessons, read alouds, provision of books to the home, and a digital app for students to practice reading skills." (p. 5) 2) "The MORE intervention was then implemented in treatment schools from G1 spring to Grade 2 (G2) spring while control schools received business as usual instruction." (p. 6) 3) "As a response to the COVID-19 pandemic in 2020, MORE lessons and materials were provided to all Grade 3 (G3) students in both conditions during online schooling (Relyea et al., 2025)." (p. 6) 4) "The treatment-control contrast was therefore a 3-year “full spiral” (G1, G2, G3) of MORE for treatment students compared to a 1-year “partial spiral” (G3 only) for control students." (p. 6) Detailed Analysis: Criterion B asks whether time and resources are balanced between treatment and control, unless the additional resources are the explicit, integral treatment being tested. The paper defines MORE as a multi-component intervention package that includes teacher professional development, structured lessons, read-alouds, home book provision, and a digital app. These components imply additional materials and supports relative to typical instruction, but they are described as core elements of the intervention itself (i.e., integral to what "MORE" is). The control condition is described as "business as usual instruction," which is a valid comparison when the intended estimand is the effect of adopting the MORE package versus typical practice. The COVID-period note indicates that, in Grade 3, MORE materials were provided to both conditions, making the key contrast one of cumulative exposure ("full spiral" vs "partial spiral"), not a one-time resource shock. Criterion B is met because the extra resources are integral to the intervention package being evaluated against business-as-usual.
  • Level 3 Criteria

    • R

      Reproduced

      • Criterion R is not met because no independent replication by a different research team is identified or documented.
      • "This study analyzes data from the Model of Reading Engagement (MORE), a sustained content literacy intervention implemented in Grades 1-3 that demonstrated positive treatment effects on both near transfer reading and far transfer math outcomes in a prior study." (p. 1)
      • Relevant Quotes: 1) "This study analyzes data from the Model of Reading Engagement (MORE), a sustained content literacy intervention implemented in Grades 1-3 that demonstrated positive treatment effects on both near transfer reading and far transfer math outcomes in a prior study." (p. 1) 2) "We analyze data testing the efficacy of a sustained content literacy intervention called the Model of Reading Engagement (MORE) through a longitudinal randomized controlled trial (RCT) (J. S. Kim et al., 2024)." (p. 5) Detailed Analysis: Criterion R requires evidence that the study (or the central experimental claim) has been independently reproduced by other researchers (i.e., a different author team) in a different context and published in a peer-reviewed outlet. The paper references a "prior study" and identifies the original RCT results as coming from "J. S. Kim et al., 2024," but this is not an independent replication; it is the same research program and is referenced as the original evidence base that this paper extends. An internet search was conducted for independent replications of the MORE intervention; no clearly independent, peer-reviewed replication study by a different research team was identified that reproduces this specific 30-school cluster RCT. Criterion R is not met because independent replication is not documented or identified.
    • A

      All-subject Exams

      • Criterion A is not met because the standardized exam outcomes reported are limited to reading and math, not all main subjects.
      • "EOG and MAP are standardized tests administered by the state in G3 and G4 in both math and reading..." (p. 9)
      • Relevant Quotes: 1) "EOG and MAP are standardized tests administered by the state in G3 and G4 in both math and reading and report high internal consistencies (α ≈ .90)." (p. 9) 2) "Perhaps most surprisingly, the results also showed positive far transfer effects on state standardized math tests persisting in G4, 14 months following the end of the intervention." (p. 6) Detailed Analysis: Criterion A requires standardized exam-based assessment across all main subjects taught at the relevant educational level, not only the focal subject(s), to detect cross-subject tradeoffs. The paper explicitly describes standardized testing coverage for "both math and reading" and frames the key near- and far-transfer outcomes around reading and math. The paper does not document standardized exam outcomes for additional core subjects (such as science or social studies) as primary outcomes in the study, nor does it provide an explicit rationale for treating only reading and math as "all main subjects." Because Criterion A is about breadth of subject coverage (and not merely whether multiple subjects were tested), the available quotes support that the study's standardized exam outcomes are limited to reading and math. Criterion A is not met because standardized exam outcomes are not reported across all main subjects.
    • G

      Graduation Tracking

      • Criterion G is not met because follow-up is reported through spring of Grade 4, not through graduation.
      • "Immediate outcomes were assessed in the spring of G3 and long-term outcomes were assessed one year following program implementation in spring G4." (p. 6)
      • Relevant Quotes: 1) "Immediate outcomes were assessed in the spring of G3 and long-term outcomes were assessed one year following program implementation in spring G4." (p. 6) 2) "Perhaps most surprisingly, the results also showed positive far transfer effects on state standardized math tests persisting in G4, 14 months following the end of the intervention." (p. 6) Detailed Analysis: Criterion G requires tracking participants until graduation from the relevant educational stage (e.g., end of elementary, secondary, or high school, as applicable). The paper documents follow-up through spring of Grade 4, including a follow-up described as "14 months following the end of the intervention." This is meaningful longer-term follow-up, but it is not tracking to graduation. An internet search for follow-up publications by the same authors that track this cohort to graduation did not identify any such publication providing graduation outcomes for this cohort. Criterion G is not met because the study's reported follow-up ends in Grade 4 and does not track students to graduation.
    • P

      Pre-Registered

      • Criterion P is not met because no pre-registration ID/link and pre-intervention registration date are provided or verifiable from accessible sources.
      • Relevant Quotes: 1) (No pre-registration registry ID, link, or registration date is provided in the paper.) Detailed Analysis: Criterion P requires quoted evidence that the protocol was pre-registered before data collection or intervention began, including a registry identifier (or link) and a registration date that can be compared to study start. The PDF contains no mention of pre-registration, no registry ID, and no registration date. An attempt was made to verify a possible AEA RCT Registry entry via publicly linked sources; however, the registry pages were not accessible for verification in this check environment, and no alternative accessible official registry record with dates was found. Criterion P is not met because pre-registration cannot be verified from the paper or accessible registry records.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

  • Submit a Study for Evaluation

    Share your research with us for review

  • Suggest Improvements

    Provide feedback to help us make things better.

  • Update Your Study

    If you're the author, let us know about necessary updates or corrections.