Efficacy of Zearn Math over two years in grades 3 to 5: An experiment in Texas

John F. Pane, Christopher Doss, Ivy Todd, Dorothy Seaman

Published:
ERCT Check Date:
DOI: 10.26300/e3bq-7g59
  • mathematics
  • K12
  • US
  • gamification
  • blended learning
  • EdTech platform
  • digital assessment
  • formative assessment
2
  • C

    The study is randomized at the school level, which satisfies class-level randomization.

    "We randomly selected 64 of those schools to comprise the study sample. We used blocked randomization, stratifying by schoolwide percentages of students who were economically disadvantaged, English-learners, and proficient in mathematics in the 2020-2021 academic year." (p. 14)

  • E

    Outcomes were measured using Texas STAAR, a standardized state assessment.

    "The STAAR assessment, as the state accountability test, is designed to measure student proficiency on Texas grade-level mathematics standards starting in grade 3." (p. 14)

  • T

    The study ran across two school years, exceeding the minimum one-term requirement.

    "Sixty-four schools within one urban school district in Texas participated in this study over the 2022-2023 and 2023-2024 school years." (p. 14)

  • D

    The control condition is clearly described and supported by detailed baseline and implementation documentation.

    "Control group schools were asked not to use Zearn Math and to continue with business as usual for the two years." (p. 14)

  • S

    Random assignment was implemented at the school level (64 schools).

    "Within each block, half of the schools were assigned to the treatment group and half to the control group, resulting in 32 schools assigned to each condition." (p. 14)

  • I

    RAND conducted the evaluation independently of Zearn, supported by IES funding.

    "The RAND team then received a grant from the Institute of Education Sciences, U.S. Department of Education, to conduct the study independently of Zearn." (p. 2)

  • Y

    Outcomes were tracked over two full academic years (2022-2023 and 2023-2024).

    "Sixty-four schools within one urban school district in Texas participated in this study over the 2022-2023 and 2023-2024 school years." (p. 14)

  • B

    Although Zearn provided additional implementation supports, teachers reported similar total math instructional time across groups and the supports appear integral to the tested intervention package.

    "In both years, teachers in both groups reported spending similar amounts of time on math instruction, including time spent on math-related supplemental technology products outside of the regular math period." (p. 31)

  • R

    An independent, peer-reviewed randomized study by a different author compared Zearn Math to another program, serving as an external replication effort on the intervention.

    "To ensure that students benefit from high-quality learning experiences, the current conceptual replication of Wang and Woodworth, a randomized control trial, evaluated the relative impacts of two computer programs used in a school district as supplements to students' regular education math instruction, DreamBox Learning and Zearn Math." (Foster, Abstract)

  • A

    The paper reports standardized outcomes only for mathematics, not for all core subjects.

    "RQ1. (Confirmatory) What is Zearn Math's cumulative effect over two academic years on student achievement on grade-level mathematics content, as measured by the Texas STAAR assessment?" (p. 13)

  • G

    The study reports outcomes through the end of the second study year and does not track students to graduation.

    "Yitsd represents student's Texas STAAR score measured at the end of study year 2 of student i, with teacher t, in school s, in random assignment block d." (p. 20)

  • P

    The paper states a REES registry ID, but the public registry entry date could not be verified to be before study start.

    "To increase transparency and credibility of this study and meet funder requirements, we clearly stated our primary research questions by designating the first two as confirmatory and preregistering them along with our analysis plan at the Registry of Efficacy and Effectiveness Studies (REES, undated), under Registry ID: 17280.1v1." (p. 13)

Abstract

Zearn Math is a popular software platform for K-8 mathematics learning, designed to enable all students to successfully access grade-level content. RAND researchers collaborated with Zearn, the product's developer, to design this evaluation. Then RAND conducted the study independently, randomly assigning 64 schools in an urban Texas district to either supplement classroom instruction with Zearn Math in grades 3-5 for two years - or to continue with business-as-usual, which included various other supplemental technology products. High proportions of economically disadvantaged, Hispanic, English-learner, and below-proficient students made up the primary sample of 10,000+ students. The study preregistered two confirmatory research questions about Zearn Math's effects on Texas STAAR math assessment scores, for all students and students below proficient at baseline. Those results were positive but not statistically significant; equivalent to raising a control group student from the median to the 53rd or 54th percentile. Although this study did not yield confirmatory evidence that Zearn Math improves student learning, consistent positive signals across all estimated confirmatory and exploratory effects, including on the MAP adaptive mathematics assessment, suggest it holds promise to do so.

Full Article

ERCT Criteria Breakdown

  • Level 1 Criteria

    • C

      Class-level RCT

      • The study is randomized at the school level, which satisfies class-level randomization.
      • "We randomly selected 64 of those schools to comprise the study sample. We used blocked randomization, stratifying by schoolwide percentages of students who were economically disadvantaged, English-learners, and proficient in mathematics in the 2020-2021 academic year." (p. 14)
      • Relevant Quotes: 1) "We randomly selected 64 of those schools to comprise the study sample. We used blocked randomization, stratifying by schoolwide percentages of students who were economically disadvantaged, English-learners, and proficient in mathematics in the 2020-2021 academic year." (p. 14) 2) "Within each block, half of the schools were assigned to the treatment group and half to the control group, resulting in 32 schools assigned to each condition." (p. 14) Detailed Analysis: Criterion C requires randomization at the class level or stronger. The paper explicitly describes blocked randomization at the school level (64 schools, 32 per condition). Because school-level randomization is stronger than class-level randomization, it satisfies Criterion C by definition under the ERCT standard. Criterion C is met because randomization occurred at the school level.
    • E

      Exam-based Assessment

      • Outcomes were measured using Texas STAAR, a standardized state assessment.
      • "The STAAR assessment, as the state accountability test, is designed to measure student proficiency on Texas grade-level mathematics standards starting in grade 3." (p. 14)
      • Relevant Quotes: 1) "The STAAR assessment, as the state accountability test, is designed to measure student proficiency on Texas grade-level mathematics standards starting in grade 3." (p. 14) 2) "In collaboration with Zearn we chose the STAAR as the outcome for confirmatory RQs 1 & 2, because it has high relevance for administrators, educators and families, and its focus on grade-level content might be well-suited to capturing effects of Zearn Math's grade-level approach to supporting all students to succeed at learning grade-level content." (p. 14) Detailed Analysis: Criterion E requires a standardized exam-based assessment rather than a bespoke test created for the study. The paper identifies Texas STAAR as the "state accountability test" used for the confirmatory outcomes, which is a standardized assessment aligned to Texas grade-level standards. Criterion E is met because the primary outcome uses the standardized STAAR exam.
    • T

      Term Duration

      • The study ran across two school years, exceeding the minimum one-term requirement.
      • "Sixty-four schools within one urban school district in Texas participated in this study over the 2022-2023 and 2023-2024 school years." (p. 14)
      • Relevant Quotes: 1) "Sixty-four schools within one urban school district in Texas participated in this study over the 2022-2023 and 2023-2024 school years." (p. 14) 2) "Treatment group schools gained access to Zearn Math, received training for building leaders, instructional leaders, and teachers, and were asked to integrate Zearn Math into their instructional practice for the two-year study period." (p. 14) Detailed Analysis: Criterion T requires outcomes to be measured at least one full academic term after the intervention begins. The paper states the study spanned two school years (2022-2023 and 2023-2024) and describes a two-year study period. Two academic years necessarily exceed one term. Criterion T is met because the tracking period spans two school years.
    • D

      Documented Control Group

      • The control condition is clearly described and supported by detailed baseline and implementation documentation.
      • "Control group schools were asked not to use Zearn Math and to continue with business as usual for the two years." (p. 14)
      • Relevant Quotes: 1) "Control group schools were asked not to use Zearn Math and to continue with business as usual for the two years." (p. 14) 2) "Table 1: Descriptive statistics and group balance for primary sample" (p. 43) 3) "The most common products selected by control group teachers changed from Imagine Math, Kahoot, and Go Math! in year 1 to ST Math, i-Ready, and IXL in year 2." (p. 32) Detailed Analysis: Criterion D requires a well-documented control group, including what the control group received and evidence that the groups are comparable at baseline. The paper explicitly defines the control condition as "business as usual" and states that control schools were asked not to use Zearn. It also provides a baseline balance table (Table 1) with sample sizes and demographic and baseline achievement characteristics. Finally, the paper documents what supplemental products control teachers used, by listing common products across years. Criterion D is met because the control condition, baseline composition, and typical control-group resources are documented.
  • Level 2 Criteria

    • S

      School-level RCT

      • Random assignment was implemented at the school level (64 schools).
      • "Within each block, half of the schools were assigned to the treatment group and half to the control group, resulting in 32 schools assigned to each condition." (p. 14)
      • Relevant Quotes: 1) "We randomly selected 64 of those schools to comprise the study sample. We used blocked randomization, stratifying by schoolwide percentages of students who were economically disadvantaged, English-learners, and proficient in mathematics in the 2020-2021 academic year." (p. 14) 2) "Within each block, half of the schools were assigned to the treatment group and half to the control group, resulting in 32 schools assigned to each condition." (p. 14) Detailed Analysis: Criterion S requires school-level randomization. The paper describes blocked randomization of schools, with 64 schools selected and allocated to treatment and control conditions (32 per condition). This directly satisfies the school-level RCT requirement. Criterion S is met because the unit of randomization is the school.
    • I

      Independent Conduct

      • RAND conducted the evaluation independently of Zearn, supported by IES funding.
      • "The RAND team then received a grant from the Institute of Education Sciences, U.S. Department of Education, to conduct the study independently of Zearn." (p. 2)
      • Relevant Quotes: 1) "RAND researchers collaborated with Zearn, the product's developer, to design a randomized experiment to evaluate Zearn Math's effects over two years on student mathematics achievement in grades 3 to 5." (p. 2) 2) "The RAND team then received a grant from the Institute of Education Sciences, U.S. Department of Education, to conduct the study independently of Zearn." (p. 2) Detailed Analysis: Criterion I requires that the study be conducted independently from the intervention's developer to reduce bias. The paper acknowledges collaboration with Zearn on design, then explicitly states that RAND received an IES grant "to conduct the study independently of Zearn." This is strong, direct evidence of independent conduct as defined by the ERCT standard. Criterion I is met because the paper explicitly states independent conduct by RAND.
    • Y

      Year Duration

      • Outcomes were tracked over two full academic years (2022-2023 and 2023-2024).
      • "Sixty-four schools within one urban school district in Texas participated in this study over the 2022-2023 and 2023-2024 school years." (p. 14)
      • Relevant Quotes: 1) "Sixty-four schools within one urban school district in Texas participated in this study over the 2022-2023 and 2023-2024 school years." (p. 14) 2) "Treatment group schools gained access to Zearn Math, received training for building leaders, instructional leaders, and teachers, and were asked to integrate Zearn Math into their instructional practice for the two-year study period." (p. 14) Detailed Analysis: Criterion Y requires outcomes to be measured at least one academic year after the intervention begins. The paper states the study covered two school years and explicitly frames the implementation as a two-year study period. This exceeds the one-year minimum. Criterion Y is met because the study spans two academic years.
    • B

      Balanced Control Group

      • Although Zearn provided additional implementation supports, teachers reported similar total math instructional time across groups and the supports appear integral to the tested intervention package.
      • "In both years, teachers in both groups reported spending similar amounts of time on math instruction, including time spent on math-related supplemental technology products outside of the regular math period." (p. 31)
      • Relevant Quotes: 1) "Treatment group schools gained access to Zearn Math, received training for building leaders, instructional leaders, and teachers, and were asked to integrate Zearn Math into their instructional practice for the two-year study period." (p. 14) 2) "Zearn provided implementation support to the treatment group schools throughout the study period." (p. 9) 3) "Zearn also provided implementation coaching to each treatment group school, which typically comprised bi-weekly calls between a Zearn coach and the school's implementation point-of-contact to review school usage data, highlight areas of success, and discuss any challenges hindering usage." (p. 9) 4) "During the spring of each study year, Zearn facilitated challenges to motivate on-or-above-grade-level lesson completion. These challenges offered prizes to students, classrooms, and schools based on lesson completion." (p. 9) 5) "In both years, teachers in both groups reported spending similar amounts of time on math instruction, including time spent on math-related supplemental technology products outside of the regular math period." (p. 31) Detailed Analysis: Criterion B asks whether time and resources are balanced between treatment and control unless additional resources are explicitly part of what is being tested. Here, the treatment condition includes access to Zearn plus explicit implementation supports (training, coaching, and incentive challenges with prizes). These are additional resources relative to the control condition. However, the paper frames these supports as part of the delivered intervention package (implementation support and coaching are described as what Zearn "provided" to treatment schools). In addition, the study reports that teachers in both groups spent similar total time on math instruction, including time on supplemental technology, which reduces the risk that the estimated effects are simply driven by more total math time in treatment. Under the ERCT decision rule, this is best categorized as "resources are integral to the design" and not merely optional add-ons. The study is therefore estimating the effect of adopting Zearn Math as implemented, including its implementation supports, relative to business as usual. Criterion B is met because total instructional time is reported as similar across groups and the added supports appear integral to the tested intervention package.
  • Level 3 Criteria

    • R

      Reproduced

      • An independent, peer-reviewed randomized study by a different author compared Zearn Math to another program, serving as an external replication effort on the intervention.
      • "To ensure that students benefit from high-quality learning experiences, the current conceptual replication of Wang and Woodworth, a randomized control trial, evaluated the relative impacts of two computer programs used in a school district as supplements to students' regular education math instruction, DreamBox Learning and Zearn Math." (Foster, Abstract)
      • Relevant Quotes: 1) "To ensure that students benefit from high-quality learning experiences, the current conceptual replication of Wang and Woodworth, a randomized control trial, evaluated the relative impacts of two computer programs used in a school district as supplements to students' regular education math instruction, DreamBox Learning and Zearn Math." (Foster, Abstract) Detailed Analysis: Criterion R requires an independent replication published in a peer- reviewed journal by a different research team. Foster (Journal of Research on Educational Effectiveness, volume 17) is authored by Matthew E. Foster (not the RAND author team) and describes a randomized trial involving Zearn Math in a different setting and sample. While it is not a replication of this Texas district study design, it is an independent randomized evaluation involving the same intervention (Zearn Math), which constitutes relevant reproduction evidence for the intervention. Criterion R is met because an independent peer-reviewed randomized study evaluated Zearn Math in another context.
    • A

      All-subject Exams

      • The paper reports standardized outcomes only for mathematics, not for all core subjects.
      • "RQ1. (Confirmatory) What is Zearn Math's cumulative effect over two academic years on student achievement on grade-level mathematics content, as measured by the Texas STAAR assessment?" (p. 13)
      • Relevant Quotes: 1) "RQ1. (Confirmatory) What is Zearn Math's cumulative effect over two academic years on student achievement on grade-level mathematics content, as measured by the Texas STAAR assessment?" (p. 13) 2) "RQ4. (Exploratory) What are the effects of Zearn Math on mathematics achievement growth, as measured by the MAP adaptive assessment, overall and for the above-listed subgroups?" (p. 13) Detailed Analysis: Criterion A requires standardized exam-based outcomes across all main subjects to detect spillovers (for example, gains in math at the expense of reading). The research questions and outcomes in this paper are exclusively mathematics-focused (STAAR math and MAP math). There is no evidence of standardized outcomes reported for reading, language arts, science, or social studies. Criterion A is not met because only mathematics outcomes are assessed.
    • G

      Graduation Tracking

      • The study reports outcomes through the end of the second study year and does not track students to graduation.
      • "Yitsd represents student's Texas STAAR score measured at the end of study year 2 of student i, with teacher t, in school s, in random assignment block d." (p. 20)
      • Relevant Quotes: 1) "Sixty-four schools within one urban school district in Texas participated in this study over the 2022-2023 and 2023-2024 school years." (p. 14) 2) "Yitsd represents student's Texas STAAR score measured at the end of study year 2 of student i, with teacher t, in school s, in random assignment block d." (p. 20) Detailed Analysis: Criterion G requires tracking participants until graduation, and it also depends on Criterion Y (which is met here). The paper describes a two-year study window and defines the primary outcome as measured at the end of study year 2. It does not describe follow-up beyond year 2, nor does it report graduation outcomes for the cohort. A web search for follow-up publications by the same author team that track this cohort to a graduation endpoint did not identify any such papers. Criterion G is not met because tracking ends at the end of study year 2 with no graduation follow-up reported.
    • P

      Pre-Registered

      • The paper states a REES registry ID, but the public registry entry date could not be verified to be before study start.
      • "To increase transparency and credibility of this study and meet funder requirements, we clearly stated our primary research questions by designating the first two as confirmatory and preregistering them along with our analysis plan at the Registry of Efficacy and Effectiveness Studies (REES, undated), under Registry ID: 17280.1v1." (p. 13)
      • Relevant Quotes: 1) "To increase transparency and credibility of this study and meet funder requirements, we clearly stated our primary research questions by designating the first two as confirmatory and preregistering them along with our analysis plan at the Registry of Efficacy and Effectiveness Studies (REES, undated), under Registry ID: 17280.1v1." (p. 13) 2) "To make REES even more secure, you will need to sign in using your ICPSR Researcher Passport." (REES website) 3) "The sample was determined by enrollment on the date of the state's official enrollment snapshot at the beginning of study year 1 (October 28, 2022)." (p. 15) Detailed Analysis: Criterion P requires that the study protocol be preregistered before the study begins, and that this timing be verifiable. The paper asserts that confirmatory questions and the analysis plan were preregistered at REES (Registry ID: 17280.1v1). However, to validate Criterion P as specified, we must verify the registry entry date precedes the start of the study. The REES site indicates that viewing registry records may require signing in, and the specific public registry record for 17280.1v1 with a visible registration date could not be located and verified as publicly accessible. Without a verifiable preregistration date that is prior to the study start (with the sample defined at the beginning of study year 1 on October 28, 2022), the criterion cannot be confirmed under the ERCT rules. Criterion P is not met because the preregistration timing could not be independently verified to precede study start.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

  • Submit a Study for Evaluation

    Share your research with us for review

  • Suggest Improvements

    Provide feedback to help us make things better.

  • Update Your Study

    If you're the author, let us know about necessary updates or corrections.