Abstract
The use of self-led educational technologies holds significant potential for improving student learning at scale, but sustaining student engagement with these platforms remains a challenge. We present results from an experimental evaluation implemented following the scale-up of a math platform in Peru, where primary school teachers received weekly WhatsApp messages summarizing their students’ platform activity and encouraging them to promote engagement. The messages increased the average weekly share of students using the platform by 5 percentage points (a 17% increase) and the average share of math exercises completed by 4 percentage points (a 16% increase). Effects dissipated once the messages stopped, suggesting that salience and simplified monitoring are likely mechanisms. We find little evidence of impact heterogeneity based on teacher characteristics or students’ prior platform use and achievement. Non-experimental evidence suggests that increased use of the student math platform improved math learning. Overall, our findings indicate that light-touch communication with teachers can cost-effectively strengthen engagement with EdTech platforms scaled through the education system.
Full
Article
ERCT Criteria Breakdown
-
Level 1 Criteria
-
C
Class-level RCT
- The unit of randomization is the teacher (classroom), meeting the class-level RCT requirement.
- Teachers were randomly assigned to either a treatment group, which received weekly WhatsApp messages about their students’ platform use, or to a control group.
Relevant Quotes:
1) "To evaluate whether this type of teacher-facing messaging can improve student engagement with complementary learning tools, we conducted a randomized controlled trial with 853 teachers of grades 4 through 6 in Peru." (p. 2)
2) "Teachers were randomly assigned to either a treatment group, which received weekly WhatsApp messages about their students’ platform use, or to a control group." (p. 2)
3) "We report heteroskedasticity-robust standard errors clustered at the teacher level, the unit of randomization." (p. 5)
Detailed Analysis:
The ERCT C criterion requires random assignment at the classroom level or higher (or the tutoring exception). Here, the paper states that teachers were randomized. In primary schooling, a teacher corresponds to a classroom, so treatment assignment is effectively at the class level, not within-class student assignment. The analysis also treats the teacher as the unit of randomization (clustered standard errors), which is consistent with a class-level RCT.
Final sentence: Criterion C is met because teachers (classrooms) were randomly assigned to treatment and control.
-
E
Exam-based Assessment
- The RCT’s primary outcomes are platform usage logs, not standardized exam scores; any test-score analysis is non-experimental.
- The experimental variation was not designed to allow measurement of downstream impacts on test scores.
Relevant Quotes:
1) "The experimental variation was not designed to allow measurement of downstream impacts on test scores." (p. 2)
2) "We construct three primary outcomes." (p. 4)
3) "However, we can use non-experimental variation to examine learning outcomes." (p. 2)
Detailed Analysis:
Criterion E requires that the RCT measures outcomes with standardized exam-based assessments. The paper explicitly states that the experiment was not designed to measure impacts on test scores. The experimental outcomes are administrative usage metrics from the platform (connection and exercises). The paper does report an "endline math test" for a separate, non-experimental topic-exposure analysis, but this is not the experimental outcome and is not described as a standardized national exam. Therefore, the study does not satisfy the ERCT requirement for exam-based assessment within the RCT.
Final sentence: Criterion E is not met because the RCT outcomes are platform-use metrics rather than standardized exams.
-
T
Term Duration
- Outcomes are tracked from the May 30 start through November 2022, exceeding one academic term.
- We use comprehensive administrative data recorded by the math learning platform, spanning from March to November 2022.
Relevant Quotes:
1) "The 8-week intervention began the week of May 30 and concluded in mid-July 2022." (p. 4)
2) "We use comprehensive administrative data recorded by the math learning platform, spanning from March to November 2022." (p. 4)
3) "Panel C: Delayed post-intervention (weeks 19–27)" (Table 2, p. 6)
Detailed Analysis:
Criterion T requires outcome measurement at least one full academic term (roughly 3–4 months) after the intervention begins. The intervention started the week of May 30. The dataset spans through November 2022, and the main experimental results are reported through a delayed post-intervention window of weeks 19–27. From late May to late November is about six months, which exceeds a term.
Final sentence: Criterion T is met because the follow-up period from the start of the intervention to the latest outcomes is about six months.
-
D
Documented Control Group
- The control group is clearly defined (no messages) and baseline characteristics for treatment and control are documented.
- The control group received no messages.
Relevant Quotes:
1) "The control group received no messages." (p. 4)
2) "Table 1 presents summary statistics on pre-treatment school and teacher characteristics for both the treatment and control groups, along with balance tests." (p. 5)
Detailed Analysis:
Criterion D requires that the control condition and its baseline characteristics are documented well enough to assess comparability. The paper explicitly states the control condition (no WhatsApp messages). It also provides baseline balance and summary statistics for treatment and control, including school and teacher characteristics and pre-treatment platform-use measures (Table 1).
Final sentence: Criterion D is met because the control condition is clearly defined and baseline balance is documented.
-
Level 2 Criteria
-
S
School-level RCT
- Randomization was performed at the teacher level, not at the school level.
- Among the 853 teachers in the final sample, 600 were randomly assigned to the treatment group and 253 to the control group.
Relevant Quotes:
1) "Teachers were randomly assigned to either a treatment group, which received weekly WhatsApp messages about their students’ platform use, or to a control group." (p. 2)
2) "Among the 853 teachers in the final sample, 600 were randomly assigned to the treatment group and 253 to the control group." (p. 4)
3) "We report heteroskedasticity-robust standard errors clustered at the teacher level, the unit of randomization." (p. 5)
Detailed Analysis:
Criterion S requires school-level randomization. The paper repeatedly identifies the teacher as the unit of randomization and clusters standard errors at the teacher level. There is no statement that entire schools were assigned to treatment or control. Therefore, this is not a school-level RCT.
Final sentence: Criterion S is not met because randomization occurred at the teacher (classroom) level rather than the school level.
-
I
Independent Conduct
- The lead author holds IP rights to the software being promoted, so the study is not independent of the intervention’s owner/designer.
- Roberto Araya is a researcher at Universidad de Chile and the owner of Automind, the company that holds the intellectual property rights of the Conecta Ideas software.
Relevant Quotes:
1) "Roberto Araya is a researcher at Universidad de Chile and the owner of Automind, the company that holds the intellectual property rights of the Conecta Ideas software." (p. 11)
2) "The implementation of the Conecta Ideas Peru program was carried out by the research center GRADE." (p. 11)
Detailed Analysis:
Criterion I requires independent conduct, meaning the evaluation should be carried out independently from the intervention designers/owners to reduce conflicts of interest. The competing-interests statement says the first author owns the company holding the platform’s intellectual property. Even if implementation in Peru was carried out by GRADE, the direct ownership stake of a co-author in the platform being promoted means the evaluation is not independent under this criterion.
Final sentence: Criterion I is not met because a co-author owns the IP for the platform central to the study.
-
Y
Year Duration
- The study tracks outcomes for less than a full academic year (March–November), falling short of year-duration tracking.
- We use comprehensive administrative data recorded by the math learning platform, spanning from March to November 2022.
Relevant Quotes:
1) "The school year runs from March to December." (p. 3)
2) "We use comprehensive administrative data recorded by the math learning platform, spanning from March to November 2022." (p. 4)
3) "The 8-week intervention began the week of May 30 and concluded in mid-July 2022." (p. 4)
Detailed Analysis:
Criterion Y requires outcome measurement for at least one full academic year after the intervention begins. The paper describes the Peruvian school year as March to December. The available platform data used in the study span March through November, and the intervention begins at the end of May. This does not cover a full academic year of follow-up from the intervention start, nor does it cover the complete March to December school year in outcomes.
Final sentence: Criterion Y is not met because outcomes are tracked for only part of the school year (up to November) rather than a full academic year.
-
B
Balanced Control Group
- The only additional resource is the WhatsApp messaging itself, which is the treatment being tested; the control group is business-as-usual by design.
- This study estimates the impact of sending weekly WhatsApp messages to teachers with information about their students’ use of the learning platform.
Relevant Quotes:
1) "This study estimates the impact of sending weekly WhatsApp messages to teachers with information about their students’ use of the learning platform." (p. 4)
2) "The control group received no messages." (p. 4)
Detailed Analysis:
Criterion B checks whether extra time/budget/resources given to the intervention group are balanced in the control group, unless those extra resources are explicitly the treatment variable. Here, the intervention is the provision of weekly WhatsApp messages to teachers. The control condition is explicitly "no messages." The messages are not an auxiliary add-on to a different educational intervention; they are the intervention being tested. Under the ERCT decision tree, this corresponds to the case where extra resources are present and are integral to the design (the resource is the treatment), so an unaugmented control group is acceptable.
Final sentence: Criterion B is met because the additional resource (messages) is the treatment variable being tested.
-
Level 3 Criteria
-
R
Reproduced
- No peer-reviewed independent replication of this WhatsApp-to-teachers messaging RCT was found in a web search as of the ERCT check date.
- An earlier experimental evaluation in Chile found platform use led to a 0.27 standard deviation (s.d.) increase in test scores (Araya et al., 2025).
Relevant Quotes:
1) "An earlier experimental evaluation in Chile found platform use led to a 0.27 standard deviation (s.d.) increase in test scores (Araya et al., 2025)." (p. 2)
Detailed Analysis:
Criterion R requires independent reproduction of this study (the teacher WhatsApp messaging intervention) by other authors in a peer-reviewed journal. The paper cites prior experimental work on the underlying platform in Chile, but that is a different intervention (platform use in school time) and involves overlapping authorship. Using web searches for this paper’s title, DOI, and keywords (WhatsApp, teacher messaging, Conecta Ideas, Peru), no peer-reviewed replication studies by independent teams were identified. Because the existence of a replication cannot be established from the available sources, the criterion remains not met.
Final sentence: Criterion R is not met because no independent peer-reviewed replication of the messaging intervention was found.
-
A
All-subject Exams
- Only mathematics outcomes are discussed, and criterion E is not met, so the all-subject standardized exam requirement is not satisfied.
- The Conecta Ideas platform offered weekly math problems to primary school students, aligned with the national curriculum.
Relevant Quotes:
1) "The national curriculum includes core subjects such as math, language, science, and social studies, along with specialized subjects like art and physical education." (p. 3)
2) "The Conecta Ideas platform offered weekly math problems to primary school students, aligned with the national curriculum." (p. 2)
Detailed Analysis:
Criterion A requires standardized exam-based assessment across all main subjects. The paper’s intervention and outcome discussion are focused on mathematics. More importantly, ERCT rules make criterion E a prerequisite: if the study does not use standardized exam-based outcomes in the RCT, criterion A cannot be met. Since the RCT outcomes are platform usage rather than standardized exams (criterion E not met), the study necessarily fails criterion A. There is also no evidence that standardized exams were collected across language, science, or other core subjects.
Final sentence: Criterion A is not met because the study does not provide all-subject standardized exam outcomes, and criterion E is not met.
-
G
Graduation Tracking
- Students are followed only within the 2022 school year and not until graduation; criterion Y is also not met, which implies G cannot be met.
- After the messages ended in week 8, the gap in platform use between treatment and control classrooms gradually narrowed and was no longer statistically significant by the end of the school year.
Relevant Quotes:
1) "After the messages ended in week 8, the gap in platform use between treatment and control classrooms gradually narrowed and was no longer statistically significant by the end of the school year." (p. 2)
2) "We use comprehensive administrative data recorded by the math learning platform, spanning from March to November 2022." (p. 4)
Detailed Analysis:
Criterion G requires tracking participants until graduation, potentially described in follow-up publications by the same authors. In this paper, tracking and reported effects are confined to the 2022 school year (data through November). A targeted web search for follow-up studies reporting graduation outcomes for this Peruvian cohort (grades 4–6 in 2022) did not identify any publication that follows the same participants through the end of primary school. Additionally, ERCT rules specify that if criterion Y (year duration) is not met, criterion G cannot be met.
Final sentence: Criterion G is not met because the study does not track students to graduation and no follow-up graduation paper was found.
-
P
Pre-Registered
- No pre-registration identifier or registry entry could be located for this RCT in the paper or via a web search.
Relevant Quotes:
1) No statement about trial pre-registration or a registry ID was found in the paper.
Detailed Analysis:
Criterion P requires that the study protocol be pre-registered before the study begins, and that the registry record can be verified (registration date before data collection). The paper does not mention pre-registration, a registry name, or an ID. A web search for a corresponding registry entry (including the AEA RCT Registry / Social Science Registry) using the paper title, DOI, author names, and keywords did not yield a public pre-registration record that can be verified for timing.
Final sentence: Criterion P is not met because no verifiable pre-registration record was found.
Request an Update or Contact Us
Are you the author of this study? Let us know if you have any questions or updates.