Abstract
We examine whether highlighting streaks—instances of repeated and consecutive behavior when completing learning tasks—encourages 4th to 6th grade students in Peru to increase their use of an online math platform and improve learning. 60,000 students were randomly assigned to receive messages that (i) highlighted streaks, (ii) provided personalized reminders with positive reinforcement, (iii) provided generic reminders, or (iv) to a control group. Highlighting streaks and providing personalized reminders significantly increased platform use compared to generic reminders and the control group, with streaks more effective on the intensive margin and personalized reminders more effective on the extensive margin. Highlighting streaks also significantly improved math achievement compared to the control group among the 1,500 students who took an endline test, although differences with other treatment arms were not significant.
Full
Article
ERCT Criteria Breakdown
-
Level 1 Criteria
-
C
Class-level RCT
- The study randomized at the student level rather than the class or school level.
- "60,000 students were randomly assigned to receive messages that (i) highlighted streaks, (ii) provided personalized reminders... or (iv) to a control group." (p. 1)
Relevant Quotes:
1) "60,000 students were randomly assigned to receive messages that (i) highlighted streaks, (ii) provided personalized reminders... or (iv) to a control group." (p. 1)
2) "Randomization to treatment was carried out after the baseline test... 24,000 students were assigned to the control group..." (p. 4)
Detailed Analysis:
The ERCT standard requires randomization to occur at the class or school level to prevent contamination, unless the intervention is strictly one-to-one tutoring. Here, the intervention involves sending app notifications (messages) to individual students/households. The quotes explicitly state that "students were randomly assigned," indicating a student-level randomization. While the intervention is digital and personal (app notifications), it is not described as personal tutoring, and the standard prefers class-level assignment to minimize peer effects or contamination. As randomization was at the student level, this criterion is not met.
Final sentence explaining if criterion C is not met because the study randomized individual students rather than classes or schools.
-
E
Exam-based Assessment
- The study used a custom endline test administered through the app rather than a widely recognized standardized exam.
- "The baseline test and endline test were administered through the app ... They were closely aligned with the exercises included in the six-week treatment period..." (p. 3)
Relevant Quotes:
1) "The baseline test and endline test were administered through the app and had a very similar format. They were closely aligned with the exercises included in the six-week treatment period..." (p. 3)
2) "We use data on the endline test to generate a standardized measure of math academic achievement by subtracting the fraction of correct responses from the mean of the control group..." (p. 3)
Detailed Analysis:
Criterion E requires the use of widely recognized standardized assessments (e.g., state or national exams) rather than tests designed specifically for the study. The paper states the tests were "administered through the app" and were "closely aligned with the exercises included in the six-week treatment period." While the authors standardized the scores statistically (z-scores), the assessment tool itself was custom-made for this intervention context and is not a standard, external exam.
Final sentence explaining if criterion E is not met because the study utilized a custom assessment aligned with the intervention rather than a standard external exam.
-
T
Term Duration
- The intervention duration was six weeks, which is shorter than the required full academic term.
- "Our intervention took place during six weeks of the summer break..." (p. 3)
Relevant Quotes:
1) "Our intervention took place during six weeks of the summer break starting from the week of January 17, 2022, and until the week of February 21, 2022." (p. 3)
Detailed Analysis:
The ERCT standard defines a term as approximately 3-4 months. This intervention explicitly "took place during six weeks." While it occurred during the summer break, the duration falls significantly short of the minimum requirement for a full academic term (usually 12+ weeks). Therefore, the study is considered short-term.
Final sentence explaining if criterion T is not met because the intervention lasted only six weeks, which is shorter than a full academic term.
-
D
Documented Control Group
- The control group is clearly defined as receiving no messages, and their demographics and baseline performance are well-documented.
- "During the six-week treatment period, the control group did not receive any further notifications..." (p. 3)
Relevant Quotes:
1) "During the six-week treatment period, the control group did not receive any further notifications..." (p. 3)
2) "Table 1 Baseline Balance... Control (1)... Multigrade 5.91... Female 51.12... Math achievement in 2021 -0.00..." (p. 5)
Detailed Analysis:
Criterion D requires detailed documentation of the control group, including demographics and conditions. The paper provides a dedicated column in Table 1 listing the baseline characteristics (demographics, prior math achievement, school type) for the control group. Furthermore, the condition of the control group is explicitly described: they had access to the app but "did not receive any further notifications" specific to the treatment (streaks/reminders).
Final sentence explaining if criterion D is met because the paper provides detailed baseline statistics and a clear description of the conditions for the control group.
-
Level 2 Criteria
-
S
School-level RCT
- The study utilized student-level randomization, not school-level randomization.
- "60,000 students were randomly assigned..." (p. 1)
Relevant Quotes:
1) "60,000 students were randomly assigned to receive messages..." (p. 1)
2) "Randomization to treatment was carried out after the baseline test... stratified based on whether the student had completed the baseline test..." (p. 4)
Detailed Analysis:
Criterion S requires randomization at the school level to simulate real-world implementation and account for school-wide factors. The quotes confirm that randomization was performed at the student level ("students were randomly assigned"), which does not meet the stronger school-level requirement.
Final sentence explaining if criterion S is not met because the unit of randomization was the student, not the school.
-
I
Independent Conduct
- The study appears to be conducted by the authors who designed the intervention without a stated independent third-party evaluator.
- "We compare the effect of sending messages which (i) highlighted streaks..." (p. 2)
Relevant Quotes:
1) "We compare the effect of sending messages which (i) highlighted streaks..." (p. 2)
2) "The learning platform used in this study, called 'Conecta Ideas,' included a student app that was developed at the Center for Advanced Research on Education at the Universidad de Chile..." (p. 3)
3) "Author statement... We confirm that the manuscript has been read and approved by all named authors..." (p. 9)
Detailed Analysis:
Criterion I requires the study to be conducted independently from the authors who designed the intervention. The paper gives no indication that an external evaluation firm or independent third party collected or analyzed the data. The authors themselves seem to be responsible for the design of the message intervention and the analysis of the results. While the app was developed by a center (where some authors are affiliated), the specific intervention (messaging) and the evaluation were conducted by the research team.
Final sentence explaining if criterion I is not met because there is no evidence of independent conduct or external oversight separate from the intervention designers.
-
Y
Year Duration
- The intervention lasted six weeks, failing the one-year duration requirement.
- "Our intervention took place during six weeks of the summer break..." (p. 3)
Relevant Quotes:
1) "Our intervention took place during six weeks of the summer break starting from the week of January 17, 2022, and until the week of February 21, 2022." (p. 3)
Detailed Analysis:
Criterion Y requires the study to span at least one full academic year. As noted in criterion T, this study was a six-week summer intervention. This does not meet the requirement for a year-long duration.
Final sentence explaining if criterion Y is not met because the study duration was significantly shorter than one academic year.
-
B
Balanced Control Group
- The intervention specifically tested the impact of "nudges" (messages) as the treatment variable, so the lack of messages for the control group is by design and balanced.
- "We compare the effect of sending messages which (i) highlighted streaks... and (iv) to a control group which did not receive any messages." (p. 2)
Relevant Quotes:
1) "We compare the effect of sending messages which (i) highlighted streaks... and (iv) to a control group which did not receive any messages." (p. 2)
2) "To encourage the use of the online math platform, notifications were sent through the app to the smartphones..." (p. 3)
3) "During the six-week treatment period, the control group did not receive any further notifications, while the three treatment groups received different types of app notifications..." (p. 3)
Detailed Analysis:
Criterion B checks for balanced resources (time/budget) between groups unless the extra resource is the variable being tested. In this study, the "resource" provided to the treatment group is the set of notifications (messages) designed to nudge behavior (highlighting streaks). The underlying educational resource (the math platform and exercises) was available to the control group (who had the app), but they did not receive the specific notifications. Since the study explicitly intends to test the impact of these messages (the "nudge") as the treatment variable, the absence of messages in the control group is the correct counterfactual and does not constitute an invalid imbalance.
Final sentence explaining if criterion B is met because the additional resource (messages) was the explicit treatment variable being tested.
-
Level 3 Criteria
-
R
Reproduced
- The paper states this is the first experimental study of its kind in this context, and no independent replication is cited.
- "To our knowledge, this is the first experimental study examining the effects of highlighting streaks in an educational context." (p. 2)
Relevant Quotes:
1) "To our knowledge, this is the first experimental study examining the effects of highlighting streaks in an educational context." (p. 2)
Detailed Analysis:
Criterion R requires the study to be independently replicated. The authors explicitly state that this is the first study of its kind in an educational context. There are no references to subsequent independent replication studies in the text provided.
Final sentence explaining if criterion R is not met because the study identifies itself as the first of its kind without citation of independent replication.
-
A
All-subject Exams
- The study only assessed math achievement and did not measure outcomes in other main subjects like reading or science.
- "We also analyze effects on academic achievement... generate a standardized measure of math academic achievement..." (p. 3)
Relevant Quotes:
1) "We also analyze effects on academic achievement... generate a standardized measure of math academic achievement..." (p. 3)
2) "The exercises included... 30 exercises covering all the topics presented during the treatment period (i.e., 15 exercises for numeracy, and 5 for geometry, probability, and patterns, respectively)." (p. 3)
Detailed Analysis:
Criterion A requires assessment of all main subjects (e.g., math, science, language arts) to check for negative spillovers. This study focused exclusively on mathematics ("math academic achievement"). No other subjects were assessed. Furthermore, Criterion E (standardized exam) was not met, which is a prerequisite for this criterion.
Final sentence explaining if criterion A is not met because only math outcomes were measured and the prerequisite Criterion E was not met.
-
G
Graduation Tracking
- The study followed up immediately after the six-week intervention; no graduation tracking was conducted.
- "We cannot assess persistence following the end of treatment due to data limitations..." (p. 6)
Relevant Quotes:
1) "We cannot assess persistence following the end of treatment due to data limitations..." (p. 6)
2) "We also conducted a baseline test and an endline test, which took place in the weeks immediately before and after the six-week treatment period." (p. 3)
Detailed Analysis:
Criterion G requires tracking students until graduation to assess long-term impacts. The paper explicitly states that data limitations prevented assessment of persistence following the end of the treatment. Tracking ended with the endline test immediately following the six-week summer program.
Final sentence explaining if criterion G is not met because data collection stopped immediately after the intervention without long-term tracking.
-
P
Pre-Registered
- There is no evidence of a pre-registered protocol with hypotheses and analysis plans in a public registry before data collection.
- "We received approval for this project from the Via Libre... and registered the Conecta Ideas data with the Ministry of Justice in Peru." (p. 1)
Relevant Quotes:
1) "We received approval for this project from the Via Libre (Comite Institutional de Bioetica) ethics board and registered the Conecta Ideas data with the Ministry of Justice in Peru." (p. 1)
Detailed Analysis:
Criterion P requires pre-registration of the full study protocol (hypotheses, methods, analysis plan) before data collection begins. The authors mention registering the *data* with the Ministry of Justice (likely for privacy/data protection compliance) and receiving ethics approval. However, there is no mention of registering the study protocol on a standard registry like ClinicalTrials.gov, OSF, or the AEA RCT Registry.
Final sentence explaining if criterion P is not met because there is no citation of a pre-registered study protocol in a public registry.
Request an Update or Contact Us
Are you the author of this study? Let us know if you have any questions or updates.