Streaks to success: The effects of highlighting streaks on student effort and learning

Raphaëlle Aulagnon, Julian Cristia, Santiago Cueto, Ofer Malamud

Published:
ERCT Check Date:
DOI: 10.1016/j.econedurev.2025.102721
  • mathematics
  • K12
  • Latam
  • gamification
  • online homework
  • EdTech app
  • mobile learning
0
  • C

    The study randomized at the student level rather than at the class or school level.

    "60,000 students were randomly assigned to receive messages that (i) highlighted streaks, (ii) provided personalized reminders with positive reinforcement, (iii) provided generic reminders, or (iv) to a control group." (p. 1)

  • E

    Outcomes were measured with app-administered tests closely aligned to the treatment exercises rather than a standardized external exam.

    "The baseline test and endline test were administered through the app and had a very similar format. They were closely aligned with the exercises included in the six-week treatment period..." (p. 3)

  • T

    The intervention and outcome window was six weeks, which is shorter than a full academic term.

    "Our intervention took place during six weeks of the summer break starting from the week of January 17, 2022, and until the week of February 21, 2022." (p. 3)

  • D

    The control group is clearly described, including its notification condition and baseline characteristics.

    "During the six-week treatment period, the control group did not receive any further notifications, while the three treatment groups received different types of app notifications each week." (p. 3)

  • S

    Randomization occurred at the student level rather than the school level.

    "60,000 students were randomly assigned to receive messages that (i) highlighted streaks, (ii) provided personalized reminders with positive reinforcement, (iii) provided generic reminders, or (iv) to a control group." (p. 1)

  • I

    The paper does not provide a clear statement that the evaluation was conducted by an independent third party separate from the intervention team.

    "The learning platform used in this study, called 'Conecta Ideas,' included a student app that was developed at the Center for Advanced Research on Education at the Universidad de Chile (and owned by the firm AutoMind)." (p. 3)

  • Y

    The intervention and measurement period was six weeks, not an academic year.

    "Our intervention took place during six weeks of the summer break starting from the week of January 17, 2022, and until the week of February 21, 2022." (p. 3)

  • B

    The additional input (messages/notifications) is the treatment being tested, so a no-message control is the appropriate comparison.

    "During the six-week treatment period, the control group did not receive any further notifications, while the three treatment groups received different types of app notifications each week." (p. 3)

  • R

    No independent replication study by other authors was found for this specific intervention.

    "To our knowledge, this is the first experimental study examining the effects of highlighting streaks in an educational context." (p. 2)

  • A

    The study does not measure standardized outcomes across all core subjects, and criterion E is not met.

    "We also analyze effects on academic achievement, focusing on the subsample of students who completed the endline test." (p. 3)

  • G

    The study does not track participants to graduation and also fails the year-duration prerequisite (criterion Y).

    "While we cannot assess persistence following the end of treatment due to data limitations, we explore student behavior in the week following treatment when the endline test was administered." (p. 6)

  • P

    The paper provides ethics approval and data registration statements, but no evidence of a pre-registered study protocol.

    "We received approval for this project from the Via Libre (Comite Institutional de Bioetica) ethics board and registered the Conecta Ideas data with the Ministry of Justice in Peru." (p. 1)

Abstract

We examine whether highlighting streaks - instances of repeated and consecutive behavior when completing learning tasks - encourages 4th to 6th grade students in Peru to increase their use of an online math platform and improve learning. 60,000 students were randomly assigned to receive messages that (i) highlighted streaks, (ii) provided personalized reminders with positive reinforcement, (iii) provided generic reminders, or (iv) to a control group. Highlighting streaks and providing personalized reminders significantly increased platform use compared to generic reminders and the control group, with streaks more effective on the intensive margin and personalized reminders more effective on the extensive margin. Highlighting streaks also significantly improved math achievement compared to the control group among the 1,500 students who took an endline test, although differences with other treatment arms were not significant.

Full Article

ERCT Criteria Breakdown

  • Level 1 Criteria

    • C

      Class-level RCT

      • The study randomized at the student level rather than at the class or school level.
      • "60,000 students were randomly assigned to receive messages that (i) highlighted streaks, (ii) provided personalized reminders with positive reinforcement, (iii) provided generic reminders, or (iv) to a control group." (p. 1)
      • Relevant Quotes: 1) "60,000 students were randomly assigned to receive messages that (i) highlighted streaks, (ii) provided personalized reminders with positive reinforcement, (iii) provided generic reminders, or (iv) to a control group." (p. 1) 2) "Randomization to treatment was carried out after the baseline test and stratified based on whether the student had completed the baseline test and the number of weeks that the student used the platform in 2021." (p. 4) Detailed Analysis: Criterion C requires randomization at the class level (or stronger at the school level), unless the intervention is explicitly one-to-one tutoring. The paper states that "students were randomly assigned" and describes student-level stratification, which indicates the unit of randomization is the individual student rather than a class or a school. The intervention is app notifications (messages), not personal tutoring, so the tutoring exception does not apply. Final sentence explaining if criterion C is not met because randomization was at the student level rather than the class (or school) level.
    • E

      Exam-based Assessment

      • Outcomes were measured with app-administered tests closely aligned to the treatment exercises rather than a standardized external exam.
      • "The baseline test and endline test were administered through the app and had a very similar format. They were closely aligned with the exercises included in the six-week treatment period..." (p. 3)
      • Relevant Quotes: 1) "The baseline test and endline test were administered through the app and had a very similar format. They were closely aligned with the exercises included in the six-week treatment period and included 30 exercises covering all the topics presented during the treatment period (i.e., 15 exercises for numeracy, and 5 for geometry, probability, and patterns, respectively)." (p. 3) Detailed Analysis: Criterion E requires a widely recognized standardized exam, not a custom instrument created for or tightly aligned to the intervention. The paper explicitly states the baseline and endline tests were administered through the app and were "closely aligned" with the treatment-period exercises. That indicates a study-specific assessment rather than an external standardized exam. Final sentence explaining if criterion E is not met because the study used app-based tests closely aligned with the intervention rather than a standardized external exam.
    • T

      Term Duration

      • The intervention and outcome window was six weeks, which is shorter than a full academic term.
      • "Our intervention took place during six weeks of the summer break starting from the week of January 17, 2022, and until the week of February 21, 2022." (p. 3)
      • Relevant Quotes: 1) "Our intervention took place during six weeks of the summer break starting from the week of January 17, 2022, and until the week of February 21, 2022." (p. 3) 2) "We also conducted a baseline test and an endline test, which took place in the weeks immediately before and after the six-week treatment period." (p. 3) Detailed Analysis: Criterion T requires outcomes to be measured at least one full academic term after the intervention begins (typically about 3-4 months). The paper specifies a six-week intervention period, with the endline test occurring immediately after. This does not meet the term duration requirement. Final sentence explaining if criterion T is not met because outcomes were measured around a six-week window rather than at least one full academic term after the start.
    • D

      Documented Control Group

      • The control group is clearly described, including its notification condition and baseline characteristics.
      • "During the six-week treatment period, the control group did not receive any further notifications, while the three treatment groups received different types of app notifications each week." (p. 3)
      • Relevant Quotes: 1) "During the six-week treatment period, the control group did not receive any further notifications, while the three treatment groups received different types of app notifications each week." (p. 3) 2) "Forty percent of the sample, or 24,000 students, were assigned to the control group, while 20 percent of the sample, or 12,000 students, were assigned to each of the three treatment groups: Streak, Personalized Reminder, and Generic Reminder." (p. 4) 3) "Table 1 examines baseline balance across the four treatment arms for two relevant samples: all students, and students who took the endline test (in panels A and B, respectively). Column 1 presents means for the control group..." (p. 4) Detailed Analysis: Criterion D requires a well-documented control group, including the control condition and baseline information. The paper states the control group received no further notifications during treatment and provides the control group size and assignment shares. It also explicitly describes that Table 1 reports baseline balance with control-group means, which supplies the needed baseline documentation. Final sentence explaining if criterion D is met because the control condition is explicitly defined and baseline control-group statistics are documented in the paper.
  • Level 2 Criteria

    • S

      School-level RCT

      • Randomization occurred at the student level rather than the school level.
      • "60,000 students were randomly assigned to receive messages that (i) highlighted streaks, (ii) provided personalized reminders with positive reinforcement, (iii) provided generic reminders, or (iv) to a control group." (p. 1)
      • Relevant Quotes: 1) "60,000 students were randomly assigned to receive messages that (i) highlighted streaks, (ii) provided personalized reminders with positive reinforcement, (iii) provided generic reminders, or (iv) to a control group." (p. 1) Detailed Analysis: Criterion S requires school-level randomization. The quoted sentence identifies students as the randomized unit, and the paper does not describe random assignment of schools. Final sentence explaining if criterion S is not met because the unit of randomization was students rather than schools.
    • I

      Independent Conduct

      • The paper does not provide a clear statement that the evaluation was conducted by an independent third party separate from the intervention team.
      • "The learning platform used in this study, called 'Conecta Ideas,' included a student app that was developed at the Center for Advanced Research on Education at the Universidad de Chile (and owned by the firm AutoMind)." (p. 3)
      • Relevant Quotes: 1) "The learning platform used in this study, called 'Conecta Ideas,' included a student app that was developed at the Center for Advanced Research on Education at the Universidad de Chile (and owned by the firm AutoMind)." (p. 3) 2) "This project is the result of a large collaborative effort." (p. 1) Detailed Analysis: Criterion I requires explicit evidence that the study was conducted independently from the designers or implementers of the intervention. The paper describes development and ownership connections for the platform and does not contain a clear statement that an independent evaluation organization implemented the experiment or performed the analysis. In the absence of an explicit independence statement, the criterion is not met. Final sentence explaining if criterion I is not met because the paper does not provide clear evidence of independent third-party conduct of the evaluation.
    • Y

      Year Duration

      • The intervention and measurement period was six weeks, not an academic year.
      • "Our intervention took place during six weeks of the summer break starting from the week of January 17, 2022, and until the week of February 21, 2022." (p. 3)
      • Relevant Quotes: 1) "Our intervention took place during six weeks of the summer break starting from the week of January 17, 2022, and until the week of February 21, 2022." (p. 3) Detailed Analysis: Criterion Y requires outcomes tracked for at least one full academic year after the intervention begins. The paper specifies a six-week summer intervention and does not report year-long tracking. Final sentence explaining if criterion Y is not met because the study duration is six weeks rather than at least one academic year.
    • B

      Balanced Control Group

      • The additional input (messages/notifications) is the treatment being tested, so a no-message control is the appropriate comparison.
      • "During the six-week treatment period, the control group did not receive any further notifications, while the three treatment groups received different types of app notifications each week." (p. 3)
      • Relevant Quotes: 1) "We compare the effect of sending messages which (i) highlighted streaks of completed assignments, to messages which (ii) provided personalized reminders with positive reinforcement, (iii) provided generic reminders, and (iv) to a control group which did not receive any messages." (p. 2) 2) "During the six-week treatment period, the control group did not receive any further notifications, while the three treatment groups received different types of app notifications each week." (p. 3) Detailed Analysis: Criterion B checks whether the intervention group receives extra time, budget, or resources that are not matched in the control group, unless those extra resources are explicitly the treatment variable being tested. Here, the intervention is defined as sending app messages. The control group continues with access to the same platform but does not receive the messages. This is exactly the intended counterfactual for testing the causal effect of the messages themselves, so the resource difference is integral to the treatment design. Final sentence explaining if criterion B is met because the messages are the treatment variable, making a no-message control appropriate.
  • Level 3 Criteria

    • R

      Reproduced

      • No independent replication study by other authors was found for this specific intervention.
      • "To our knowledge, this is the first experimental study examining the effects of highlighting streaks in an educational context." (p. 2)
      • Relevant Quotes: 1) "To our knowledge, this is the first experimental study examining the effects of highlighting streaks in an educational context." (p. 2) Detailed Analysis: Criterion R requires independent reproduction by other researchers in a different context, published as a peer-reviewed replication. The paper positions itself as the first experimental educational study on this specific streak-highlighting intervention. A targeted web search for replications and follow-up RCTs of this exact intervention did not identify an independent replication paper by a separate author team. Final sentence explaining if criterion R is not met because no independent replication study was found for this intervention.
    • A

      All-subject Exams

      • The study does not measure standardized outcomes across all core subjects, and criterion E is not met.
      • "We also analyze effects on academic achievement, focusing on the subsample of students who completed the endline test." (p. 3)
      • Relevant Quotes: 1) "We also analyze effects on academic achievement, focusing on the subsample of students who completed the endline test." (p. 3) 2) "The main outcomes in the study are measures of platform use and student math scores on the endline test." (p. 3) Detailed Analysis: Criterion A requires standardized exam-based measurement across all main subjects, and it explicitly depends on criterion E being met. This study reports only mathematics achievement from an app-based endline test and does not report standardized exams across multiple subjects. Since criterion E is not met, criterion A cannot be met. Final sentence explaining if criterion A is not met because outcomes are limited to math (and not via standardized all-subject exams), and criterion E is not met.
    • G

      Graduation Tracking

      • The study does not track participants to graduation and also fails the year-duration prerequisite (criterion Y).
      • "While we cannot assess persistence following the end of treatment due to data limitations, we explore student behavior in the week following treatment when the endline test was administered." (p. 6)
      • Relevant Quotes: 1) "While we cannot assess persistence following the end of treatment due to data limitations, we explore student behavior in the week following treatment when the endline test was administered." (p. 6) Detailed Analysis: Criterion G requires tracking participants until graduation and is constrained by the prerequisite that criterion Y must be met. The paper explicitly states it cannot assess persistence after the end of treatment and focuses only on the immediate post-treatment week. In addition, criterion Y is not met because the study does not track outcomes over a full academic year. A targeted search did not identify a subsequent follow-up paper by the same author team reporting graduation tracking for this cohort. Final sentence explaining if criterion G is not met because there is no graduation tracking (and criterion Y is not met).
    • P

      Pre-Registered

      • The paper provides ethics approval and data registration statements, but no evidence of a pre-registered study protocol.
      • "We received approval for this project from the Via Libre (Comite Institutional de Bioetica) ethics board and registered the Conecta Ideas data with the Ministry of Justice in Peru." (p. 1)
      • Relevant Quotes: 1) "We received approval for this project from the Via Libre (Comite Institutional de Bioetica) ethics board and registered the Conecta Ideas data with the Ministry of Justice in Peru." (p. 1) Detailed Analysis: Criterion P requires a publicly accessible pre-registered protocol (hypotheses, methods, and analysis plan) registered before data collection began, typically with an identifiable registry entry or ID. The paper mentions ethics approval and registration of the data with a government ministry, but it does not cite a pre-registration registry entry for the study protocol. Attempts to verify a registry entry via the AEA RCT Registry were blocked in this environment, so no protocol registration could be confirmed beyond what is stated in the paper. Final sentence explaining if criterion P is not met because the paper does not provide evidence of a pre-registered study protocol.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

  • Submit a Study for Evaluation

    Share your research with us for review

  • Suggest Improvements

    Provide feedback to help us make things better.

  • Update Your Study

    If you're the author, let us know about necessary updates or corrections.