Level 1 Criteria
-
C Class-level RCT
- Randomisation was conducted at the individual‑student level rather than at the class level, so the Class‑level RCT criterion is not satisfied.
- "The 619 participants were individually randomized into treatment and control groups with 305 students in the control and 314 in the treatment group."
- Relevant Quotes: 1) "The 619 participants were individually randomized into treatment and control groups with 305 students in the control and 314 in the treatment group." (p. 9) 2) "Randomization was stratified by center‑batch preferences." (p. 9) Detailed Analysis: These quotes make clear that randomisation occurred at the individual‑student level rather than at the class level. The ERCT Standard’s Class‑level RCT criterion requires entire classes (or schools) to be randomized to prevent contamination. Because students within the same cohort were assigned individually, this criterion is not met. Therefore, criterion C is not met because randomisation was conducted at the individual‑student level rather than at the class level.
-
E Exam-based Assessment
- The study used researcher‑designed custom tests rather than a standardized, widely recognized exam, so the Exam‑based Assessment criterion is not satisfied.
- "The tests were designed independently by the research team and intended to capture a wide range of student achievement."
- Relevant Quotes: 1) "The tests were designed independently by the research team and intended to capture a wide range of student achievement." (p. 11) 2) "Test items ranged in difficulty from 'very easy' questions ... to 'grade‑appropriate' competencies found in international assessments." (p. 11) Detailed Analysis: The assessment instruments were custom‑designed by the authors. The ERCT Exam‑based Assessment criterion requires use of a standardized, widely recognized exam. Since bespoke tests were used, this criterion is not met. Therefore, criterion E is not met because the study used custom‑designed assessments instead of a widely recognized standardized exam.
-
T Term Duration
- Outcomes were measured after a 4.5‑month intervention period, which covers at least one term, satisfying the Term Duration criterion.
- "We measure program impacts using ... tests ... before and after the 4.5‑month‑long intervention."
- Relevant Quotes: 1) "We measure program impacts using ... tests ... before and after the 4.5‑month‑long intervention." (p. 3) 2) "Baseline assessments in September 2015 ... endline in February 2016." (p. 5) Detailed Analysis: The intervention spanned approximately 4.5 months, exceeding a typical academic term of 3–4 months. Under the ERCT Term Duration criterion, measuring outcomes after at least one full term satisfies the requirement. Therefore, criterion T is met because outcomes were measured after a 4.5‑month period, exceeding a full academic term.
-
D Documented Control Group
- The study provides detailed baseline characteristics and assessment outcomes for the control group, fulfilling the Documented Control Group criterion.
- "The treatment and control groups did not differ significantly at baseline on gender, socioeconomic status (SES), or baseline test scores (Table 1, panel A)."
- Relevant Quotes: 1) "The treatment and control groups did not differ significantly at baseline on gender, socioeconomic status (SES), or baseline test scores (Table 1, panel A)." (p. 9) 2) "Students not chosen by lottery ... completed an endline assessment." (p. 5) Detailed Analysis: Table 1 and its narrative describe the control group’s composition, demographics, and baseline performance. This clear documentation meets the Documented Control Group criterion. Therefore, criterion D is met because the control group’s characteristics and baseline performance are comprehensively documented.