Can Discussions about Girls’ Education Improve Academic Outcomes? Evidence from a Randomized Development Project

Christopher S. Cotton, Ardyn Nordstrom, Jordan Nanowski, and Eric Richert

Published: Feb 1, 2025

ERCT Check Date: Dec 31, 2025

DOI: 10.1093/wber/lhae021

Link

Download PDF

mathematics
reading
K12
Africa
parent involvement

C

Randomization was at the school level, which satisfies the class-level RCT requirement.

"The program was implemented in randomly selected schools across 10 rural districts in Zimbabwe and is estimated to have reached 48,773 girls." (p. 214)
E

Outcomes were measured using EGRA and EGMA, which are standardized assessment systems.

"Girls also took the Early Grade Reading Assessment (EGRA) and Early Grade Mathematics Assessment (EGMA)." (p. 218)
T

Midline outcomes were measured about 1.5 years after implementation began, exceeding one academic term.

"Baseline data collection occurred before implementation began in February 2014. Midline data collection took place a year and a half later, in June–August 2015." (p. 216)
D

The control group is described with sample sizes and baseline characteristics, including a baseline balance table.

"The evaluation sample included 37 treatment and 28 control locations." (p. 217)
S

Treatment was assigned at the school level, meeting the school-level RCT requirement.

"The program was implemented in randomly selected schools across 10 rural districts in Zimbabwe and is estimated to have reached 48,773 girls." (p. 214)
I

Data collection involved an external firm and the authors analyze secondary data rather than directly implementing the intervention.

"The data collection was funded by the UK government’s Girls’ Education Challenge fund and conducted by Miske Witt & Associates in collaboration with World Vision, the program implementer, and PriceWaterhouseCoopers LLC, the UK Aid Girls’ Education Challenge fund manager." (p. 211)
Y

Midline outcomes were measured about 1.5 years after implementation began, exceeding one academic year.

"Baseline data collection occurred before implementation began in February 2014. Midline data collection took place a year and a half later, in June–August 2015." (p. 216)
B

The intervention adds facilitated discussions, and those added resources are the treatment being tested rather than an uncontrolled add-on.

"This article evaluates the impact that facilitated discussions about girls’ education have on education outcomes for students in rural Zimbabwe." (p. 211)
R

No independent reproduction of this specific study was identified in the paper, and none could be verified from accessible sources.

"However, to the knowledge of the authors, this is the first paper to present causal evidence of the impact that information provided as part of a dialogue can have on education outcomes." (p. 214)
A

The study measures mathematics and reading only, not all core subjects.

"Girls also took the Early Grade Reading Assessment (EGRA) and Early Grade Mathematics Assessment (EGMA)." (p. 218)
G

Participants were followed through endline in 2016, which is not tracking the cohort to graduation.

"Endline data collection occurred in November–December 2016, at the end of the project." (p. 216)
P

The study reports AEA RCT Registry registration, but the registry entry timing could not be verified from accessible sources for this check.

"This trial was registered with the American Economic Association’s registry for randomized controlled trials." (p. 217)

Abstract

This article evaluates the impact that facilitated discussions about girls’ education have on education outcomes for students in rural Zimbabwe. The staggered implementation of components of a randomized education project allowed for the causal analysis of a dialogue-based engagement campaign. This campaign involved regular discussions between trained facilitators and parents, teachers, and youth about girls’ rights, the importance of attending school, and the barriers girls face in pursuing education. The campaigns increased mathematics performance and enrollment in the year after implementation. There was no similar improvement in literacy performance during this period. Longer-term data on the broader project suggest that adding additional education-focused interventions did not further increase mathematics performance and enrollment beyond what can be attributable to the dialogue campaigns alone.

Full Article

ERCT Criteria Breakdown

Level 1 Criteria
- C
  Class-level RCT
  - Randomization was at the school level, which satisfies the class-level RCT requirement.
  - "The program was implemented in randomly selected schools across 10 rural districts in Zimbabwe and is estimated to have reached 48,773 girls." (p. 214)
  - Relevant Quotes: 1) "The program was implemented in randomly selected schools across 10 rural districts in Zimbabwe and is estimated to have reached 48,773 girls." (p. 214) Detailed Analysis: Criterion C requires randomization at the class level (or stronger). The paper states that schools were "randomly selected," implying a school-level cluster randomized design. School-level randomization is stronger than class-level randomization and therefore satisfies criterion C. Final summary: Criterion C is met because randomization occurred at the school level.
- E
  Exam-based Assessment
  - Outcomes were measured using EGRA and EGMA, which are standardized assessment systems.
  - "Girls also took the Early Grade Reading Assessment (EGRA) and Early Grade Mathematics Assessment (EGMA)." (p. 218)
  - Relevant Quotes: 1) "Girls also took the Early Grade Reading Assessment (EGRA) and Early Grade Mathematics Assessment (EGMA)." (p. 218) 2) "The EGRA and EGMA have strict development guidelines that ensure the difficulty level is standard- ized across versions." (p. 218) Detailed Analysis: Criterion E requires standardized, widely recognized exam-based assessment rather than custom-made tests. The study used EGRA and EGMA and explicitly describes them as having "strict development guidelines" ensuring standardization across versions. This supports that outcomes were assessed with standardized instruments. Final summary: Criterion E is met because EGRA and EGMA are standardized assessment systems used to measure outcomes.
- T
  Term Duration
  - Midline outcomes were measured about 1.5 years after implementation began, exceeding one academic term.
  - "Baseline data collection occurred before implementation began in February 2014. Midline data collection took place a year and a half later, in June–August 2015." (p. 216)
  - Relevant Quotes: 1) "Baseline data collection occurred before implementation began in February 2014. Midline data collection took place a year and a half later, in June–August 2015." (p. 216) 2) "The staggered implementation and its align- ment with data collection points allow us to measure the short-term impact of the dialogue-based en- gagement efforts by midline, nine months to one year after dialogues were introduced." (p. 213) Detailed Analysis: Criterion T requires that outcomes are measured at least one full academic term after the intervention begins (typically 3-4 months). The paper states implementation began in February 2014 and midline measurement occurred in June–August 2015, about 16-18 months later. The paper also describes measuring impacts "nine months to one year after dialogues were introduced," which exceeds one term. Final summary: Criterion T is met because outcomes were measured well over one term after the intervention began.
- D
  Documented Control Group
  - The control group is described with sample sizes and baseline characteristics, including a baseline balance table.
  - "The evaluation sample included 37 treatment and 28 control locations." (p. 217)
  - Relevant Quotes: 1) "The evaluation sample included 37 treatment and 28 control locations." (p. 217) 2) "At midline, 385 and 557 girls are in the control and treatment samples, respectively." (p. 217) 3) "Table 1. Baseline Summary Statistics" (p. 218) Detailed Analysis: Criterion D requires that the control group is well documented (size and baseline characteristics). The paper reports the number of treatment and control locations and provides midline sample sizes for each group. It also provides a baseline summary statistics table (Table 1) comparing observable characteristics between control and treatment groups, supporting clear documentation of the control condition. Final summary: Criterion D is met because the control group is clearly described and baseline characteristics are reported.
Level 2 Criteria
- S
  School-level RCT
  - Treatment was assigned at the school level, meeting the school-level RCT requirement.
  - "The program was implemented in randomly selected schools across 10 rural districts in Zimbabwe and is estimated to have reached 48,773 girls." (p. 214)
  - Relevant Quotes: 1) "The program was implemented in randomly selected schools across 10 rural districts in Zimbabwe and is estimated to have reached 48,773 girls." (p. 214) 2) "Since the treatment was applied at the school level, and participation in the interventions was voluntary, this specification estimates the project’s intent-to-treat effect." (p. 219) Detailed Analysis: Criterion S requires randomization at the school level. The paper states that the program was implemented in "randomly selected schools" and later explicitly notes that "treatment was applied at the school level." This is direct confirmation that assignment was at the school (cluster) level. Final summary: Criterion S is met because the unit of treatment assignment was the school.
- I
  Independent Conduct
  - Data collection involved an external firm and the authors analyze secondary data rather than directly implementing the intervention.
  - "The data collection was funded by the UK government’s Girls’ Education Challenge fund and conducted by Miske Witt & Associates in collaboration with World Vision, the program implementer, and PriceWaterhouseCoopers LLC, the UK Aid Girls’ Education Challenge fund manager." (p. 211)
  - Relevant Quotes: 1) "A preliminary exploration of the data by the research team was funded by World Vision Canada and conducted through Limestone Analytics, where Cotton holds a secondary affiliation and where Nordstrom and Nanowski previously worked." (p. 211) 2) "The data collection was funded by the UK government’s Girls’ Education Challenge fund and conducted by Miske Witt & Associates in collaboration with World Vision, the program implementer, and PriceWaterhouseCoopers LLC, the UK Aid Girls’ Education Challenge fund manager." (p. 211) 3) "World Vision provided the data set to the research team for academic research;" (p. 211) Detailed Analysis: Criterion I requires that the study is conducted independently from the intervention designers. The intervention was implemented by a consortium led by World Vision, while data collection was conducted by professional enumerators from a Zimbabwe-based firm and (per acknowledgments) by Miske Witt & Associates in collaboration with the implementer and fund manager. The authors are academics and analyze secondary data provided for academic research. The acknowledgments disclose that a preliminary exploration was funded by World Vision Canada and conducted through Limestone Analytics, which is a potential source of influence, but the main paper positions the analysis as academic research on secondary data rather than an internal self-evaluation by the implementers. Final summary: Criterion I is met because the analysis is conducted by an external academic team using secondary data, with disclosed relationships.
- Y
  Year Duration
  - Midline outcomes were measured about 1.5 years after implementation began, exceeding one academic year.
  - "Baseline data collection occurred before implementation began in February 2014. Midline data collection took place a year and a half later, in June–August 2015." (p. 216)
  - Relevant Quotes: 1) "Baseline data collection occurred before implementation began in February 2014. Midline data collection took place a year and a half later, in June–August 2015." (p. 216) Detailed Analysis: Criterion Y requires outcomes to be measured at least one full academic year after intervention start. The paper reports an interval from implementation start (February 2014) to midline measurement (June–August 2015) of about 16-18 months, which exceeds a typical academic year. Final summary: Criterion Y is met because the time from implementation start to outcome measurement exceeds one academic year.
- B
  Balanced Control Group
  - The intervention adds facilitated discussions, and those added resources are the treatment being tested rather than an uncontrolled add-on.
  - "This article evaluates the impact that facilitated discussions about girls’ education have on education outcomes for students in rural Zimbabwe." (p. 211)
  - Relevant Quotes: 1) "This article evaluates the impact that facilitated discussions about girls’ education have on education outcomes for students in rural Zimbabwe." (p. 211) 2) "The dialogue sessions provided a setting for participants to comfortably discuss a guided set of topics in the presence of a trained facilitator." (p. 215) 3) "The program provided no financial assistance or other resources to these groups or their members." (p. 215) Detailed Analysis: Criterion B requires comparable time and resources in treatment and control unless the additional resources are the treatment variable itself. The intervention consists of facilitated, structured discussions led by trained facilitators, which are additional time and personnel inputs relative to business-as-usual. The paper explicitly frames the causal question as the impact of these facilitated discussions. Therefore, the additional time and facilitation resources are integral to the intervention definition, not a confounding add-on that should be matched in the control condition. Final summary: Criterion B is met because the added resources (facilitated discussions) are the central treatment being evaluated.
Level 3 Criteria
- R
  Reproduced
  - No independent reproduction of this specific study was identified in the paper, and none could be verified from accessible sources.
  - "However, to the knowledge of the authors, this is the first paper to present causal evidence of the impact that information provided as part of a dialogue can have on education outcomes." (p. 214)
  - Relevant Quotes: 1) "However, to the knowledge of the authors, this is the first paper to present causal evidence of the impact that information provided as part of a dialogue can have on education outcomes." (p. 214) Detailed Analysis: Criterion R requires an independent replication by other authors in a different context, published in a peer-reviewed venue. The paper positions itself as "the first" to provide causal evidence for this mechanism in education, and it does not cite any independent replication of the same intervention and evaluation. Based on accessible sources reviewed for this ERCT check, no clear independent reproduction of this specific study was verifiable. Final summary: Criterion R is not met because independent reproduction was not identified or verifiable.
- A
  All-subject Exams
  - The study measures mathematics and reading only, not all core subjects.
  - "Girls also took the Early Grade Reading Assessment (EGRA) and Early Grade Mathematics Assessment (EGMA)." (p. 218)
  - Relevant Quotes: 1) "Girls also took the Early Grade Reading Assessment (EGRA) and Early Grade Mathematics Assessment (EGMA)." (p. 218) Detailed Analysis: Criterion A requires standardized assessments across all main subjects. The paper reports outcomes using EGMA (mathematics) and EGRA (reading) only. It does not report standardized outcomes for other core subject areas. Final summary: Criterion A is not met because only mathematics and reading were assessed.
- G
  Graduation Tracking
  - Participants were followed through endline in 2016, which is not tracking the cohort to graduation.
  - "Endline data collection occurred in November–December 2016, at the end of the project." (p. 216)
  - Relevant Quotes: 1) "Endline data collection occurred in November–December 2016, at the end of the project." (p. 216) 2) "The analysis is limited to girls in grade seven or below (i.e., in primary school) at baseline for clarity of interpretation." (p. 217) Detailed Analysis: Criterion G requires tracking participants until graduation. The study’s endline data collection occurred at the end of the project in late 2016. The analyzed cohort includes girls in Grade 7 or below at baseline, so many participants would not have reached graduation by the endline window. The paper provides no evidence of follow-up tracking until graduation for the full cohort. Final summary: Criterion G is not met because tracking ends at project endline rather than at participant graduation.
- P
  Pre-Registered
  - The study reports AEA RCT Registry registration, but the registry entry timing could not be verified from accessible sources for this check.
  - "This trial was registered with the American Economic Association’s registry for randomized controlled trials." (p. 217)
  - Relevant Quotes: 1) "This trial was registered with the American Economic Association’s registry for randomized controlled trials." (p. 217) 2) "The registry record is available at https://www. socialscienceregistry. org/trials/7963 ." (p. 217) 3) "Baseline data collection occurred before implementation began in February 2014." (p. 216) Detailed Analysis: Criterion P requires that the protocol is pre-registered before the study begins. The paper states that the trial was registered and provides a registry URL, and it also states that baseline data collection occurred before implementation began in February 2014. However, during this ERCT check, the AEA registry entry itself was not accessible for verification of the registration date, so the timing relative to February 2014 could not be confirmed. Final summary: Criterion P is not met because pre-registration timing could not be verified from accessible sources for this check.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Request Valuation Update

All Other Requests

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

Submit a Study for Evaluation

Share your research with us for review
Suggest Improvements

Provide feedback to help us make things better.
Update Your Study

If you're the author, let us know about necessary updates or corrections.

Can Discussions about Girls’ Education Improve Academic Outcomes? Evidence from a Randomized Development Project

Randomization was at the school level, which satisfies the class-level RCT requirement.

Outcomes were measured using EGRA and EGMA, which are standardized assessment systems.

Midline outcomes were measured about 1.5 years after implementation began, exceeding one academic term.

The control group is described with sample sizes and baseline characteristics, including a baseline balance table.

Treatment was assigned at the school level, meeting the school-level RCT requirement.

Data collection involved an external firm and the authors analyze secondary data rather than directly implementing the intervention.

Midline outcomes were measured about 1.5 years after implementation began, exceeding one academic year.

The intervention adds facilitated discussions, and those added resources are the treatment being tested rather than an uncontrolled add-on.

No independent reproduction of this specific study was identified in the paper, and none could be verified from accessible sources.

The study measures mathematics and reading only, not all core subjects.

Participants were followed through endline in 2016, which is not tracking the cohort to graduation.

The study reports AEA RCT Registry registration, but the registry entry timing could not be verified from accessible sources for this check.

Abstract

ERCT Criteria Breakdown

Level 1 Criteria

Class-level RCT

Exam-based Assessment

Term Duration

Documented Control Group

Level 2 Criteria

School-level RCT

Independent Conduct

Year Duration

Balanced Control Group

Level 3 Criteria

Reproduced

All-subject Exams

Graduation Tracking

Pre-Registered

Request an Update or Contact Us

Have Questions or Suggestions?

Submit a Study for Evaluation

Suggest Improvements

Update Your Study

Have Questions
or Suggestions?