Can teaching be taught? Improving teachers' pedagogical skills at scale in rural Peru

Juan F. Castro, Paul Glewwe, Alexandra Heredia-Mayo, Stephanie Majerowicz, Ricardo Montero

Published:
ERCT Check Date:
DOI: 10.3982/QE2079
  • mathematics
  • reading
  • K12
  • Latam
2
  • C

    Randomization was conducted at the school level, satisfying the class-level requirement.

    "We evaluate this teacher coaching program, exploiting random assignment of that program's expansion to 3797 rural schools in 2016."

  • E

    The study uses the ECE, Peru's national standardized assessment for primary schools.

    "The measure of student learning outcomes is the National Student Evaluation (henceforth, ECE, its Spanish acronym) primary school exam that assesses students' mathematics and reading comprehension skills."

  • T

    Outcomes were measured approximately 9 months after the intervention began, exceeding the one-term requirement.

    "APM schools began the program in February of 2016... We look at effects on students' 2016 and 2018 test scores..."

  • D

    The control group is clearly defined as non-participating schools and their baseline characteristics are extensively documented in Table 2.

    "The other 2421 schools, the control group, which we call non-APM schools, did not participate in any coaching program in 2016, 2017, and 2018."

  • S

    Randomization occurred at the school level across 6,218 schools.

    "Of the 6218 eligible schools... 3797 were randomly assigned to the treatment group... The other 2421 schools, the control group..."

  • I

    The study evaluation was conducted by independent academics, distinct from the Ministry that designed the intervention.

    "Juan F. Castro: Department of Economics, Universidad del Pacifico... Paul Glewwe: Department of Applied Economics, University of Minnesota..."

  • Y

    The study measured outcomes after 1 and 3 years of program implementation.

    "We look at effects on students' 2016 and 2018 test scores, 1 year and 3 years after the program started."

  • B

    The intervention explicitly tests the impact of adding significant resources (coaching), making the resource imbalance integral to the study design.

    "The coaching program is a substantial investment by Peru's government, costing over US$ 130 million per year."

  • R

    The study has not been independently reproduced.

    "To our knowledge, no prior study has evaluated the effects on pedagogy and student learning of a large-scale teacher coaching program in a developing country."

  • A

    Assessments were limited to mathematics and reading, omitting other core subjects like science.

    "The measure of student learning outcomes is the National Student Evaluation (henceforth, ECE, its Spanish acronym) primary school exam that assesses students' mathematics and reading comprehension skills."

  • G

    Tracking ended at Grade 4, prior to primary school graduation.

    "We look at effects on students' 2016 and 2018 test scores... This implies that, for our cohort of students, we have test score data at the student level first in 2016 when they were in second grade, and again in 2018 in fourth grade."

  • P

    No pre-registration of the study protocol is mentioned.

Abstract

We evaluate the impact of a large-scale teacher coaching program in Peru, a context with high teacher turnover, on teachers' pedagogical skills and student learning. Previous studies find that small-scale coaching programs can improve teaching of reading and science in developing countries. However, scaling up can reduce programs' effectiveness, and teacher turnover can erode compliance and cause spillovers onto non-program schools. We develop a framework that defines different treatment effects when teacher turnover is present, and explains which effects can be estimated. We evaluate this teacher coaching program, exploiting random assignment of that program's expansion to 3797 rural schools in 2016. After two years, teachers assigned to the program increased their aggregate pedagogical skills by 0.20 standard deviations. The program also increased student learning; after 1 year, Grade 2 students' mathematics and reading scores increased by 0.106 and 0.075 standard deviations (of the distributions of those test scores), respectively. After three years, the cumulative effect increases slightly, to 0.114 and 0.100, respectively. One reason why these impacts are low is that some uncoached teachers moved into treated schools in years 2 and 3. Following our framework, we estimate that the impacts on students of having a "fully" coached teacher for all three years are 0.18 and 0.16 standard deviations for mathematics and reading comprehension, respectively.

Full Article

ERCT Criteria Breakdown

  • Level 1 Criteria

    • C

      Class-level RCT

      • Randomization was conducted at the school level, satisfying the class-level requirement.
      • "We evaluate this teacher coaching program, exploiting random assignment of that program's expansion to 3797 rural schools in 2016."
      • Relevant Quotes: 1) "We evaluate this teacher coaching program, exploiting random assignment of that program's expansion to 3797 rural schools in 2016." (p. 1) 2) "Of the 6218 eligible schools... 3797 were randomly assigned to the treatment group... The other 2421 schools... control group..." (p. 9) Detailed Analysis: The study explicitly states that the unit of randomization was the school. According to the ERCT standard, a school-level RCT is considered stronger than a class-level RCT and automatically satisfies this criterion. Final sentence explaining if criterion C is met: The criterion is met because the randomization was conducted at the school level, which satisfies the requirement for class-level (or stronger) randomization.
    • E

      Exam-based Assessment

      • The study uses the ECE, Peru's national standardized assessment for primary schools.
      • "The measure of student learning outcomes is the National Student Evaluation (henceforth, ECE, its Spanish acronym) primary school exam that assesses students' mathematics and reading comprehension skills."
      • Relevant Quotes: 1) "The measure of student learning outcomes is the National Student Evaluation (henceforth, ECE, its Spanish acronym) primary school exam that assesses students' mathematics and reading comprehension skills." (p. 11) 2) "It has been implemented annually since 2007 and is comparable across years. All schools with five or more students in the tested grade take the exam..." (p. 11) Detailed Analysis: The study uses the National Student Evaluation (ECE), which is the official standardized national exam in Peru. It is widely recognized, implemented annually, and comparable across years, fitting the definition of a standard exam-based assessment rather than a custom researcher-designed test. Final sentence explaining if criterion E is met: The criterion is met because the study utilizes the National Student Evaluation (ECE), a recognized standardized national exam, to measure student outcomes.
    • T

      Term Duration

      • Outcomes were measured approximately 9 months after the intervention began, exceeding the one-term requirement.
      • "APM schools began the program in February of 2016... We look at effects on students' 2016 and 2018 test scores..."
      • Relevant Quotes: 1) "APM schools began the program in February of 2016... We look at effects on students' 2016 and 2018 test scores..." (p. 11) 2) "The school year begins in March, and the standardized tests are taken in November." (p. 11) Detailed Analysis: The intervention started in February 2016, and the first outcomes (test scores) were measured in November 2016. This duration is approximately 9 months, covering a full academic year, which significantly exceeds the one-term (3-4 months) minimum requirement. Final sentence explaining if criterion T is met: The criterion is met because the interval between the intervention start and the first measurement is approximately 9 months, exceeding the one-term requirement.
    • D

      Documented Control Group

      • The control group is clearly defined as non-participating schools and their baseline characteristics are extensively documented in Table 2.
      • "The other 2421 schools, the control group, which we call non-APM schools, did not participate in any coaching program in 2016, 2017, and 2018."
      • Relevant Quotes: 1) "The other 2421 schools, the control group, which we call non-APM schools, did not participate in any coaching program in 2016, 2017, and 2018." (p. 9) 2) "Table 2. Balance table for experimental sample... Control Mean/(SD)... Math score... Poverty rates... % teachers with degree..." (p. 27-28) Detailed Analysis: The paper explicitly identifies the control group ("non-APM schools") and confirms they received no coaching. Furthermore, Table 2 provides detailed documentation of the control group's baseline characteristics, including sample size, mean test scores, demographics, and infrastructure metrics, facilitating comparison. Final sentence explaining if criterion D is met: The criterion is met because the study provides a detailed description of the control group's baseline characteristics and explicitly states they did not participate in the program.
  • Level 2 Criteria

    • S

      School-level RCT

      • Randomization occurred at the school level across 6,218 schools.
      • "Of the 6218 eligible schools... 3797 were randomly assigned to the treatment group... The other 2421 schools, the control group..."
      • Relevant Quotes: 1) "We evaluate this teacher coaching program, exploiting random assignment of that program's expansion to 3797 rural schools in 2016." (p. 1) 2) "Of the 6218 eligible schools... 3797 were randomly assigned to the treatment group... The other 2421 schools... control group..." (p. 9) Detailed Analysis: The study clearly states that the randomization unit was the school (3,797 treated vs 2,421 control). This fulfills the requirement for a school-level RCT, where the intervention is implemented and assigned at the level of the educational institution. Final sentence explaining if criterion S is met: The criterion is met because the random assignment was conducted at the school level involving over 6,000 schools.
    • Y

      Year Duration

      • The study measured outcomes after 1 and 3 years of program implementation.
      • "We look at effects on students' 2016 and 2018 test scores, 1 year and 3 years after the program started."
      • Relevant Quotes: 1) "We look at effects on students' 2016 and 2018 test scores, 1 year and 3 years after the program started." (p. 11) 2) "APM schools began the program in early 2016 and operated it for 3 consecutive years." (p. 11) Detailed Analysis: The intervention was a multi-year program starting in early 2016. Outcomes were measured in late 2016 (1 year) and late 2018 (3 years). This duration meets and exceeds the requirement of one full academic year. Final sentence explaining if criterion Y is met: The criterion is met because the study tracks outcomes over a period of up to three years, satisfying the one-year minimum.
    • B

      Balanced Control Group

      • The intervention explicitly tests the impact of adding significant resources (coaching), making the resource imbalance integral to the study design.
      • "The coaching program is a substantial investment by Peru's government, costing over US$ 130 million per year."
      • Relevant Quotes: 1) "The coaching program is a substantial investment by Peru's government, costing over US$ 130 million per year." (p. 7) 2) "This version of the program alone... cost the government about US$ 40 million in 2016... This implies an annual cost of US$ 228 per student..." (p. 8) 3) "We evaluate the impact of a large-scale teacher coaching program... The other 2421 schools, the control group... did not participate in any coaching program..." (p. 1, p. 9) Detailed Analysis: The intervention explicitly tests the impact of providing additional resources (a costly coaching program involving hired coaches, travel, and training) compared to the status quo. The extra resources *are* the treatment variable being tested ("impact of a large-scale teacher coaching program"). According to the ERCT standard decision tree, if resources are the treatment variable, the control group may remain "business as usual" without violating the balance criterion. Final sentence explaining if criterion B is met: The criterion is met because the study explicitly evaluates the impact of the additional resources provided by the coaching program as the primary treatment variable, justifying the business-as-usual control group.
    • I

      Independent Conduct

      • The study evaluation was conducted by independent academics, distinct from the Ministry that designed the intervention.
      • "Juan F. Castro: Department of Economics, Universidad del Pacifico... Paul Glewwe: Department of Applied Economics, University of Minnesota..."
      • Relevant Quotes: 1) "Juan F. Castro: Department of Economics, Universidad del Pacifico... Paul Glewwe: Department of Applied Economics, University of Minnesota..." (p. 1) 2) "The randomized evaluation was planned by Peru's Ministry of Education. The student assessments and teacher observation instrument used in this study were designed by the Ministry of Education..." (p. 1) 3) "We used anonymized data provided by the Ministry of Education." (p. 1) Detailed Analysis: The authors are academic researchers from various universities (Pacifico, Minnesota, Andes) and are independent of the Ministry of Education, which designed the intervention. Although the Ministry planned the randomization and collected the data (as is common with administrative data), the authors conducted the study (the analysis and evaluation presented in the paper) independently of the intervention designers. This separation satisfies the requirement for independent conduct. Final sentence explaining if criterion I is met: The criterion is met because the authors are independent academic researchers who evaluated the program designed and implemented by the Ministry of Education.
  • Level 3 Criteria

    • A

      All-subject Exams

      • Assessments were limited to mathematics and reading, omitting other core subjects like science.
      • "The measure of student learning outcomes is the National Student Evaluation (henceforth, ECE, its Spanish acronym) primary school exam that assesses students' mathematics and reading comprehension skills."
      • Relevant Quotes: 1) "The measure of student learning outcomes is the National Student Evaluation (henceforth, ECE, its Spanish acronym) primary school exam that assesses students' mathematics and reading comprehension skills." (p. 11) 2) "ECE scores are reported both as levels of subject mastery and as a Rasch score..." (p. 11) Detailed Analysis: The study assesses outcomes using the ECE, which covers Mathematics and Reading Comprehension. While these are core subjects, primary education typically includes other main subjects like Science or Social Studies (often "Ciencia y Ambiente" or "Personal Social" in Peru). The criterion requires measuring impact on *all* main subjects to detect potential negative spillovers. Since only Math and Reading were assessed, this criterion is not met. Final sentence explaining if criterion A is met: The criterion is not met because the study only measured outcomes for mathematics and reading, excluding other potential main subjects like science or social studies.
    • G

      Graduation Tracking

      • Tracking ended at Grade 4, prior to primary school graduation.
      • "We look at effects on students' 2016 and 2018 test scores... This implies that, for our cohort of students, we have test score data at the student level first in 2016 when they were in second grade, and again in 2018 in fourth grade."
      • Relevant Quotes: 1) "We look at effects on students' 2016 and 2018 test scores, 1 year and 3 years after the program started." (p. 11) 2) "Initially, the ECE tested students at the end of the second grade... but, starting in 2018, it was shifted to fourth grade." (p. 11) Detailed Analysis: The study tracks students from Grade 2 (2016) to Grade 4 (2018). Primary education in Peru typically lasts through Grade 6. The tracking stopped at Grade 4, meaning the students were not tracked until graduation from primary school. Final sentence explaining if criterion G is met: The criterion is not met because the study tracked students only until Grade 4, rather than through graduation from primary school (Grade 6).
    • R

      Reproduced

      • The study has not been independently reproduced.
      • "To our knowledge, no prior study has evaluated the effects on pedagogy and student learning of a large-scale teacher coaching program in a developing country."
      • Relevant Quotes: 1) "To our knowledge, no prior study has evaluated the effects on pedagogy and student learning of a large-scale teacher coaching program in a developing country." (p. 5) Detailed Analysis: The authors explicitly state that no prior study has evaluated a program of this scale in this context. There is no mention of an independent team replicating this specific large-scale Peruvian APM evaluation in a peer-reviewed journal. While general coaching literature exists, this specific intervention's results have not been reproduced by an independent team. Final sentence explaining if criterion R is met: The criterion is not met because there is no evidence of an independent replication of this specific large-scale intervention study.
    • P

      Pre-Registered

      • No pre-registration of the study protocol is mentioned.
      • Relevant Quotes: 1) "The randomized evaluation was planned by Peru's Ministry of Education." (p. 1) 2) "The replication package for this paper is at: https://doi.org/10.5281/zenodo.13738582." (p. 1) Detailed Analysis: There is no mention in the paper of a pre-registration ID (e.g., AEA RCT Registry, ClinicalTrials.gov) or a pre-analysis plan filed before data collection began in 2016. While the evaluation was "planned" by the Ministry, the criterion requires a formal, public pre-registration of the protocol. Final sentence explaining if criterion P is met: The criterion is not met because the paper does not cite a public pre-registration of the study protocol or analysis plan established prior to data collection.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

  • Submit a Study for Evaluation

    Share your research with us for review

  • Suggest Improvements

    Provide feedback to help us make things better.

  • Update Your Study

    If you're the author, let us know about necessary updates or corrections.