Abstract
Using embedded paraprofessionals to provide personalized instruction is a promising model for differentiating instruction within the classroom. This study examines two randomized controlled trials of paraprofessional-led tutoring in early-grade math and literacy. However, intent-to-treat (ITT) analyses revealed no overall achievement impacts for either program. We then explore two mechanisms that have surfaced in the tutoring literature as central efficacy moderators—dosage and tailoring—as plausible explanations to these results. While dosage was low for both programs, we estimate significant benefits from treatment assignment at higher-dosage campuses in numeracy (i.e., up to 0.28 SD at 80% progression) but no effect at any level of observed dosage on literacy. Curricular analysis revealed the literacy program's rigid structure may have impeded adaptation to student proficiency while student skill did not predict differences in numeracy program impacts. Supplemented by tutor survey data, these findings suggest that successful implementation of para-tutoring may depend on role prioritization, instructional coordination, and the use of student data to provide responsive instruction.
Full
Article
ERCT Criteria Breakdown
-
Level 1 Criteria
-
C
Class-level RCT
- Student-level randomization is acceptable here because the intervention is tutoring.
- Within each classroom and baseline reading-level stratum, researchers randomly assigned 103 students to treatment and 182 students to the control.
Relevant Quotes:
1) "This study examines two randomized controlled trials of paraprofessional-led tutoring in early-grade math and literacy." (p. 1)
2) "Within each classroom and baseline reading-level stratum, researchers randomly assigned 103 students to treatment and 182 students to the control." (p. 7)
3) "Within each classroom, researchers randomly assigned 384 students to the treatment group and 849 students to the control group." (p. 7)
Detailed Analysis:
Criterion C requires randomization at the class level to avoid contamination, but the ERCT standard explicitly allows student-level randomization when the intervention is personal tutoring.
This paper evaluates paraprofessional-led tutoring. The randomization procedures show eligible students were randomized within classrooms for both the literacy and numeracy studies. Because the treatment is tutoring, the class-level unit is not required under the tutoring exception.
Criterion C is met because the intervention is tutoring, so student-level randomization within classrooms is allowed under the ERCT exception.
-
E
Exam-based Assessment
- Outcomes were measured with standardized assessments (DIBELS-8 and i-Ready).
- We preregistered end-of-year test scores as the main outcome for each study. For literacy this is a Dynamic Indicators of Basic Early Literacy Skills (DIBELS-8) summative score, and for numeracy it is an i-Ready Math Diagnostic (i-Ready) summative score.
Relevant Quotes:
1) "We preregistered end-of-year test scores as the main outcome for each study. For literacy this is a Dynamic Indicators of Basic Early Literacy Skills (DIBELS-8) summative score, and for numeracy it is an i-Ready Math Diagnostic (i-Ready) summative score." (p. 6)
Detailed Analysis:
Criterion E requires outcomes to be measured with standardized, widely used exam-based assessments rather than researcher-created tests.
The paper specifies DIBELS-8 for literacy and the i-Ready Math Diagnostic for numeracy as the main end-of-year outcomes. Both are established standardized assessments used in U.S. schools.
Criterion E is met because the primary outcomes are standardized assessments (DIBELS-8 and i-Ready), not bespoke researcher-made tests.
-
T
Term Duration
- Outcomes were measured at end of year, well more than one term after start.
- We preregistered end-of-year test scores as the main outcome for each study. For literacy this is a Dynamic Indicators of Basic Early Literacy Skills (DIBELS-8) summative score, and for numeracy it is an i-Ready Math Diagnostic (i-Ready) summative score.
Relevant Quotes:
1) "Due to the pilot nature of this program, implementation did not begin until November, three months into the school year." (p. 5)
2) "We preregistered end-of-year test scores as the main outcome for each study. For literacy this is a Dynamic Indicators of Basic Early Literacy Skills (DIBELS-8) summative score, and for numeracy it is an i-Ready Math Diagnostic (i-Ready) summative score." (p. 6)
Detailed Analysis:
Criterion T requires that the primary outcome is measured at least one full academic term after the intervention begins (roughly 3-4 months).
For the literacy program, implementation began in November (three months into the school year). The primary outcome is defined as end-of-year test scores, which necessarily occur at the end of the academic year. This implies a gap of multiple months between start and outcome measurement, exceeding one term.
The numeracy program is described as beginning in fall 2023 with the same end-of-year outcome structure, which also implies at least one term of tracking from the start of implementation to the primary outcome.
Criterion T is met because outcomes are end-of-year measures taken many months after implementation begins (well beyond a single term).
-
D
Documented Control Group
- The control group is described as business-as-usual and baseline characteristics are reported.
- Control students continued to receive all supportive services and interventions they usually would during their “business-as-usual” (BaU) course of schooling.
Relevant Quotes:
1) "Control students continued to receive all supportive services and interventions they usually would during their “business-as-usual” (BaU) course of schooling." (p. 7)
2) "Female 0.500 0.475 0.0329 0.498 0.500 0.0113" (p. 25)
3) "Black 0.683 0.646 0.0157 0.618 0.590 -0.0205" (p. 25)
Detailed Analysis:
Criterion D requires a clearly described control condition plus sufficient baseline documentation (for example demographics and baseline scores) to support comparability.
The paper explicitly defines the control condition as business-as-usual. It also provides baseline characteristic tables by treatment assignment (Table A2), reporting demographic composition (for example gender and race) and other baseline indicators for treatment and control groups.
Criterion D is met because the control condition is described as business-as-usual and the paper reports baseline characteristics for treatment and control groups.
-
Level 2 Criteria
-
S
School-level RCT
- Randomization occurred within classrooms rather than at the school level.
- The experimental design for both studies stratified randomization within classrooms to assign eligible-students into either the treatment (i.e., paraprofessional-led tutoring) or control condition.
Relevant Quotes:
1) "The experimental design for both studies stratified randomization within classrooms to assign eligible-students into either the treatment (i.e., paraprofessional-led tutoring) or control condition." (p. 7)
2) "Within each classroom, researchers randomly assigned 384 students to the treatment group and 849 students to the control group." (p. 7)
Detailed Analysis:
Criterion S requires school-level randomization (whole schools assigned to treatment or control).
The paper states that randomization was stratified within classrooms and gives examples of within-classroom student assignment. This is not school-level randomization.
Criterion S is not met because assignment to treatment and control occurred within classrooms rather than at the school level.
-
I
Independent Conduct
- The paper evaluates externally developed, district-implemented programs rather than a researcher-designed intervention.
- Both interventions directed paraprofessionals' time towards receiving training and personalizing instruction to below-grade level students using externally-developed, highly structured programs.
Relevant Quotes:
1) "Both interventions directed paraprofessionals' time towards receiving training and personalizing instruction to below-grade level students using externally-developed, highly structured programs." (p. 5)
2) "The para-tutors received weekly synchronous coaching from the program developer, using tutoring session recordings and dashboards to monitor student progress." (p. 5)
Detailed Analysis:
Criterion I requires that the evaluation is conducted independently from the intervention designers.
The interventions are described as using externally developed programs, and the literacy program includes coaching from a program developer, indicating an implementation/design entity that is separate from the paper's research description. The paper positions the authors as evaluating district-implemented programs rather than as the intervention developers.
The paper does not provide a formal independence statement (for example, an explicit declaration of no role in program design), but the described separation between district/program developer and the evaluation supports treating the evaluation as independent under the ERCT criterion.
Criterion I is met because the interventions are externally developed and the paper presents an evaluation of district-implemented programs rather than a designer-led evaluation.
-
Y
Year Duration
- Implementation began in November, so the study does not span a full academic year from start of year.
- Due to the pilot nature of this program, implementation did not begin until November, three months into the school year.
Relevant Quotes:
1) "Due to the pilot nature of this program, implementation did not begin until November, three months into the school year." (p. 5)
Detailed Analysis:
Criterion Y requires that outcomes are measured at least one full academic year after the intervention begins.
The literacy program did not begin until November, which the paper describes as three months into the school year. This indicates the intervention was not implemented for a full academic-year cycle from the beginning of the year.
The numeracy study is also described as a school-year intervention with fall start and end-of-year outcomes, but the paper does not document a full-year implementation window from the start of the academic year through the end.
Criterion Y is not met because implementation began in November, so the study does not document a full academic year of tracking from the start of the school year.
-
B
Balanced Control Group
- Extra tutoring time is the treatment being tested, so a business-as-usual control is acceptable.
- Control students continued to receive all supportive services and interventions they usually would during their “business-as-usual” (BaU) course of schooling.
Relevant Quotes:
1) "students were expected to receive 15-minute in-class, one-on-one early literacy sessions daily." (p. 5)
2) "paraprofessional instruction was to occur in 20-minute increments to small groups three times a week." (p. 6)
3) "Control students continued to receive all supportive services and interventions they usually would during their “business-as-usual” (BaU) course of schooling." (p. 7)
Detailed Analysis:
Criterion B asks whether the intervention and control conditions are balanced in time and resources, unless the additional resources are explicitly the treatment variable.
Both interventions add instructional resources: daily one-on-one tutoring in literacy and small-group tutoring sessions in numeracy. The control condition is business-as-usual with no matched additional instructional time.
Here, the additional tutoring time and staffing model are the core components being evaluated (two para-tutoring programs). Under the ERCT exception, when extra instructional time is integral to the treatment being tested, a business-as-usual control is acceptable as long as the study intent is to test the effects of providing that additional tutoring resource.
Criterion B is met because the added tutoring time is integral to the treatment being tested, so a business-as-usual control is acceptable under the ERCT rule.
-
Level 3 Criteria
-
R
Reproduced
- No independent replication of this paper's para-tutoring implementations and findings was found.
- In fall 2023, the district introduced a para-led numeracy intervention adapted from an established model and curriculum. Small-scale pilot evaluations of the program had found positive impacts in other districts (Clarke et al., 2016, 2020)
Relevant Quotes:
1) "In fall 2023, the district introduced a para-led numeracy intervention adapted from an established model and curriculum. Small-scale pilot evaluations of the program had found positive impacts in other districts (Clarke et al., 2016, 2020)" (p. 6)
2) "A total of 29 classrooms were randomly assigned to treatment (ROOTS) or control (standard district practices) conditions." (Clarke et al., 2016, PDF p. 2)
3) "The purpose of this study was to conduct a replication study of a kindergarten mathematics intervention, ROOTS, delivered within the context of a research base core program." (Clarke et al., 2022, PDF p. 2)
Detailed Analysis:
Criterion R requires an independent replication of the study's intervention and findings in a different context by a different research team, ideally in peer-reviewed outlets.
The numeracy intervention in this paper is described as adapted from an established program with prior evaluations in other districts. External peer-reviewed studies of the ROOTS kindergarten math intervention exist, including an RCT with positive effects and later replication work that found null effects in a different context.
However, those studies evaluate ROOTS as a math intervention, not the specific delivery model studied here (embedded paraprofessionals in a large urban district with implementation changes such as reduced coaching), and they do not replicate the combined two-intervention (literacy plus numeracy) design of this paper. The literacy program is explicitly described as a first-ever implementation, and no independent replication of this paper's results was located.
Criterion R is not met because no independent study was found that replicates this paper's para-tutoring implementations and findings.
-
A
All-subject Exams
- Only literacy and math outcomes are measured, not all core subjects.
- We preregistered end-of-year test scores as the main outcome for each study. For literacy this is a Dynamic Indicators of Basic Early Literacy Skills (DIBELS-8) summative score, and for numeracy it is an i-Ready Math Diagnostic (i-Ready) summative score.
Relevant Quotes:
1) "We preregistered end-of-year test scores as the main outcome for each study. For literacy this is a Dynamic Indicators of Basic Early Literacy Skills (DIBELS-8) summative score, and for numeracy it is an i-Ready Math Diagnostic (i-Ready) summative score." (p. 6)
Detailed Analysis:
Criterion A requires assessing outcomes across all main core subjects using standardized exams (and it cannot be met if criterion E is not met).
The paper defines the preregistered primary outcomes as a literacy assessment (DIBELS-8) and a math assessment (i-Ready). It does not describe standardized end-of-year outcomes in other core subjects (for example science or social studies), nor does it report cross-subject spillovers for each cohort.
Criterion A is not met because the study reports only literacy and math outcomes rather than standardized exams across all main subjects.
-
G
Graduation Tracking
- The study reports end-of-year outcomes only and does not track students through graduation.
Relevant Quotes:
1) "We preregistered end-of-year test scores as the main outcome for each study. For literacy this is a Dynamic Indicators of Basic Early Literacy Skills (DIBELS-8) summative score, and for numeracy it is an i-Ready Math Diagnostic (i-Ready) summative score." (p. 6)
Detailed Analysis:
Criterion G requires tracking participants until graduation from the relevant educational stage. Under the ERCT rules, if criterion Y (Year Duration) is not met, criterion G cannot be met either.
The paper defines its primary outcomes as end-of-year test scores and does not describe any multi-year follow-up through later grade completion or graduation. Given that the intervention does not meet the ERCT Year Duration requirement, graduation tracking is automatically not satisfied for this study.
A web search for follow-up publications by the same authors reporting longer-run tracking for these cohorts did not identify any graduation- tracking results associated with this study.
Criterion G is not met because the study measures only end-of-year outcomes and does not track students through graduation (and Y is not met).
-
P
Pre-Registered
- The paper claims preregistration but provides no registry link, ID, or date that can be verified.
- We preregistered end-of-year test scores as the main outcome for each study.
Relevant Quotes:
1) "We preregistered end-of-year test scores as the main outcome for each study." (p. 6)
Detailed Analysis:
Criterion P requires a publicly verifiable pre-registered protocol with an identifier/link and a registration date that precedes data collection.
The paper states that outcomes were preregistered but does not provide a registry name, URL, registration identifier, or registration date.
A targeted web search did not locate a public preregistration entry for this study that can be verified against the study timeline (for example on AEA RCT Registry or OSF Registries). Without a verifiable record and timing, the criterion cannot be marked as met.
Criterion P is not met because no verifiable preregistration record (link, ID, and date prior to study start) is provided or could be confirmed.
Request an Update or Contact Us
Are you the author of this study? Let us know if you have any questions or updates.