Investigating the additive effects of opportunities to spell words on word reading for students with significant reading difficulties

Nathan H. Clemens; Sharon Vaughn; Alexis N. Boucher; Marcia A. Barnes; Gregory Roberts; Megan Osbon; Anna-Mari Fall; J. E. Miller; Nancy Scammacca

Published: Feb 9, 2026

ERCT Check Date: Apr 14, 2026

DOI: 10.1007/s11145-025-10729-6

Link

Download PDF

reading
language arts
K12
US

C

Students were randomized to tutoring-style small groups and then those groups were randomized to conditions, which satisfies the ERCT tutoring/personal-teaching exception to class-level randomization.

"Within grade levels, students were randomly assigned to intervention groups of 2–4 students based on the times provided by the school for intervention and the availability of tutors."
E

The outcomes include standardized, commercially published reading assessments (TOWRE-2, TOSWRF-2, TOSREC, DIBELS ORF), not only researcher-developed tests.

"The SWE subtest from the TOWRE-2 is test of word reading efficiency."
T

The intervention lasted 10 weeks and outcomes were measured at posttest; this does not reach a full academic term and no term-later follow-up is reported.

"Students in both conditions received 10 weeks of word reading intervention."
D

The comparison condition (Decoding) and the treatment condition (Decoding+Spelling) are clearly described, and baseline demographics and baseline equivalence are reported in tables.

"This study compared two intervention conditions on students’ word reading skills, which we referred to as Decoding and Decoding+Spelling."
S

Randomization was not at the school level; students were grouped and then groups were randomized to conditions across multiple schools.

"Within grade levels, students were randomly assigned to intervention groups of 2–4 students..."
I

The paper shows the authors led tutor training and project leads monitored adherence, but it does not document an independent third-party evaluation team.

"Training involved modeling both conditions by the first and second authors..."
Y

The intervention and measurement window was 10 weeks, far short of 75% of an academic year; additionally, ERCT rules imply Y cannot be met when T is not met.

"Students in both conditions received 10 weeks of word reading intervention."
B

The Decoding+Spelling condition intentionally added time, and the paper explicitly frames this added time as integral to the tested contrast (adding spelling as implemented in practice), so the resource imbalance is by design.

"We decided on option b... allow the spelling component to naturally add time to the intervention and evaluate the effects of decoding instruction and practice when spelling was added."
R

No independent replication of this specific RCT was found in the paper, and an internet search did not identify an external replication study as of the ERCT check date.
A

Outcomes focus on literacy-related skills only and do not include standardized assessments across all core school subjects.

"The SWE subtest from the TOWRE-2 is test of word reading efficiency."
G

The study reports only pretest and posttest around a 10-week intervention and provides no evidence of tracking outcomes through graduation; additionally, ERCT rules imply G cannot be met when Y is not met.

"Of the 70 students who met the inclusion criteria, 68 students received pretests, treatment, and posttests..."
P

The paper provides no registry link/identifier or dated statement showing the protocol was pre-registered before data collection.

Abstract

Integrating spelling with word reading instruction is commonly recommended. However, few studies have determined the unique effect of spelling activities on word reading skills for students with reading difficulties. Considering the limited time available for intervention and the fact that several words can be read in the time it takes to spell one, additional research is needed to optimize intervention time. We randomly assigned 68 students with significant word-reading difficulties in Grades 2 through 4 to one of two intervention conditions in which decoding instruction and practice were identical, but in one condition, students also practiced spelling the words (with feedback). While controlling for pretest scores and clustering, group comparisons at post-test indicated statistically significant differences favoring the Decoding+Spelling condition on silent word recognition (g=0.36) and handwriting fluency (g=0.70). Non-statistically significant effects favored the Decoding+Spelling condition on word reading efficiency (g=0.14) and oral reading fluency at students’ grade level (g=0.17). Effect sizes were small to negligible on proximal measures of word reading, spelling, letter combinations targeted in the intervention, and a third-grade passage of oral reading fluency. Non-significant effects favored the decoding-only condition on norm-referenced silent sentence verification fluency (g=− 0.16). Results offer some support for integrating spelling within decoding intervention, but the modest effects suggest that the time needed to include spelling should be considered carefully.

Full Article

ERCT Criteria Breakdown

Level 1 Criteria
- C
  Class-level RCT
  - Students were randomized to tutoring-style small groups and then those groups were randomized to conditions, which satisfies the ERCT tutoring/personal-teaching exception to class-level randomization.
  - "Within grade levels, students were randomly assigned to intervention groups of 2–4 students based on the times provided by the school for intervention and the availability of tutors."
  - Relevant Quotes: 1) "Within grade levels, students were randomly assigned to intervention groups of 2–4 students based on the times provided by the school for intervention and the availability of tutors." (PDF p. 5) 2) "Groups were then randomly assigned to one of the two conditions." (PDF p. 5) 3) "Tutors were current or former educators with experience implementing reading interventions or working with elementary-aged students." (PDF p. 9) Detailed Analysis: Criterion C requires randomization at the class level (or stronger) to reduce contamination, with an exception for interventions that are tutoring/personal-teaching in nature. This study did not randomize intact classrooms. Instead, it formed "intervention groups of 2–4 students" and then randomized those groups to conditions. The intervention was delivered by "tutors" (current or former educators), which indicates a supplemental, small-group tutoring format rather than whole-class instruction. In this context, the ERCT exception applies: the key issue is avoiding within-class contamination when only some students in the same instructional setting receive the intervention. Here, the intervention is delivered in small groups by tutors, and the randomization procedure is clearly described. Criterion C is met because the study is a tutoring-style small-group RCT with clearly described random assignment.
- E
  Exam-based Assessment
  - The outcomes include standardized, commercially published reading assessments (TOWRE-2, TOSWRF-2, TOSREC, DIBELS ORF), not only researcher-developed tests.
  - "The SWE subtest from the TOWRE-2 is test of word reading efficiency."
  - Relevant Quotes: 1) "The SWE subtest from the TOWRE-2 is test of word reading efficiency." (PDF p. 10) 2) "The Test of Silent Word Reading Fluency, second edition (TOSWRF-2; Mather et al., 2014) is a test of word recognition efficiency..." (PDF p. 10) 3) "The Test of Silent Reading Efficiency and Comprehension (TOSREC; Wagner et al., 2010) is a group-administered test of efficiency with sentence comprehension." (PDF p. 11) 4) "We used Oral Reading Fluency (ORF) passages from the Dynamic Indicators of Basic Early Literacy Skills, eighth edition (University of Oregon, 2018)." (PDF p. 11) Detailed Analysis: Criterion E requires standardized exam-based (i.e., externally developed, widely used) assessments rather than only custom, researcher-created measures aligned to the intervention. The paper explicitly reports multiple commercial standardized measures (TOWRE-2, TOSWRF-2, TOSREC, and DIBELS ORF) and provides reliability/validity context for them. Although the study also includes researcher-developed proximal measures, the presence and use of these standardized commercial outcomes satisfies the ERCT requirement. Criterion E is met because the study uses widely recognized, standardized commercial reading assessments as outcomes.
- T
  Term Duration
  - The intervention lasted 10 weeks and outcomes were measured at posttest; this does not reach a full academic term and no term-later follow-up is reported.
  - "Students in both conditions received 10 weeks of word reading intervention."
  - Relevant Quotes: 1) "Students in both conditions received 10 weeks of word reading intervention." (PDF p. 6) 2) "Of the 70 students who met the inclusion criteria, 68 students received pretests, treatment, and posttests..." (PDF p. 5) 3) "Several limitations should be acknowledged. The intervention involved a small amount of time each day (≈ 15 min) across 10 weeks." (PDF p. 19) Detailed Analysis: Criterion T requires that outcomes be measured at least one full academic term after the intervention begins (typically about 3–4 months), or that a short intervention still includes term-long follow-up tracking. The paper states the intervention duration was "10 weeks" and that participants received "pretests, treatment, and posttests," which indicates the main outcome point is an immediate posttest at the end of the 10-week period. Ten weeks is typically shorter than one academic term, and the paper does not describe an additional outcome measurement occurring a term after the intervention start. Criterion T is not met because outcomes are measured at posttest after a 10-week window and no term-later follow-up is reported.
- D
  Documented Control Group
  - The comparison condition (Decoding) and the treatment condition (Decoding+Spelling) are clearly described, and baseline demographics and baseline equivalence are reported in tables.
  - "This study compared two intervention conditions on students’ word reading skills, which we referred to as Decoding and Decoding+Spelling."
  - Relevant Quotes: 1) "This study compared two intervention conditions on students’ word reading skills, which we referred to as Decoding and Decoding+Spelling." (PDF p. 6) 2) "Instruction and practice in both were identical with one exception—students in the Decoding+Spelling condition engaged in a spelling activity in which they spelled all words included in instruction and practice." (PDF p. 6) 3) "Participant demographic characteristics of the final sample are reported in Table 1." (PDF p. 6) 4) "Descriptive statistics are provided in Table 2. Results of the baseline equivalence analyses are reported in Table 3." (PDF p. 13) Detailed Analysis: Criterion D requires that the control/comparison condition be documented well enough to understand what it received and to assess baseline comparability. The paper provides a clear, parallel description of the two conditions and specifies the single intended difference (the presence/absence of spelling practice during the lesson sequence). It also reports where to find baseline demographic information and baseline equivalence/statistics (Tables 1–3), which supports assessing comparability. Criterion D is met because the control/comparison condition and baseline comparability information are clearly documented.
Level 2 Criteria
- S
  School-level RCT
  - Randomization was not at the school level; students were grouped and then groups were randomized to conditions across multiple schools.
  - "Within grade levels, students were randomly assigned to intervention groups of 2–4 students..."
  - Relevant Quotes: 1) "Participants were recruited from five schools and three school districts in the Southwestern United States." (PDF p. 5) 2) "Within grade levels, students were randomly assigned to intervention groups of 2–4 students..." (PDF p. 5) 3) "Groups were then randomly assigned to one of the two conditions." (PDF p. 5) 4) "Given the relatively small number of schools in the sample (n=5), schools were treated as fixed effects rather than random effects..." (PDF p. 13) Detailed Analysis: Criterion S requires that whole schools (the implementing units) are randomized to conditions. This study recruited students from multiple schools, but the randomized unit was the intervention group (small groups of students), not the school. The analysis further supports this by noting schools were handled as fixed effects rather than being the randomized unit. Criterion S is not met because schools were not randomized to conditions.
- I
  Independent Conduct
  - The paper shows the authors led tutor training and project leads monitored adherence, but it does not document an independent third-party evaluation team.
  - "Training involved modeling both conditions by the first and second authors..."
  - Relevant Quotes: 1) "Training involved modeling both conditions by the first and second authors while tutors observed, followed by guided practice, partner practice, and independent rehearsal of both conditions." (PDF p. 9) 2) "Tutors then delivered the intervention to project leads, who evaluated their adherence to the lesson protocols." (PDF p. 9) 3) "Forms for all measures were double scored by two independent scorers." (PDF p. 13) Detailed Analysis: Criterion I requires clear documentation that the evaluation was conducted independently from the intervention designers (e.g., external evaluators responsible for implementation oversight, data collection, and/or analysis). The paper indicates that the "first and second authors" modeled intervention procedures during training and that "project leads" evaluated adherence. This describes internal study-team oversight, not independent conduct. While measures were double scored by "two independent scorers," that is a measurement-quality step and does not establish that the overall trial was independently conducted (e.g., external evaluation organization, independent analysis team, or independence-from-designer statement). Criterion I is not met because independent third-party conduct of the trial/evaluation is not clearly documented.
- Y
  Year Duration
  - The intervention and measurement window was 10 weeks, far short of 75% of an academic year; additionally, ERCT rules imply Y cannot be met when T is not met.
  - "Students in both conditions received 10 weeks of word reading intervention."
  - Relevant Quotes: 1) "Students in both conditions received 10 weeks of word reading intervention." (PDF p. 6) 2) "Several limitations should be acknowledged. The intervention involved a small amount of time each day (≈ 15 min) across 10 weeks." (PDF p. 19) Detailed Analysis: Criterion Y requires that outcomes be measured at least 75% of an academic year after the intervention begins. The paper clearly indicates a 10-week intervention with pretest and posttest around that window, which is far shorter than a school year. In addition, per the ERCT dependency rule provided with this task, if Criterion T is not met then Criterion Y is not met; Criterion T is not met for this study. Criterion Y is not met because the study duration is only 10 weeks (and Criterion T is not met).
- B
  Balanced Control Group
  - The Decoding+Spelling condition intentionally added time, and the paper explicitly frames this added time as integral to the tested contrast (adding spelling as implemented in practice), so the resource imbalance is by design.
  - "We decided on option b... allow the spelling component to naturally add time to the intervention and evaluate the effects of decoding instruction and practice when spelling was added."
  - Relevant Quotes: 1) "When designing the study we recognized that we had two options regarding how to treat the addition of the spelling component in terms of time: (a) hold intervention time constant between conditions and therefore intentionally add additional word reading practice to account for the time used for spelling in the Decoding+Spelling condition, or (b) allow the spelling component to naturally add time to the intervention and evaluate the effects of decoding instruction and practice when spelling was added." (PDF p. 10) 2) "We decided on option b." (PDF p. 10) 3) "In the Decoding condition, across 68 observations, the mean length per intervention session was 14.23 min (SD=5.54). In the Decoding+Spelling condition, across 62 observations, mean length per session was 18.70 min (SD=4.84)." (PDF p. 10) 4) "The mean time for the spelling component was 6.24 min (SD=1.45)." (PDF p. 10) Detailed Analysis: Criterion B compares the nature, quantity, and quality of resources (time, adult support, materials) between intervention and control, unless the study explicitly frames additional resources/time as integral to the treatment contrast being tested. Here, additional instructional time is clearly present in the Decoding+Spelling condition (about 4.5 minutes longer per session on average). The paper explicitly states that the researchers considered holding time constant but chose to allow spelling to "naturally add time" so they could evaluate the practical value (including time costs) of adding spelling in real intervention contexts. Under the ERCT Criterion B decision logic, extra resources are present and not negligible, but they are explicitly integral to the treatment definition being evaluated (adding spelling as it is implemented, with associated time demands). Therefore, the control is not required to match time/budget for Criterion B to be met in this specific design intent. Criterion B is met because the added time/resources are an intentional, integral part of the treatment contrast being tested.
Level 3 Criteria
- R
  Reproduced
  - No independent replication of this specific RCT was found in the paper, and an internet search did not identify an external replication study as of the ERCT check date.
  - Relevant Quotes: (No quotes in the paper claim an independent replication of this specific experiment.) Detailed Analysis: Criterion R requires an independent replication by a different research team in a different context, published in a peer-reviewed outlet. The paper does not claim that this specific two-condition study (Decoding vs. Decoding+Spelling, Grades 2–4, 10 weeks, small-group tutoring format) has been replicated by an independent team. I also conducted an internet search (on 2026-04-14) for peer-reviewed replication studies that explicitly reproduce this specific experiment; no such independent replication was identified. Criterion R is not met because independent reproduction of this specific study was not found.
- A
  All-subject Exams
  - Outcomes focus on literacy-related skills only and do not include standardized assessments across all core school subjects.
  - "The SWE subtest from the TOWRE-2 is test of word reading efficiency."
  - Relevant Quotes: 1) "The SWE subtest from the TOWRE-2 is test of word reading efficiency." (PDF p. 10) 2) "The Test of Silent Word Reading Fluency, second edition (TOSWRF-2; Mather et al., 2014) is a test of word recognition efficiency..." (PDF p. 10) 3) "The Test of Silent Reading Efficiency and Comprehension (TOSREC; Wagner et al., 2010) is a group-administered test of efficiency with sentence comprehension." (PDF p. 11) 4) "We used Oral Reading Fluency (ORF) passages..." (PDF p. 11) Detailed Analysis: Criterion A requires standardized exam-based assessment across all main subjects taught at the relevant level (to detect tradeoffs), and it depends on Criterion E being met (Criterion E is met here). The paper’s reported standardized outcomes are all literacy-based (word reading efficiency, silent word reading fluency, silent reading efficiency/comprehension, oral reading fluency) plus spelling/handwriting-related measures. It does not report standardized outcomes in other core subjects (e.g., mathematics, science, social studies). Criterion A is not met because the study does not assess all core subjects using standardized exams.
- G
  Graduation Tracking
  - The study reports only pretest and posttest around a 10-week intervention and provides no evidence of tracking outcomes through graduation; additionally, ERCT rules imply G cannot be met when Y is not met.
  - "Of the 70 students who met the inclusion criteria, 68 students received pretests, treatment, and posttests..."
  - Relevant Quotes: 1) "Of the 70 students who met the inclusion criteria, 68 students received pretests, treatment, and posttests..." (PDF p. 5) 2) "Students in both conditions received 10 weeks of word reading intervention." (PDF p. 6) Detailed Analysis: Criterion G requires tracking participants until graduation (of the relevant educational stage) and may be supported by follow-up publications by the same authors. This paper reports a short-term design (pretest and posttest around a 10-week intervention) and does not describe long-term follow-up or graduation-related tracking. I also searched for follow-up papers by the same author team that track this cohort to graduation; none were found. In any case, per the ERCT dependency rule provided with this task, if Criterion Y is not met then Criterion G is not met; Criterion Y is not met for this study. Criterion G is not met because graduation tracking is not reported (and Criterion Y is not met).
- P
  Pre-Registered
  - The paper provides no registry link/identifier or dated statement showing the protocol was pre-registered before data collection.
  - Relevant Quotes: (No relevant quotes about pre-registration were found in the paper.) Detailed Analysis: Criterion P requires explicit evidence that the study protocol was publicly pre-registered before data collection began (typically a registry name plus an ID/link and a registration date). I searched the paper for any mention of pre-registration (e.g., "pre-registered," "registered report," OSF, or a registry ID) and found none. I also conducted an internet search for a registry entry tied to this paper (by title/DOI/authors) and did not locate a verifiable pre-registration record with a date preceding study start. Criterion P is not met because there is no quoted evidence of pre-registration and no verifiable registry entry was found.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Request Valuation Update

All Other Requests

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

Submit a Study for Evaluation

Share your research with us for review
Suggest Improvements

Provide feedback to help us make things better.
Update Your Study

If you're the author, let us know about necessary updates or corrections.

Investigating the additive effects of opportunities to spell words on word reading for students with significant reading difficulties

Students were randomized to tutoring-style small groups and then those groups were randomized to conditions, which satisfies the ERCT tutoring/personal-teaching exception to class-level randomization.

The outcomes include standardized, commercially published reading assessments (TOWRE-2, TOSWRF-2, TOSREC, DIBELS ORF), not only researcher-developed tests.

The intervention lasted 10 weeks and outcomes were measured at posttest; this does not reach a full academic term and no term-later follow-up is reported.

The comparison condition (Decoding) and the treatment condition (Decoding+Spelling) are clearly described, and baseline demographics and baseline equivalence are reported in tables.

Randomization was not at the school level; students were grouped and then groups were randomized to conditions across multiple schools.

The paper shows the authors led tutor training and project leads monitored adherence, but it does not document an independent third-party evaluation team.

The intervention and measurement window was 10 weeks, far short of 75% of an academic year; additionally, ERCT rules imply Y cannot be met when T is not met.

The Decoding+Spelling condition intentionally added time, and the paper explicitly frames this added time as integral to the tested contrast (adding spelling as implemented in practice), so the resource imbalance is by design.

No independent replication of this specific RCT was found in the paper, and an internet search did not identify an external replication study as of the ERCT check date.

Outcomes focus on literacy-related skills only and do not include standardized assessments across all core school subjects.

The study reports only pretest and posttest around a 10-week intervention and provides no evidence of tracking outcomes through graduation; additionally, ERCT rules imply G cannot be met when Y is not met.

The paper provides no registry link/identifier or dated statement showing the protocol was pre-registered before data collection.

Abstract

ERCT Criteria Breakdown

Level 1 Criteria

Class-level RCT

Exam-based Assessment

Term Duration

Documented Control Group

Level 2 Criteria

School-level RCT

Independent Conduct

Year Duration

Balanced Control Group

Level 3 Criteria

Reproduced

All-subject Exams

Graduation Tracking

Pre-Registered

Request an Update or Contact Us

Have Questions or Suggestions?

Submit a Study for Evaluation

Suggest Improvements

Update Your Study

Have Questions
or Suggestions?