Lexical inference training for homonyms: Two randomized controlled trials for children with English as a first and an additional language

Sophie A. Booton; Julia M. H. Birchenough; Katie Gilligan-Lee; Fiona Jelley; Victoria A. Murphy

Published: Jan 5, 2026

ERCT Check Date: Feb 22, 2026

DOI: 10.1111/bjep.70056

Link

Download PDF

reading
language arts
K12
UK

C

Randomization is at the child level, but the intervention is delivered in small-group/individual formats with an experimenter, matching the ERCT tutoring-style exception.

"During the intervention, children worked individually (inference condition) or in pairs (reading condition) with the experimenter." (p. 13)
E

The primary outcomes rely on researcher-created homonym measures rather than a widely recognized standardized exam.

"The Homonyms: receptive test was created to assess children's recognition of the meanings of the selected homonyms..." (p. 4)
T

Outcomes were measured about one week after completing a short (~2-week) intervention period, which is far shorter than an academic term.

"Interventions were then conducted in groups of four; each group then completed four 30-min intervention sessions across a two-week period..." (p. 7)
D

The paper documents the control conditions and reports baseline comparisons between conditions, enabling interpretation of control-group comparability.

"Two spatial training conditions served as a control..." (p. 5)
S

Schools were not randomized to conditions; participants were assigned within schools.

"Participants were randomly assigned by the primary investigator to one of two types of training..." (p. 4)
I

The paper does not document an independent, third-party evaluation team; assignment and delivery are described as being done by the study researchers.

"Participants were randomly assigned by the primary investigator..." (p. 4)
Y

Outcomes were measured within weeks rather than at least 75% of an academic year after the intervention began; additionally, because T is not met, Y cannot be met.

"Then, one week later, children completed a posttest battery including the same tests." (p. 7)
B

Study 1 uses a time-matched active control, but Study 2 provides greater per-child adult support in the intervention (individual delivery) than the control (pairs), creating an input imbalance not framed as the treatment variable.

"During the intervention, children worked individually (inference condition) or in pairs (reading condition) with the experimenter." (p. 13)
R

No independent replication by a different research team was identified; the paper reports two trials by the same author team.

"The efficacy of a novel inference intervention for teaching homonyms was demonstrated across two pre-post randomized controlled trials..." (p. 18)
A

Because criterion E is not met, criterion A cannot be met; the study also does not assess all core subjects via standardized exams.
G

The paper reports only short-term post-tests and does not track students to graduation; additionally, because Y is not met, G cannot be met.

"Then, one week later, children completed a posttest battery including the same tests." (p. 7)
P

Study 2 is stated to be pre-registered on OSF, but the paper does not report a registration date that can be verified as occurring before data collection.

"This study's method and analyses were pre-registered on the Open Science Framework..." (p. 9)

Abstract

Background: Many words have multiple meanings, which present challenges to learning, yet research has yet to identify effective interventions for homonyms. Lexical inference may be a promising strategy. Aim: To evaluate a brief, novel lexical inference intervention for homonyms. Samples: Children aged 7–8 years (Study 1: N = 180, Study 2: N = 76). Study 2 included children with English as an Additional Language (EAL, n = 37). Methods: In two randomized controlled trials, participants were assigned to either inference training or control (Study 1: spatial training; Study 2: implicit exposure through reading). Their receptive knowledge of taught and untaught homonyms was measured before and after the intervention, and in Study 2, metacognitive and inference skills too. Results: Those in the inference interventions showed greater gains in receptive knowledge than control groups. In Study 2, children also showed improvement in the inference test with homonyms, and while children with EAL had a specific challenge with receptive knowledge of homonyms compared to their EL1 peers, the intervention was equally effective for both groups. Receptive knowledge and inference with homonyms predicted unique variance in reading comprehension. The intervention showed limited transfer to untaught words, although patterns of errors provided some indication of improved understanding. Conclusions: A brief inference training is effective for gaining knowledge of homonyms, with limited transfer to untaught words, and the intervention is equally effective for children with EAL and EL1. The findings also showed the importance of homonym understanding and inference for children's reading comprehension.

Full Article

ERCT Criteria Breakdown

Level 1 Criteria
- C
  Class-level RCT
  - Randomization is at the child level, but the intervention is delivered in small-group/individual formats with an experimenter, matching the ERCT tutoring-style exception.
  - "During the intervention, children worked individually (inference condition) or in pairs (reading condition) with the experimenter." (p. 13)
  - Relevant Quotes: 1) "Participants were randomly assigned by the primary investigator to one of two types of training: inference training (n=60) or spatial (control) training (n=120)." (p. 4) 2) "Interventions were then conducted in groups of four; each group then completed four 30-min intervention sessions across a two-week period." (p. 7) 3) "During the intervention, children worked individually (inference condition) or in pairs (reading condition) with the experimenter." (p. 13) Detailed Analysis: Criterion C requires class-level (or stronger) randomization, to reduce classroom contamination, but it includes an exception when the intervention is personal teaching/tutoring. The paper describes participant-level assignment (not class- or school-level assignment). However, the delivery format is experimenter- led in small groups (Study 1) and individually/in pairs (Study 2), which is much closer to tutoring/personal teaching than to a classroom-wide instructional change delivered by classroom teachers. Under the ERCT exception, student-level randomization is acceptable for a tutoring-style intervention. Criterion C is met because the intervention is delivered in tutoring-like formats (small-group/individual) where the ERCT exception applies.
- E
  Exam-based Assessment
  - The primary outcomes rely on researcher-created homonym measures rather than a widely recognized standardized exam.
  - "The Homonyms: receptive test was created to assess children's recognition of the meanings of the selected homonyms..." (p. 4)
  - Relevant Quotes: 1) "The Homonyms: receptive test was created to assess children's recognition of the meanings of the selected homonyms based on a previous assessment (Booton, Hodgkiss et al., 2022)." (p. 4) 2) "Children's ability to infer the intended meaning (primary or secondary) of homonyms in context was assessed as a further aspect of word knowledge which could be affected by the intervention." (p. 10) 3) "To examine how homonym knowledge relates to reading comprehension, the York Assessment of Reading for Comprehension (YARC; Snowling et al., 2009) passage reading subtest form A was used." (p. 11) Detailed Analysis: Criterion E requires that the study's outcomes are measured using a standardized, widely recognized exam-based assessment, rather than a custom instrument created for the study. The paper explicitly states that the main receptive homonym outcome test was "created" for this work, and it also uses a researcher-scored homonym inference measure. While Study 2 also includes a standardized reading comprehension test (YARC), the intervention's direct learning outcomes (homonym receptive knowledge and related measures) are not measured by a standardized exam. Criterion E is not met because the primary outcome measures are custom homonym tests rather than standardized exams.
- T
  Term Duration
  - Outcomes were measured about one week after completing a short (~2-week) intervention period, which is far shorter than an academic term.
  - "Interventions were then conducted in groups of four; each group then completed four 30-min intervention sessions across a two-week period..." (p. 7)
  - Relevant Quotes: 1) "A week before the training, children completed a pretest battery including the vocabulary test..." (p. 6) 2) "Interventions were then conducted in groups of four; each group then completed four 30-min intervention sessions across a two-week period." (p. 7) 3) "Then, one week later, children completed a posttest battery including the same tests." (p. 7) 4) "Three sessions of 15–25min were completed over approximately 2weeks." (p. 13) 5) "Then, one week later, children completed the post-test measures in a fixed order in one session." (p. 13) Detailed Analysis: Criterion T requires that outcomes be measured at least one full academic term (roughly 3–4 months) after the intervention begins. In Study 1, children complete pretesting about a week before training, then complete four short sessions across two weeks, and are post-tested one week later. In Study 2, children complete baseline/pretest two weeks before the interventions, then complete three sessions across about two weeks, and are post-tested one week later. Across both studies, the start-to-primary-measurement window is measured in weeks, not in months, and is therefore far shorter than a term. Criterion T is not met because the intervention-to-posttest interval is weeks rather than at least one academic term.
- D
  Documented Control Group
  - The paper documents the control conditions and reports baseline comparisons between conditions, enabling interpretation of control-group comparability.
  - "Two spatial training conditions served as a control..." (p. 5)
  - Relevant Quotes: 1) "Participants were randomly assigned by the primary investigator to one of two types of training: inference training (n=60) or spatial (control) training (n=120)." (p. 4) 2) "Two spatial training conditions served as a control: Children completed four training sessions of spatial visualization training with or without physical manipulatives." (p. 5) 3) "There was no difference in age (t (177)=1.69, p=.092), mother's education (t (177)=1.34, p=.182), gender (χ2 (178)=0.003, p=.959) or bilingual status (χ2 (145)=0.02, p=.897) between the two conditions." (p. 7) 4) "The reading intervention was designed as an active control group with implicit instruction: children were exposed to the taught key vocabulary in the same supportive paragraph contexts (see Figure 3b) as used in the inference intervention but embedded within stories." (p. 13) Detailed Analysis: Criterion D requires that the control group is well-documented, including what it received, its size, and information supporting baseline comparability. In Study 1, the control condition (spatial training) is described, with group sizes reported, and baseline demographic comparisons are presented. In Study 2, the active control (implicit reading) is described, including the nature of exposure and activities. These details allow a reader to understand what the control group did and to evaluate comparability. Criterion D is met because the control conditions are clearly described and baseline comparisons are reported.
Level 2 Criteria
- S
  School-level RCT
  - Schools were not randomized to conditions; participants were assigned within schools.
  - "Participants were randomly assigned by the primary investigator to one of two types of training..." (p. 4)
  - Relevant Quotes: 1) "Participants were Year 3 children (aged 7 to 8 years) from six medium-to-large English state primary schools (n=20 to 51 children per school) in areas of low deprivation." (p. 4) 2) "Participants were randomly assigned by the primary investigator to one of two types of training: inference training (n=60) or spatial (control) training (n=120)." (p. 4) 3) "Participants with EAL (n=37) or EL1 (n=39) were assigned by stratified randomization into one of two interventions: inference (n=40) or reading (n=36)." (p. 9) Detailed Analysis: Criterion S requires school-level randomization (i.e., entire schools are assigned to intervention vs control). While multiple schools were involved, the paper describes assignment at the level of children/participants (including stratified randomization in Study 2), not assignment of whole schools. Criterion S is not met because the unit of randomization is participants, not schools.
- I
  Independent Conduct
  - The paper does not document an independent, third-party evaluation team; assignment and delivery are described as being done by the study researchers.
  - "Participants were randomly assigned by the primary investigator..." (p. 4)
  - Relevant Quotes: 1) "Participants were randomly assigned by the primary investigator to one of two types of training: inference training (n=60) or spatial (control) training (n=120)." (p. 4) 2) "Data collection was completed in a quiet area of children's schools by trained psychology research assistants." (p. 6) 3) "The research was conducted in a quiet area in the child's school by two postdoctoral researchers in developmental psychology." (p. 13) 4) "Participants with EAL (n=37) or EL1 (n=39) were assigned by stratified randomization... via the random number generator formula in Excel by the researchers delivering the interventions." (p. 9) Detailed Analysis: Criterion I requires evidence that the evaluation is conducted independently from the intervention developers/authors (or at least by a clearly independent third party), reducing the risk of bias in delivery, measurement, and analysis. The paper indicates that randomization was performed by the "primary investigator" (Study 1) and that stratified assignment was performed by the "researchers delivering the interventions" (Study 2). Data collection and conduct are described as being performed by research assistants and postdoctoral researchers, but there is no explicit statement that these evaluators were independent from the intervention research team. Criterion I is not met because independence of the evaluation team from the intervention developers is not clearly documented.
- Y
  Year Duration
  - Outcomes were measured within weeks rather than at least 75% of an academic year after the intervention began; additionally, because T is not met, Y cannot be met.
  - "Then, one week later, children completed a posttest battery including the same tests." (p. 7)
  - Relevant Quotes: 1) "Interventions were then conducted in groups of four; each group then completed four 30-min intervention sessions across a two-week period." (p. 7) 2) "Then, one week later, children completed a posttest battery including the same tests." (p. 7) 3) "Three sessions of 15–25min were completed over approximately 2weeks." (p. 13) 4) "Then, one week later, children completed the post-test measures in a fixed order in one session." (p. 13) Detailed Analysis: Criterion Y requires outcome measurement at least 75% of an academic year after the intervention begins. The paper describes intervention delivery over approximately two weeks and post-testing one week later in both studies. This is far shorter than the required year-duration window. ERCT dependency rules also apply: if Criterion T is not met, then Criterion Y is not met. Here T is not met. Criterion Y is not met because measurement occurs within weeks, not most of an academic year after intervention start.
- B
  Balanced Control Group
  - Study 1 uses a time-matched active control, but Study 2 provides greater per-child adult support in the intervention (individual delivery) than the control (pairs), creating an input imbalance not framed as the treatment variable.
  - "During the intervention, children worked individually (inference condition) or in pairs (reading condition) with the experimenter." (p. 13)
  - Relevant Quotes: 1) "Two spatial training conditions served as a control: Children completed four training sessions of spatial visualization training with or without physical manipulatives." (p. 5) 2) "They completed interactive activities and received feedback during training, as in the intervention condition, with activities related to mental rotation, mental transformation, and object completion..." (p. 5) 3) "The reading intervention was designed as an active control group with implicit instruction: children were exposed to the taught key vocabulary in the same supportive paragraph contexts (see Figure 3b) as used in the inference intervention but embedded within stories." (p. 13) 4) "During the intervention, children worked individually (inference condition) or in pairs (reading condition) with the experimenter." (p. 13) Detailed Analysis: Criterion B compares the nature, quantity, and quality of resources (including time and adult support) provided to intervention vs control, unless extra resources are explicitly the treatment variable. Study 1 appears balanced on time and structure: both groups complete four training sessions, and the control is described as interactive with feedback "as in the intervention condition," suggesting comparable engagement and adult-led structure. Study 2 uses an active control with comparable exposure to taught-word paragraph contexts. However, delivery differs in a way that affects a key resource: adult attention per child. The inference condition is delivered individually, while the reading control is delivered in pairs with the experimenter. The paper does not frame this difference as the treatment variable (i.e., it is not presented as a deliberate test of additional adult time/support). Therefore, across the paper's RCT evidence, there is an unbalanced input of adult instructional support in Study 2 that could confound the estimated effect of the instructional approach. Criterion B is not met because Study 2 provides greater per-child adult support in the intervention than in the control without framing this as the intended treatment variable.
Level 3 Criteria
- R
  Reproduced
  - No independent replication by a different research team was identified; the paper reports two trials by the same author team.
  - "The efficacy of a novel inference intervention for teaching homonyms was demonstrated across two pre-post randomized controlled trials..." (p. 18)
  - Relevant Quotes: 1) "The efficacy of a novel inference intervention for teaching homonyms was demonstrated across two pre-post randomized controlled trials..." (p. 18) Detailed Analysis: Criterion R requires independent replication by a different research team in a different context, published in a peer-reviewed outlet. This paper contains two randomized controlled trials, but both are conducted and reported within the same article by the same author team, and therefore do not constitute independent reproduction. An internet search for later or external replication studies of the same "Word Detectives" lexical inference intervention for homonyms did not identify a peer-reviewed replication by an independent research team as of the ERCT check date. Criterion R is not met because independent reproduction evidence was not found.
- A
  All-subject Exams
  - Because criterion E is not met, criterion A cannot be met; the study also does not assess all core subjects via standardized exams.
  - Relevant Quotes: 1) "The Homonyms: receptive test was created to assess children's recognition of the meanings of the selected homonyms..." (p. 4) 2) "To examine how homonym knowledge relates to reading comprehension, the York Assessment of Reading for Comprehension (YARC; Snowling et al., 2009) passage reading subtest form A was used." (p. 11) Detailed Analysis: Criterion A requires all-subject standardized exams and has an explicit prerequisite: if Criterion E is not met, then Criterion A is not met. Here, the primary learning outcomes are measured via researcher-created homonym assessments, and the only standardized assessment described is a reading comprehension test (not a full set of core subjects such as math, science, etc.). Criterion A is not met because criterion E is not met and because the paper does not report standardized exams across all main subjects.
- G
  Graduation Tracking
  - The paper reports only short-term post-tests and does not track students to graduation; additionally, because Y is not met, G cannot be met.
  - "Then, one week later, children completed a posttest battery including the same tests." (p. 7)
  - Relevant Quotes: 1) "Then, one week later, children completed a posttest battery including the same tests." (p. 7) 2) "Then, one week later, children completed the post-test measures in a fixed order in one session." (p. 13) 3) "Both studies have some limitations worth noting. The intervention and timescale for post-tests was brief..." (p. 18) Detailed Analysis: Criterion G requires tracking participants until graduation (i.e., a long-term educational endpoint). The paper describes brief interventions and post-testing one week later, and explicitly characterizes the post-test timescale as brief. No graduation or end-of-stage follow-up (e.g., administrative tracking to primary school completion) is reported in this article. ERCT dependency rules also apply: if Criterion Y is not met, then Criterion G is not met. Here Y is not met, so G cannot be met. Criterion G is not met because there is no graduation tracking and the follow-up window is only short-term.
- P
  Pre-Registered
  - Study 2 is stated to be pre-registered on OSF, but the paper does not report a registration date that can be verified as occurring before data collection.
  - "This study's method and analyses were pre-registered on the Open Science Framework..." (p. 9)
  - Relevant Quotes: 1) "This study's method and analyses were pre-registered on the Open Science Framework ... and any exploratory deviations from this are noted." (p. 9) 2) "Two weeks before the interventions, children completed the battery of baseline tasks and pretest measures, one-to-one with the experimenter..." (p. 13) Detailed Analysis: Criterion P requires that the full protocol be pre-registered before data collection begins, which typically requires verifying the registry record date is earlier than baseline/testing start. The paper states Study 2 was pre-registered and provides an OSF link, but it does not provide a registration date in the paper text. An attempt was made to verify the OSF record date via the registry link, but a registration/creation date could not be retrieved from accessible sources in a way that could be quoted verbatim here. Therefore, the timing (pre vs post data-collection start) cannot be confirmed from the paper text plus verifiable registry metadata. Criterion P is not met because the pre-registration date cannot be verified (from the paper and accessible registry metadata) as preceding data collection.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Request Valuation Update

All Other Requests

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

Submit a Study for Evaluation

Share your research with us for review
Suggest Improvements

Provide feedback to help us make things better.
Update Your Study

If you're the author, let us know about necessary updates or corrections.

Lexical inference training for homonyms: Two randomized controlled trials for children with English as a first and an additional language

Randomization is at the child level, but the intervention is delivered in small-group/individual formats with an experimenter, matching the ERCT tutoring-style exception.

The primary outcomes rely on researcher-created homonym measures rather than a widely recognized standardized exam.

Outcomes were measured about one week after completing a short (~2-week) intervention period, which is far shorter than an academic term.

The paper documents the control conditions and reports baseline comparisons between conditions, enabling interpretation of control-group comparability.

Schools were not randomized to conditions; participants were assigned within schools.

The paper does not document an independent, third-party evaluation team; assignment and delivery are described as being done by the study researchers.

Outcomes were measured within weeks rather than at least 75% of an academic year after the intervention began; additionally, because T is not met, Y cannot be met.

Study 1 uses a time-matched active control, but Study 2 provides greater per-child adult support in the intervention (individual delivery) than the control (pairs), creating an input imbalance not framed as the treatment variable.

No independent replication by a different research team was identified; the paper reports two trials by the same author team.

Because criterion E is not met, criterion A cannot be met; the study also does not assess all core subjects via standardized exams.

The paper reports only short-term post-tests and does not track students to graduation; additionally, because Y is not met, G cannot be met.

Study 2 is stated to be pre-registered on OSF, but the paper does not report a registration date that can be verified as occurring before data collection.

Abstract

ERCT Criteria Breakdown

Level 1 Criteria

Class-level RCT

Exam-based Assessment

Term Duration

Documented Control Group

Level 2 Criteria

School-level RCT

Independent Conduct

Year Duration

Balanced Control Group

Level 3 Criteria

Reproduced

All-subject Exams

Graduation Tracking

Pre-Registered

Request an Update or Contact Us

Have Questions or Suggestions?

Submit a Study for Evaluation

Suggest Improvements

Update Your Study

Have Questions
or Suggestions?