A Cluster Randomized Controlled Trial Comparing the Efficacy of Pre-School Language Interventions - Building Early Sentences Therapy and an Adapted Derbyshire Language Scheme

Cristina McKean, Christine Jack, Sean Pert, Carolyn Letts, Helen Stringer, Mark Masidlover, Anastasia Trebacz, Robert Rush, Emily Armstrong, Kate Conn, Jenny Sandham, Elaine Ashton, and Naomi Rose

Published:
ERCT Check Date:
DOI: 10.1111/1460-6984.70036
  • language arts
  • pre-K
  • UK
  • parent involvement
  • homework
0
  • C

    The trial randomized whole Early Years Settings, which satisfies the class-level (or stronger) randomization requirement.

    We conducted a pre-registered cluster randomized controlled trial in 20 EYS randomized to receive BEST or A-DLS.

  • E

    The study used NRDLS, described as a standardized and validated assessment, as the primary outcome measure.

    NRDLS is a standardized, normed reliable and valid omnibus language assessment that measures young children's comprehension and production abilities yielding SSs (Edwards et al. 2011).

  • T

    The study included follow-up about 9 weeks after an 8-week intervention, providing roughly term-long tracking from start to T3.

    Children were assessed by RAs blind to treatment arm allocation for eligibility (T0), before the intervention (T1), immediately after the intervention (T2) and at follow-up (T3 approximately 9 weeks after T2).

  • D

    The study removed the treatment-as-usual control arm, leaving no business-as-usual control group.

    The most significant change was the removal of a TAU arm.

  • S

    Entire Early Years Settings were randomized, meeting the school-level RCT requirement.

    Randomization of EYS to one of two intervention arms was conducted by a statistician not involved with the study.

  • I

    Intervention developers (paper authors) trained and supervised delivery, so the evaluation was not independent.

    Prior to intervention delivery in the trial, RAs were trained in the interventions by C.M. and S.P.; this included video recording their delivery of interventions, reflecting on their fidelity and receiving feedback from S.P. and C.M. using that rating scale.

  • Y

    Outcomes were tracked only to a 9-week post-test follow-up, far short of a full academic year.

    Children were assessed by RAs blind to treatment arm allocation for eligibility (T0), before the intervention (T1), immediately after the intervention (T2) and at follow-up (T3 approximately 9 weeks after T2).

  • B

    The two arms were explicitly matched on dosage/delivery and both included comparable homework resources, balancing time and inputs.

    We created an adapted version of DLS (A-DLS) which could be delivered with high treatment fidelity and reliability in a research context and which matched BEST as closely as possible in terms of dosage and delivery whilst retaining DLS key principles and characteristics.

  • R

    No peer-reviewed independent replication of the 2025 BEST vs A-DLS trial was found.

  • A

    The study measured language and communication only, not standardized outcomes across all main subjects/domains.

    Outcomes were NRDLS comprehension and production standard scores (SS), measures of language structures targeted in the interventions and communicative participation (FOCUS-34).

  • G

    The study followed children only for weeks to a few months post- intervention, with no tracking to graduation.

    Longer term follow-up is needed to test how long such benefits might be present for a child

  • P

    The ISRCTN registry record shows registration before first enrolment, indicating prospective pre-registration.

    [X] Prospectively registered

Abstract

Children's language abilities set the stage for their education, psychosocial development and life chances across the life course. Aims: To compare the efficacy of two preschool language interventions delivered with low dosages in early years settings (EYS): Building Early Sentences Therapy (BEST) and an Adapted Derbyshire Language Scheme (A-DLS). The former is informed by usage-based linguistic theory, the latter by typical language developmental patterns. Methods: We conducted a pre-registered cluster randomized controlled trial in 20 EYS randomized to receive BEST or A-DLS. Children aged 3;05-4;05, who were monolingual, with comprehension and/or production scores ≤ 16th centile (New Reynell Developmental Language Scales-NRDLS) and no sensorineural hearing impairment, severe visual impairment or learning disability were eligible. A total of 102 children received the intervention. Speech and language therapists delivered interventions with high fidelity in 15-min group sessions twice weekly for 8 weeks. Baseline (T1), outcome (T2), and follow-up (T3) measures were completed blind to the intervention arm. Outcomes were NRDLS comprehension and production standard scores (SS), measures of language structures targeted in the interventions and communicative participation (FOCUS-34). Results: Both interventions were associated with significant change from T1 to T2 and from T1 to T3 in all outcomes. There were no differences between interventions in gains in NRDLS comprehension SS at T2 or T3. BEST produced greater gains in NRDLS production SS between T1-T2 (d = 0.40) and T1-T3 (d = 0.55) and in BEST-targeted sentences (d = 0.77).

Full Article

ERCT Criteria Breakdown

  • Level 1 Criteria

    • C

      Class-level RCT

      • The trial randomized whole Early Years Settings, which satisfies the class-level (or stronger) randomization requirement.
      • We conducted a pre-registered cluster randomized controlled trial in 20 EYS randomized to receive BEST or A-DLS.
      • Relevant Quotes: 1) "We conducted a pre-registered cluster randomized controlled trial in 20 EYS randomized to receive BEST or A-DLS." (p. 1) 2) "Randomization of EYS to one of two intervention arms was conducted by a statistician not involved with the study." (p. 4) Detailed Analysis: The paper explicitly describes a cluster randomized controlled trial where the unit of randomization is the Early Years Setting (EYS). Because whole settings (schools) were randomized (a stronger design than class-level randomization), the ERCT Class-level RCT criterion is satisfied. Final sentence: Criterion C is met because entire Early Years Settings (schools) were randomized to intervention arms.
    • E

      Exam-based Assessment

      • The study used NRDLS, described as a standardized and validated assessment, as the primary outcome measure.
      • NRDLS is a standardized, normed reliable and valid omnibus language assessment that measures young children's comprehension and production abilities yielding SSs (Edwards et al. 2011).
      • Relevant Quotes: 1) "Outcomes were NRDLS comprehension and production standard scores (SS), measures of language structures targeted in the interventions and communicative participation (FOCUS-34)." (p. 1) 2) "NRDLS is a standardized, normed reliable and valid omnibus language assessment that measures young children's comprehension and production abilities yielding SSs (Edwards et al. 2011)." (p. 4) Detailed Analysis: The primary outcomes include NRDLS comprehension and production standard scores. The paper describes NRDLS as a standardized, normed, reliable and valid assessment, which matches ERCT's requirement to use a recognized standardized exam-based measure rather than a bespoke test. Final sentence: Criterion E is met because the primary outcomes use the standardized NRDLS assessment.
    • T

      Term Duration

      • The study included follow-up about 9 weeks after an 8-week intervention, providing roughly term-long tracking from start to T3.
      • Children were assessed by RAs blind to treatment arm allocation for eligibility (T0), before the intervention (T1), immediately after the intervention (T2) and at follow-up (T3 approximately 9 weeks after T2).
      • Relevant Quotes: 1) "Treatment was delivered by research assistants (RAs), who were qualified speech and language therapists (SLTs), to 102 preschool children twice a week for 8 weeks: 10 EYS in each of two waves." (p. 4) 2) "Children were assessed by RAs blind to treatment arm allocation for eligibility (T0), before the intervention (T1), immediately after the intervention (T2) and at follow-up (T3 approximately 9 weeks after T2)." (p. 4) Detailed Analysis: The intervention lasted 8 weeks, and an additional follow-up assessment took place about 9 weeks after the post-intervention assessment. This implies that the final outcomes (T3) were measured about 17 weeks after the intervention began (8 + 9 weeks), which is roughly 4 months and meets the "at least one academic term" requirement. Although the immediate post-test (T2) occurs at 8 weeks, ERCT allows short interventions if there is term-long follow-up tracking. Final sentence: Criterion T is met because outcomes were tracked to T3, about 17 weeks after the intervention started.
    • D

      Documented Control Group

      • The study removed the treatment-as-usual control arm, leaving no business-as-usual control group.
      • The most significant change was the removal of a TAU arm.
      • Relevant Quotes: 1) "The most significant change was the removal of a TAU arm." (p. 4) 2) "This study compares the efficacy of two interventions: Building Early Sentences Therapy (BEST) (McKean et al. 2013) and an adaptation of the Derbyshire Language Scheme (DLS) (Knowles and Masidlover 1982)." (p. 2) Detailed Analysis: ERCT criterion D requires a documented control group that receives only standard schooling ("business as usual") with no special treatment. This trial removed the treatment-as-usual (TAU) arm and instead compares two active interventions (BEST vs A-DLS). Because no group received only usual provision, the study does not include the required documented business-as-usual control condition. Final sentence: Criterion D is not met because the treatment-as-usual control arm was removed and both groups received active interventions.
  • Level 2 Criteria

    • S

      School-level RCT

      • Entire Early Years Settings were randomized, meeting the school-level RCT requirement.
      • Randomization of EYS to one of two intervention arms was conducted by a statistician not involved with the study.
      • Relevant Quotes: 1) "A total of 20 EYS were allocated to receive either BEST or A-DLS in two waves to avoid contamination within an EYS and enable group delivery." (p. 4) 2) "Randomization of EYS to one of two intervention arms was conducted by a statistician not involved with the study." (p. 4) Detailed Analysis: The unit of randomization is the Early Years Setting (EYS), which is the educational site implementing the intervention. Entire settings were allocated to BEST or A-DLS, which satisfies the ERCT requirement that randomization occurs at the school/site level rather than within a school. Final sentence: Criterion S is met because whole Early Years Settings were randomized to intervention arms.
    • I

      Independent Conduct

      • Intervention developers (paper authors) trained and supervised delivery, so the evaluation was not independent.
      • Prior to intervention delivery in the trial, RAs were trained in the interventions by C.M. and S.P.; this included video recording their delivery of interventions, reflecting on their fidelity and receiving feedback from S.P. and C.M. using that rating scale.
      • Relevant Quotes: 1) "Funding for the development of Building Early Sentences Therapy (BEST) was provided by Newcastle University." (p. 1) 2) "Prior to intervention delivery in the trial, RAs were trained in the interventions by C.M. and S.P.; this included video recording their delivery of interventions, reflecting on their fidelity and receiving feedback from S.P. and C.M. using that rating scale." (p. 6) 3) "Masidlover, one of the original creators of DLS, created new DLS materials for each activity and provided detailed feedback and advice in the development of the manual and approach." (p. 6) Detailed Analysis: ERCT criterion I requires the evaluation to be conducted independently of the intervention designers. Here, the paper reports that BEST development was funded at Newcastle University and that named authors (C.M. and S.P.) trained the delivering staff and provided fidelity feedback. The paper also states that an original DLS creator (Masidlover) contributed to materials development for A-DLS. This indicates that intervention developers were directly involved in implementation support, so the conduct is not independent (despite use of an external statistician for randomization). Final sentence: Criterion I is not met because intervention developers were directly involved in training, fidelity feedback, and materials development for the evaluated interventions.
    • Y

      Year Duration

      • Outcomes were tracked only to a 9-week post-test follow-up, far short of a full academic year.
      • Children were assessed by RAs blind to treatment arm allocation for eligibility (T0), before the intervention (T1), immediately after the intervention (T2) and at follow-up (T3 approximately 9 weeks after T2).
      • Relevant Quotes: 1) "Treatment was delivered by research assistants (RAs), who were qualified speech and language therapists (SLTs), to 102 preschool children twice a week for 8 weeks: 10 EYS in each of two waves." (p. 4) 2) "Children were assessed by RAs blind to treatment arm allocation for eligibility (T0), before the intervention (T1), immediately after the intervention (T2) and at follow-up (T3 approximately 9 weeks after T2)." (p. 4) Detailed Analysis: The longest follow-up reported is approximately 9 weeks after the immediate post-intervention assessment, following an 8-week intervention. This corresponds to about 17 weeks of tracking from the start of intervention to the final measurement, which is far less than an academic year (roughly 9-10 months). Therefore the Year Duration requirement is not satisfied. Final sentence: Criterion Y is not met because participant outcomes were tracked for only about 17 weeks, not a full academic year.
    • B

      Balanced Control Group

      • The two arms were explicitly matched on dosage/delivery and both included comparable homework resources, balancing time and inputs.
      • We created an adapted version of DLS (A-DLS) which could be delivered with high treatment fidelity and reliability in a research context and which matched BEST as closely as possible in terms of dosage and delivery whilst retaining DLS key principles and characteristics.
      • Relevant Quotes: 1) "Comparing interventions delivered with the same dosage, delivery context, level of treatment fidelity and similar resources, tests whether it is the specific learning mechanisms/active ingredients exploited by the interventions which promote change or simply 'therapy general' effects (Frizelle and McKean 2022)." (p. 2) 2) "We created an adapted version of DLS (A-DLS) which could be delivered with high treatment fidelity and reliability in a research context and which matched BEST as closely as possible in terms of dosage and delivery whilst retaining DLS key principles and characteristics." (p. 6) 3) "Following each session parents are given a homework booklet containing pictures of the verbs targeted in the session with a range of agents and patients." (p. 6) 4) "Homework packs for each activity were developed and provided together with guidance videos for parents" (p. 6) Detailed Analysis: This is a head-to-head comparison of two active interventions rather than an intervention vs business-as-usual design. Both arms received a comparable overall dose (group sessions twice weekly for 8 weeks) and both included parent homework materials. The authors explicitly state that A-DLS was adapted to match BEST in dosage and delivery, supporting balance of time and resource inputs across arms. Under the updated ERCT criterion B decision logic, extra resources are present, but the comparator arm matches them, so the comparison is not confounded by extra time/budget. Final sentence: Criterion B is met because the two study arms were explicitly matched for dosage, delivery, and homework resources.
  • Level 3 Criteria

    • R

      Reproduced

      • No peer-reviewed independent replication of the 2025 BEST vs A-DLS trial was found.
      • Relevant Quotes: 1) "In a quasi-experimental pilot study, Trebacz et al. (2024) found that BEST produced greater standard scores (SSs) gains in expressive language than a treatment-as-usual control but not comprehension." (p. 2) 2) "In an RCT, Broomfield and Dodd (2011) demonstrated that DLS was associated with improvements in comprehension but not production when compared to a wait-list control." (p. 2) Detailed Analysis: ERCT criterion R requires an independent replication of this study (or its core intervention comparison) by a different research team in a different context, published in a peer-reviewed journal. The current paper cites a prior BEST pilot study (Trebacz et al., 2024) and a prior DLS RCT (Broomfield and Dodd, 2011). However, Trebacz is an author on the current paper, so this pilot is not independent of the current author team. The Broomfield and Dodd trial evaluates DLS in a different design and does not replicate this paper's head-to-head BEST vs A-DLS cluster RCT. Internet search did not identify any peer-reviewed, independent replications of this 2025 head-to-head trial as of the ERCT check date. Final sentence: Criterion R is not met because no independent, peer-reviewed replication of this study was found.
    • A

      All-subject Exams

      • The study measured language and communication only, not standardized outcomes across all main subjects/domains.
      • Outcomes were NRDLS comprehension and production standard scores (SS), measures of language structures targeted in the interventions and communicative participation (FOCUS-34).
      • Relevant Quotes: 1) "Outcomes were NRDLS comprehension and production standard scores (SS), measures of language structures targeted in the interventions and communicative participation (FOCUS-34)." (p. 1) 2) "Vineland Adaptive Behaviour Scales-Vineland-3 were completed by the child's teacher at T0, to characterize children's non-verbal and broader developmental profiles (Sparrow et al. 2016)." (p. 6) Detailed Analysis: Criterion A requires standardized exam-based assessment across all main subjects (or an explicitly justified exception). This preschool language trial evaluates outcomes in language and communicative participation only (NRDLS and related language measures). Vineland-3 is used at baseline to characterize participants, not as an outcome across subject areas. Therefore the study does not assess effects across the broader set of curriculum domains, so criterion A is not satisfied. (Criterion E is met, but A has the additional requirement of multi-subject coverage.) Final sentence: Criterion A is not met because outcomes were limited to language and communication rather than all main subjects/domains.
    • G

      Graduation Tracking

      • The study followed children only for weeks to a few months post- intervention, with no tracking to graduation.
      • Longer term follow-up is needed to test how long such benefits might be present for a child
      • Relevant Quotes: 1) "Children were assessed by RAs blind to treatment arm allocation for eligibility (T0), before the intervention (T1), immediately after the intervention (T2) and at follow-up (T3 approximately 9 weeks after T2)." (p. 4) 2) "Longer term follow-up is needed to test how long such benefits might be present for a child" (p. 12) 3) "Follow up assessments will now be conducted 2-3 months after the end of the intervention." (ISRCTN record, p. 2) Detailed Analysis: ERCT criterion G requires tracking participants until graduation. This study reports only short-term follow-up (weeks to a few months) after the intervention, and explicitly states that longer-term follow-up is needed. In addition, ERCT specifies that if criterion Y is not met, then G is not met; here, Y is not met because tracking is far shorter than an academic year. A search for subsequent peer-reviewed follow-up papers by the same author team tracking this cohort to school-stage graduation did not identify any such publications as of the ERCT check date. Final sentence: Criterion G is not met because the cohort was not tracked to any graduation milestone, only short-term follow-up.
    • P

      Pre-Registered

      • The ISRCTN registry record shows registration before first enrolment, indicating prospective pre-registration.
      • [X] Prospectively registered
      • Relevant Quotes: 1) "This pre-registered cluster randomized controlled trial took place in three local authorities (LAs) in England between January 2020 and June 2022 (ISRCTN10974028) (McKean et al 2020)" (p. 3) 2) "Registration date" (ISRCTN record, p. 1) 3) "08/01/2020" (ISRCTN record, p. 1) 4) "Date of first enrolment" (ISRCTN record, p. 14) 5) "01/02/2020" (ISRCTN record, p. 14) Detailed Analysis: The paper reports that the trial was pre-registered and provides the public registry identifier (ISRCTN10974028). The ISRCTN registry record shows a registration date of 08/01/2020 and a first enrolment date of 01/02/2020, meaning the record was registered before recruitment began. This satisfies ERCT criterion P's requirement that the protocol be registered before the study starts. Final sentence: Criterion P is met because ISRCTN10974028 was registered before the first participant was enrolled.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

  • Submit a Study for Evaluation

    Share your research with us for review

  • Suggest Improvements

    Provide feedback to help us make things better.

  • Update Your Study

    If you're the author, let us know about necessary updates or corrections.