The Effects of Immersive Virtual Reality on Language Learning: Causal Effects on CEFR-Aligned Grammatical and Lexical Performance and Within-VR Comparative Effects on Communication and Intercultural Competence

Huijun Niu

Published: Jan 28, 2026

ERCT Check Date: Apr 16, 2026

DOI: 10.1177/07356331261419686

Link

Download PDF

L2 languages
higher education
China
Asia
gamification
EdTech platform

C

Randomization occurred at the individual student level rather than by intact classes (and the intervention is not one-to-one tutoring), so the class-level RCT requirement is not satisfied.

Participants were randomized at the individual level to one of four arms using computer-generated permuted blocks (1:1:1:1).
E

The primary outcome is an author-developed, CEFR-aligned proficiency battery rather than a widely recognized standardized exam, so the exam-based assessment requirement is not satisfied.

This randomized controlled trial tested the effects of immersive Virtual Reality (VR) enhanced with artificial intelligence on English language development, operationalized as performance on an author-developed, CEFR-aligned language proficiency battery emphasizing grammatical and lexical performance...
T

The paper reports the dosage in sessions (15 × 90 minutes) but does not clearly document calendar start and outcome measurement dates showing at least one full academic term elapsed from start to measurement.

Each experimental group underwent 15 sessions of 90 minutes each, focusing on specific pedagogical interventions tailored to address different facets of language teaching and learning.
D

The control condition is clearly described (traditional instruction), with sample size and baseline/posttest descriptives reported in tables, satisfying the documented control group requirement.

Participants were assigned to NLP-enhanced VR, ML-enhanced VR, SA-enhanced VR, or a traditional instruction control condition.
S

Randomization was not conducted at the school (or site) level; participants were randomized individually, so the school-level RCT requirement is not satisfied.

Participants were randomized at the individual level to one of four arms using computer-generated permuted blocks (1:1:1:1).
I

The intervention platform was custom-developed for the study and there is no clear statement that an independent external evaluator conducted the trial, so independent conduct is not established.

The fully immersive VR environment (360° headset-based) was custom-developed in Unity 3D specifically for this study and deployed on Oculus Quest 2 headsets (256 GB model).
Y

The paper does not provide start and measurement dates demonstrating outcome measurement at least 75% of an academic year after the intervention began, and criterion T is also not met.

Each experimental group underwent 15 sessions of 90 minutes each, focusing on specific pedagogical interventions tailored to address different facets of language teaching and learning.
B

Instructional time appears matched across arms (15 × 90-minute sessions) and the added technology resources are integral to the intervention being tested versus business-as-usual, so the balanced control requirement is satisfied.

Table 1 shows the control condition had "15 × 90-min sessions" and the VR conditions also had "15 × 90-min sessions."
R

No independent replication by a different research team in a different context could be identified for this 2026 study.
A

Because the study does not meet criterion E (it uses an author-developed assessment rather than standardized exams), it cannot meet the all-subject standardized exams requirement.

Language Proficiency (Author-Developed, CEFR-Aligned; Pretest-Posttest). This author-developed test was based on the Common European Framework of Reference for Languages (CEFR)...
G

The study does not report tracking participants through to graduation, and because criterion Y is not met, graduation tracking is also not satisfied.
P

No explicit pre-registration statement or registry identifier is provided showing the protocol was registered before data collection began.

Abstract

This randomized controlled trial tested the effects of immersive Virtual Reality (VR) enhanced with artificial intelligence on English language development, operationalized as performance on an author-developed, CEFR-aligned language proficiency battery emphasizing grammatical and lexical performance, in undergraduate Chinese EFL learners (N = 477). Participants were assigned to NLP-enhanced VR, ML-enhanced VR, SA-enhanced VR, or a traditional instruction control condition. Posttest scores on an author-developed, CEFR-aligned proficiency measure were analyzed using mixed-effects ANCOVA to account for recurring laboratory sections. The NLP-enhanced VR condition yielded substantially greater grammatical and lexical gains than all other conditions (F(3,473) = 1139.45, p < .001, η2 = .88), with post hoc tests confirming its superiority. Communication competence and intercultural competence were measured only within the three VR arms. No reliable between-arm differences were detected for communication competence (F(2,354) = 0.02, p = .982) or intercultural competence (F(2,354) = 1.06, p = .349), so no causal claims are made versus the control group for these outcomes. Findings indicate that context-sensitive, NLP-driven conversational support in immersive VR can causally enhance foundational linguistic subsystems—vocabulary, grammar, and sentence-level syntax—as measured by the CEFR-aligned assessment, while the durability and communicative transfer of these gains require verification through delayed and independent measures.

Full Article

ERCT Criteria Breakdown

Level 1 Criteria
- C
  Class-level RCT
  - Randomization occurred at the individual student level rather than by intact classes (and the intervention is not one-to-one tutoring), so the class-level RCT requirement is not satisfied.
  - Participants were randomized at the individual level to one of four arms using computer-generated permuted blocks (1:1:1:1).
  - Relevant Quotes: 1) "Participants were randomized at the individual level to one of four arms using computer-generated permuted blocks (1:1:1:1)." (p. 7) 2) "Instruction and data collection occurred in recurring sections/lab blocks (J = [insert]), which can induce within-section correlation. All confirmatory analyses therefore modeled section as a random intercept." (p. 7) 3) "Scheduling and clusters. Participants attended recurring sections/lab blocks (J = 16; mean cluster size mˉ = 29.8, SD = 2.1, range = 26–33)." (p. 7) Detailed Analysis: Criterion C requires random assignment at the class level (or stronger) to reduce contamination between treatment and control. The paper explicitly states individual-level assignment using permuted blocks, which indicates that students in the same recurring lab/section structure could be assigned to different arms. The ERCT exception allowing student-level randomization applies to personal tutoring/one-to-one teaching interventions. Here the intervention is delivered in recurring lab blocks with shared settings and instructors, and is not described as one-to-one tutoring. Final sentence: Criterion C is not met because the unit of randomization is individual students rather than intact classes (and no tutoring exception applies).
- E
  Exam-based Assessment
  - The primary outcome is an author-developed, CEFR-aligned proficiency battery rather than a widely recognized standardized exam, so the exam-based assessment requirement is not satisfied.
  - This randomized controlled trial tested the effects of immersive Virtual Reality (VR) enhanced with artificial intelligence on English language development, operationalized as performance on an author-developed, CEFR-aligned language proficiency battery emphasizing grammatical and lexical performance...
  - Relevant Quotes: 1) "This randomized controlled trial tested the effects of immersive Virtual Reality (VR) enhanced with artificial intelligence on English language development, operationalized as performance on an author-developed, CEFR-aligned language proficiency battery emphasizing grammatical and lexical performance..." (p. 1) 2) "Language Proficiency (Author-Developed, CEFR-Aligned; Pretest-Posttest). This author-developed test was based on the Common European Framework of Reference for Languages (CEFR)..." (p. 12) 3) "Items were adapted from CEFR-aligned standardized language assessments (e.g., Cambridge English Qualifications) to target VR-enhanced learning contexts." (p. 12) 4) "Because the author-developed CEFR-aligned battery is intentionally focused on grammatical accuracy and lexical range..." (p. 12) Detailed Analysis: Criterion E requires outcome measurement using standardized, widely recognized exams rather than researcher-built instruments. Although the measure is CEFR-aligned and draws on items adapted from standardized assessments, the paper repeatedly describes the primary outcome as "author-developed." A CEFR alignment and item adaptation strategy can improve construct alignment, but it does not turn an author-assembled battery into a widely recognized standardized exam administered under a standard external testing program. Final sentence: Criterion E is not met because the primary outcome is an author-developed, CEFR-aligned battery rather than a standardized external exam.
- T
  Term Duration
  - The paper reports the dosage in sessions (15 × 90 minutes) but does not clearly document calendar start and outcome measurement dates showing at least one full academic term elapsed from start to measurement.
  - Each experimental group underwent 15 sessions of 90 minutes each, focusing on specific pedagogical interventions tailored to address different facets of language teaching and learning.
  - Relevant Quotes: 1) "Each experimental group underwent 15 sessions of 90 minutes each, focusing on specific pedagogical interventions tailored to address different facets of language teaching and learning." (p. 6) 2) "The control group engaged in standard language learning activities, serving as a baseline for comparative analysis." (p. 6) 3) "The duration of the intervention (15 × 90 minute intervention sessions) allowed participants sufficient time to engage with the personalized exercises and assimilate the feedback." (p. 10) Detailed Analysis: Criterion T requires outcome measurement at least one full academic term after the intervention begins, which typically requires clear calendar dates (or an explicit term/semester framing) for the start of the intervention and the posttest (or other primary outcome measurement). The paper specifies instructional dosage (15 sessions of 90 minutes) but does not provide, in the quoted text, explicit calendar start and end dates (e.g., month-to-month) or an explicit statement that the 15 sessions span a full semester/ term. Final sentence: Criterion T is not met because the paper does not clearly document a term-length calendar interval from intervention start to outcome measurement.
- D
  Documented Control Group
  - The control condition is clearly described (traditional instruction), with sample size and baseline/posttest descriptives reported in tables, satisfying the documented control group requirement.
  - Participants were assigned to NLP-enhanced VR, ML-enhanced VR, SA-enhanced VR, or a traditional instruction control condition.
  - Relevant Quotes: 1) "Participants were assigned to NLP-enhanced VR, ML-enhanced VR, SA-enhanced VR, or a traditional instruction control condition." (p. 1) 2) "The study was anchored in a randomized experimental framework, incorporating three experimental groups and a control group." (p. 6) 3) "The control group engaged in standard language learning activities, serving as a baseline for comparative analysis." (p. 6) 4) "Table 1. Experimental Group Characteristics" with "Control Traditional instruction Grammar/Vocabulary Classroom lectures 15 × 90-min sessions No technology 120 (58/62)" (p. 9) 5) "Table 2. Descriptive statistics by group" includes language proficiency "Pretest" and "Posttest" values for "Control" with n, M, SD, and 95% CI. (p. 13) Detailed Analysis: Criterion D requires that the control group be sufficiently documented so readers can understand what the control condition received and assess baseline comparability. The paper clearly identifies the control condition as traditional instruction / standard language learning activities. It reports the control group sample size and demographics (Table 1) and provides baseline and posttest descriptive statistics for the primary outcome (Table 2). Final sentence: Criterion D is met because the control condition, sample size, and baseline/posttest descriptives are explicitly documented.
Level 2 Criteria
- S
  School-level RCT
  - Randomization was not conducted at the school (or site) level; participants were randomized individually, so the school-level RCT requirement is not satisfied.
  - Participants were randomized at the individual level to one of four arms using computer-generated permuted blocks (1:1:1:1).
  - Relevant Quotes: 1) "Participants were randomized at the individual level to one of four arms using computer-generated permuted blocks (1:1:1:1)." (p. 7) 2) "Participants were enlisted from a pool of undergraduates majoring in Teaching English as a Foreign Language." (p. 7) Detailed Analysis: Criterion S requires randomization at the level of the implementing institution/site (e.g., schools, centers, campuses, or other comparable delivery sites). The paper describes a single higher-education participant pool with individual-level randomization, not random assignment across multiple sites. The mention of recurring lab blocks indicates clustering for analysis, but it does not indicate that lab blocks or sites were randomized as the unit of assignment. Final sentence: Criterion S is not met because randomization was conducted at the individual participant level rather than at the institutional site level.
- I
  Independent Conduct
  - The intervention platform was custom-developed for the study and there is no clear statement that an independent external evaluator conducted the trial, so independent conduct is not established.
  - The fully immersive VR environment (360° headset-based) was custom-developed in Unity 3D specifically for this study and deployed on Oculus Quest 2 headsets (256 GB model).
  - Relevant Quotes: 1) "The fully immersive VR environment (360° headset-based) was custom-developed in Unity 3D specifically for this study and deployed on Oculus Quest 2 headsets (256 GB model)." (p. 8) 2) "Trained observers diligently documented participants’ communicative behaviors during interactions within the immersive VR environment." (p. 15) 3) "Theme refinement through peer debriefing with two independent researchers" (p. 17) Detailed Analysis: Criterion I requires clear evidence that the evaluation was conducted independently of the intervention designers/providers. The paper describes an intervention environment custom-developed specifically for this study, which strongly suggests the research team (or close collaborators) were involved in intervention development. While the paper mentions trained observers and "two independent researchers" for qualitative peer debriefing, these statements do not establish that the overall evaluation (implementation, data collection, and/or analysis) was led by a third-party external evaluation team independent of the intervention development. Final sentence: Criterion I is not met because the paper does not clearly document independent external conduct of the evaluation separate from the intervention development.
- Y
  Year Duration
  - The paper does not provide start and measurement dates demonstrating outcome measurement at least 75% of an academic year after the intervention began, and criterion T is also not met.
  - Each experimental group underwent 15 sessions of 90 minutes each, focusing on specific pedagogical interventions tailored to address different facets of language teaching and learning.
  - Relevant Quotes: 1) "Each experimental group underwent 15 sessions of 90 minutes each, focusing on specific pedagogical interventions tailored to address different facets of language teaching and learning." (p. 6) 2) "The duration of the intervention (15 × 90 minute intervention sessions) allowed participants sufficient time to engage with the personalized exercises and assimilate the feedback." (p. 10) Detailed Analysis: Criterion Y requires outcome measurement at least 75% of an academic year after the intervention begins, which requires clear calendar start and outcome measurement dates (or an explicit academic-year span). The quoted text provides session dosage but does not provide a calendar interval. Additionally, per the ERCT dependency rule, if criterion T is not met then criterion Y is not met. Final sentence: Criterion Y is not met because year-scale timing is not documented in dates and because criterion T is not met.
- B
  Balanced Control Group
  - Instructional time appears matched across arms (15 × 90-minute sessions) and the added technology resources are integral to the intervention being tested versus business-as-usual, so the balanced control requirement is satisfied.
  - Table 1 shows the control condition had "15 × 90-min sessions" and the VR conditions also had "15 × 90-min sessions."
  - Relevant Quotes: 1) "Each experimental group underwent 15 sessions of 90 minutes each..." (p. 6) 2) "The control group engaged in standard language learning activities, serving as a baseline for comparative analysis." (p. 6) 3) "All experimental groups used identical core VR environments and scenario-based activities, differing solely in their specified technological augmentations to ensure fair comparison." (p. 6) 4) "Table 1. Experimental Group Characteristics" shows the control as "15 × 90-min sessions" and the VR groups as "15 × 90-min sessions" (p. 9) 5) "The fully immersive VR environment (360° headset-based) was custom-developed in Unity 3D specifically for this study and deployed on Oculus Quest 2 headsets (256 GB model)." (p. 8) Detailed Analysis: Criterion B evaluates whether differences in time/budget/material resources between intervention and control could confound the causal contrast, unless those additional resources are explicitly the treatment being tested. Here, the VR arms clearly involve substantial extra material and infrastructure resources (VR headsets, custom software, cloud and AI components). However, these resources are not incidental; they define the intervention itself (AI-enhanced immersive VR) in contrast to traditional instruction. Importantly, the paper indicates comparable instructional time exposure across arms via the common "15 × 90-min sessions" dosage reported for both the VR groups and the control group, which reduces the most common imbalance (extra time-on-task). Final sentence: Criterion B is met because session time is reported as matched across arms and the added technology inputs are integral to the treatment being tested against business-as-usual instruction.
Level 3 Criteria
- R
  Reproduced
  - No independent replication by a different research team in a different context could be identified for this 2026 study.
  - Relevant Quotes: 1) (No statement in the paper excerpt indicates that this study has been independently replicated by another research team.) Detailed Analysis: Criterion R requires an independent replication of this study (or a clearly identified reproduction of its central experimental claim using the same intervention approach) by a different research team in a different context, published in a peer-reviewed outlet. The provided paper excerpt does not report any prior replication. An internet search using the DOI and full title did not identify any clearly independent replication studies of this exact trial. Final sentence: Criterion R is not met because no independent replication of this study could be found in the paper or via an internet search.
- A
  All-subject Exams
  - Because the study does not meet criterion E (it uses an author-developed assessment rather than standardized exams), it cannot meet the all-subject standardized exams requirement.
  - Language Proficiency (Author-Developed, CEFR-Aligned; Pretest-Posttest). This author-developed test was based on the Common European Framework of Reference for Languages (CEFR)...
  - Relevant Quotes: 1) "Language Proficiency (Author-Developed, CEFR-Aligned; Pretest-Posttest). This author-developed test was based on the Common European Framework of Reference for Languages (CEFR)..." (p. 12) 2) "This randomized controlled trial tested the effects ... operationalized as performance on an author-developed, CEFR- aligned language proficiency battery emphasizing grammatical and lexical performance..." (p. 1) Detailed Analysis: Criterion A requires standardized exam-based assessment across all main subjects and explicitly depends on criterion E being met. Here, the primary outcome is an author-developed battery, so criterion E is not met, which automatically prevents meeting criterion A. Additionally, the study focuses on L2 language outcomes rather than assessing across all core subjects for the educational program. Final sentence: Criterion A is not met because criterion E is not met and the study does not assess all subjects using standardized exams.
- G
  Graduation Tracking
  - The study does not report tracking participants through to graduation, and because criterion Y is not met, graduation tracking is also not satisfied.
  - Relevant Quotes: 1) (No statement in the paper excerpt describes following participants until graduation from their program or educational stage.) Detailed Analysis: Criterion G requires follow-up tracking until graduation. The provided paper excerpt focuses on pretest-posttest outcomes and does not describe longer-term follow-up through participants’ graduation. Per the ERCT dependency rule, if criterion Y is not met then criterion G is not met. An internet search for follow-up publications by the same author reporting graduation outcomes for this cohort did not identify any such follow-up paper. Final sentence: Criterion G is not met because graduation tracking is not reported, no follow-up paper with graduation outcomes was found, and criterion Y is not met.
- P
  Pre-Registered
  - No explicit pre-registration statement or registry identifier is provided showing the protocol was registered before data collection began.
  - Relevant Quotes: 1) (No pre-registration link, registry name/ID, or registration date is stated in the paper excerpt.) Detailed Analysis: Criterion P requires a clearly identified, time-stamped pre-registration in a registry (e.g., OSF Registrations, ClinicalTrials.gov, ISRCTN), with registration occurring before data collection began. The provided paper excerpt contains detailed methods (including randomization and analysis plans) but does not include a pre-registration statement, registry name, registration ID, or a registration date. An internet search using the DOI and title did not reveal a clearly linked public preregistration record for this study. Final sentence: Criterion P is not met because no verifiable pre-registration record is cited in the paper and none was found via internet search.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Request Valuation Update

All Other Requests

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

Submit a Study for Evaluation

Share your research with us for review
Suggest Improvements

Provide feedback to help us make things better.
Update Your Study

If you're the author, let us know about necessary updates or corrections.

The Effects of Immersive Virtual Reality on Language Learning: Causal Effects on CEFR-Aligned Grammatical and Lexical Performance and Within-VR Comparative Effects on Communication and Intercultural Competence

Randomization occurred at the individual student level rather than by intact classes (and the intervention is not one-to-one tutoring), so the class-level RCT requirement is not satisfied.

The primary outcome is an author-developed, CEFR-aligned proficiency battery rather than a widely recognized standardized exam, so the exam-based assessment requirement is not satisfied.

The paper reports the dosage in sessions (15 × 90 minutes) but does not clearly document calendar start and outcome measurement dates showing at least one full academic term elapsed from start to measurement.

The control condition is clearly described (traditional instruction), with sample size and baseline/posttest descriptives reported in tables, satisfying the documented control group requirement.

Randomization was not conducted at the school (or site) level; participants were randomized individually, so the school-level RCT requirement is not satisfied.

The intervention platform was custom-developed for the study and there is no clear statement that an independent external evaluator conducted the trial, so independent conduct is not established.

The paper does not provide start and measurement dates demonstrating outcome measurement at least 75% of an academic year after the intervention began, and criterion T is also not met.

Instructional time appears matched across arms (15 × 90-minute sessions) and the added technology resources are integral to the intervention being tested versus business-as-usual, so the balanced control requirement is satisfied.

No independent replication by a different research team in a different context could be identified for this 2026 study.

Because the study does not meet criterion E (it uses an author-developed assessment rather than standardized exams), it cannot meet the all-subject standardized exams requirement.

The study does not report tracking participants through to graduation, and because criterion Y is not met, graduation tracking is also not satisfied.

No explicit pre-registration statement or registry identifier is provided showing the protocol was registered before data collection began.

Abstract

ERCT Criteria Breakdown

Level 1 Criteria

Class-level RCT

Exam-based Assessment

Term Duration

Documented Control Group

Level 2 Criteria

School-level RCT

Independent Conduct

Year Duration

Balanced Control Group

Level 3 Criteria

Reproduced

All-subject Exams

Graduation Tracking

Pre-Registered

Request an Update or Contact Us

Have Questions or Suggestions?

Submit a Study for Evaluation

Suggest Improvements

Update Your Study

Have Questions
or Suggestions?