Effectiveness of VR and traditional training in medical education for mass casualty management: an OSCE-based randomized controlled trial

Zhe Li, Wan Chen, Guozheng Qiu, Lei Shi, Yutao Tang, Xibin Xu, Sanshan Zhu, and Liwen Lyu

Published: Feb 7, 2026

ERCT Check Date: Mar 13, 2026

DOI: 10.1186/s12909-026-08759-x

Link

Download PDF

science
higher education
China
EdTech platform

C

Participants were randomized as individuals, not as intact classes (and this is not a one-to-one tutoring exception).

"All eligible participants were randomly assigned to either the VR-based Training Group or the Lecture-based Training Group in an equal 1:1 ratio, with 23 participants in each group." (p. 3)
E

Outcomes were measured with a study-constructed multiple-choice test and an OSCE checklist developed for this study, not a widely recognized standardized exam.

"The test consisted of 20 questions designed to assess competency in areas such as triage protocols, injury identification, and emergency management according to standardized mass casualty guidelines." (p. 4)
T

Outcomes were assessed immediately around a single 2-hour training session rather than at least one academic term after the intervention began.

"Each educational session—whether VR-based or lecture-based—lasts for two hours, and all participants will be given the same amount of class time." (p. 2)
D

The lecture-based control condition is clearly described and the paper reports group sizes and baseline characteristics.

"The baseline characteristics of the participants in both groups were comparable (Table 1)." (p. 6)
S

The trial randomized individual trainees rather than randomizing schools (or other institution-level sites) to conditions.

"All eligible participants were randomly assigned to either the VR-based Training Group or the Lecture-based Training Group..." (p. 3)
I

The VR program was developed by the same hospital department that ran the study, and the paper does not document an external independent evaluation team (despite using blinded raters and an independent statistician for some tasks).

"The program—Road Traffic Injury VR Software 1.0—was developed by the Emergency Department of Guangxi Zhuang Autonomous Region People’s Hospital." (p. 3)
Y

Outcomes were assessed immediately after a 2-hour intervention, so year-long tracking is absent; additionally, since criterion T is not met, criterion Y is automatically not met.

"Furthermore, the training duration was limited to two hours, and study outcomes were assessed immediately after the intervention." (p. 11)
B

Both arms received equal instructional time (2 hours) and the same core curriculum content; the VR hardware/software is integral to the intended treatment contrast.

"Each educational session—whether VR-based or lecture-based—lasts for two hours, and all participants will be given the same amount of class time." (p. 2)
R

No independent replication by a non-overlapping author team was found for this specific study/intervention.

"However, comparative evidence between VR and conventional instructional approaches remains limited, especially in the field of mass casualty management." (p. 2)
A

Because criterion E is not met, criterion A is automatically not met; additionally, the study does not assess standardized exams across all core subjects.

"The primary endpoint involves assessing students’ performance in simulated mass casualty scenarios using an Objective Structured Clinical Examination (OSCE) and a written examination to evaluate theoretical knowledge." (p. 2)
G

The paper does not track participants to graduation, and because criterion Y is not met, criterion G is automatically not met.

"Future research should therefore include extended training exposure and longitudinal follow-up to evaluate skill durability over time." (p. 11)
P

The study reports a registration date of February 5, 2025, which is after the stated study period (January 2024 to October 2024), so the protocol was not pre-registered before the study began.

"The study was subsequently registered in the Chinese Clinical Trial Registry (ChiCTR2500096725) with a registration date of February 5, 2025." (p. 3)

Abstract

Objective This study aimed to evaluate the effectiveness of VR-based training compared to traditional lecture-based training for medical trainees in managing MCIs, specifically focusing on road traffic accidents. The primary assessment was performed using an Objective Structured Clinical Examination (OSCE) and a theoretical knowledge test. Methods A randomized controlled trial was conducted with 46 medical trainees receiving emergency medicine training. Participants were randomly assigned to either a VR-based Training Group or a Lecture-based Training Group, with each group receiving a 2-hour training session on mass casualty management. The training effectiveness was evaluated through pre- and post-training knowledge tests, OSCE performance, and post-training feedback questionnaires. Statistical analyses were performed to compare the two groups. Results Baseline characteristics were well-matched between groups. The VR-based Training Group demonstrated significantly higher post-test scores (83.96 ± 13.11) compared to the Lecture-based Training Group (72.17 ± 20.89, p = 0.03). The learning gain was also significantly greater in the VR-based Training Group (40.26 ± 15.61) compared to the Lecture-based Training Group (28.26 ± 17.04, p = 0.02). OSCE results showed that the VR-based Training Group demonstrated better performance than the Lecture-based Training Group across all stations, with significant improvements in triage, injury assessment, and overall scene management. Additionally, feedback from the post-training questionnaire revealed that the VR-based Training Group reported greater confidence in performing critical tasks. Conclusion VR-based training may represent a more effective and engaging approach for teaching mass casualty management relative to traditional lecture-based methods. It may enhance both theoretical knowledge and practical skills, which could help better prepare medical trainees for high-pressure scenarios such as MCIs. As VR technology continues to evolve, its integration into medical education holds considerable potential for improving preparedness and clinical performance in emergency settings.

Full Article

ERCT Criteria Breakdown

Level 1 Criteria
- C
  Class-level RCT
  - Participants were randomized as individuals, not as intact classes (and this is not a one-to-one tutoring exception).
  - "All eligible participants were randomly assigned to either the VR-based Training Group or the Lecture-based Training Group in an equal 1:1 ratio, with 23 participants in each group." (p. 3)
  - Relevant Quotes: 1) "Participants are randomly assigned to either the VR-based Training Group or the Lecture-based Training Group, with the allocation ratio being 1:1." (p. 2) 2) "All eligible participants were randomly assigned to either the VR-based Training Group or the Lecture-based Training Group in an equal 1:1 ratio, with 23 participants in each group." (p. 3) Detailed Analysis: Criterion C requires randomization at the class level (or stronger, such as school/site-level), unless the intervention is clearly a personal tutoring / one-to-one teaching intervention. Here, the unit randomized is the individual "participants" within a single hospital training context, and the intervention is a modality comparison (VR-based training vs lecture-based training), not a one-to-one tutoring program. The paper does not describe intact classes, cohorts, course sections, or other group units being randomized as clusters. Final Summary: Criterion C is not met because randomization was at the individual participant level rather than at the class (or school/site) level.
- E
  Exam-based Assessment
  - Outcomes were measured with a study-constructed multiple-choice test and an OSCE checklist developed for this study, not a widely recognized standardized exam.
  - "The test consisted of 20 questions designed to assess competency in areas such as triage protocols, injury identification, and emergency management according to standardized mass casualty guidelines." (p. 4)
  - Relevant Quotes: 1) "Students in both groups were administered a written multiple-choice exam both before and after the training as a measure of their theoretical knowledge of mass casualty management." (p. 4) 2) "The test consisted of 20 questions designed to assess competency in areas such as triage protocols, injury identification, and emergency management according to standardized mass casualty guidelines." (p. 4) 3) "Each OSCE station was evaluated using a structured checklist developed by three senior emergency medicine educators." (p. 5) 4) "The questionnaire was developed specifically for this study and is provided as a supplementary file." (p. 5) Detailed Analysis: Criterion E requires standardized, widely recognized exam-based assessments (externally established and broadly comparable), not instruments authored/assembled specifically for the trial. The paper describes a written multiple-choice exam consisting of "20 questions designed" for this assessment, which indicates a study-constructed test even if the content aligns with "standardized mass casualty guidelines." The OSCE is scored using a "structured checklist developed by three senior emergency medicine educators," which is likewise a locally developed instrument for this trial rather than a named, external standardized exam. The post-training questionnaire is explicitly "developed specifically for this study," further reinforcing that outcome measurement relies on study-specific tools. Final Summary: Criterion E is not met because the paper does not use a clearly identified, widely recognized standardized exam-based assessment.
- T
  Term Duration
  - Outcomes were assessed immediately around a single 2-hour training session rather than at least one academic term after the intervention began.
  - "Each educational session—whether VR-based or lecture-based—lasts for two hours, and all participants will be given the same amount of class time." (p. 2)
  - Relevant Quotes: 1) "Each educational session—whether VR-based or lecture-based— lasts for two hours, and all participants will be given the same amount of class time." (p. 2) 2) "To evaluate performance change while minimizing potential practice effects, the OSCE was administered twice—once before and once after the training intervention." (p. 5) 3) "Furthermore, the training duration was limited to two hours, and study outcomes were assessed immediately after the intervention." (p. 11) Detailed Analysis: Criterion T requires that outcomes be measured at least one full academic term (about 3–4 months) after the intervention begins. The intervention here is explicitly a single two-hour session, and the OSCE is administered "once before and once after the training intervention," indicating an immediate pre/post design rather than term-long follow-up. The discussion further confirms outcomes were "assessed immediately after the intervention." Final Summary: Criterion T is not met because outcomes are measured immediately after a 2-hour session rather than after at least one academic term.
- D
  Documented Control Group
  - The lecture-based control condition is clearly described and the paper reports group sizes and baseline characteristics.
  - "The baseline characteristics of the participants in both groups were comparable (Table 1)." (p. 6)
  - Relevant Quotes: 1) "The Lecture-based Training Group received instruction through standard classroom-based, didactic lectures..." (p. 4) 2) "All eligible participants were randomly assigned to either the VR-based Training Group or the Lecture-based Training Group in an equal 1:1 ratio, with 23 participants in each group." (p. 3) 3) "The baseline characteristics of the participants in both groups were comparable (Table 1)." (p. 6) 4) "Table 1 Baseline characteristics of participants" (p. 6) Detailed Analysis: Criterion D requires a well-documented control group, including what it received and baseline characteristics. The paper explicitly describes what the control group received (lecture-based training) and provides clear group sizes (23 per group). It also states baseline characteristics were comparable and provides Table 1, which documents baseline demographics and work experience for both groups. Final Summary: Criterion D is met because the control condition and baseline characteristics are clearly documented.
Level 2 Criteria
- S
  School-level RCT
  - The trial randomized individual trainees rather than randomizing schools (or other institution-level sites) to conditions.
  - "All eligible participants were randomly assigned to either the VR-based Training Group or the Lecture-based Training Group..." (p. 3)
  - Relevant Quotes: 1) "The study will be conducted at Guangxi Zhuang Autonomous Region People’s Hospital..." (p. 2) 2) "All eligible participants were randomly assigned to either the VR-based Training Group or the Lecture-based Training Group in an equal 1:1 ratio, with 23 participants in each group." (p. 3) Detailed Analysis: Criterion S requires randomization at the school/site level, i.e., whole schools or analogous sites (institutions/centers/programs) are assigned to conditions. This study is conducted at a single hospital and randomizes individual trainees within that site. The paper does not describe multiple sites being randomized as clusters. Final Summary: Criterion S is not met because randomization was not conducted at a school/site (cluster) level.
- I
  Independent Conduct
  - The VR program was developed by the same hospital department that ran the study, and the paper does not document an external independent evaluation team (despite using blinded raters and an independent statistician for some tasks).
  - "The program—Road Traffic Injury VR Software 1.0—was developed by the Emergency Department of Guangxi Zhuang Autonomous Region People’s Hospital." (p. 3)
  - Relevant Quotes: 1) "The program—Road Traffic Injury VR Software 1.0—was developed by the Emergency Department of Guangxi Zhuang Autonomous Region People’s Hospital." (p. 3) 2) "Two independent raters scored all participants’ performances. Both raters were blinded to group allocation..." (p. 5) 3) "All statistical analyses were conducted... under guidance from an independent statistician to ensure the appropriateness of the methods." (p. 5) 4) "Z.L. (Zhe Li) designed the study, performed data analysis, and wrote the main manuscript text." (p. 12) Detailed Analysis: Criterion I requires that the evaluation be conducted independently from the designers of the intervention, typically via a third-party evaluator or clearly external/independent conduct. The intervention software was developed by the same hospital department where the study was conducted, and an author is stated to have designed the study and performed the analysis. The study includes bias-reducing steps (blinded OSCE raters and guidance from an independent statistician), but these do not document that the overall conduct and evaluation were independent from the intervention developers. Final Summary: Criterion I is not met because the evaluation is not clearly independent from the intervention developers/institution.
- Y
  Year Duration
  - Outcomes were assessed immediately after a 2-hour intervention, so year-long tracking is absent; additionally, since criterion T is not met, criterion Y is automatically not met.
  - "Furthermore, the training duration was limited to two hours, and study outcomes were assessed immediately after the intervention." (p. 11)
  - Relevant Quotes: 1) "The study will be conducted... between January 2024 and October 2024." (p. 2) 2) "Each educational session—whether VR-based or lecture-based— lasts for two hours..." (p. 2) 3) "Furthermore, the training duration was limited to two hours, and study outcomes were assessed immediately after the intervention." (p. 11) Detailed Analysis: Criterion Y requires outcomes to be measured at least 75% of one academic year after the intervention begins. This study’s intervention is a single two-hour session, and the paper explicitly states outcomes were assessed immediately after the intervention. This is far shorter than 75% of an academic year. ERCT dependency rule also applies: if criterion T is not met, then criterion Y is not met. Since term-long follow-up is absent, year- long tracking is necessarily absent. Final Summary: Criterion Y is not met because the study measures outcomes immediately after a 2-hour session, and T is not met.
- B
  Balanced Control Group
  - Both arms received equal instructional time (2 hours) and the same core curriculum content; the VR hardware/software is integral to the intended treatment contrast.
  - "Each educational session—whether VR-based or lecture-based—lasts for two hours, and all participants will be given the same amount of class time." (p. 2)
  - Relevant Quotes: 1) "Each educational session—whether VR-based or lecture-based— lasts for two hours, and all participants will be given the same amount of class time." (p. 2) 2) "Both groups are exposed to the same curriculum content focused on managing mass casualty incidents..." (p. 2) 3) "Participants in both the VR and Lecture-based Training Groups received the same core curriculum content... Each training session lasted 2 h." (p. 3) 4) "Theoretical concepts covered during the VR session were identical to those presented in the lecture-based group, ensuring content equivalence across both formats." (p. 4) 5) "Both groups received equal time for exposure to the curriculum direction and were evaluated through the same methods of assessment post-training." (p. 4) Detailed Analysis: Criterion B compares the nature, quantity, and quality of resources (time, materials, staff support) provided to intervention and control conditions, and asks whether the control condition offers a comparable substitute for the intervention’s inputs, unless extra resources are explicitly the treatment variable. This paper explicitly states both groups had the same amount of class time ("two hours") and received the same core curriculum content, with content equivalence explicitly asserted. The VR group necessarily uses VR equipment and software, but these resources are integral to the intervention being tested (VR-based training modality). The main confound that Criterion B targets in educational studies—additional instructional time—is explicitly controlled by equal time in both arms. Final Summary: Criterion B is met because instructional time and core content are explicitly balanced, and the VR-specific resources are integral to the intended treatment contrast.
Level 3 Criteria
- R
  Reproduced
  - No independent replication by a non-overlapping author team was found for this specific study/intervention.
  - "However, comparative evidence between VR and conventional instructional approaches remains limited, especially in the field of mass casualty management." (p. 2)
  - Relevant Quotes: From this paper: 1) "However, comparative evidence between VR and conventional instructional approaches remains limited, especially in the field of mass casualty management." (p. 2) 2) "Few studies have rigorously evaluated VR-based training for large-scale accident scenarios..." (p. 2) From other sources found during an internet search: 3) "Road Traffic Injury VR Software 1.0" (Frontiers in Virtual Reality, Li et al., 2025) 4) "The software, named Road Traffic Injury VR Software 1.0, was specifically designed to simulate road traffic injury scenarios..." (PLOS ONE, Li et al., 2025) Detailed Analysis: Criterion R requires independent replication (a different research team, different context) published in a peer-reviewed journal. An internet search located related publications that use the same named software ("Road Traffic Injury VR Software 1.0"), including a 2025 Frontiers paper and a 2025 PLOS ONE paper. However, these are not independent replications of the present BMC Medical Education RCT because they share overlapping authors (e.g., Zhe Li, Lei Shi, Wan Chen, Yutao Tang, Guozheng Qiu). No replication study by an author team with no overlap with this paper’s authors was identified in the searched sources as of 2026-03-13. Final Summary: Criterion R is not met because no independent (non-overlapping author team) replication of this specific study/intervention was found.
- A
  All-subject Exams
  - Because criterion E is not met, criterion A is automatically not met; additionally, the study does not assess standardized exams across all core subjects.
  - "The primary endpoint involves assessing students’ performance in simulated mass casualty scenarios using an Objective Structured Clinical Examination (OSCE) and a written examination to evaluate theoretical knowledge." (p. 2)
  - Relevant Quotes: 1) "The primary endpoint involves assessing students’ performance in simulated mass casualty scenarios using an Objective Structured Clinical Examination (OSCE) and a written examination to evaluate theoretical knowledge." (p. 2) 2) "The test consisted of 20 questions designed to assess competency..." (p. 4) Detailed Analysis: Criterion A requires standardized exam-based assessment across all main subjects, and ERCT rules state: if criterion E is not met, then criterion A is not met. This study assesses mass casualty management performance via OSCE stations and a study-constructed written test, rather than using standardized exams across a multi-subject curriculum. Final Summary: Criterion A is not met because criterion E is not met and the study does not use standardized exams across all core subjects.
- G
  Graduation Tracking
  - The paper does not track participants to graduation, and because criterion Y is not met, criterion G is automatically not met.
  - "Future research should therefore include extended training exposure and longitudinal follow-up to evaluate skill durability over time." (p. 11)
  - Relevant Quotes: From this paper: 1) "Furthermore, the training duration was limited to two hours, and study outcomes were assessed immediately after the intervention." (p. 11) 2) "Future research should therefore include extended training exposure and longitudinal follow-up to evaluate skill durability over time." (p. 11) Detailed Analysis: Criterion G requires tracking participants until graduation from the relevant educational stage. The paper explicitly states outcomes were assessed immediately and frames longitudinal follow-up as future work, implying no long-term tracking (let alone tracking to graduation) was performed. ERCT dependency rule also applies: if criterion Y (Year Duration) is not met, then criterion G is not met. Since this study lacks year-long tracking, it necessarily lacks graduation tracking. An internet search for follow-up publications by the same author team did not identify any publication that tracks this cohort to graduation. Final Summary: Criterion G is not met because there is no graduation tracking, and Y is not met (so G fails by dependency as well).
- P
  Pre-Registered
  - The study reports a registration date of February 5, 2025, which is after the stated study period (January 2024 to October 2024), so the protocol was not pre-registered before the study began.
  - "The study was subsequently registered in the Chinese Clinical Trial Registry (ChiCTR2500096725) with a registration date of February 5, 2025." (p. 3)
  - Relevant Quotes: From this paper: 1) "The study will be conducted... between January 2024 and October 2024." (p. 2) 2) "The study was subsequently registered in the Chinese Clinical Trial Registry (ChiCTR2500096725) with a registration date of February 5, 2025." (p. 3) Detailed Analysis: Criterion P requires a publicly registered protocol before data collection begins. The paper states the study timeframe is "between January 2024 and October 2024" and separately states the trial was registered with a "registration date of February 5, 2025." Taken at face value, the registration date is after the study period described in the paper, which cannot qualify as pre-registration. An attempt to locate the public registry entry via open web search using the identifier "ChiCTR2500096725" did not return a publicly accessible registry record page; therefore, the assessment relies on the paper’s own stated registration date and stated study timeframe. Final Summary: Criterion P is not met because the reported registry date is after the reported study period, so the protocol was not pre-registered.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Request Valuation Update

All Other Requests

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

Submit a Study for Evaluation

Share your research with us for review
Suggest Improvements

Provide feedback to help us make things better.
Update Your Study

If you're the author, let us know about necessary updates or corrections.

Effectiveness of VR and traditional training in medical education for mass casualty management: an OSCE-based randomized controlled trial

Participants were randomized as individuals, not as intact classes (and this is not a one-to-one tutoring exception).

Outcomes were measured with a study-constructed multiple-choice test and an OSCE checklist developed for this study, not a widely recognized standardized exam.

Outcomes were assessed immediately around a single 2-hour training session rather than at least one academic term after the intervention began.

The lecture-based control condition is clearly described and the paper reports group sizes and baseline characteristics.

The trial randomized individual trainees rather than randomizing schools (or other institution-level sites) to conditions.

The VR program was developed by the same hospital department that ran the study, and the paper does not document an external independent evaluation team (despite using blinded raters and an independent statistician for some tasks).

Outcomes were assessed immediately after a 2-hour intervention, so year-long tracking is absent; additionally, since criterion T is not met, criterion Y is automatically not met.

Both arms received equal instructional time (2 hours) and the same core curriculum content; the VR hardware/software is integral to the intended treatment contrast.

No independent replication by a non-overlapping author team was found for this specific study/intervention.

Because criterion E is not met, criterion A is automatically not met; additionally, the study does not assess standardized exams across all core subjects.

The paper does not track participants to graduation, and because criterion Y is not met, criterion G is automatically not met.

The study reports a registration date of February 5, 2025, which is after the stated study period (January 2024 to October 2024), so the protocol was not pre-registered before the study began.

Abstract

ERCT Criteria Breakdown

Level 1 Criteria

Class-level RCT

Exam-based Assessment

Term Duration

Documented Control Group

Level 2 Criteria

School-level RCT

Independent Conduct

Year Duration

Balanced Control Group

Level 3 Criteria

Reproduced

All-subject Exams

Graduation Tracking

Pre-Registered

Request an Update or Contact Us

Have Questions or Suggestions?

Submit a Study for Evaluation

Suggest Improvements

Update Your Study

Have Questions
or Suggestions?