Evaluating a Problem-Based Learning Model Integrated with 3D Anatomy Software and Software-Assisted Annotation in Undergraduate Spinal Surgery Education: a Randomized Controlled Trial

Wenbo Li, Ziyao Ding, Shuo Feng, Wenkang Xu, Haixu Qi, Qirui Zhu, Bingxu Xiao, Shaoyu Zhu, Maji Sun, Feng Yuan

Published: Jan 26, 2026

ERCT Check Date: Feb 22, 2026

DOI: 10.1186/s12909-026-08652-7

Link

Download PDF

science
higher education
China
EdTech app

C

Random allocation is described at the individual participant level rather than by intact classes (or schools), so class-level randomization is not demonstrated.

"Participants were randomly allocated via random number table assignment into experimental (n=60) and control (n=60) groups."
E

The primary outcome assessments are internally assembled (course-derived written exam plus a study-developed questionnaire) rather than a widely recognized standardized exam.

"This questionnaire was developed specifically for this study to measure perceived mastery (Domain A) and learning experience/satisfaction (Domain B)."
T

Although an August 2024 start is stated, the paper does not provide a clear end date or duration showing outcomes were measured at least one academic term after the intervention began.

"At the conclusion of the instructional period, a structured nine-item questionnaire was administered across two domains (Tables 2 and 3)..."
D

The control condition and demographics are described, but baseline academic performance (pre-intervention achievement) is not reported for the control group.

"The control group included 30 males and 30 females aged 21–24 years (mean 22.5±0.8 years)."
S

The study occurs within one institution and randomizes individuals, not whole schools/sites, so school-level randomization is not shown.

"A cohort of 120 spine surgery interns who commenced training at our institution in August 2024 was enrolled."
I

The authors who designed the study also carried out data collection and analysis, and no independent external evaluation team is documented.

"Feng Yuan and Maji Sun conceived and designed the study."
Y

The paper does not report an outcome measurement time point at least 75% of an academic year after intervention start, and criterion T is also not met.

"A cohort of 120 spine surgery interns who commenced training at our institution in August 2024 was enrolled."
B

The intervention explicitly tests a combined 3D+PBL+annotation teaching package where the additional resources are integral to the treatment definition, making a business-as-usual lecture control appropriate.

"Participants were divided into an experimental group (n=60, receiving 3D+PBL+annotation-assisted teaching) and a control group (n=60, receiving traditional LBL)."
R

No independent replication by a different research team in a different context could be identified.
A

Because standardized exam-based assessment is not documented (criterion E not met), the study cannot meet the all-subject standardized exam requirement.
G

The study focuses on short-term outcomes and provides no graduation tracking, and criterion G also fails automatically because criterion Y is not met.

"This study acknowledges limitations including ... (3) exclusive focus on short-term outcomes without assessing long-term knowledge/skill retention."
P

The paper reports ChiCTR registration with an ID and registration date that precedes the stated August 2024 study start.

"Chinese Clinical Trial Registry (ChiCTR), ChiCTR2400082568. Registered on 01 April 2024."

Abstract

Background: Traditional lecture-based learning (LBL) faces limitations in teaching complex spinal anatomy and surgical procedures. This study aimed to evaluate the efficacy of a novel Problem-Based Learning (PBL) model integrated with three-dimensional (3D) anatomy software and software-assisted annotation in spinal surgery education. Methods: A randomized controlled trial included 120 fifth-year clinical medicine undergraduates, starting in August 2024. Participants were divided into an experimental group (n = 60, receiving 3D + PBL + annotation-assisted teaching) and a control group (n = 60, receiving traditional LBL). Outcomes were assessed via written tests (objective/subjective questions) and standardized questionnaires evaluating knowledge mastery, learning motivation, academic atmosphere, teacher-student interaction, and knowledge retention. Results: The experimental group scored significantly higher on subjective questions (case analysis) than the control group (39.33 ± 5.38 vs. 32.08 ± 4.79, P < 0.001). Questionnaire results indicated that the experimental group reported significantly higher self-rated mastery in spinal endoscopic procedures, anatomy, Michigan State University (MSU) classification of lumbar disc herniation (LDH), and differential diagnosis (all P < 0.05). In addition, students in the experimental group expressed greater satisfaction with learning motivation, academic atmosphere, teacher–student interaction, and knowledge retention (all P < 0.05). Conclusions: Integrating 3D anatomy visualization, software-assisted annotation, and PBL significantly enhances clinical reasoning, spatial understanding, and student engagement in spinal surgery education. This multimodal approach addresses the limitations of traditional methods and is recommended for broader application in orthopedic training. Trial registration: Chinese Clinical Trial Registry (ChiCTR), ChiCTR2400082568. Registered on 01 April 2024.

Full Article

ERCT Criteria Breakdown

Level 1 Criteria
- C
  Class-level RCT
  - Random allocation is described at the individual participant level rather than by intact classes (or schools), so class-level randomization is not demonstrated.
  - "Participants were randomly allocated via random number table assignment into experimental (n=60) and control (n=60) groups."
  - Relevant Quotes: 1) "Participants were randomly allocated via random number table assignment into experimental (n=60) and control (n=60) groups." (p. 2) 2) "A cohort of 120 spine surgery interns who commenced training at our institution in August 2024 was enrolled." (p. 2) Detailed Analysis: Criterion C requires that randomization be conducted at the class level (or stronger, e.g., school/site level) to reduce contamination, unless a tutoring/one-to-one teaching exception applies. The paper states that "Participants were randomly allocated" into two groups using a random number table, but it does not state that intact classes, rotations, tutorial groups, or other pre-existing class units were randomized. The setting is described as a single cohort at one institution, which is consistent with individual-level assignment. No tutoring/one-to-one intervention exception is described; the intervention is a group teaching model. Criterion C is not met because the unit of randomization is not reported as class-level (or school-level), but rather individual participants. Final sentence: Criterion C is not met because randomization is not shown to occur at the class (or school) level.
- E
  Exam-based Assessment
  - The primary outcome assessments are internally assembled (course-derived written exam plus a study-developed questionnaire) rather than a widely recognized standardized exam.
  - "This questionnaire was developed specifically for this study to measure perceived mastery (Domain A) and learning experience/satisfaction (Domain B)."
  - Relevant Quotes: 1) "The summative assessment employed a closed-book format comprising two equally weighted sections, each accounting for 50% of the total 100-point score." (p. 4) 2) "Examination content was rigorously derived from didactic materials, with objective items (primarily multiple-choice questions) curated from historical medical practitioner databases to evaluate factual knowledge acquisition." (p. 4) 3) "The structured nine-item questionnaire was used to assess students’ mastery of spinal surgery concepts and satisfaction with the integrated 3D-PBL-annotation pedagogy. This questionnaire was developed specifically for this study to measure perceived mastery (Domain A) and learning experience/satisfaction (Domain B)." (p. 5) Detailed Analysis: Criterion E requires use of a standardized, widely recognized exam-based assessment (i.e., not assembled specifically for the study). The paper describes a closed-book exam whose content is "derived from didactic materials" and includes objective items curated from a question database plus faculty-authored subjective case analyses. This indicates an internally constructed assessment aligned to the course rather than a widely recognized standardized external exam. The paper also explicitly states that the questionnaire "was developed specifically for this study," which directly indicates a non-standardized instrument. Criterion E is not met because the outcome assessment is not documented as a widely recognized standardized exam-based assessment. Final sentence: Criterion E is not met because the measures are internally assembled and the questionnaire is study-developed rather than standardized.
- T
  Term Duration
  - Although an August 2024 start is stated, the paper does not provide a clear end date or duration showing outcomes were measured at least one academic term after the intervention began.
  - "At the conclusion of the instructional period, a structured nine-item questionnaire was administered across two domains (Tables 2 and 3)..."
  - Relevant Quotes: 1) "A cohort of 120 spine surgery interns who commenced training at our institution in August 2024 was enrolled." (p. 2) 2) "At the conclusion of the instructional period, a structured nine-item questionnaire was administered across two domains (Tables 2 and 3)..." (p. 4) 3) "Pre-session preparation involved distributing representative clinical cases (lumbar spinal stenosis, disc herniation, and vertebral compression fractures) and spine endoscopy videos one week prior..." (p. 4) Detailed Analysis: Criterion T requires outcomes be measured at least one full academic term after the intervention begins (or that the paper clearly documents term-long timing from start to outcome measurement). The paper provides a start context ("commenced training... in August 2024") and states outcomes were collected "at the conclusion of the instructional period," but it does not specify the date of that conclusion, the length of the instructional period, or whether it spans a semester/term (approximately 3–4 months). The "one week prior" statement refers to distribution of materials before a session, not the overall duration from intervention start to outcome measurement. Criterion T is not met because the start-to-outcome interval is not documented as at least one academic term. Final sentence: Criterion T is not met because the paper does not report a duration demonstrating at least one term between start and measurement.
- D
  Documented Control Group
  - The control condition and demographics are described, but baseline academic performance (pre-intervention achievement) is not reported for the control group.
  - "The control group included 30 males and 30 females aged 21–24 years (mean 22.5±0.8 years)."
  - Relevant Quotes: 1) "The control group included 30 males and 30 females aged 21–24 years (mean 22.5±0.8 years)." (p. 2) 2) "All participants were clinical interns with no statistically significant intergroup differences in age or sex distribution (P>0.05)." (p. 2) 3) "The control cohort received conventional educator-centered pedagogy. Prior to each session, participants were required to review designated educational materials." (p. 4) Detailed Analysis: Criterion D requires a well-documented control group, including (at least) who the control group is, what it received, and baseline performance / achievement information that supports comparability. The paper documents the control group’s size and demographics (sex, age) and describes the control condition as conventional lecture-based teaching, including preparatory expectations. However, the paper does not report baseline academic performance (e.g., a pre-test, prior course grades, or other baseline achievement) for the control group (or for both groups). Without baseline performance reporting, the control group is not fully documented per the ERCT control documentation requirement. Criterion D is not met because baseline academic performance information is not provided for the control group. Final sentence: Criterion D is not met because baseline performance of the control group is not documented.
Level 2 Criteria
- S
  School-level RCT
  - The study occurs within one institution and randomizes individuals, not whole schools/sites, so school-level randomization is not shown.
  - "A cohort of 120 spine surgery interns who commenced training at our institution in August 2024 was enrolled."
  - Relevant Quotes: 1) "A cohort of 120 spine surgery interns who commenced training at our institution in August 2024 was enrolled." (p. 2) 2) "Participants were randomly allocated via random number table assignment into experimental (n=60) and control (n=60) groups." (p. 2) Detailed Analysis: Criterion S requires randomization at the school/site level (i.e., whole educational units implementing the intervention are randomized). The paper describes a single institution cohort and individual participants being randomized into experimental and control groups. There is no indication that multiple schools/sites participated, or that entire educational units were randomized. Criterion S is not met because the unit of randomization is not a school (or equivalent site), but individual participants. Final sentence: Criterion S is not met because the study does not use school-level (site-level) randomization.
- I
  Independent Conduct
  - The authors who designed the study also carried out data collection and analysis, and no independent external evaluation team is documented.
  - "Feng Yuan and Maji Sun conceived and designed the study."
  - Relevant Quotes: 1) "Feng Yuan and Maji Sun conceived and designed the study." (p. 7) 2) "Wenbo Li, Ziyao Ding, Wenkang Xu, and Haixu Qi were responsible for material preparation, data collection, and analysis." (p. 7) 3) "Subjective components consisted of clinical case analyses authored by orthopedics faculty and independently graded by non-participating instructors to ensure impartiality." (p. 4) Detailed Analysis: Criterion I requires that the evaluation be conducted independently from the intervention designers to reduce bias in implementation, measurement, and analysis. The author contributions state that key authors conceived and designed the study and that authors were responsible for "material preparation, data collection, and analysis," indicating the same team designed and ran the evaluation. The paper does include one safeguard: subjective case analyses were graded by "non-participating instructors." This helps reduce grading bias for that component, but it does not establish that the overall trial was conducted by an independent external evaluation team. Criterion I is not met because independent conduct is not documented for the trial overall. Final sentence: Criterion I is not met because the paper does not document independent external conduct of the evaluation.
- Y
  Year Duration
  - The paper does not report an outcome measurement time point at least 75% of an academic year after intervention start, and criterion T is also not met.
  - "A cohort of 120 spine surgery interns who commenced training at our institution in August 2024 was enrolled."
  - Relevant Quotes: 1) "A cohort of 120 spine surgery interns who commenced training at our institution in August 2024 was enrolled." (p. 2) 2) "At the conclusion of the instructional period, a structured nine-item questionnaire was administered..." (p. 4) Detailed Analysis: Criterion Y requires outcomes be measured at least 75% of an academic year after the intervention begins, supported by clearly stated start and measurement dates (or otherwise clearly stated duration). The paper provides a start context (August 2024) but does not provide a dated end point or a duration for the instructional period, so the start-to-measurement interval cannot be verified as year-scale. Additionally, ERCT rules specify that if criterion T is not met then criterion Y is not met. Since criterion T is not met due to unclear timing, criterion Y is also not met. Final sentence: Criterion Y is not met because the paper does not document a year-scale measurement window (and T is not met).
- B
  Balanced Control Group
  - The intervention explicitly tests a combined 3D+PBL+annotation teaching package where the additional resources are integral to the treatment definition, making a business-as-usual lecture control appropriate.
  - "Participants were divided into an experimental group (n=60, receiving 3D+PBL+annotation-assisted teaching) and a control group (n=60, receiving traditional LBL)."
  - Relevant Quotes: 1) "Participants were divided into an experimental group (n=60, receiving 3D+PBL+annotation-assisted teaching) and a control group (n=60, receiving traditional LBL)." (p. 1) 2) "Pre-session preparation involved distributing representative clinical cases ... and spine endoscopy videos one week prior, with instructions to download the Complete Anatomy software to review 3D anatomical models;" (p. 4) 3) "The spine endoscopic anatomical structures were annotated using QuPath software (Open-source bioimage analysis software), with a total of 200 annotated images prepared for student instruction." (p. 3) 4) "The control cohort received conventional educator-centered pedagogy. Prior to each session, participants were required to review designated educational materials." (p. 4) Detailed Analysis: Criterion B evaluates whether time and resources are balanced between intervention and control conditions, unless the additional resources are explicitly the treatment being tested. This paper defines the experimental condition as "3D+PBL+annotation- assisted teaching" and describes specific additional inputs that are core to that package (3D anatomy software use and QuPath-based annotated image materials). These are not incidental add-ons; they are part of the intervention construct the study aims to evaluate versus traditional lecture-based learning. The control group is described as receiving conventional instruction and being required to review designated educational materials. While the paper does not quantify total time-on-task for each condition, the key resource differences (software-supported 3D visualization and annotation-guided learning) are the intended treatment contrast rather than an uncontrolled imbalance. Therefore, under the ERCT criterion-B decision rule, this is a case where additional resources are integral to the treatment definition being tested against business-as-usual. Final sentence: Criterion B is met because the extra resources are integral to the intervention being tested rather than a confounding, non-integral imbalance.
Level 3 Criteria
- R
  Reproduced
  - No independent replication by a different research team in a different context could be identified.
  - Relevant Quotes: 1) "This study acknowledges limitations including (1) constrained sample size and demographic homogeneity (n = 120 from a single institution)..." (p. 7) Detailed Analysis: Criterion R requires evidence that the same study claim/intervention has been independently replicated by a different research team, in a different context, in a peer-reviewed publication. This article reports one single-institution RCT and does not state that it is itself a replication study. To check for independent reproduction, internet searching was performed for trials replicating the same combined intervention (3D anatomy software + PBL + software-assisted annotation) in undergraduate spinal surgery education. No clearly identified peer-reviewed replication study by an independent author team reproducing this specific combined intervention and study design was found. Final sentence: Criterion R is not met because independent replication of this study was not identified.
- A
  All-subject Exams
  - Because standardized exam-based assessment is not documented (criterion E not met), the study cannot meet the all-subject standardized exam requirement.
  - Relevant Quotes: 1) "Examination content was rigorously derived from didactic materials..." (p. 4) 2) "This questionnaire was developed specifically for this study..." (p. 5) Detailed Analysis: Criterion A requires standardized exam-based outcomes across all main subjects, and ERCT rules specify that if criterion E is not met then criterion A is not met. Since the paper does not document use of a widely recognized standardized exam-based assessment (and explicitly reports a study-developed questionnaire), criterion E is not met, which automatically prevents meeting criterion A. Final sentence: Criterion A is not met because criterion E is not met.
- G
  Graduation Tracking
  - The study focuses on short-term outcomes and provides no graduation tracking, and criterion G also fails automatically because criterion Y is not met.
  - "This study acknowledges limitations including ... (3) exclusive focus on short-term outcomes without assessing long-term knowledge/skill retention."
  - Relevant Quotes: 1) "This study acknowledges limitations including ... (3) exclusive focus on short-term outcomes without assessing long-term knowledge/skill retention." (p. 7) Detailed Analysis: Criterion G requires tracking participants until graduation (from the relevant educational stage). The paper explicitly acknowledges an "exclusive focus on short-term outcomes," which is inconsistent with a design that follows learners to a graduation milestone. No graduation outcome data or tracking procedures are described. Additionally, ERCT rules specify that if criterion Y is not met then criterion G is not met. Since criterion Y is not met, criterion G cannot be met. Final sentence: Criterion G is not met because graduation tracking is not reported and criterion Y is not met.
- P
  Pre-Registered
  - The paper reports ChiCTR registration with an ID and registration date that precedes the stated August 2024 study start.
  - "Chinese Clinical Trial Registry (ChiCTR), ChiCTR2400082568. Registered on 01 April 2024."
  - Relevant Quotes: 1) "Chinese Clinical Trial Registry (ChiCTR), ChiCTR2400082568. Registered on 01 April 2024." (p. 2) 2) "A randomized controlled trial included 120 fifth-year clinical medicine undergraduates, starting in August 2024." (p. 1) Detailed Analysis: Criterion P requires that the protocol be pre-registered before data collection begins. The paper provides a registry (ChiCTR), a registration identifier (ChiCTR2400082568), and a registration date (01 April 2024). It also states that the RCT started in August 2024. On the basis of these statements, registration occurred before the reported start, satisfying the timing requirement. A direct lookup of the ChiCTR registry record was attempted, but the ChiCTR public search interface was not accessible in a way that allowed retrieval of the specific record in this review environment. Therefore, the evidence used here is the paper’s explicit registration statement. Final sentence: Criterion P is met because the paper reports prospective trial registration with an ID and a registration date preceding study start.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Request Valuation Update

All Other Requests

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

Submit a Study for Evaluation

Share your research with us for review
Suggest Improvements

Provide feedback to help us make things better.
Update Your Study

If you're the author, let us know about necessary updates or corrections.

Evaluating a Problem-Based Learning Model Integrated with 3D Anatomy Software and Software-Assisted Annotation in Undergraduate Spinal Surgery Education: a Randomized Controlled Trial

Random allocation is described at the individual participant level rather than by intact classes (or schools), so class-level randomization is not demonstrated.

The primary outcome assessments are internally assembled (course-derived written exam plus a study-developed questionnaire) rather than a widely recognized standardized exam.

Although an August 2024 start is stated, the paper does not provide a clear end date or duration showing outcomes were measured at least one academic term after the intervention began.

The control condition and demographics are described, but baseline academic performance (pre-intervention achievement) is not reported for the control group.

The study occurs within one institution and randomizes individuals, not whole schools/sites, so school-level randomization is not shown.

The authors who designed the study also carried out data collection and analysis, and no independent external evaluation team is documented.

The paper does not report an outcome measurement time point at least 75% of an academic year after intervention start, and criterion T is also not met.

The intervention explicitly tests a combined 3D+PBL+annotation teaching package where the additional resources are integral to the treatment definition, making a business-as-usual lecture control appropriate.

No independent replication by a different research team in a different context could be identified.

Because standardized exam-based assessment is not documented (criterion E not met), the study cannot meet the all-subject standardized exam requirement.

The study focuses on short-term outcomes and provides no graduation tracking, and criterion G also fails automatically because criterion Y is not met.

The paper reports ChiCTR registration with an ID and registration date that precedes the stated August 2024 study start.

Abstract

ERCT Criteria Breakdown

Level 1 Criteria

Class-level RCT

Exam-based Assessment

Term Duration

Documented Control Group

Level 2 Criteria

School-level RCT

Independent Conduct

Year Duration

Balanced Control Group

Level 3 Criteria

Reproduced

All-subject Exams

Graduation Tracking

Pre-Registered

Request an Update or Contact Us

Have Questions or Suggestions?

Submit a Study for Evaluation

Suggest Improvements

Update Your Study

Have Questions
or Suggestions?