Artificial intelligence-based image recognition in bronchoscopy: software development and randomized controlled trial for training evaluation in intensive care residents

Beatrice Brunoni, Francesco Zadek, Federica Pampurini, Marco Vettorello, Francesco Baccoli, Federico Cabitza, Roberto Fumagalli, and Thomas Langer

Published: Feb 24, 2026

ERCT Check Date: Apr 14, 2026

DOI: 10.1186/s12909-026-08817-4

Link

Download PDF

higher education
EU
EdTech app

C

Although randomization was at the individual level, this is acceptable here because the intervention is an individual, personal-training activity rather than a class-delivered program.

"Each resident had 20 min of individual training and watched the individual training sessions of the other residents of her/his group." (p. 5)
E

Outcomes were measured with a modified skills assessment tool rather than a widely recognized standardized exam-based assessment.

"Bronchoscopy skills were assessed using the modified Bronchoscopy Skill and Task Assessment Tool (BSTAT) before and after training." (p. 1)
T

Outcomes were measured immediately after brief training, not at least one academic term after the intervention began.

"At the end of the training, each resident repeated the modified BSTAT." (p. 5)
D

The paper documents both groups, sample sizes, baseline characteristics, and comparative results.

"Twenty-two second-year anesthesia and intensive care residents (aged 28 ± 2 years old, 59% female) were enrolled and randomized in two groups of 11 individuals each." (p. 6)
S

Randomization was not conducted at an institution or site level; it was individual-level allocation.

"Participants were thereafter randomized in a 1:1 ratio using sealed envelopes." (p. 5)
I

The paper does not document an independent external evaluation team; the authors designed, delivered, and analyzed the study themselves.

"Authors’ contributions Concept and study design: TL, BB, RF; AI-software development: BB, FP, MV; Training and data recruitment: BB, FP, FB, MV; Data analysis and interpretation: BB, FZ, FP..." (p. 10)
Y

Outcomes were not measured over at least 75% of an academic year, and criterion T is also not met.

"At the end of the training, each resident repeated the modified BSTAT." (p. 5)
B

The study appears to keep training exposure (time and peer observation) comparable across groups, while the guidance modality (AI vs expert) is the intended treatment difference.

"Each resident had 20 min of individual training and watched the individual training sessions of the other residents of her/his group." (p. 5)
R

No independent replication of this specific custom-made software and trial by a different research team was found.

"Two recent studies share similarities with our research, as they explore the role of AI-assisted feedback in improving bronchoscopy performance in a simulated setting [12, 13]." (p. 8)
A

Criterion E is not met, so criterion A is not met, and the outcomes are not all-subject standardized exams.
G

The study does not track participants to graduation, and criterion Y is not met so criterion G is not met.

"At the end of the training, each resident repeated the modified BSTAT." (p. 5)
P

The trial registration date is after the study began, so the protocol was not pre-registered before data collection.

"The randomized controlled trial was retrospectively registered with the ISRCTN registry (ISRCTN63799884)." (p. 2)

Abstract

Flexible bronchoscopy is an essential tool for airway management and both diagnostic and therapeutic interventions, particularly in critical care. Accurate identification of tracheobronchial structures is crucial but challenging for less experienced clinicians, often leading to prolonged procedures and increased complication risks. Simulation-based training using virtual reality or manikins has shown promise, and recent studies suggest that artificial intelligence (AI)-based training outperforms self-directed learning. Limited data exist comparing AI-based bronchoscopy training to expert-led instruction. This study aimed to develop and evaluate a custom-made AI-based software for identifying key tracheobronchial structures and assessing its effectiveness as a training tool for anesthesia and intensive care residents.

Full Article

ERCT Criteria Breakdown

Level 1 Criteria
- C
  Class-level RCT
  - Although randomization was at the individual level, this is acceptable here because the intervention is an individual, personal-training activity rather than a class-delivered program.
  - "Each resident had 20 min of individual training and watched the individual training sessions of the other residents of her/his group." (p. 5)
  - Relevant Quotes: 1) "Participants were thereafter randomized in a 1:1 ratio using sealed envelopes." (p. 5) 2) "Each resident had 20 min of individual training and watched the individual training sessions of the other residents of her/his group." (p. 5) Detailed Analysis: Criterion C requires random assignment at the level of intact classes (or school-level randomization), primarily to reduce contamination when an intervention is delivered to whole classes. However, the ERCT standard explicitly allows an exception when the intervention is designed as personal teaching (tutoring-like), where individual-level randomization is acceptable. Here the intervention is simulation-based bronchoscopy training provided to individual residents ("Each resident had 20 min of individual training..."). Given the individualized delivery and the absence of a classroom-level instructional program being implemented, individual randomization does not create the class contamination problem that criterion C is meant to prevent. Final summary sentence: Criterion C is met because this is an individually delivered training intervention where the ERCT personal-teaching exception makes individual randomization acceptable.
- E
  Exam-based Assessment
  - Outcomes were measured with a modified skills assessment tool rather than a widely recognized standardized exam-based assessment.
  - "Bronchoscopy skills were assessed using the modified Bronchoscopy Skill and Task Assessment Tool (BSTAT) before and after training." (p. 1)
  - Relevant Quotes: 1) "Bronchoscopy skills were assessed using the modified Bronchoscopy Skill and Task Assessment Tool (BSTAT) before and after training." (p. 1) 2) "Specifically, as previously reported by several authors [18–20], we used a modified version, which evaluates only proximal tracheobronchial structures (Supplementary materials), on which the AI-based software was specifically trained." (p. 5) Detailed Analysis: Criterion E requires outcomes to be measured using standardized, widely recognized exam-based assessments (not custom or study-tailored measures). The paper assesses outcomes using a "modified" BSTAT, and the paper explicitly notes the modified version focuses on proximal structures "on which the AI-based software was specifically trained." This creates a close alignment between what the intervention targets and what the outcome measure emphasizes, and it is not described as an external, standardized exam comparable across institutions. Final summary sentence: Criterion E is not met because the primary outcome measure is a modified skills tool rather than a standardized exam-based assessment.
- T
  Term Duration
  - Outcomes were measured immediately after brief training, not at least one academic term after the intervention began.
  - "At the end of the training, each resident repeated the modified BSTAT." (p. 5)
  - Relevant Quotes: 1) "After a 1-hour frontal lecture on bronchoscopy and bronchial anatomy, the baseline bronchoscopy skills of all participants were tested using the Bronchoscopy Skill and Task Assessment Tool (BSTAT) [17]." (p. 5) 2) "Each resident had 20 min of individual training and watched the individual training sessions of the other residents of her/his group." (p. 5) 3) "At the end of the training, each resident repeated the modified BSTAT." (p. 5) Detailed Analysis: Criterion T requires outcome measurement at least one full academic term (roughly 3 to 4 months) after the intervention begins. The paper describes a single short training session (a lecture, brief individual training, and immediate post-training retesting). The quote "At the end of the training..." indicates the post-test occurs immediately after the brief intervention, with no term-long follow-up window. Final summary sentence: Criterion T is not met because outcomes were assessed immediately after a short training session rather than at least one academic term after intervention start.
- D
  Documented Control Group
  - The paper documents both groups, sample sizes, baseline characteristics, and comparative results.
  - "Twenty-two second-year anesthesia and intensive care residents (aged 28 ± 2 years old, 59% female) were enrolled and randomized in two groups of 11 individuals each." (p. 6)
  - Relevant Quotes: 1) "In a randomized trial, 22 second-year anesthesia residents with limited bronchoscopy experience were assigned to either AI-based unsupervised training (n=11) or traditional human-led training (n=11)." (p. 1) 2) "Twenty-two second-year anesthesia and intensive care residents (aged 28 ± 2 years old, 59% female) were enrolled and random ized in two groups of 11 individuals each." (p. 6) 3) "No difference in age and sex was observed between the two groups (p = 0.49 and p = 0.14, respectively)." (p. 6) 4) "Pre- and post-training results of the modified BSTAT examination are reported in Table 1." (p. 6) Detailed Analysis: Criterion D requires that the control/comparison group be clearly described, including group sizes and baseline information so readers can judge comparability. The paper provides explicit group sizes (11 and 11), baseline characteristics (age, sex, and prior bronchoscopy experience), and reports pre/post outcomes with comparative statistics (including in Table 1). Final summary sentence: Criterion D is met because the comparison groups and their baseline and outcome data are documented in the text and Table 1.
Level 2 Criteria
- S
  School-level RCT
  - Randomization was not conducted at an institution or site level; it was individual-level allocation.
  - "Participants were thereafter randomized in a 1:1 ratio using sealed envelopes." (p. 5)
  - Relevant Quotes: 1) "This study was performed at the Niguarda Hospital and at the University of Milano Bicocca, both located in Milano, Italy." (p. 2) 2) "Participants were thereafter randomized in a 1:1 ratio using sealed envelopes." (p. 5) 3) "Twenty-two second-year anesthesia and intensive care residents (aged 28 ± 2 years old, 59% female) were enrolled and random ized in two groups of 11 individuals each." (p. 6) Detailed Analysis: Criterion S requires randomization among schools/sites (i.e., the educational institution or equivalent unit implementing the intervention). Although the study occurs at named institutions, the unit of randomization is individual residents, not institutions, sites, or intact cohorts. Final summary sentence: Criterion S is not met because allocation is at the individual resident level, not the site/school level.
- I
  Independent Conduct
  - The paper does not document an independent external evaluation team; the authors designed, delivered, and analyzed the study themselves.
  - "Authors’ contributions Concept and study design: TL, BB, RF; AI-software development: BB, FP, MV; Training and data recruitment: BB, FP, FB, MV; Data analysis and interpretation: BB, FZ, FP..." (p. 10)
  - Relevant Quotes: 1) "The assessment of the modified BSTAT was always performed by the same person blinded for group allocation." (p. 5) 2) "Authors’ contributions Concept and study design: TL, BB, RF; AI-software development: BB, FP, MV; Training and data recruitment: BB, FP, FB, MV; Data analysis and interpretation: BB, FZ, FP; FC Drafting of article: BB, TL; Designing of figures: BB, FZ, FP; Critical review, editing and approval of article: all authors." (p. 10) Detailed Analysis: Criterion I requires that the evaluation be conducted independently from the intervention designers/developers to reduce bias. The paper reports assessor blinding to group allocation, which helps reduce measurement bias, but it does not establish independence. The author contribution statement indicates the authors designed the study, developed the AI software, conducted training/recruitment, and performed the analysis. Final summary sentence: Criterion I is not met because the evaluation was not conducted by an independent external team.
- Y
  Year Duration
  - Outcomes were not measured over at least 75% of an academic year, and criterion T is also not met.
  - "At the end of the training, each resident repeated the modified BSTAT." (p. 5)
  - Relevant Quotes: 1) "Each resident had 20 min of individual training and watched the individual training sessions of the other residents of her/his group." (p. 5) 2) "At the end of the training, each resident repeated the modified BSTAT." (p. 5) Detailed Analysis: Criterion Y requires outcomes measured at least 75% of one academic year after the intervention begins. The paper describes only immediate post-training assessment, and it does not describe any long-term follow-up window. Additionally, per ERCT rule, if criterion T is not met, criterion Y is not met. Final summary sentence: Criterion Y is not met because measurement occurs immediately after training rather than over an academic-year timescale.
- B
  Balanced Control Group
  - The study appears to keep training exposure (time and peer observation) comparable across groups, while the guidance modality (AI vs expert) is the intended treatment difference.
  - "Each resident had 20 min of individual training and watched the individual training sessions of the other residents of her/his group." (p. 5)
  - Relevant Quotes: 1) "The first group received classical training performed by an expert bronchoscopy instructor." (p. 5) 2) "The second group performed unsupervised training using the AI-based image recognition software." (p. 5) 3) "Each resident had 20 min of individual training and watched the individual training sessions of the other residents of her/his group." (p. 5) Detailed Analysis: Criterion B compares the nature, quantity, and quality of resources (time, materials, instructor attention) provided to intervention and control conditions, and asks whether the control condition offers a comparable substitute for the intervention inputs, unless resource differences are explicitly the treatment variable. Here, the intended treatment contrast is the training modality itself: expert-led guidance versus AI-based guidance. The paper describes the same training exposure structure in a way that applies to participants generally ("Each resident had 20 min..." and peer observation), while the key difference is whether guidance is provided by an expert instructor or by the AI software. There is no evidence that one group received extra training time or additional non-integral resources beyond what defines the modality. Thus, the resource differences are integral to the intervention being tested rather than a confounding add-on. Final summary sentence: Criterion B is met because training time and exposure appear comparable while the AI-versus-expert guidance is the intended treatment difference.
Level 3 Criteria
- R
  Reproduced
  - No independent replication of this specific custom-made software and trial by a different research team was found.
  - "Two recent studies share similarities with our research, as they explore the role of AI-assisted feedback in improving bronchoscopy performance in a simulated setting [12, 13]." (p. 8)
  - Relevant Quotes: 1) "Two recent studies share similarities with our research, as they explore the role of AI-assisted feedback in improving bronchoscopy performance in a simulated setting [12, 13]." (p. 8) Detailed Analysis: Criterion R requires an independent replication of the same study claim by a different research team in a different context, published in a peer-reviewed journal. The paper notes prior related studies with "similarities" but does not claim those studies replicated this custom-made AI software or this specific RCT. An internet search for independent replication of this specific Brunoni et al. (2026) custom-made AI-based software trial did not identify any peer-reviewed replication study by a different author team as of the ERCT check date. Final summary sentence: Criterion R is not met because independent replication of this specific trial and software was not found.
- A
  All-subject Exams
  - Criterion E is not met, so criterion A is not met, and the outcomes are not all-subject standardized exams.
  - Relevant Quotes: 1) "Bronchoscopy skills were assessed using the modified Bronchoscopy Skill and Task Assessment Tool (BSTAT) before and after training." (p. 1) Detailed Analysis: Criterion A requires all-subject standardized exams and explicitly depends on meeting criterion E. This study uses a modified skills assessment tool rather than standardized exams, and it assesses bronchoscopy-related skills rather than multiple core subject areas. Final summary sentence: Criterion A is not met because criterion E is not met and the study does not use all-subject standardized exams.
- G
  Graduation Tracking
  - The study does not track participants to graduation, and criterion Y is not met so criterion G is not met.
  - "At the end of the training, each resident repeated the modified BSTAT." (p. 5)
  - Relevant Quotes: 1) "At the end of the training, each resident repeated the modified BSTAT." (p. 5) Detailed Analysis: Criterion G requires tracking participants until graduation. The paper reports only immediate post-training outcomes ("At the end of the training...") with no longer-term follow-up, and no outcomes related to completion of residency/training are reported. In addition, per ERCT rule, if criterion Y is not met, criterion G is not met. An internet search for follow-up publications by the same author team tracking this cohort to program completion/graduation did not identify any such follow-up study as of the ERCT check date. Final summary sentence: Criterion G is not met because the study does not include graduation tracking and the duration is far shorter than required.
- P
  Pre-Registered
  - The trial registration date is after the study began, so the protocol was not pre-registered before data collection.
  - "The randomized controlled trial was retrospectively registered with the ISRCTN registry (ISRCTN63799884)." (p. 2)
  - Relevant Quotes: 1) "The randomized controlled trial was retrospectively registered with the ISRCTN registry (ISRCTN63799884)." (p. 2) 2) "The study was performed in February 2024." (p. 6) 3) "Registration date 06/02/2026" (ISRCTN63799884 record, p. 1) 4) "Date of first enrolment 01/02/2024" (ISRCTN63799884 record, p. 4) Detailed Analysis: Criterion P requires that the study protocol be registered before the study begins (i.e., before first enrolment / data collection). The paper itself states the trial was "retrospectively registered," which directly indicates it was not pre-registered. The ISRCTN record lists a registration date of 06/02/2026, while also listing a "Date of first enrolment" of 01/02/2024. This confirms registration occurred after the study started. Final summary sentence: Criterion P is not met because registry registration occurred after first enrolment and the paper explicitly describes retrospective registration.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Request Valuation Update

All Other Requests

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

Submit a Study for Evaluation

Share your research with us for review
Suggest Improvements

Provide feedback to help us make things better.
Update Your Study

If you're the author, let us know about necessary updates or corrections.

Artificial intelligence-based image recognition in bronchoscopy: software development and randomized controlled trial for training evaluation in intensive care residents

Although randomization was at the individual level, this is acceptable here because the intervention is an individual, personal-training activity rather than a class-delivered program.

Outcomes were measured with a modified skills assessment tool rather than a widely recognized standardized exam-based assessment.

Outcomes were measured immediately after brief training, not at least one academic term after the intervention began.

The paper documents both groups, sample sizes, baseline characteristics, and comparative results.

Randomization was not conducted at an institution or site level; it was individual-level allocation.

The paper does not document an independent external evaluation team; the authors designed, delivered, and analyzed the study themselves.

Outcomes were not measured over at least 75% of an academic year, and criterion T is also not met.

The study appears to keep training exposure (time and peer observation) comparable across groups, while the guidance modality (AI vs expert) is the intended treatment difference.

No independent replication of this specific custom-made software and trial by a different research team was found.

Criterion E is not met, so criterion A is not met, and the outcomes are not all-subject standardized exams.

The study does not track participants to graduation, and criterion Y is not met so criterion G is not met.

The trial registration date is after the study began, so the protocol was not pre-registered before data collection.

Abstract

ERCT Criteria Breakdown

Level 1 Criteria

Class-level RCT

Exam-based Assessment

Term Duration

Documented Control Group

Level 2 Criteria

School-level RCT

Independent Conduct

Year Duration

Balanced Control Group

Level 3 Criteria

Reproduced

All-subject Exams

Graduation Tracking

Pre-Registered

Request an Update or Contact Us

Have Questions or Suggestions?

Submit a Study for Evaluation

Suggest Improvements

Update Your Study

Have Questions
or Suggestions?