Level 1 Criteria
-
C Class-level RCT
- Randomization is at the student level, but the intervention is an individualized AI personalized learning platform, fitting ERCT's personal teaching exception.
- "Methods: A prospective randomized controlled trial (RCT) design was adopted, enrolling 40 full-time medical undergraduates who were stratified by baseline academic performance and then randomly assigned via computer-generated block randomization (block size = 4) into an experimental group (n = 20, AI intervention) and a control group (n = 20, traditional instruction)." (p. 1)
- Relevant Quotes: 1) "Methods: A prospective randomized controlled trial (RCT) design was adopted, enrolling 40 full-time medical undergraduates who were stratified by baseline academic performance and then randomly assigned via computer-generated block randomization (block size = 4) into an experimental group (n = 20, AI intervention) and a control group (n = 20, traditional instruction)." (p. 1) 2) "The experimental group received a 12-week personalized learning intervention through the Coze platform, with specific measures including: Dynamic learning path optimization: Weekly adjustment of learning content difficulty and sequence based on diagnostic test results; Affective sensing support: Real-time identification of learning emotions through natural language processing (NLP) with triggered motivational feedback; Intelligent resource recommendation: Integration of a 2,800-case medical database utilizing BERT models to match personalized learning resources; Clinical simulation interaction: Embedded virtual case system providing real-time operational guidance." (p. 1) Detailed Analysis: Criterion C prefers class-level randomization to prevent contamination, but ERCT allows an exception when the intervention is designed for personal teaching or tutoring. Here, the unit of randomization is individual students ("40 full-time medical undergraduates ... randomly assigned"). However, the intervention is explicitly an individualized "personalized learning" system with adaptive paths and personalized recommendations, functioning as a personal tutoring style tool rather than a class-wide instructional method. Therefore, despite student-level randomization, the intervention fits the ERCT personal teaching exception. Final sentence explaining if criterion C is met/not met because the intervention is individualized personal teaching, so student-level randomization is acceptable under the ERCT exception.
-
E Exam-based Assessment
- The study reports using a standardized test bank (LCME) with reliability and validity evidence rather than a bespoke study-created exam.
- "This study employs the standardized test bank of the Accreditation Council for Medical Education (LCME) to assess learning effectiveness." (p. 5)
- Relevant Quotes: 1) "This study employs the standardized test bank of the Accreditation Council for Medical Education (LCME) to assess learning effectiveness." (p. 5) 2) "The tool includes three parallel test sets (A/B/C), covers Bloom's taxonomy levels, and demonstrates high reliability (α = 0.89) and validity (CVI = 0.91). Scoring used IRT calibration and double-blind marking." (p. 5) Detailed Analysis: Criterion E requires exam-based assessment using a standardized, externally-defined instrument, not an ad-hoc test created solely for this study. The paper explicitly describes the outcome test as a "standardized test bank" attributed to LCME, and provides psychometric evidence (reliability, validity) plus standardized scoring practices (IRT calibration, double- blind marking). While the paper's description is unusual in naming, it is still presented as a standardized test bank rather than a custom exam authored for this trial. Final sentence explaining if criterion E is met/not met because the study states it used a standardized test bank (with reliability and validity evidence) rather than a bespoke study-specific exam.
-
T Term Duration
- Outcomes were measured at Week 12 after the intervention began, matching a term-length (approximately 12 weeks) follow-up window.
- "The experimental group received a 12-week personalized learning intervention through the Coze platform..." (p. 1)
- Relevant Quotes: 1) "The experimental group received a 12-week personalized learning intervention through the Coze platform..." (p. 1) 2) "Data was collected at: Baseline (Week 0): Demographics, pre-test scores, learning behavior. Intervention Period (Weeks 4, 8, 12): Platform logs, diagnostic tests, classroom recordings, engagement metrics. Endpoint (Week 12): Post-test, satisfaction survey, motivation scales." (p. 5) Detailed Analysis: Criterion T requires that outcomes are measured at least one academic term after the intervention begins. A term is typically around 3-4 months. The intervention is explicitly stated as "12-week", and the measurement schedule specifies an endpoint at Week 12, where the post-test and other outcomes are collected. Twelve weeks is approximately three months and is commonly treated as term-length in many academic calendars. Final sentence explaining if criterion T is met/not met because outcomes were measured at Week 12 after the intervention began, which is consistent with a term-length follow-up period.
-
D Documented Control Group
- The control condition is described in detail and baseline characteristics are reported for both groups.
- "The control group followed the traditional lecture-based model: 4 h/week of teacher-centered instruction using standardized textbooks (e.g., Systematic Anatomy)." (p. 4)
- Relevant Quotes: 1) "The control group followed the traditional lecture-based model: 4 h/week of teacher-centered instruction using standardized textbooks (e.g., Systematic Anatomy). Learning reinforcement included weekly quizzes (multiple-choice, short-answer) graded uniformly by the teaching office. No digital tools or personalized feedback were used." (p. 4) 2) "The baseline academic performance data showed no statistically significant difference in pre-test knowledge reserves between the two groups (Experimental Group: 70.40 ± 8.96 points vs. Control Group: 70.20 ± 11.40 points, p = 0.950). In terms of demographic characteristics, the gender distribution was balanced (Experimental Group: male:female = 12:8; Control Group: 11:9, χ^{2} = 0.06, p = 0.812), and age indicators also showed high consistency (Experimental group 18.10 ± 0.97 years vs. Control group 18.15 ± 0.81 years, t(38) = 0.36, p = 0.724)." (p. 4) Detailed Analysis: Criterion D requires that the control group is well documented, including what it received and baseline comparability. The paper provides a concrete description of the control condition (lecture-based instruction, hours per week, materials, reinforcement, and explicit absence of digital tools). It also reports baseline academic and demographic characteristics (scores, gender, age) demonstrating comparability between groups. Final sentence explaining if criterion D is met/not met because the paper clearly describes the control condition and reports baseline characteristics enabling meaningful comparison.