Evaluation of the impact of AI-driven personalized learning platform on medical students' learning performance

Yajun Chen

Published:
ERCT Check Date:
DOI: 10.3389/fmed.2025.1610012
  • science
  • higher education
  • China
  • blended learning
  • EdTech platform
1
  • C

    Randomization is at the student level, but the intervention is an individualized AI personalized learning platform, fitting ERCT's personal teaching exception.

    "Methods: A prospective randomized controlled trial (RCT) design was adopted, enrolling 40 full-time medical undergraduates who were stratified by baseline academic performance and then randomly assigned via computer-generated block randomization (block size = 4) into an experimental group (n = 20, AI intervention) and a control group (n = 20, traditional instruction)." (p. 1)

  • E

    The study reports using a standardized test bank (LCME) with reliability and validity evidence rather than a bespoke study-created exam.

    "This study employs the standardized test bank of the Accreditation Council for Medical Education (LCME) to assess learning effectiveness." (p. 5)

  • T

    Outcomes were measured at Week 12 after the intervention began, matching a term-length (approximately 12 weeks) follow-up window.

    "The experimental group received a 12-week personalized learning intervention through the Coze platform..." (p. 1)

  • D

    The control condition is described in detail and baseline characteristics are reported for both groups.

    "The control group followed the traditional lecture-based model: 4 h/week of teacher-centered instruction using standardized textbooks (e.g., Systematic Anatomy)." (p. 4)

  • S

    Randomization occurs at the student level rather than assigning whole schools or sites.

    "Methods: A prospective randomized controlled trial (RCT) design was adopted, enrolling 40 full-time medical undergraduates..." (p. 1)

  • I

    The paper does not provide evidence that the evaluation was conducted by an independent team distinct from the intervention designer.

    "The platform was built on the Coze open-source framework (v2.4.1)..." (p. 5)

  • Y

    Despite stating a one-year study period, outcomes are reported at Week 12 rather than after a full academic year of follow-up.

    "The study period was from August 10, 2024, to August 10, 2025." (p. 3)

  • B

    The intervention intentionally adds an AI platform as the treatment, so the control remains business-as-usual by design under ERCT's resource treatment exception.

    "The experimental group used the Coze-based AI Personalized Learning Platform (AI-PLP) alongside 4 h/week of traditional instruction." (p. 4)

  • R

    No independent replication study of this specific intervention trial was found in available sources at the time of this ERCT check.

  • A

    Outcomes are limited to a single course or domain rather than standardized exams across all main subjects.

    "The control group followed the traditional lecture-based teaching model ... using standardized textbooks (e.g., Systematic Anatomy)." (p. 4)

  • G

    The study does not track participants until graduation and, since Year Duration is not met, Graduation Tracking cannot be met under ERCT rules.

    "Future directions include ... longitudinal tracking..." (p. 2)

  • P

    The paper does not report a pre-registered protocol or registry entry that can be verified as registered prior to data collection.

Abstract

This study aims to evaluate the comprehensive impact of an artificial intelligence (AI)-driven personalized learning platform based on the Coze platform on medical students' learning outcomes, learning satisfaction, and self-directed learning abilities. It seeks to explore its practical application value in medical education and provide empirical evidence for the digital transformation of education.

Full Article

ERCT Criteria Breakdown

  • Level 1 Criteria

    • C

      Class-level RCT

      • Randomization is at the student level, but the intervention is an individualized AI personalized learning platform, fitting ERCT's personal teaching exception.
      • "Methods: A prospective randomized controlled trial (RCT) design was adopted, enrolling 40 full-time medical undergraduates who were stratified by baseline academic performance and then randomly assigned via computer-generated block randomization (block size = 4) into an experimental group (n = 20, AI intervention) and a control group (n = 20, traditional instruction)." (p. 1)
      • Relevant Quotes: 1) "Methods: A prospective randomized controlled trial (RCT) design was adopted, enrolling 40 full-time medical undergraduates who were stratified by baseline academic performance and then randomly assigned via computer-generated block randomization (block size = 4) into an experimental group (n = 20, AI intervention) and a control group (n = 20, traditional instruction)." (p. 1) 2) "The experimental group received a 12-week personalized learning intervention through the Coze platform, with specific measures including: Dynamic learning path optimization: Weekly adjustment of learning content difficulty and sequence based on diagnostic test results; Affective sensing support: Real-time identification of learning emotions through natural language processing (NLP) with triggered motivational feedback; Intelligent resource recommendation: Integration of a 2,800-case medical database utilizing BERT models to match personalized learning resources; Clinical simulation interaction: Embedded virtual case system providing real-time operational guidance." (p. 1) Detailed Analysis: Criterion C prefers class-level randomization to prevent contamination, but ERCT allows an exception when the intervention is designed for personal teaching or tutoring. Here, the unit of randomization is individual students ("40 full-time medical undergraduates ... randomly assigned"). However, the intervention is explicitly an individualized "personalized learning" system with adaptive paths and personalized recommendations, functioning as a personal tutoring style tool rather than a class-wide instructional method. Therefore, despite student-level randomization, the intervention fits the ERCT personal teaching exception. Final sentence explaining if criterion C is met/not met because the intervention is individualized personal teaching, so student-level randomization is acceptable under the ERCT exception.
    • E

      Exam-based Assessment

      • The study reports using a standardized test bank (LCME) with reliability and validity evidence rather than a bespoke study-created exam.
      • "This study employs the standardized test bank of the Accreditation Council for Medical Education (LCME) to assess learning effectiveness." (p. 5)
      • Relevant Quotes: 1) "This study employs the standardized test bank of the Accreditation Council for Medical Education (LCME) to assess learning effectiveness." (p. 5) 2) "The tool includes three parallel test sets (A/B/C), covers Bloom's taxonomy levels, and demonstrates high reliability (α = 0.89) and validity (CVI = 0.91). Scoring used IRT calibration and double-blind marking." (p. 5) Detailed Analysis: Criterion E requires exam-based assessment using a standardized, externally-defined instrument, not an ad-hoc test created solely for this study. The paper explicitly describes the outcome test as a "standardized test bank" attributed to LCME, and provides psychometric evidence (reliability, validity) plus standardized scoring practices (IRT calibration, double- blind marking). While the paper's description is unusual in naming, it is still presented as a standardized test bank rather than a custom exam authored for this trial. Final sentence explaining if criterion E is met/not met because the study states it used a standardized test bank (with reliability and validity evidence) rather than a bespoke study-specific exam.
    • T

      Term Duration

      • Outcomes were measured at Week 12 after the intervention began, matching a term-length (approximately 12 weeks) follow-up window.
      • "The experimental group received a 12-week personalized learning intervention through the Coze platform..." (p. 1)
      • Relevant Quotes: 1) "The experimental group received a 12-week personalized learning intervention through the Coze platform..." (p. 1) 2) "Data was collected at: Baseline (Week 0): Demographics, pre-test scores, learning behavior. Intervention Period (Weeks 4, 8, 12): Platform logs, diagnostic tests, classroom recordings, engagement metrics. Endpoint (Week 12): Post-test, satisfaction survey, motivation scales." (p. 5) Detailed Analysis: Criterion T requires that outcomes are measured at least one academic term after the intervention begins. A term is typically around 3-4 months. The intervention is explicitly stated as "12-week", and the measurement schedule specifies an endpoint at Week 12, where the post-test and other outcomes are collected. Twelve weeks is approximately three months and is commonly treated as term-length in many academic calendars. Final sentence explaining if criterion T is met/not met because outcomes were measured at Week 12 after the intervention began, which is consistent with a term-length follow-up period.
    • D

      Documented Control Group

      • The control condition is described in detail and baseline characteristics are reported for both groups.
      • "The control group followed the traditional lecture-based model: 4 h/week of teacher-centered instruction using standardized textbooks (e.g., Systematic Anatomy)." (p. 4)
      • Relevant Quotes: 1) "The control group followed the traditional lecture-based model: 4 h/week of teacher-centered instruction using standardized textbooks (e.g., Systematic Anatomy). Learning reinforcement included weekly quizzes (multiple-choice, short-answer) graded uniformly by the teaching office. No digital tools or personalized feedback were used." (p. 4) 2) "The baseline academic performance data showed no statistically significant difference in pre-test knowledge reserves between the two groups (Experimental Group: 70.40 ± 8.96 points vs. Control Group: 70.20 ± 11.40 points, p = 0.950). In terms of demographic characteristics, the gender distribution was balanced (Experimental Group: male:female = 12:8; Control Group: 11:9, χ^{2} = 0.06, p = 0.812), and age indicators also showed high consistency (Experimental group 18.10 ± 0.97 years vs. Control group 18.15 ± 0.81 years, t(38) = 0.36, p = 0.724)." (p. 4) Detailed Analysis: Criterion D requires that the control group is well documented, including what it received and baseline comparability. The paper provides a concrete description of the control condition (lecture-based instruction, hours per week, materials, reinforcement, and explicit absence of digital tools). It also reports baseline academic and demographic characteristics (scores, gender, age) demonstrating comparability between groups. Final sentence explaining if criterion D is met/not met because the paper clearly describes the control condition and reports baseline characteristics enabling meaningful comparison.
  • Level 2 Criteria

    • S

      School-level RCT

      • Randomization occurs at the student level rather than assigning whole schools or sites.
      • "Methods: A prospective randomized controlled trial (RCT) design was adopted, enrolling 40 full-time medical undergraduates..." (p. 1)
      • Relevant Quotes: 1) "Methods: A prospective randomized controlled trial (RCT) design was adopted, enrolling 40 full-time medical undergraduates who were stratified by baseline academic performance and then randomly assigned via computer-generated block randomization (block size = 4) into an experimental group (n = 20, AI intervention) and a control group (n = 20, traditional instruction)." (p. 1) Detailed Analysis: Criterion S requires a school-level (or site-level) RCT where schools are randomized to treatment or control. The methods explicitly state that individual students ("40 full-time medical undergraduates") were randomized into two groups. There is no indication of multiple schools being assigned to conditions. Final sentence explaining if criterion S is met/not met because the unit of randomization is students, not schools or sites.
    • I

      Independent Conduct

      • The paper does not provide evidence that the evaluation was conducted by an independent team distinct from the intervention designer.
      • "The platform was built on the Coze open-source framework (v2.4.1)..." (p. 5)
      • Relevant Quotes: 1) "The platform was built on the Coze open-source framework (v2.4.1) and featured a three-layer architecture designed for medical education:" (p. 5) 2) "Author contributions: YC: Writing – original draft, Writing – review & editing." (p. 15) Detailed Analysis: Criterion I requires clear evidence that the study was conducted independently from the designers of the intervention. The paper describes the AI platform's construction in a way that indicates close involvement of the author with the intervention itself. The author list is a single person, and the paper does not state that data collection, implementation, or analysis was performed by an external evaluator or an independent evaluation organization. While the paper mentions that the allocation sequence was managed by an independent third-party researcher (randomization logistics), that is not sufficient to establish independent conduct of the overall intervention evaluation. Final sentence explaining if criterion I is met/not met because the paper provides no explicit evidence of an independent evaluation team distinct from the intervention designer.
    • Y

      Year Duration

      • Despite stating a one-year study period, outcomes are reported at Week 12 rather than after a full academic year of follow-up.
      • "The study period was from August 10, 2024, to August 10, 2025." (p. 3)
      • Relevant Quotes: 1) "The study period was from August 10, 2024, to August 10, 2025." (p. 3) 2) "The experimental group received a 12-week personalized learning intervention through the Coze platform..." (p. 1) 3) "Data was collected at: ... Intervention Period (Weeks 4, 8, 12)... Endpoint (Week 12): Post-test, satisfaction survey, motivation scales." (p. 5) Detailed Analysis: Criterion Y requires that outcomes are measured at least one full academic year after the intervention begins. Although the paper states a one-year "study period", the intervention and the explicit measurement schedule culminate at Week 12, with the endpoint post-test and surveys collected then. The paper does not report a later end-of-year outcome assessment aligned with a 9-10 month academic year follow-up. Final sentence explaining if criterion Y is met/not met because the reported outcome measurement schedule ends at Week 12 rather than after a full academic year of follow-up.
    • B

      Balanced Control Group

      • The intervention intentionally adds an AI platform as the treatment, so the control remains business-as-usual by design under ERCT's resource treatment exception.
      • "The experimental group used the Coze-based AI Personalized Learning Platform (AI-PLP) alongside 4 h/week of traditional instruction." (p. 4)
      • Relevant Quotes: 1) "The experimental group used the Coze-based AI Personalized Learning Platform (AI-PLP) alongside 4 h/week of traditional instruction." (p. 4) 2) "The control group followed the traditional lecture-based model: 4 h/week of teacher-centered instruction using standardized textbooks (e.g., Systematic Anatomy). ... No digital tools or personalized feedback were used." (p. 4) 3) "Daily average learning duration extended by 41.5% (49.25 ± 18.59 vs. 34.80 ± 18.32 min, p = 0.048, d = 0.49)..." (p. 2) Detailed Analysis: Criterion B requires comparable educational inputs across conditions unless the additional time or resources are themselves the treatment variable. This study's treatment is explicitly the provision and use of the AI-PLP, added on top of the same baseline instruction time (4 h/week) that the control group receives. The control group is explicitly business-as-usual without the platform. The paper also reports that learning time increased in the experimental group, which is plausibly a mechanism of the platform. Under ERCT's decision rule, when the extra resource is integral to the intervention being tested (here, access to the AI-PLP), the control may remain business-as-usual. The imbalance does not automatically violate criterion B because the resource difference is the intended treatment contrast. Final sentence explaining if criterion B is met/not met because the extra resource (AI-PLP access) is the treatment variable by design, so a business-as-usual control is acceptable under ERCT criterion B.
  • Level 3 Criteria

    • R

      Reproduced

      • No independent replication study of this specific intervention trial was found in available sources at the time of this ERCT check.
      • Relevant Quotes: (No quotes found in the paper regarding an independent replication of this specific RCT, and no external replication paper was located.) Detailed Analysis: Criterion R requires independent reproduction by other authors. The paper is a recent (2025) single-site student-level RCT. Searches across major public indexing and hosting sources (Frontiers landing page, PubMed, and PubMed Central full text) did not identify a separate peer-reviewed replication study by an independent team that explicitly repeats this intervention and reports results. Final sentence explaining if criterion R is met/not met because no independent replication study could be identified at the time of checking.
    • A

      All-subject Exams

      • Outcomes are limited to a single course or domain rather than standardized exams across all main subjects.
      • "The control group followed the traditional lecture-based teaching model ... using standardized textbooks (e.g., Systematic Anatomy)." (p. 4)
      • Relevant Quotes: 1) "The control group followed the traditional lecture-based model: 4 h/week of teacher-centered instruction using standardized textbooks (e.g., Systematic Anatomy)." (p. 4) 2) "The following data were collected ... Academic performance: 3 standardized tests before and after the intervention..." (p. 1) Detailed Analysis: Criterion A requires standardized exam outcomes across all main subjects taught (or a clearly justified specialized exception). The context and materials indicate a specific course domain (e.g., "Systematic Anatomy"), and the reported academic outcome is based on a set of tests tied to that domain. The paper does not report standardized exams across the broader set of concurrent subjects in a medical curriculum. Final sentence explaining if criterion A is met/not met because the study measures performance in a limited domain rather than all main subjects.
    • G

      Graduation Tracking

      • The study does not track participants until graduation and, since Year Duration is not met, Graduation Tracking cannot be met under ERCT rules.
      • "Future directions include ... longitudinal tracking..." (p. 2)
      • Relevant Quotes: 1) "Future directions include multicenter large-sample studies, longitudinal tracking, and interdisciplinary applications to advance the intelligent transformation of educational models." (p. 2) 2) "Data was collected at: ... Endpoint (Week 12): Post-test, satisfaction survey, motivation scales." (p. 5) Detailed Analysis: Criterion G requires tracking through graduation, and ERCT specifies that if criterion Y is not met, criterion G is not met. The paper's measurement schedule ends at Week 12 and explicitly frames "longitudinal tracking" as a future direction rather than something done in this study. No graduation outcomes are reported. A web search for follow-up publications by the same author tracking the same cohort to graduation did not locate any such paper at the time of this check. Final sentence explaining if criterion G is met/not met because the study ends at Week 12 with no graduation follow-up, and Year Duration is not met.
    • P

      Pre-Registered

      • The paper does not report a pre-registered protocol or registry entry that can be verified as registered prior to data collection.
      • Relevant Quotes: (No quotes found in the paper providing a trial registry name, registry identifier, or a public pre-registration link.) Detailed Analysis: Criterion P requires a publicly verifiable pre-registration made before data collection begins. The paper reports ethics approval and informed consent but does not provide a trial registration number, registry link, OSF link, or any pre-analysis plan identifier. Targeted web searches for registrations using the DOI, title, author, and ethics approval number did not identify a corresponding public pre-registration entry. Final sentence explaining if criterion P is met/not met because no verifiable pre-registration record is reported or discoverable from the available sources.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

  • Submit a Study for Evaluation

    Share your research with us for review

  • Suggest Improvements

    Provide feedback to help us make things better.

  • Update Your Study

    If you're the author, let us know about necessary updates or corrections.