Randomized Controlled Study on the Impact of Problem-Based Learning Combined With Large Language Models on Critical Thinking Skills in Nursing Students

Shi Kejingyun, Rao Mingjun

Published:
ERCT Check Date:
DOI: 10.1097/NNE.0000000000001879
  • science
  • higher education
  • China
  • Asia
  • project-based learning
  • EdTech app
  • EdTech platform
0
  • C

    The study randomized individual students within a single institution rather than randomizing at the class level.

    "Students were randomly assigned to either the PBL group or the LLM-integrated PBL group using a computer-generated randomization sequence..." (p. 217)

  • E

    The study utilized the California Critical Thinking Skills Test (CCTST), a widely recognized standardized assessment.

    "The California Critical Thinking Skills Test (CCTST), developed by Facione, was used to assess various dimensions of critical thinking among nursing students." (p. 217)

  • T

    The intervention lasted only 8 weeks, which is shorter than the required full academic term (typically 3-4 months).

    "It is also important to consider the relatively short duration of the educational intervention (an 8-week course with a total of 16 hours)..." (p. 219)

  • D

    The study documents the control group's size, demographics, and baseline performance scores in detail.

    "The demographic comparison revealed no significant differences in age... with an average age of 18.04 years for the PBL group..." (p. 217)

  • S

    The study was conducted within a single nursing college with randomization at the student level, not the school level.

    "The sample consisted of first-year nursing students from a nursing college affiliated to Guizhou University..." (p. 217)

  • I

    The study was conducted by the authors themselves; only the randomization process was handled by an independent researcher.

    "This randomization process was conducted by an independent researcher who was not involved in the teaching or assessment of the students." (p. 217)

  • Y

    The study duration was 8 weeks, which does not meet the requirement of one full academic year.

    "...an 8-week course with a total of 16 hours..." (p. 219)

  • B

    The study tests the integration of LLMs as the variable; both groups received equal course time (16 hours), satisfying the balance requirement.

    "...an 8-week course with a total of 16 hours..." (p. 219)

  • R

    No independent replication of this specific study was found in peer-reviewed journals.

  • A

    The study measured only critical thinking skills and the specific course test score, not all main subjects taught in the school.

    "We also analyzed the students' test scores at the end of the course... there was no statistically significant difference in grades..." (p. 218)

  • G

    Outcomes were measured immediately after the course ended, with no tracking until graduation.

    "After the entire course was completed, all students filled out questionnaires on critical thinking skills." (p. 217)

  • P

    The paper mentions following CONSORT but does not provide a registration number or evidence of pre-registered protocol.

Abstract

Background: The integration of Large Language Models (LLMs) into nursing education presents a novel approach to enhancing critical thinking skills. This study evaluated the effectiveness of LLM-assisted Problem-Based Learning (PBL) compared to traditional PBL in improving critical thinking skills among nursing students. Methods: Participants were randomly assigned to either a traditional PBL group (50 nursing students) or an LLM-integrated PBL group (50 nursing students). The California Critical thinking Skills test was used to assess critical thinking skills. Results: The LLM-integrated PBL group showed a more pronounced increase (0.60 points) in critical thinking skills compared to the PBL group (0.50 points, p<.01). A notable difference was observed in inductive reasoning skills between the PBL group and the LLM-integrated PBL group (p<.01). Conclusion: This study provides empirical evidence supporting the use of LLM-assisted PBL as an effective educational strategy in nursing education.

Full Article

ERCT Criteria Breakdown

  • Level 1 Criteria

    • C

      Class-level RCT

      • The study randomized individual students within a single institution rather than randomizing at the class level.
      • "Students were randomly assigned to either the PBL group or the LLM-integrated PBL group using a computer-generated randomization sequence..." (p. 217)
      • Relevant Quotes: 1) "The sample consisted of first-year nursing students from a nursing college affiliated to Guizhou University... 50 nursing students in the PBL group and 50 in the LLM-integrated PBL group." (p. 217) 2) "Students were randomly assigned to either the PBL group or the LLM-integrated PBL group using a computer-generated randomization sequence to ensure an unbiased distribution of participants." (p. 217) Detailed Analysis: The ERCT standard requires randomization at the class level or school level to prevent contamination and simulate real-world implementation, unless the intervention is specifically designed as 1-on-1 tutoring. This study involves Problem-Based Learning (PBL), which is a group-based activity ("engaging in real-world, open-ended problems in groups"). However, the randomization was performed at the individual student level, assigning 100 students into two groups. This poses a risk of contamination if students interact outside the specific intervention hours within the same college. Since it is not a class-level randomization, this criterion is not met. Therefore, criterion C is not met.
    • E

      Exam-based Assessment

      • The study utilized the California Critical Thinking Skills Test (CCTST), a widely recognized standardized assessment.
      • "The California Critical Thinking Skills Test (CCTST), developed by Facione, was used to assess various dimensions of critical thinking among nursing students." (p. 217)
      • Relevant Quotes: 1) "The California Critical Thinking Skills Test (CCTST), developed by Facione, was used to assess various dimensions of critical thinking among nursing students." (p. 217) 2) "The CCTST has demonstrated moderate reliability... and evidence of construct validity through experimental studies... supporting its use as an assessment tool for critical thinking skills." (p. 217) Detailed Analysis: The standard requires the use of standardized exam-based assessments that are widely recognized, rather than custom-made tests designed solely for the study. The authors explicitly utilized the CCTST, citing its development by Facione and providing reliability and validity metrics. This is a standard, commercially available instrument for assessing the primary outcome (critical thinking). Therefore, criterion E is met.
    • T

      Term Duration

      • The intervention lasted only 8 weeks, which is shorter than the required full academic term (typically 3-4 months).
      • "It is also important to consider the relatively short duration of the educational intervention (an 8-week course with a total of 16 hours)..." (p. 219)
      • Relevant Quotes: 1) "It is also important to consider the relatively short duration of the educational intervention (an 8-week course with a total of 16 hours)..." (p. 219) Detailed Analysis: The ERCT standard requires that outcomes be measured at least one full academic term (approximately 3-4 months) after the intervention begins to ensure effects are not merely transitory. The authors explicitly state in the limitations section that the intervention was an "8-week course." This duration is significantly shorter than a standard academic semester or term. Therefore, criterion T is not met.
    • D

      Documented Control Group

      • The study documents the control group's size, demographics, and baseline performance scores in detail.
      • "The demographic comparison revealed no significant differences in age... with an average age of 18.04 years for the PBL group..." (p. 217)
      • Relevant Quotes: 1) "Participants were randomly assigned to either a traditional PBL group (50 nursing students) or an LLM-integrated PBL group (50 nursing students)." (p. 216) 2) "The demographic comparison revealed no significant differences in age... with an average age of 18.04 years for the PBL group (SD 0.76)... The majority of participants were female (93%), with 47 (94%) female participants in PBL..." (p. 217) 3) "Table. Critical Thinking Skills and Subscales... PBL (n=50) Pretest Mean Score (SD)..." (p. 218) Detailed Analysis: The standard requires detailed documentation of the control group including demographics, baseline performance, and treatment received. The text provides specific sample sizes (N=50), age statistics, and gender distribution for the control (PBL) group. Additionally, the Table on page 218 provides the pretest mean scores for the PBL group across all subscales. The control condition (traditional PBL without LLM) is also clearly defined. Therefore, criterion D is met.
  • Level 2 Criteria

    • S

      School-level RCT

      • The study was conducted within a single nursing college with randomization at the student level, not the school level.
      • "The sample consisted of first-year nursing students from a nursing college affiliated to Guizhou University..." (p. 217)
      • Relevant Quotes: 1) "The sample consisted of first-year nursing students from a nursing college affiliated to Guizhou University in Guiyang, China." (p. 217) 2) "Students were randomly assigned... using a computer-generated randomization sequence..." (p. 217) Detailed Analysis: Criterion S requires randomization to occur among schools (units implementing the intervention) to test real-world relevance and avoid school-wide confounding factors. This study was conducted at a single site ("a nursing college") and randomized individual students within that single site. No school-level assignment occurred. Therefore, criterion S is not met.
    • I

      Independent Conduct

      • The study was conducted by the authors themselves; only the randomization process was handled by an independent researcher.
      • "This randomization process was conducted by an independent researcher who was not involved in the teaching or assessment of the students." (p. 217)
      • Relevant Quotes: 1) "Author Affiliations: Guiyang Vocational and Technical College... Guizhou University... Department of Plastic Surgery, Guizhou Provincial People's Hospital..." (p. 216) 2) "This randomization process was conducted by an independent researcher who was not involved in the teaching or assessment of the students." (p. 217) Detailed Analysis: The standard requires the study (data collection and analysis) to be conducted independently from the intervention designers to remove bias. While the authors state that an "independent researcher" handled the *randomization*, there is no statement that the teaching, data collection, or analysis were conducted by an external third party. The authors appear to be the primary researchers and educators involved. Therefore, criterion I is not met.
    • Y

      Year Duration

      • The study duration was 8 weeks, which does not meet the requirement of one full academic year.
      • "...an 8-week course with a total of 16 hours..." (p. 219)
      • Relevant Quotes: 1) "It is also important to consider the relatively short duration of the educational intervention (an 8-week course with a total of 16 hours)..." (p. 219) Detailed Analysis: Criterion Y requires outcomes to be measured at least one full academic year (~9-10 months) after the intervention begins. Since Criterion T (Term Duration) was not met, Criterion Y cannot be met. The study explicitly identifies the duration as an "8-week course," which is significantly shorter than an academic year. Therefore, criterion Y is not met.
    • B

      Balanced Control Group

      • The study tests the integration of LLMs as the variable; both groups received equal course time (16 hours), satisfying the balance requirement.
      • "...an 8-week course with a total of 16 hours..." (p. 219)
      • Relevant Quotes: 1) "Participants were randomly assigned to either a traditional PBL group... or an LLM-integrated PBL group." (p. 216) 2) "It is also important to consider the relatively short duration of the educational intervention (an 8-week course with a total of 16 hours)..." (p. 219) 3) "This study evaluated the effectiveness of LLM-assisted Problem-Based Learning (PBL) compared to traditional PBL..." (p. 216) Detailed Analysis: The ERCT standard requires that control groups utilize balanced time and resources unless the extra resource is the specific variable being tested. In this study, the "extra resource" is access to and integration of Large Language Models (LLMs). This is explicitly the treatment variable defined in the study intent. Regarding time resources, both groups participated in the same "8-week course with a total of 16 hours." Therefore, the time-on-task was balanced, and the resource difference (LLM access) is the integral component being tested. Therefore, criterion B is met.
  • Level 3 Criteria

    • R

      Reproduced

      • No independent replication of this specific study was found in peer-reviewed journals.
      • Relevant Quotes: 1) "To the best of our knowledge, this study is the first randomized controlled trial to examine the effectiveness of LLM-integrated PBL in enhancing nursing students' critical thinking skills..." (p. 218) Detailed Analysis: The paper claims to be the first RCT of its kind. A search of recent literature reveals no subsequent independent replications of this specific protocol by different research teams in different contexts published in peer-reviewed scientific journals. Therefore, criterion R is not met.
    • A

      All-subject Exams

      • The study measured only critical thinking skills and the specific course test score, not all main subjects taught in the school.
      • "We also analyzed the students' test scores at the end of the course... there was no statistically significant difference in grades..." (p. 218)
      • Relevant Quotes: 1) "The California Critical Thinking Skills Test (CCTST)... was used to assess various dimensions of critical thinking..." (p. 217) 2) "We also analyzed the students' test scores at the end of the course... there was no statistically significant difference in grades..." (p. 218) Detailed Analysis: Criterion A requires assessing effects across all core subjects taught in the school/program to detect potential negative spillovers (e.g., did focusing on LLM-PBL reduce performance in anatomy or pharmacology?). This study measured critical thinking and the specific course grade but did not assess standardized outcomes across the broader nursing curriculum. Therefore, criterion A is not met.
    • G

      Graduation Tracking

      • Outcomes were measured immediately after the course ended, with no tracking until graduation.
      • "After the entire course was completed, all students filled out questionnaires on critical thinking skills." (p. 217)
      • Relevant Quotes: 1) "After the entire course was completed, all students filled out questionnaires on critical thinking skills." (p. 217) 2) "The results indicated a significant improvement... postintervention (Table)." (p. 217) Detailed Analysis: The standard requires tracking participants until graduation to assess long-term impacts. This study measured outcomes immediately upon completion of the 8-week course. A search for follow-up publications by the same authors yielded no results tracking this specific cohort to graduation. Therefore, criterion G is not met.
    • P

      Pre-Registered

      • The paper mentions following CONSORT but does not provide a registration number or evidence of pre-registered protocol.
      • Relevant Quotes: 1) "The study report followed the Consolidated Standards of Reporting Trials (CONSORT) criteria..." (p. 217) 2) "Before the data collection, the protocol underwent a thorough review, and we got ethical approval..." (p. 217) Detailed Analysis: The standard requires the full study protocol to be pre-registered (e.g., on a public registry like ClinicalTrials.gov) before data collection begins. While the authors mention ethical approval and following CONSORT guidelines, they do not cite a specific pre-registration ID number, nor does a search of public registries reveal a pre-registered protocol corresponding to this specific study prior to the stated data collection period. Therefore, criterion P is not met.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

  • Submit a Study for Evaluation

    Share your research with us for review

  • Suggest Improvements

    Provide feedback to help us make things better.

  • Update Your Study

    If you're the author, let us know about necessary updates or corrections.