Impact of artificial intelligence on task performance and perceived task load: a pragmatic randomized experiment

Ghalia Y. Bhadila, Dania Bahdila, Nujud O. Saber, and Dana A. Alyafi

Published: Jan 16, 2026

ERCT Check Date: Mar 4, 2026

DOI: 10.3389/feduc.2026.1754136

Link

Download PDF

science
higher education
Asia
EdTech app

C

Randomization was at the individual participant level (dental interns), not at the class (or stronger) level, and no tutoring exception applies.

"For multivariable analysis, the unit of analysis was the participant."
E

The outcome was a researcher-developed 15-question MCQ quiz rather than a widely recognized standardized exam.

"Standardized multiple-choice questions (MCQs) based on evidence-based guidelines were developed to assess task performance and clinical decision-making."
T

Outcomes were measured immediately after a brief 15-minute quiz, not at least one academic term after the intervention began.

"Immediately after task completion, both groups completed the NASA-TLX survey to assess their perceived task load..."
D

The control group is clearly defined (baseline knowledge, no AI), its size is reported, and its conditions are described alongside the intervention group and outcome tables.

"Participants were randomized into AI-assisted (n = 67, 50.8%) and baseline knowledge (n = 65, 49.2%) groups."
S

Randomization occurred among individual interns within one institution, not among schools (or equivalent educational sites).

"Simple randomization was performed using a 1:1 allocation ratio generated in Microsoft Excel."
I

The paper does not document an independent third-party evaluation; instead, the study team designed the assessment and ran the study.

"Standardized multiple-choice questions (MCQs) based on evidence-based guidelines were developed to assess task performance and clinical decision-making."
Y

The study does not track outcomes for 75% of an academic year and does not provide term-length follow-up.

"The maximum time allowed to solve the 15-question test was 15 min."
B

Although the AI-assisted group received additional resources (internet and AI tools), these resources are the explicit treatment being tested against baseline practice, with time-on-task held constant.

"Therefore, the control group completed a paper-based quiz reflecting the standard classroom practice at our institution, whereas the intervention group completed the same quiz on Google Forms with access to AI tools."
R

No independent replication of this specific study was found, and the paper only cites other related AI education RCTs rather than replications of this experiment.

"These findings align with those of a previous randomized clinical trial (RCT) conducted at Georgetown University School of Medicine (Kalam et al., 2025)..."
A

This criterion is not met because exam-based assessment (E) is not met; additionally, outcomes focus on a single case-based quiz rather than standardized exams across all core subjects.

"The study outcomes are: 1. Primary outcome: Task performance score, reported as the total number of correct answers out of 15 multiple-choice questions (score range: 0–15)..."
G

The study does not track participants until graduation and, since year duration (Y) is not met, graduation tracking (G) is not met.

"Future investigations should include longitudinal studies that assess the sustained influence of AI on perceived task load and clinical performance..."
P

The paper provides ethics approval details but does not report a public pre-registration record, registry ID, or a registration date before data collection.

Abstract

Introduction: This study aimed to assess the impact of artificial intelligence (AI) assistance on immediate task performance and evaluate perceived task load and AI acceptance among dental interns in an educational setting. Methods: A pragmatic experiment was conducted among 132 dental interns during the 2024–2025 academic year. Participants were randomly allocated to either an AI-assisted group (n = 67) or a baseline knowledge control group (n = 65) to complete a 15-question quiz based on pediatric orthodontic cases. Perceived task load was measured using the National Aeronautics and Space Administration Task Load Index. AI acceptance was assessed using the Technology Acceptance Model (TAM). Task performance (quiz scores), task load, and AI acceptance were analyzed using Wilcoxon rank-sum tests and an adjusted generalized regression model. Results: The AI-assisted group achieved higher task performance scores (median, 13 vs. 11; p < 0.0001) and lower perceived task load scores (median, 21.7 vs. 41.7; p < 0.0001) than the control group. The AI-assisted group had 1.67 times higher odds of answering a question correctly compared to controls in the adjusted model. Responses to the TAM demonstrated high levels of perceived usefulness, perceived ease of use, and behavioral intention (Cronbach’s α = 0.92–0.95). Conclusion: The AI-assisted group demonstrated improved immediate task performance and reduced perceived task load compared to the control group. This study serves as a preliminary step toward understanding how AI tools can support clinical learning and decision-making processes in educational settings.

Full Article

ERCT Criteria Breakdown

Level 1 Criteria
- C
  Class-level RCT
  - Randomization was at the individual participant level (dental interns), not at the class (or stronger) level, and no tutoring exception applies.
  - "For multivariable analysis, the unit of analysis was the participant."
  - Relevant Quotes: 1) "Simple randomization was performed using a 1:1 allocation ratio generated in Microsoft Excel. Equal numbers of printed slips labeled “Group I,” or “Group II” were folded and placed in an opaque container. The slips were mixed thoroughly, and at the time of data collection, the participants were asked to randomly draw one slip each, without replacement." (p. 3) 2) "For multivariable analysis, the unit of analysis was the participant." (p. 4) Detailed Analysis: Criterion C requires randomization at the class level (or stronger school/site level) to reduce cross-group contamination, unless the intervention is explicitly one-to-one tutoring/personal teaching (in which case individual randomization can be acceptable). The paper describes simple randomization where individual "participants" (dental interns) draw slips assigning them to Group I vs Group II, and it explicitly states that the "unit of analysis was the participant." This establishes an individually randomized experiment, not a class- or school-level cluster RCT. The intervention is AI assistance during a quiz, not tutoring or personal teaching, so the tutoring exception does not apply. Final Summary: Criterion C is not met because allocation is at the participant level rather than the class (or school/site) level, and no tutoring exception applies.
- E
  Exam-based Assessment
  - The outcome was a researcher-developed 15-question MCQ quiz rather than a widely recognized standardized exam.
  - "Standardized multiple-choice questions (MCQs) based on evidence-based guidelines were developed to assess task performance and clinical decision-making."
  - Relevant Quotes: 1) "Participants were randomly allocated to either an AI-assisted group (n = 67) or a baseline knowledge control group (n = 65) to complete a 15-question quiz based on pediatric orthodontic cases." (p. 1) 2) "Standardized multiple-choice questions (MCQs) based on evidence-based guidelines were developed to assess task performance and clinical decision-making." (p. 4) 3) "The study outcomes are: 1. Primary outcome: Task performance score, reported as the total number of correct answers out of 15 multiple-choice questions (score range: 0–15)..." (p. 4) Detailed Analysis: Criterion E requires standardized exam-based assessment that is widely recognized and externally standardized (e.g., national/state exams), rather than an assessment created for the purposes of the study. This study evaluates performance using a 15-question quiz derived from clinical cases, with MCQs explicitly described as being "developed" for the study. Although the authors use the term "Standardized" to describe MCQs, the process described is local test development with content validity review, not use of a known, widely adopted standardized exam. Final Summary: Criterion E is not met because outcomes are measured using a study-developed quiz rather than a widely recognized standardized exam.
- T
  Term Duration
  - Outcomes were measured immediately after a brief 15-minute quiz, not at least one academic term after the intervention began.
  - "Immediately after task completion, both groups completed the NASA-TLX survey to assess their perceived task load..."
  - Relevant Quotes: 1) "Data were collected between May 2025 and June 2025." (p. 2) 2) "The maximum time allowed to solve the 15-question test was 15 min." (p. 3) 3) "Immediately after task completion, both groups completed the NASA-TLX survey to assess their perceived task load, which was completed digitally using Google Forms." (p. 4) 4) "This study aimed to assess the impact of artificial intelligence (AI) assistance on immediate task performance..." (p. 1) Detailed Analysis: Criterion T requires that outcomes be measured at least one full academic term (typically ~3–4 months) after the intervention begins, even if the intervention itself is short. Here, the intervention exposure is limited to a short quiz session (maximum 15 minutes). The paper frames the main endpoint as "immediate task performance" and administers the task-load measure "immediately after task completion." There is no term-length follow-up measurement window. Final Summary: Criterion T is not met because outcomes are measured immediately after a brief quiz rather than at least one academic term after the intervention begins.
- D
  Documented Control Group
  - The control group is clearly defined (baseline knowledge, no AI), its size is reported, and its conditions are described alongside the intervention group and outcome tables.
  - "Participants were randomized into AI-assisted (n = 67, 50.8%) and baseline knowledge (n = 65, 49.2%) groups."
  - Relevant Quotes: 1) "• Group I (AI-assisted): Dental interns were permitted to use AI-assisted tools for task completion. • Group II (Baseline Knowledge): Dental interns completing the task without AI assistance." (p. 2) 2) "Both groups completed the quiz simultaneously in the same, quiet room under identical supervision and time limits to minimize environmental bias." (p. 3) 3) "The final analyzed sample included 132 dental interns, of whom 53.8% were female (n = 71) and 46.2% were male (n = 61). Participants were randomized into AI-assisted (n = 67, 50.8%) and baseline knowledge (n = 65, 49.2%) groups." (p. 5) Detailed Analysis: Criterion D requires that the control group be documented well enough to interpret the comparison, including a clear description of what the control group received and who it included. The paper explicitly defines the control condition (baseline knowledge, no AI assistance), provides group sizes, and describes key features of the testing conditions (same room, supervision, and time limits). The results section reports group counts and basic participant demographics. While the study does not provide a pre-intervention baseline test of the primary outcome (quiz performance), the control group condition itself is clearly described and reported. Final Summary: Criterion D is met because the control group condition and sample size are explicitly defined and described with sufficient detail to interpret comparisons.
Level 2 Criteria
- S
  School-level RCT
  - Randomization occurred among individual interns within one institution, not among schools (or equivalent educational sites).
  - "Simple randomization was performed using a 1:1 allocation ratio generated in Microsoft Excel."
  - Relevant Quotes: 1) "Simple randomization was performed using a 1:1 allocation ratio generated in Microsoft Excel." (p. 3) 2) "For multivariable analysis, the unit of analysis was the participant." (p. 4) Detailed Analysis: Criterion S requires school-level (or equivalent site-level) randomization, where whole schools/centers/sites are assigned to conditions. This experiment randomizes individual dental interns and treats the participant as the unit of analysis. The paper does not describe multiple schools/sites being randomized. Final Summary: Criterion S is not met because assignment is at the individual participant level, not at the school/site level.
- I
  Independent Conduct
  - The paper does not document an independent third-party evaluation; instead, the study team designed the assessment and ran the study.
  - "Standardized multiple-choice questions (MCQs) based on evidence-based guidelines were developed to assess task performance and clinical decision-making."
  - Relevant Quotes: 1) "Standardized multiple-choice questions (MCQs) based on evidence-based guidelines were developed to assess task performance and clinical decision-making." (p. 4) 2) "Author contributions GB: ... Conceptualization, Methodology, ... Investigation, ... DB: ... Conceptualization, ... Methodology, Formal analysis. ... NS: ... Methodology, Investigation. ... DA: ... Conceptualization, Investigation, ..." (p. 8) Detailed Analysis: Criterion I requires that the evaluation be conducted independently from the intervention designers/providers (or at least clearly document independent third-party conduct of implementation, measurement, or analysis). The paper indicates that the study materials (MCQs) were developed as part of the study and that the author team performed core study roles including conceptualization, methodology, investigation, and formal analysis. The article does not provide a statement that an external evaluation team independent of the authors conducted the evaluation or analysis. Final Summary: Criterion I is not met because independent third-party conduct is not documented and the study appears to have been conducted and analyzed by the author team.
- Y
  Year Duration
  - The study does not track outcomes for 75% of an academic year and does not provide term-length follow-up.
  - "The maximum time allowed to solve the 15-question test was 15 min."
  - Relevant Quotes: 1) "The maximum time allowed to solve the 15-question test was 15 min." (p. 3) 2) "Immediately after task completion, both groups completed the NASA-TLX survey to assess their perceived task load..." (p. 4) 3) "This study aimed to assess the impact of artificial intelligence (AI) assistance on immediate task performance..." (p. 1) Detailed Analysis: Criterion Y requires outcome measurement at least 75% of one academic year after the intervention begins. This study measures immediate quiz performance and immediate post-task perceived workload, with the quiz itself limited to a 15-minute session. There is no year-long follow-up period. Final Summary: Criterion Y is not met because outcome measurement is immediate and does not approach 75% of an academic year of follow-up.
- B
  Balanced Control Group
  - Although the AI-assisted group received additional resources (internet and AI tools), these resources are the explicit treatment being tested against baseline practice, with time-on-task held constant.
  - "Therefore, the control group completed a paper-based quiz reflecting the standard classroom practice at our institution, whereas the intervention group completed the same quiz on Google Forms with access to AI tools."
  - Relevant Quotes: 1) "Both groups completed the quiz simultaneously in the same, quiet room under identical supervision and time limits to minimize environmental bias." (p. 3) 2) "The maximum time allowed to solve the 15-question test was 15 min." (p. 3) 3) "Therefore, the control group completed a paper-based quiz reflecting the standard classroom practice at our institution, whereas the intervention group completed the same quiz on Google Forms with access to AI tools." (p. 3) 4) "The intervention group had access to the Internet during the quiz but did not receive any new AI instructional content or preparatory material beyond that available to the control group." (p. 3) 5) "Third, although the experiment was designed to reflect real classroom practice, the difference in the intervention group completing an online quiz while the control group completed a paper-based quiz might have influenced the participants performance and perceived task load. These differences could be unrelated to AI assistance such as (ergonomics, navigation speed, or user interface) and are potential confounders." (p. 8) Detailed Analysis: Criterion B checks whether time and resources are balanced between intervention and control, unless the additional resources are explicitly the treatment variable being tested. Extra resources are present: the intervention group has access to internet and AI tools, and uses Google Forms, whereas the control group uses a paper-based quiz without internet/AI. However, these added resources (AI/internet access and the means to use them) are the core treatment contrast of the study ("AI assistance" vs "baseline knowledge" standard quiz practice). This matches the ERCT exception: when additional resources are the intended treatment, the control group can remain business-as-usual. The study also holds constant key inputs that would otherwise confound "time on education," including simultaneous testing, the same room, identical supervision, and the same time limit. The authors acknowledge a limitation that online vs paper mode may introduce confounding differences (e.g., interface/ergonomics), but this does not negate that the additional resources are explicitly the intervention being tested and time-on-task is held constant. Final Summary: Criterion B is met because the additional resources (AI/internet access) are the explicit treatment being evaluated against business-as-usual, with time-on-task held constant.
Level 3 Criteria
- R
  Reproduced
  - No independent replication of this specific study was found, and the paper only cites other related AI education RCTs rather than replications of this experiment.
  - "These findings align with those of a previous randomized clinical trial (RCT) conducted at Georgetown University School of Medicine (Kalam et al., 2025)..."
  - Relevant Quotes: 1) "These findings align with those of a previous randomized clinical trial (RCT) conducted at Georgetown University School of Medicine (Kalam et al., 2025)..." (p. 7) 2) "Despite this, the heterogeneity in AI interventions used may have limited the reproducibility and interpretability of the study findings." (p. 8) External Search Results (performed 2026-03-04): - Searched by DOI, full title, and author combination for independent replications of this exact dental-intern quiz RCT. - No peer-reviewed publication was found that explicitly reports an independent replication of this specific study (same study, same cohort, or a clearly stated replication of this exact protocol). Detailed Analysis: Criterion R requires an independent replication of this specific study, conducted by a different research team in a different context, published in a peer-reviewed outlet. The article cites other AI-related RCTs (e.g., medical student and dental student contexts) as related evidence, but these are not described as replications of this exact experiment. Given the very recent publication date (January 2026), and based on external searching, there is no identified independent replication of this exact trial. Final Summary: Criterion R is not met because no independent peer-reviewed replication of this specific study was found.
- A
  All-subject Exams
  - This criterion is not met because exam-based assessment (E) is not met; additionally, outcomes focus on a single case-based quiz rather than standardized exams across all core subjects.
  - "The study outcomes are: 1. Primary outcome: Task performance score, reported as the total number of correct answers out of 15 multiple-choice questions (score range: 0–15)..."
  - Relevant Quotes: 1) "The study outcomes are: 1. Primary outcome: Task performance score, reported as the total number of correct answers out of 15 multiple-choice questions (score range: 0–15)..." (p. 4) 2) "Participants were randomly allocated to either an AI-assisted group (n = 67) or a baseline knowledge control group (n = 65) to complete a 15-question quiz based on pediatric orthodontic cases." (p. 1) Detailed Analysis: Criterion A requires standardized exam-based assessments across all main subjects, and ERCT rules specify that if criterion E is not met, criterion A is not met. This study uses a single study-developed quiz focused on pediatric orthodontic case decision-making, not standardized exams, and it does not assess multiple core subjects. Final Summary: Criterion A is not met because criterion E is not met and the study does not assess all main subjects using standardized exams.
- G
  Graduation Tracking
  - The study does not track participants until graduation and, since year duration (Y) is not met, graduation tracking (G) is not met.
  - "Future investigations should include longitudinal studies that assess the sustained influence of AI on perceived task load and clinical performance..."
  - Relevant Quotes: 1) "This study aimed to assess the impact of artificial intelligence (AI) assistance on immediate task performance..." (p. 1) 2) "Future investigations should include longitudinal studies that assess the sustained influence of AI on perceived task load and clinical performance, its impact on critical thinking, and strategies to balance AI support with independent problem-solving skills over time." (p. 8) 3) "The maximum time allowed to solve the 15-question test was 15 min." (p. 3) External Search Results (performed 2026-03-04): - Searched for follow-up publications by the same author team that track this cohort (dental interns) through internship completion and degree award (graduation-relevant endpoints). - No follow-up peer-reviewed publication was found that reports tracking this cohort until graduation. Detailed Analysis: Criterion G requires tracking participants until graduation (for this context, a plausible graduation endpoint would be completion of the internship year leading to degree award). ERCT rules also state that if criterion Y is not met, criterion G is not met. The study is explicitly about "immediate task performance" during a brief quiz session and calls for future longitudinal work, indicating that graduation tracking is not part of this study. In addition, external searching did not identify follow-up publications tracking this cohort to graduation. Final Summary: Criterion G is not met because the study measures only immediate outcomes and does not (and, given Y is not met, cannot) document tracking participants until graduation.
- P
  Pre-Registered
  - The paper provides ethics approval details but does not report a public pre-registration record, registry ID, or a registration date before data collection.
  - Relevant Quotes: 1) "The study was approved by the Ethical Research Committee of the Faculty of Dentistry at King Abdulaziz University Dental Hospital (KAUDH), Jeddah, Saudi Arabia (Protocol Code: 28–02-25)." (p. 2) 2) "Data were collected between May 2025 and June 2025." (p. 2) External Search Results (performed 2026-03-04): - Searched for a public pre-registration record using the DOI, full title, and author names. - Also searched common registries (e.g., OSF and ClinicalTrials.gov via web search) for a matching registration entry. - No public pre-registration record (with an ID and a registration date prior to May 2025) was found. Detailed Analysis: Criterion P requires a publicly accessible pre-registered protocol (with a registry ID and a registration date that precedes the start of data collection). The paper documents ethics approval and provides the data collection window (May–June 2025) but does not provide a registration platform, registry ID, or public registration date. External searching did not identify a matching public pre-registration record. Final Summary: Criterion P is not met because no public pre-registration record (ID plus pre-data-collection registration date) is documented or findable for this study.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Request Valuation Update

All Other Requests

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

Submit a Study for Evaluation

Share your research with us for review
Suggest Improvements

Provide feedback to help us make things better.
Update Your Study

If you're the author, let us know about necessary updates or corrections.

Impact of artificial intelligence on task performance and perceived task load: a pragmatic randomized experiment

Randomization was at the individual participant level (dental interns), not at the class (or stronger) level, and no tutoring exception applies.

The outcome was a researcher-developed 15-question MCQ quiz rather than a widely recognized standardized exam.

Outcomes were measured immediately after a brief 15-minute quiz, not at least one academic term after the intervention began.

The control group is clearly defined (baseline knowledge, no AI), its size is reported, and its conditions are described alongside the intervention group and outcome tables.

Randomization occurred among individual interns within one institution, not among schools (or equivalent educational sites).

The paper does not document an independent third-party evaluation; instead, the study team designed the assessment and ran the study.

The study does not track outcomes for 75% of an academic year and does not provide term-length follow-up.

Although the AI-assisted group received additional resources (internet and AI tools), these resources are the explicit treatment being tested against baseline practice, with time-on-task held constant.

No independent replication of this specific study was found, and the paper only cites other related AI education RCTs rather than replications of this experiment.

This criterion is not met because exam-based assessment (E) is not met; additionally, outcomes focus on a single case-based quiz rather than standardized exams across all core subjects.

The study does not track participants until graduation and, since year duration (Y) is not met, graduation tracking (G) is not met.

The paper provides ethics approval details but does not report a public pre-registration record, registry ID, or a registration date before data collection.

Abstract

ERCT Criteria Breakdown

Level 1 Criteria

Class-level RCT

Exam-based Assessment

Term Duration

Documented Control Group

Level 2 Criteria

School-level RCT

Independent Conduct

Year Duration

Balanced Control Group

Level 3 Criteria

Reproduced

All-subject Exams

Graduation Tracking

Pre-Registered

Request an Update or Contact Us

Have Questions or Suggestions?

Submit a Study for Evaluation

Suggest Improvements

Update Your Study

Have Questions
or Suggestions?