The role of working memory for learning with context-personalized tasks in elementary school

Ann-Kathrin Laufs, André Meyer, Maleika Krüger, and Sebastian Kempert

Published:
ERCT Check Date:
DOI: 10.3389/fpsyg.2026.1671810
  • science
  • K12
  • EU
0
  • C

    Randomization was conducted at the small-group level (and not fully rigorous), not at the class (or higher) level.

    "we treated the group as the unit of randomization."

  • E

    The primary educational outcome (CVS comprehension) was measured using a research instrument rather than a widely recognized standardized exam.

    "we employed an instrument developed by Edelsbrunner et al. (2018)"

  • T

    The main outcome measurement occurred within weeks (eight-week period and a post-test two weeks later), not at least one academic term after the intervention began.

    "The study consisted of six sessions, each lasting 50 min, conducted over an eight-week period."

  • D

    The control condition is clearly described, and baseline/descriptive information by condition is reported.

    "In the control condition, magnetism served as the task context which may be regarded as a standard context in German early science classes..."

  • S

    The study took place in multiple schools, but randomization was not conducted at the school level.

    "we treated the group as the unit of randomization."

  • I

    The paper does not clearly document that an independent third-party evaluation team conducted the study and analyses.

    "Trained test administrators, who were teacher trainees or psychology students, facilitated the sessions during regular school hours."

  • Y

    Outcome measurement occurred far short of 75% of an academic year, and criterion T is not met.

    "We assessed CVS comprehension in a post-test 2 weeks later (Edelsbrunner et al., 2018)."

  • B

    The intervention and control conditions appear time- and materials-structured similarly, differing mainly in contextual embedding of otherwise identical tasks.

    "The tasks assigned to the participants in the personalized and control conditions differed only with respect to their thematic embedding; the underlying task and sentence structure were identical:"

  • R

    No independent replication of this specific study was found in the paper or via internet searching as of the ERCT check date.

  • A

    Criterion E is not met, and the study does not assess all core subjects using standardized exams.

    "At the end of both meetings, we surveyed the situational interest in the task context and the learning content CVS and measured reading comprehension (ELFE II, Lenhard et al., 2018)."

  • G

    The study does not track students until graduation, and criterion Y is not met.

  • P

    No pre-registration (registry, ID/link, and pre-data-collection timing) is reported in the paper, and no corresponding entry was found via internet searching.

Abstract

Context personalization is an instructional approach aimed at enhancing students’ engagement and cognitive processing by embedding learning content in familiar contexts. Numerous studies explore the benefits of personalized tasks for learning, but few empirically examine cognitive mechanisms underlying the effects of context personalization. In a cluster-randomized control trial with N = 156 elementary school students, we investigated (1) whether context personalization leads to an increased interest in the learning content. Furthermore, we examined (2) the role of working memory for learning and (3) whether the assumed effect of working memory on students’ learning performance was moderated by the use of context-personalized tasks. The results indicate that context personalization elicits interest in the learning content. In addition, working memory was a significant predictor of student performance across conditions. However, the hypothesized moderating effect of context personalization on the relationship between working memory and student performance was not supported. These results contribute to a more nuanced understanding of the cognitive and motivational effects of context-personalized tasks in elementary science education.

Full Article

ERCT Criteria Breakdown

  • Level 1 Criteria

    • C

      Class-level RCT

      • Randomization was conducted at the small-group level (and not fully rigorous), not at the class (or higher) level.
      • "we treated the group as the unit of randomization."
      • Relevant Quotes: 1) "Based on their interest ratings, we then assigned students to small groups of four on average." (p. 6) 2) "Each group was randomly assigned to one of the conditions during the intervention: Groups in the personalized condition received tasks with a context based on their individual interests (e.g., soccer)." (p. 6) 3) "Although we tried to assign students to the conditions randomly, rigorous randomization was not possible for practical and organizational reasons (e.g., the existence of remedial school classes)." (p. 6) 4) "Given the cluster-randomized design of the study, in which students were assigned to small groups based on their interest ratings and subsequently exposed to either a personalized learning intervention or a control condition, we treated the group as the unit of randomization." (p. 9) Detailed Analysis: Criterion C requires random assignment at the classroom level (or stronger) to reduce within-class contamination, except for one-to-one tutoring/personal-teaching designs. This study explicitly formed "small groups" and randomized at the group level, not at the class level. The authors also acknowledge that "rigorous randomization was not possible" for practical/organizational reasons. Group-level assignment within schools is weaker than class-level randomization under ERCT and can permit within-class/school spillovers. Final sentence explaining if criterion C is met/not met because criterion C is not met because assignment was at the small-group level (not class or school level) and was not fully rigorous.
    • E

      Exam-based Assessment

      • The primary educational outcome (CVS comprehension) was measured using a research instrument rather than a widely recognized standardized exam.
      • "we employed an instrument developed by Edelsbrunner et al. (2018)"
      • Relevant Quotes: 1) "We assessed CVS comprehension in a post-test 2 weeks later (Edelsbrunner et al., 2018)." (p. 6) 2) "For the assessment of CVS comprehension at the two points of measurement (pre-test and post-test), we employed an instrument developed by Edelsbrunner et al. (2018) that consists of a set of established tasks specifically designed to evaluate understanding of the CVS." (p. 7) 3) "At the end of both meetings, we surveyed the situational interest in the task context and the learning content CVS and measured reading comprehension (ELFE II, Lenhard et al., 2018)." (p. 6) Detailed Analysis: Criterion E requires standardized exam-based assessment for the main educational outcome(s), meaning widely recognized standardized exams (for example, state/national standardized achievement tests), not researcher-selected or researcher-developed outcome measures aligned to the intervention content. The key outcome here is CVS comprehension. The paper reports this outcome was measured using an instrument "developed by Edelsbrunner et al. (2018)" (with a shortened item set), which is not described as a nationwide/state standardized exam. The study also used ELFE II, but this is reading comprehension measured as a control variable rather than the primary targeted educational outcome of the CVS intervention. Final sentence explaining if criterion E is met/not met because criterion E is not met because the main educational outcome is assessed with a research CVS instrument rather than a standardized exam.
    • T

      Term Duration

      • The main outcome measurement occurred within weeks (eight-week period and a post-test two weeks later), not at least one academic term after the intervention began.
      • "The study consisted of six sessions, each lasting 50 min, conducted over an eight-week period."
      • Relevant Quotes: 1) "The study consisted of six sessions, each lasting 50 min, conducted over an eight-week period." (p. 5) 2) "We assessed CVS comprehension in a post-test 2 weeks later (Edelsbrunner et al., 2018)." (p. 6) Detailed Analysis: Criterion T requires outcomes to be measured at least one full academic term after the intervention begins (typically ~3 to 4 months). The paper describes an eight-week period for the sessions and indicates CVS comprehension was assessed in a "post-test 2 weeks later." Even interpreting timing generously from intervention start to post-test, this implies roughly ~10 weeks, which is shorter than a typical academic term, and the paper does not define this period as a term-length follow-up. Final sentence explaining if criterion T is met/not met because criterion T is not met because the start-to-outcome measurement window is about weeks (roughly 8 + 2), which is shorter than one academic term.
    • D

      Documented Control Group

      • The control condition is clearly described, and baseline/descriptive information by condition is reported.
      • "In the control condition, magnetism served as the task context which may be regarded as a standard context in German early science classes..."
      • Relevant Quotes: 1) "In the control condition, magnetism served as the task context which may be regarded as a standard context in German early science classes (see Berlin Senate Department for Education, Youth and Family and Ministry of Education, Youth and Sports of the State of Brandenburg, 2015)." (p. 6) 2) "The result was a total of 28 groups in the personalized condition (task contexts: playing soccer, reading, swimming, riding horses, doing gymnastics, bicycling) and 13 groups in the control condition with magnetism as a task context." (p. 6) 3) "This resulted in a final sample of n = 156 students for the present analysis (personalized condition: 100, control condition: 56)." (p. 6) 4) "Table 4 shows the mean scores and standard deviations for the pre-test on CVS, the z-standardized measure for working memory components, the standardized t-scores for cognitive abilities, and reading comprehension by condition." (p. 9) Detailed Analysis: Criterion D requires that the control group is well documented, including what it received and sufficient baseline/descriptive information to support comparisons. The paper clearly defines the control condition (magnetism context), reports group counts by condition, reports analyzed sample sizes per condition, and reports baseline/descriptive statistics by condition (via Table 4 description). This is sufficient documentation of the control group for ERCT purposes. Final sentence explaining if criterion D is met/not met because criterion D is met because the control condition is explicitly described and baseline/descriptive information is reported by condition.
  • Level 2 Criteria

    • S

      School-level RCT

      • The study took place in multiple schools, but randomization was not conducted at the school level.
      • "we treated the group as the unit of randomization."
      • Relevant Quotes: 1) "We conducted cluster-randomized trial of a personalized learning intervention with a pre- and post-test at six elementary schools located in urban and suburban regions of Germany." (p. 5) 2) "Given the cluster-randomized design of the study, in which students were assigned to small groups based on their interest ratings and subsequently exposed to either a personalized learning intervention or a control condition, we treated the group as the unit of randomization." (p. 9) Detailed Analysis: Criterion S requires that schools (or equivalent sites) are the unit of random assignment. Although the study was conducted "at six elementary schools," the authors specify that they "treated the group as the unit of randomization." Therefore, randomization is not at the school level. Final sentence explaining if criterion S is met/not met because criterion S is not met because assignment was conducted at the small-group level, not the school level.
    • I

      Independent Conduct

      • The paper does not clearly document that an independent third-party evaluation team conducted the study and analyses.
      • "Trained test administrators, who were teacher trainees or psychology students, facilitated the sessions during regular school hours."
      • Relevant Quotes: 1) "Trained test administrators, who were teacher trainees or psychology students, facilitated the sessions during regular school hours." (p. 5) 2) "A-KL: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Writing – original draft, Writing – review & editing." (p. 13) Detailed Analysis: Criterion I requires that the evaluation is conducted independently from the intervention designers/authors to reduce risks of bias in implementation, measurement, analysis, and reporting. The paper states that trained test administrators facilitated sessions, but it does not describe these administrators as belonging to an independent evaluation organization separate from the research team. Moreover, the author contributions explicitly indicate that the authors (notably A-KL) performed core evaluation functions including investigation, data curation, and formal analysis. Final sentence explaining if criterion I is met/not met because criterion I is not met because independent third-party conduct of the evaluation is not clearly documented and core evaluation work is attributed to the authors.
    • Y

      Year Duration

      • Outcome measurement occurred far short of 75% of an academic year, and criterion T is not met.
      • "We assessed CVS comprehension in a post-test 2 weeks later (Edelsbrunner et al., 2018)."
      • Relevant Quotes: 1) "The study consisted of six sessions, each lasting 50 min, conducted over an eight-week period." (p. 5) 2) "We assessed CVS comprehension in a post-test 2 weeks later (Edelsbrunner et al., 2018)." (p. 6) Detailed Analysis: Criterion Y requires outcomes to be measured at least 75% of an academic year after the intervention begins. The paper describes an eight-week period for the sessions and a post-test two weeks later, which is far short of a school-year scale follow-up. Additionally, per ERCT dependency, if criterion T is not met then criterion Y is not met. Final sentence explaining if criterion Y is met/not met because criterion Y is not met because the study’s tracking spans weeks rather than most of an academic year, and criterion T is not met.
    • B

      Balanced Control Group

      • The intervention and control conditions appear time- and materials-structured similarly, differing mainly in contextual embedding of otherwise identical tasks.
      • "The tasks assigned to the participants in the personalized and control conditions differed only with respect to their thematic embedding; the underlying task and sentence structure were identical:"
      • Relevant Quotes: 1) "The tasks assigned to the participants in the personalized and control conditions differed only with respect to their thematic embedding; the underlying task and sentence structure were identical:" (p. 6) 2) "All sessions consisted of the following phases: (1) general introduction, (2) introduction to the task with a worksheet, (3) explanation of comparisons (in general), (4) explanation of “fair” comparisons, (5) application and practice of “fair” comparisons together on Example 1, (6) application and practice together on Example 2, (7) review of results." (p. 6) 3) "Phase 1 started with a segment welcoming the students. In addition, for the personalized condition, there was a short conversation that addressed students’ interests from the interest questionnaire." (p. 6) Detailed Analysis: Criterion B asks whether the control condition offers a comparable substitute for the intervention’s inputs (time, attention, materials), unless extra resources are explicitly the treatment variable. The paper explicitly states that tasks differed only by thematic embedding, with identical underlying structure, and lists a shared session-phase structure, supporting equivalence of instructional time and materials. The personalized condition included an additional "short conversation" about interests, but (as described) this appears minor and embedded within the same overall session structure rather than reflecting a major time/budget imbalance. Final sentence explaining if criterion B is met/not met because criterion B is met because the paper describes the two conditions as structurally identical in tasks and session phases, with only contextual embedding (and a minor interest conversation) differing.
  • Level 3 Criteria

    • R

      Reproduced

      • No independent replication of this specific study was found in the paper or via internet searching as of the ERCT check date.
      • Relevant Quotes: 1) "Numerous studies explore the benefits of personalized tasks for learning, but few empirically examine cognitive mechanisms underlying the effects of context personalization." (p. 1) Detailed Analysis: Criterion R requires that the study be independently replicated by a different research team, in a different context, and published in a peer-reviewed outlet. The paper itself does not describe this study as a replication of a prior trial, nor does it cite a publication that independently replicates this specific experimental design and results. Additional internet searches by title, DOI, and author names did not identify a peer-reviewed independent replication of this specific study available as of 2026-03-13. Final sentence explaining if criterion R is met/not met because criterion R is not met because no independent replication of this specific study was found or reported.
    • A

      All-subject Exams

      • Criterion E is not met, and the study does not assess all core subjects using standardized exams.
      • "At the end of both meetings, we surveyed the situational interest in the task context and the learning content CVS and measured reading comprehension (ELFE II, Lenhard et al., 2018)."
      • Relevant Quotes: 1) "At the end of both meetings, we surveyed the situational interest in the task context and the learning content CVS and measured reading comprehension (ELFE II, Lenhard et al., 2018)." (p. 6) 2) "For the assessment of CVS comprehension at the two points of measurement (pre-test and post-test), we employed an instrument developed by Edelsbrunner et al. (2018) that consists of a set of established tasks specifically designed to evaluate understanding of the CVS." (p. 7) Detailed Analysis: Criterion A requires standardized exam-based assessment across all main subjects (and it depends on criterion E being met). Here, criterion E is not met because the main outcome (CVS comprehension) is assessed using a research instrument rather than a standardized exam. Additionally, the paper does not report standardized exam outcomes across core subjects; it focuses on CVS comprehension and includes reading comprehension (ELFE II) as a control variable rather than comprehensive all-subject achievement testing. Final sentence explaining if criterion A is met/not met because criterion A is not met because criterion E is not met and the study does not assess all core subjects with standardized exams.
    • G

      Graduation Tracking

      • The study does not track students until graduation, and criterion Y is not met.
      • Relevant Quotes: 1) "We assessed CVS comprehension in a post-test 2 weeks later (Edelsbrunner et al., 2018)." (p. 6) 2) "10 Follow-up-test CVS" (p. 10) Detailed Analysis: Criterion G requires tracking participants until graduation from the relevant educational stage, and per ERCT dependency, if criterion Y is not met then criterion G is not met. The paper describes short-term measurement timing, including a post-test two weeks later, and does not describe tracking to any graduation milestone. The correlation table includes a "Follow-up-test CVS" variable label, but the paper does not provide a described follow-up timing that would approach year-scale tracking, and in any case it is not described as tracking to graduation. Final sentence explaining if criterion G is met/not met because criterion G is not met because the study does not report graduation tracking and criterion Y is not met.
    • P

      Pre-Registered

      • No pre-registration (registry, ID/link, and pre-data-collection timing) is reported in the paper, and no corresponding entry was found via internet searching.
      • Relevant Quotes: 1) "The Berlin Senate Department for Education, Youth and Family and the Ministry of Education Youth and Sports of the State of Brandenburg evaluated our study design and approved all instruments used in this study." (p. 5) Detailed Analysis: Criterion P requires a publicly accessible, pre-registered protocol with evidence it was registered before data collection began. The paper provides ethics/design approval information but does not report a trial registration, registry name, registration ID, or registration date. Searching for registrations using the article title, DOI, and author names did not locate a public pre-registration record for this study as of 2026-03-13. Final sentence explaining if criterion P is met/not met because criterion P is not met because no pre-registration record (ID/link/date before data collection) is reported or found.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

  • Submit a Study for Evaluation

    Share your research with us for review

  • Suggest Improvements

    Provide feedback to help us make things better.

  • Update Your Study

    If you're the author, let us know about necessary updates or corrections.