Abstract
Writing from multiple texts is a widespread task in higher education and requires advanced reading, writing and self-regulatory skills. Research shows that learners often struggle to accurately evaluate their performance in such tasks. This study examines whether rubrics can enhance self-assessment accuracy and improve regulatory decisions when university history students write essays from conflicting historical texts. A total of 115 students were randomly assigned to a guided rubric or unguided rubric condition. Participants in the guided rubric condition used assessment criteria, performance level and descriptors to assess their essays after reading three texts on a historical controversy, whereas the unguided rubric condition used only criteria and performance levels. Metacomprehension and regulation accuracy were measured. While guided rubrics did not significantly enhance overall metacomprehension accuracy or regulation, the impact of guided rubric use varied as a function of students' essay performance.
Full
Article
ERCT Criteria Breakdown
-
Level 1 Criteria
-
C
Class-level RCT
- Random assignment was at the individual student level (within one course), not at the class (or higher) level, and the intervention is not a one-to-one tutoring exception.
- "Participants were randomly assigned to one of the two experimental conditions." (Section 6.1)
Relevant Quotes:
1) "The study employed a one-factorial between-subjects experimental design with two conditions: guided rubric versus unguided rubric. Participants were randomly assigned to one of the two experimental conditions." (Section 6.1)
2) "Participation was part of an undergraduate history course at the university." (Section 6.1)
Detailed Analysis:
Criterion C requires randomization at the class level (or stronger) to reduce contamination between treatment and control. The paper describes a between-subjects experiment where individual students were "randomly assigned" to guided versus unguided rubric conditions, and participation occurred within an undergraduate history course context. This indicates student-level randomization rather than class- or school-level assignment.
The criterion C exception (student-level randomization is acceptable for one-to-one tutoring/personal teaching interventions) does not apply here because the intervention is a rubric-based self-assessment scaffold, not individualized tutoring.
Final summary sentence: Criterion C is not met because randomization occurred at the individual student level rather than by class (and no tutoring exception applies).
-
E
Exam-based Assessment
- Outcomes were measured with study-specific instruments (expert ratings via a rubric and a custom pre-test), not a widely recognized standardized exam.
- "The pre-test included 14 single-choice questions, each with four answer options, of which one was correct." (Section 6.2.2.1)
Relevant Quotes:
1) "The pre-test included 14 single-choice questions, each with four answer options, of which one was correct." (Section 6.2.2.1)
2) "To determine the accuracy of students' self-assessments regarding the six essay criteria, two experts used the rubric provided to the students in the guided rubric condition to rate the six essay criteria." (Section 6.2.2.2)
Detailed Analysis:
Criterion E requires a standardized exam-based assessment (e.g., state-wide/national standardized achievement tests or widely recognized standardized instruments). In this study, outcomes are assessed using (a) a study-specific pre-test created for the historical topic and (b) expert ratings of essays using the study's rubric. These are not standardized exams in the ERCT sense.
The paper does not name any external standardized achievement test used as the primary outcome measure. Therefore, the assessment approach does not satisfy ERCT criterion E.
Final summary sentence: Criterion E is not met because the study uses study-specific measures rather than a standardized exam-based assessment.
-
T
Term Duration
- The intervention and outcome measurement happened within a single short session (about 60 minutes), far shorter than one academic term.
- "The experiment lasted approximately 60 min (Fig. 2)." (Section 6.2.3)
Relevant Quotes:
1) "The reading time was fixed at 15 min." (Section 6.2.3)
2) "The experiment lasted approximately 60 min (Fig. 2)." (Section 6.2.3)
Detailed Analysis:
Criterion T requires that outcomes are measured at least one full academic term after the intervention begins (i.e., the time from intervention start to outcome measurement must be at least term-length).
The paper describes a single-session procedure with fixed 15-minute reading time and a total experiment duration of approximately 60 minutes, with no delayed follow-up measurement months later. This is far shorter than an academic term.
Final summary sentence: Criterion T is not met because outcomes were measured within an approximately 60-minute session rather than at least one academic term after intervention start.
-
D
Documented Control Group
- The unguided-rubric control condition is clearly described, and baseline and equivalence information for both groups is reported.
- "In contrast, participants in the unguided rubric condition received the same six-point rating scale for each assessment criterion, the same evaluation prompt, and the same opportunity to consult their essays." (Section 6.2.1.3)
Relevant Quotes:
1) "In contrast, participants in the unguided rubric condition received the same six-point rating scale for each assessment criterion, the same evaluation prompt, and the same opportunity to consult their essays. However, they did not receive any performance level descriptors that would help them interpret the meaning of the individual scale points." (Section 6.2.1.3)
2) "Mean scores and standard deviations for both groups on all measures are shown in Table 2." (Section 7.1)
3) "We did not find any statistically significant differences between the groups in terms of age, ... academic semester, ... gender, ... and prior knowledge..." (Section 7.1)
Detailed Analysis:
Criterion D requires a well-documented control group, including what the control condition received and baseline/descriptive information enabling comparison. The paper clearly specifies what the unguided rubric group received (same task, criteria, and performance levels, but without descriptors) and indicates the same prompt and opportunity to consult the essay were provided.
The paper also reports descriptive statistics (Table 2) and explicitly checks randomization equivalence across multiple participant and baseline variables. This constitutes clear control-group documentation.
Final summary sentence: Criterion D is met because the control condition is explicitly described and baseline comparability information is reported.
-
Level 2 Criteria
-
S
School-level RCT
- Randomization occurred among individual students, not among schools (or equivalent institutions/sites) implementing the intervention.
- "Participants were randomly assigned to one of the two experimental conditions." (Section 6.1)
Relevant Quotes:
1) "The study employed a one-factorial between-subjects experimental design with two conditions: guided rubric versus unguided rubric. Participants were randomly assigned to one of the two experimental conditions." (Section 6.1)
2) "Participation was part of an undergraduate history course at the university." (Section 6.1)
Detailed Analysis:
Criterion S requires school-level (or equivalent site-level) randomization. The paper describes a student-level between-subjects design and does not report random assignment of schools, course sections, or other intact institutional units as clusters. The described setting is a university course context with individual assignment to conditions.
Final summary sentence: Criterion S is not met because the unit of randomization is individual students rather than schools (or comparable implementation sites).
-
I
Independent Conduct
- The paper does not document that the trial was conducted and evaluated by an independent third-party team separate from the intervention designers/authors.
- "Corinna Schuster: ... Methodology, Formal analysis, Data curation, Conceptualization." (CRediT authorship contribution statement)
Relevant Quotes:
1) "Data was collected on campus in classrooms." (Section 6.2.3)
2) "Corinna Schuster: ... Methodology, Formal analysis, Data curation, Conceptualization." (CRediT authorship contribution statement)
3) "Marc Stadtler: ... Supervision, ... Methodology, Conceptualization." (CRediT authorship contribution statement)
Detailed Analysis:
Criterion I requires that study conduct and/or evaluation be independent from the intervention designers/authors (e.g., an external evaluation team collecting and analyzing data, or an explicit statement separating designers/providers from data collection and analysis).
The paper does not include a statement that data collection and analysis were conducted by an independent external evaluator. Instead, the author contribution statement indicates authors were responsible for key study design and analysis functions (conceptualization, methodology, and formal analysis). Although the paper mentions "two experts" who rated essays, it does not state these experts were independent from the research team.
Final summary sentence: Criterion I is not met because independent third-party conduct/evaluation is not documented.
-
Y
Year Duration
- Year-duration tracking is not reported and, since the study is far shorter than a term, it necessarily fails the one-academic-year duration requirement.
- "The experiment lasted approximately 60 min (Fig. 2)." (Section 6.2.3)
Relevant Quotes:
1) "The experiment lasted approximately 60 min (Fig. 2)." (Section 6.2.3)
Detailed Analysis:
Criterion Y requires outcomes to be measured at least 75% of an academic year after the intervention begins. The study is explicitly described as a short, single-session experiment lasting approximately 60 minutes, with no longer-term follow-up.
Additionally, per the ERCT dependency rule, if criterion T is not met then criterion Y is not met. Since this study clearly fails term duration, it cannot satisfy year duration.
Final summary sentence: Criterion Y is not met because the study is a single short session with no year-long tracking (and T is not met).
-
B
Balanced Control Group
- The control condition appears balanced because both groups had the same task and interface, and the only difference was the presence of performance level descriptors (no added time or budget/resources to one group are indicated).
- "The interface for the unguided rubric was identical, except that the performance level descriptors (shown here in bold) were not included." (Fig. 1 note)
Relevant Quotes:
1) "In contrast, participants in the unguided rubric condition received the same six-point rating scale for each assessment criterion, the same evaluation prompt, and the same opportunity to consult their essays. However, they did not receive any performance level descriptors..." (Section 6.2.1.3)
2) "Note. The interface for the unguided rubric was identical, except that the performance level descriptors (shown here in bold) were not included." (Fig. 1 note)
3) "The students had as much time as needed for their self-assessments." (Section 6.2.3)
Detailed Analysis:
Criterion B compares the nature, quantity, and quality of resources (time, materials, adult support, etc.) provided to intervention and control conditions, and asks whether the control offers a comparable substitute for the intervention's inputs unless the added resources are explicitly the treatment variable.
Here, the intervention is informational scaffolding (rubric descriptors). The paper indicates the unguided interface was otherwise identical, and both groups used the same rating scale, prompt, and opportunity to consult their essays. There is no indication of differential staff support, materials, or allocated instructional time across conditions (and time was not capped differently by group). The difference is the descriptor content, which is integral to the intervention contrast rather than a confounding add-on resource.
Final summary sentence: Criterion B is met because the two conditions are matched on time and materials and differ only in the rubric descriptor content that defines the treatment.
-
Level 3 Criteria
-
R
Reproduced
- No independent, peer-reviewed replication of this specific study was found, and the paper does not report that it has been replicated by an independent external team.
- "A second study (Krebs et al., 2024) replicated this effect..." (Introduction)
Relevant Quotes:
1) "A second study (Krebs et al., 2024) replicated this effect..." (Introduction)
Detailed Analysis:
Criterion R requires that the present study (or its core experimental claim in the same context and design) has been independently replicated by a different research team in a different context and published in a peer-reviewed journal.
The quoted statement concerns replication of a prior effect in the rubric literature (Krebs et al. replicating Krebs et al.), not replication of this specific 2026 study on writing from multiple conflicting historical texts. The paper itself does not claim that an independent external team has replicated this particular experiment.
An internet search for post-publication replications of this specific study did not identify an independent replication report.
Final summary sentence: Criterion R is not met because no independent replication of this specific study was identified or documented.
-
A
All-subject Exams
- Because the study does not use standardized exams (criterion E is not met), it cannot satisfy the all-subject standardized-exams requirement.
- "The pre-test included 14 single-choice questions, each with four answer options, of which one was correct." (Section 6.2.2.1)
Relevant Quotes:
1) "The pre-test included 14 single-choice questions, each with four answer options, of which one was correct." (Section 6.2.2.1)
2) "To determine the accuracy of students' self-assessments regarding the six essay criteria, two experts used the rubric provided to the students in the guided rubric condition to rate the six essay criteria." (Section 6.2.2.2)
Detailed Analysis:
Criterion A requires standardized exam-based assessments across all main subjects taught at that level (or a justified exception), and it depends on criterion E: if E is not met, A is not met.
This study uses a custom pre-test and rubric-based expert ratings focused on a single history essay task. It does not report standardized exams, and it does not assess multiple core subjects using standardized tests.
Final summary sentence: Criterion A is not met because standardized exams are not used (E is not met), so all-subject standardized exam coverage cannot be satisfied.
-
G
Graduation Tracking
- The study does not track participants through graduation, and because criterion Y is not met, criterion G cannot be met under the ERCT dependency rule.
- "Participation was part of an undergraduate history course at the university." (Section 6.1)
Relevant Quotes:
1) "Participation was part of an undergraduate history course at the university." (Section 6.1)
2) "The experiment lasted approximately 60 min (Fig. 2)." (Section 6.2.3)
Detailed Analysis:
Criterion G requires follow-up tracking of participants until graduation from the relevant educational stage. This study is a short, single-session experiment in a university course context and does not describe any longitudinal follow-up or administrative tracking to degree completion.
Additionally, per the ERCT dependency rule, if criterion Y (year duration) is not met then criterion G is not met. Since Y is clearly not met here, G cannot be met regardless of any other considerations.
An internet search for follow-up publications by the same authors that track this cohort to graduation did not identify any such graduation- tracking report.
Final summary sentence: Criterion G is not met because there is no graduation follow-up and Y is not met.
-
P
Pre-Registered
- The paper provides an OSF link for materials/data/analyses availability but does not state that the study was pre-registered with an ID and date before data collection began.
- "All materials, data, and analyses are available at https://osf.io/vh5jg/." (Footnote in Methods)
Relevant Quotes:
1) "All materials, data, and analyses are available at https://osf.io/vh5jg/." (Footnote in Methods)
2) "The experiment was approved by the Ethics Committee of the Faculty of Philosophy and Educational Science of the Ruhr University Bochum (No. EPE-2023-016)." (Section 6.1)
Detailed Analysis:
Criterion P requires an explicit statement of preregistration, including where it was registered (registry/platform), an identifier, and evidence that registration occurred before data collection began.
The paper provides an OSF link for sharing materials, data, and analyses, which supports openness, but it does not state that the study was preregistered, and it does not provide a preregistration identifier or a preregistration date. Ethics approval is not equivalent to preregistration.
Final summary sentence: Criterion P is not met because the paper does not report a preregistered protocol with an identifier and a pre-data- collection registration date.
Request an Update or Contact Us
Are you the author of this study? Let us know if you have any questions or updates.