Abstract
Introduction: This research paper presents an empirical investigation of the effectiveness of early coding instruction in improving problem-solving skills and computational thinking (CT) among primary school students. The primary research question was to determine whether a structured six-month coding intervention yields greater cognitive gains in children aged 8–12 than instruction without coding. Methods: The study employed a quasi-experimental pre-test/post-test design with 200 students randomly assigned to an experimental group (n = 100) and a control group (n = 100). The experimental group participated in a 24-week curriculum using Scratch and Python, while the control group followed the standard school curriculum. To assess skill acquisition and practice intensity, paired-sample t-tests, independent-sample t-tests, and Pearson correlation analyses were conducted. Results: The findings indicate statistically significant effects for the experimental group, with problem-solving scores increasing to 23.5 from a mean of 17.8 (p < 0.001) and computational thinking scores increasing to 30.6 from a mean of 20.4 (p < 0.001), reflecting large effect sizes. In contrast, the control group exhibited no significant gains. Conclusion: From a theoretical perspective, the study contributes to the literature by demonstrating that the concrete operational stage of development (ages 7–11) represents a critical period for the development of abstract algorithmic thinking through programming. From a practical standpoint, the results provide evidence-based support for integrating coding into primary education not merely as a professional competency, but as a cognitive development opportunity essential for 21st-century digital literacy.
Full
Article
ERCT Criteria Breakdown
-
Level 1 Criteria
-
C
Class-level RCT
- Randomization is described at the student/participant level rather than at the class (or school) level, and no tutoring exception is stated.
- "The randomly selected participants at two groups were allocated at random with the help of a computer-generated randomization sequence in order to maintain allocation concealment." (p. 3)
Relevant Quotes:
1) "with 200 students randomly assigned to an experimental group (n = 100) and a control group (n = 100)." (p. 1)
2) "The randomly selected participants at two groups were allocated at random with the help of a computer-generated randomization sequence in order to maintain allocation concealment." (p. 3)
Detailed Analysis:
Criterion C requires an RCT where the unit of randomization is the intact class (or a stronger unit such as the school), to reduce contamination. A student-level RCT can be acceptable only when the intervention is explicitly one-to-one tutoring/personal teaching.
The paper repeatedly describes random assignment of "students" or "participants" to conditions, without stating that whole classes were randomized (e.g., "classes were randomly assigned") and without stating any tutoring/personal teaching exception. This implies the intervention and control students could plausibly be within the same class or taught by the same teachers, creating a meaningful contamination risk under the ERCT definition.
Final sentence: Criterion C is not met because randomization is described at the student/participant level rather than at the class (or school) level, with no tutoring exception documented.
-
E
Exam-based Assessment
- Outcomes are assessed using a researcher-designed CT test and "Raven Progressive Matrices-style" problem-solving tasks rather than a clearly identified standardized exam-based assessment.
- "Computational thinking skills were measured using a researcher-designed, validated test battery and rubric." (p. 4)
Relevant Quotes:
1) "Computational thinking skills were measured using a researcher-designed, validated test battery and rubric." (p. 4)
2) "General problem-solving skill was assessed with a set of logic puzzles of standardized problems, Raven Progressive Matrices-style pattern problems, and problems involving arithmetic reasoning scaled to the age group (Raven 2000)." (p. 4)
Detailed Analysis:
Criterion E requires standardized exam-based assessments that are not bespoke instruments created for the study.
For computational thinking, the paper explicitly states the main CT measure is "researcher-designed," which does not satisfy the requirement for an external standardized exam.
For problem-solving, the paper describes tasks as "Raven Progressive Matrices-style," which indicates similarity to Raven rather than administration of a clearly identified standardized Raven test under its standard protocol. The paper does not provide a named standardized exam (with edition/administration details) that would qualify as a standardized exam-based outcome.
Final sentence: Criterion E is not met because the paper’s key outcomes are measured using researcher-designed instruments and "style" tasks rather than clearly identified standardized exams.
-
T
Term Duration
- Outcomes are measured at least a term after the intervention begins, with a post-test in Week 25 after a 24-week curriculum.
- "The intervention took 24 weeks. In Week 25, a post-test, same in form and difficulty as the pretest but containing different specific items to avoid memory effects was given." (p. 4)
Relevant Quotes:
1) "The intervention was a structured 24-week curriculum that was conducted as 2 h of weekly sessions amounting to a total of 48 h of instruction." (p. 4)
2) "All the 200 participants were subjected to a pre-test in Week 0, before the intervention started." (p. 4)
3) "The intervention took 24 weeks. In Week 25, a post-test, same in form and difficulty as the pretest but containing different specific items to avoid memory effects was given." (p. 4)
Detailed Analysis:
Criterion T requires that outcomes be measured at least one full academic term after the intervention begins (typically about 3–4 months). The paper describes a 24-week intervention, with a pre-test at Week 0 and a post-test at Week 25.
A 24-week interval (about 6 months) exceeds the minimum term-long requirement from intervention start to outcome measurement.
Final sentence: Criterion T is met because outcomes are measured in Week 25 following a 24-week intervention, exceeding one academic term.
-
D
Documented Control Group
- The control condition is described as the regular school curriculum without deliberate coding instruction, and baseline equivalence is documented.
- "Experimental Group (n = 100): In this group, the structured coding education intervention was applied and the Control Group (n = 100) maintained the regular school curriculum, including basic ICT literacy (e.g., typing, word processors) but not any deliberate instruction in coding or algorithms." (p. 3)
Relevant Quotes:
1) "Experimental Group (n = 100): In this group, the structured coding education intervention was applied and the Control Group (n = 100) maintained the regular school curriculum, including basic ICT literacy (e.g., typing, word processors) but not any deliberate instruction in coding or algorithms." (p. 3)
2) "Pre-test comparisons were taken to verify that there were no significant differences at the baseline of the groups." (p. 3)
3) "Table 2 describes the baseline results and pre-test scores of the two groups." (p. 5)
Detailed Analysis:
Criterion D requires a well-documented control group, including a clear description of what the control group received and baseline characteristics/performance for comparability.
The paper clearly specifies the control condition (regular school curriculum with basic ICT literacy, but no deliberate coding or algorithms instruction). It also documents baseline equivalence by stating that pre-test comparisons showed no significant baseline differences and by referring to a baseline table (Table 2).
Final sentence: Criterion D is met because the control condition is clearly described and baseline comparability is documented.
-
Level 2 Criteria
-
S
School-level RCT
- Although multiple schools are mentioned, the paper does not state that schools were randomized to conditions.
- "The population of the study was the students of three urban elementary schools that belong to the urban district and are based in primary school." (p. 3)
Relevant Quotes:
1) "The population of the study was the students of three urban elementary schools that belong to the urban district and are based in primary school." (p. 3)
2) "The randomly selected participants at two groups were allocated at random with the help of a computer-generated randomization sequence in order to maintain allocation concealment." (p. 3)
Detailed Analysis:
Criterion S requires school-level randomization (i.e., entire schools assigned to intervention vs. control).
While the study draws participants from three schools, the randomization described is at the level of "participants," not schools. The paper does not state that schools were allocated as units, does not provide a school-by-condition allocation, and does not describe a school-level randomization procedure.
Final sentence: Criterion S is not met because the paper does not document randomization at the school level.
-
I
Independent Conduct
- The paper does not document independent third-party conduct of the evaluation, and author roles suggest the study was run and analyzed by the same team.
- "XW: Supervision, Writing – review & editing, Software, Writing – original draft, Project administration, Visualization, Methodology." (p. 15)
Relevant Quotes:
1) "Author contributions" (p. 15)
2) "XW: Supervision, Writing – review & editing, Software, Writing – original draft, Project administration, Visualization, Methodology." (p. 15)
3) "FW: Data curation, Writing – review & editing, Writing – original draft, Methodology, Visualization, Project administration, Validation." (p. 15)
4) "JD: Methodology, Investigation, Conceptualization, Writing – review & editing, Writing – original draft." (p. 15)
Detailed Analysis:
Criterion I requires evidence that study conduct and/or evaluation (implementation, measurement, and analysis) was independent from the intervention designers/providers, to reduce bias.
The paper does not include a statement indicating the evaluation was conducted by an external organization or independent assessors. The author contribution statement shows the author team held core roles spanning investigation, methodology, validation, and project administration, which is consistent with the same team designing/implementing and evaluating the intervention.
Final sentence: Criterion I is not met because the paper provides no explicit documentation of independent third-party evaluation.
-
Y
Year Duration
- The study timeline includes a delayed follow-up assessment at Week 36, which is consistent with at least 75% of a typical academic year after the intervention begins.
- "Follow-up (Week 36): CT + Problem-Solving (delayed assessment)" (Figure 1, p. 4)
Relevant Quotes:
1) "Coding Intervention (Weeks 1–24): 2 hours/week; total 48 hours" (Figure 1, p. 4)
2) "Post-test (Week 25): CT + Problem-Solving (computer-based)" (Figure 1, p. 4)
3) "Follow-up (Week 36): CT + Problem-Solving (delayed assessment)" (Figure 1, p. 4)
Detailed Analysis:
Criterion Y requires outcome measurement at least 75% of one academic year after the intervention begins.
The paper’s timeline figure explicitly includes a delayed follow-up assessment at Week 36. From the start of the intervention (Weeks 1–24) to Week 36, the study’s outcome measurement window extends roughly 36 weeks from baseline and about 35 weeks from the start of the intervention, which is consistent with at least 75% of a typical academic year duration.
Final sentence: Criterion Y is met because the study timeline includes outcome measurement at a delayed follow-up in Week 36.
-
B
Balanced Control Group
- The intervention adds substantial instructional time (48 hours of coding instruction) without a matched active control, and the paper explicitly notes that a passive business-as-usual control cannot separate coding from general enrichment effects.
- "Lastly, a passive control of a business-as-usual was the control group. As much as this makes the absolute benefit of coding, it does not separate the specific benefits of coding and the overall benefits of any serious extra-curricular enrichment (e.g., chess or robotics)." (p. 14)
Relevant Quotes:
1) "The intervention was a structured 24-week curriculum that was conducted as 2 h of weekly sessions amounting to a total of 48 h of instruction." (p. 4)
2) "The control group continued in their normal classes at such periods." (p. 4)
3) "Lastly, a passive control of a business-as-usual was the control group. As much as this makes the absolute benefit of coding, it does not separate the specific benefits of coding and the overall benefits of any serious extra-curricular enrichment (e.g., chess or robotics). Active control groups should be used in future research to isolate the distinct effects of programming better." (p. 14)
Detailed Analysis:
Criterion B compares the nature, quantity, and quality of resources provided to intervention and control conditions. If the intervention adds substantial time/resources, the control should receive comparable educational inputs unless the paper explicitly frames the additional resources themselves as the treatment variable.
Here, the intervention clearly adds a large time/resource input: 2 hours per week over 24 weeks (48 total hours). The control group is described as continuing normal classes during the same periods, i.e., a passive business-as-usual control without a matched alternative enrichment activity of comparable time and structure.
The paper itself explicitly recognizes this as a limitation, stating the passive control does not separate the specific benefit of coding from the general benefit of extra enrichment. This is strong evidence of unbalanced resources in the sense required by Criterion B.
Final sentence: Criterion B is not met because the intervention adds substantial instructional time without a matched active control, and the paper explicitly acknowledges this confound.
-
Level 3 Criteria
-
R
Reproduced
- The paper does not document an independent peer-reviewed replication of this specific study, and no such replication was identified via targeted searches by DOI/title at the time of this ERCT check.
- "There is need to replicate with various demographic populations." (p. 13)
Relevant Quotes:
1) "There is need to replicate with various demographic populations." (p. 13)
2) "There is need to conduct multi-site randomized controlled trials with active control groups (e.g., comparison of coding with chess or advanced mathematics) to isolate the specific cognitive mechanisms unique to programming." (p. 14)
Detailed Analysis:
Criterion R requires independent replication by other researchers (a different team, in a different context) in a peer-reviewed outlet.
The paper does not present itself as a replication of a prior study nor does it cite a peer-reviewed independent replication of its own design. Instead, it explicitly frames replication and multi-site RCTs as future needs.
Additionally, targeted searches by DOI and full title at the time of this ERCT check did not identify an independent peer-reviewed replication study focused on reproducing this specific trial.
Final sentence: Criterion R is not met because no independent replication is documented in the paper or identifiable at the time of review.
-
A
All-subject Exams
- Criterion E is not met, so the requirement for all-subject standardized exams is not met.
Relevant Quotes:
1) "Computational thinking skills were measured using a researcher-designed, validated test battery and rubric." (p. 4)
Detailed Analysis:
Criterion A requires standardized exam-based assessment across all main subjects and depends on Criterion E being met.
Because the paper’s outcomes rely on a researcher-designed CT instrument (and problem-solving tasks described as "Raven Progressive Matrices-style" rather than a clearly named standardized exam), Criterion E is not satisfied. Therefore, by the ERCT rule, Criterion A cannot be satisfied.
Final sentence: Criterion A is not met because Criterion E is not met.
-
G
Graduation Tracking
- The paper does not report tracking participants to graduation, and no follow-up paper by the same authors reporting graduation outcomes was identified at the time of this ERCT check.
- "Second, the 6-month period is long, although not as long as other studies, and it may not reflect the long-term retention of these skills." (p. 13)
Relevant Quotes:
1) "Second, the 6-month period is long, although not as long as other studies, and it may not reflect the long-term retention of these skills." (p. 13)
2) "Longitudinal studies done in future should evaluate whether such gains are sustained in 1–2 years without further teaching." (p. 13)
Detailed Analysis:
Criterion G requires that the study follow and track participants until graduation (from the relevant educational stage) and report graduation-related outcomes.
This paper reports short-to-medium-term assessments (including a delayed assessment in its timeline), but it does not describe collecting or analyzing graduation outcomes. The authors explicitly note the limitation that the study may not reflect long-term retention and call for future longitudinal work, which is consistent with the absence of graduation tracking in the present report.
An internet search was performed for follow-up publications by the same author team that might track the same cohort to graduation, but no such graduation-tracking follow-up paper was identified at the time of this ERCT check.
Final sentence: Criterion G is not met because graduation tracking is not reported in this paper and no graduation-tracking follow-up was identified at the time of review.
-
P
Pre-Registered
- The paper provides no registry name/ID or dated statement showing pre-registration prior to data collection.
Relevant Quotes:
(No relevant quote about pre-registration, a registry identifier, or pre-registration timing was found in the paper.)
Detailed Analysis:
Criterion P requires quoted evidence that the study protocol was publicly pre-registered before data collection began (including a registry/platform and an ID, with timing that precedes data collection).
The paper includes standard end-matter sections (data availability, funding, conflict of interest, correction note, etc.), but it does not include a registration number, a registry link, or a statement that the protocol/analysis plan was pre-registered before data collection.
Final sentence: Criterion P is not met because pre-registration is not documented with a registry identifier and date preceding data collection.
Request an Update or Contact Us
Are you the author of this study? Let us know if you have any questions or updates.