Enhancing computational thinking through coding education in primary school students: an experimental study on the impact of early programming exposure on problem-solving skills

Xin Wang, Feixue Wan, and JiaMin Dai

Published: Feb 18, 2026

ERCT Check Date: Apr 14, 2026

DOI: 10.3389/fpsyg.2026.1734482

Link

Download PDF

science
K12
China
project-based learning
EdTech platform

C

Randomization is described at the student level, not at the class (or school) level, and the intervention is not a one-to-one tutoring exception.

"Methods: The study employed a quasi-experimental pre-test/post-test design with 200 students randomly assigned to an experimental group (n = 100) and a control group (n = 100)."
E

Outcomes are measured with researcher-designed instruments and puzzle sets (including Raven-style items), not a clearly identified widely used standardized exam.

"Computational thinking skills were measured using a researcher-designed, validated test battery and rubric."
T

Outcomes are measured at Week 25 after a 24-week intervention, which is longer than a typical academic term.

"The intervention took 24 weeks. In Week 25, a post-test, same in form and difficulty as the pretest but containing different specific items to avoid memory effects was given."
D

The control condition is described and baseline equivalence is reported, providing sufficient documentation of the control group.

"Experimental Group (n = 100): In this group, the structured coding education intervention was applied and the Control Group (n = 100) maintained the regular school curriculum, including basic ICT literacy (e.g., typing, word processors) but not any deliberate instruction in coding or algorithms."
S

The sample comes from three schools, but assignment is described for individual students rather than randomizing schools to conditions.

"The population of the study was the students of three urban elementary schools that belong to the urban district and are based in primary school."
I

The paper does not document an independent third-party evaluator; author roles indicate the authors conducted the study’s key activities.

"XW: Supervision, Writing – review & editing, Software, Writing – original draft, Project administration, Visualization, Methodology."
Y

The study’s pre/post window is Week 0 to Week 25 (about six months), which is below the ERCT year-duration threshold (≥75% of an academic year).

"All the 200 participants were subjected to a pre-test in Week 0, before the intervention started... The intervention took 24 weeks. In Week 25, a post-test..."
B

The intervention provides substantial additional structured instruction (48 hours) while the control is explicitly described as passive business-as-usual, so time/engagement resources are not balanced.

"Lastly, a passive control of a business-as-usual was the control group."
R

No independent replications by other research teams were identified, and the paper itself states replication across settings is needed.

"There is need to replicate with various demographic populations."
A

Criterion E is not met (no standardized exam), so criterion A is also not met; additionally, the study does not assess achievement across all core subjects.
G

Criterion Y is not met (not year-long), so criterion G is automatically not met; no evidence of tracking to graduation (or equivalent completion) was found in the paper or in follow-up publications.

"The intervention took 24 weeks. In Week 25, a post-test, same in form and difficulty as the pretest but containing different specific items to avoid memory effects was given."
P

No pre-registration registry, identifier, or registration date (prior to data collection) is reported, and no external registry entry was found.

Abstract

Introduction: This research paper presents an empirical investigation of the effectiveness of early coding instruction in improving problem-solving skills and computational thinking (CT) among primary school students. The primary research question was to determine whether a structured six-month coding intervention yields greater cognitive gains in children aged 8–12 than instruction without coding. Methods: The study employed a quasi-experimental pre-test/post-test design with 200 students randomly assigned to an experimental group (n = 100) and a control group (n = 100). The experimental group participated in a 24-week curriculum using Scratch and Python, while the control group followed the standard school curriculum. To assess skill acquisition and practice intensity, paired-sample t-tests, independent-sample t-tests, and Pearson correlation analyses were conducted. Results: The findings indicate statistically significant effects for the experimental group, with problem-solving scores increasing to 23.5 from a mean of 17.8 (p < 0.001) and computational thinking scores increasing to 30.6 from a mean of 20.4 (p < 0.001), reflecting large effect sizes. In contrast, the control group exhibited no significant gains. Conclusion: From a theoretical perspective, the study contributes to the literature by demonstrating that the concrete operational stage of development (ages 7–11) represents a critical period for the development of abstract algorithmic thinking through programming. From a practical standpoint, the results provide evidence-based support for integrating coding into primary education not merely as a professional competency, but as a cognitive development opportunity essential for 21st-century digital literacy.

Full Article

ERCT Criteria Breakdown

Level 1 Criteria
- C
  Class-level RCT
  - Randomization is described at the student level, not at the class (or school) level, and the intervention is not a one-to-one tutoring exception.
  - "Methods: The study employed a quasi-experimental pre-test/post-test design with 200 students randomly assigned to an experimental group (n = 100) and a control group (n = 100)."
  - Relevant Quotes: 1) "Methods: The study employed a quasi-experimental pre-test/post-test design with 200 students randomly assigned to an experimental group (n = 100) and a control group (n = 100)." (p. 1) 2) "The randomly selected participants at two groups were allocated at random with the help of a computer-generated randomization sequence in order to maintain allocation concealment." (p. 3) Detailed Analysis: Criterion C requires that randomization be conducted at the class level (or stronger, e.g., school level) to reduce contamination between treatment and control conditions, unless the intervention is explicitly one-to-one tutoring/personal teaching. The paper describes participants as "200 students" who were "randomly assigned" to experimental and control groups, and it further describes allocation using a computer-generated randomization sequence. However, there is no statement that intact classes (or whole schools) were the unit of randomization. The described unit of randomization is the individual student. The intervention is a structured coding curriculum delivered over many weeks and does not describe one-to-one tutoring, so the tutoring exception does not apply. Final summary sentence: Criterion C is not met because the paper describes student-level random assignment rather than class- (or school-) level randomization, with no tutoring exception.
- E
  Exam-based Assessment
  - Outcomes are measured with researcher-designed instruments and puzzle sets (including Raven-style items), not a clearly identified widely used standardized exam.
  - "Computational thinking skills were measured using a researcher-designed, validated test battery and rubric."
  - Relevant Quotes: 1) "Computational thinking skills were measured using a researcher- designed, validated test battery and rubric." (p. 4) 2) "General problem-solving skill was assessed with a set of logic puzzles of standardized problems, Raven Progressive Matrices-style pattern problems, and problems involving arithmetic reasoning scaled to the age group (Raven 2000)." (p. 4) Detailed Analysis: Criterion E requires standardized exam-based assessments that are widely recognized and not designed/assembled by the researchers specifically for this study. The paper explicitly states the CT outcome is measured with a "researcher-designed" test battery and rubric. For problem-solving, the paper describes a set of logic puzzles and "Raven Progressive Matrices- style" items, but it does not indicate administration of an official, standardized achievement exam (or a named standardized test used per its standard protocol) as the primary outcome. Therefore, the outcome measures do not meet the ERCT requirement for standardized, widely recognized exam-based assessment. Final summary sentence: Criterion E is not met because the paper’s primary outcomes are based on researcher-designed instruments rather than a clearly identified standardized exam.
- T
  Term Duration
  - Outcomes are measured at Week 25 after a 24-week intervention, which is longer than a typical academic term.
  - "The intervention took 24 weeks. In Week 25, a post-test, same in form and difficulty as the pretest but containing different specific items to avoid memory effects was given."
  - Relevant Quotes: 1) "The intervention took 24 weeks. In Week 25, a post-test, same in form and difficulty as the pretest but containing different specific items to avoid memory effects was given." (p. 4) 2) "Figure 1 provides the overview of the allocation of 200 eligible students to a coding intervention or a standard-curriculum control condition in a random fashion. It explains when to be assessed (Week 0 pre-test and Week 25 post-test) and how long/vigorous the intervention should be (24 weeks; 48 h of the instruction)." (p. 4) Detailed Analysis: Criterion T requires outcome measurement at least one academic term after the intervention begins (typically ~3–4 months). The paper provides a clear timeline: assessment at Week 0 and Week 25, and an intervention lasting 24 weeks. A 24-week period is approximately six months, which exceeds a typical single academic term, and the post-test is explicitly conducted after this period. Final summary sentence: Criterion T is met because the primary outcome measurement occurs at Week 25 after a 24-week intervention, exceeding the minimum term-length tracking requirement.
- D
  Documented Control Group
  - The control condition is described and baseline equivalence is reported, providing sufficient documentation of the control group.
  - "Experimental Group (n = 100): In this group, the structured coding education intervention was applied and the Control Group (n = 100) maintained the regular school curriculum, including basic ICT literacy (e.g., typing, word processors) but not any deliberate instruction in coding or algorithms."
  - Relevant Quotes: 1) "Experimental Group (n = 100): In this group, the structured coding education intervention was applied and the Control Group (n = 100) maintained the regular school curriculum, including basic ICT literacy (e.g., typing, word processors) but not any deliberate instruction in coding or algorithms." (p. 3) 2) "Pre-test comparisons were taken to verify that there were no significant differences at the baseline of the groups." (p. 3) 3) "Table 2 describes the baseline results and pre-test scores of the two groups." (p. 5) Detailed Analysis: Criterion D requires that the control group be clearly documented, including what instruction they received, and that baseline characteristics (e.g., demographics and/or baseline performance) are provided so the reader can evaluate comparability. The paper describes the control group as maintaining the regular school curriculum (including basic ICT literacy) and explicitly notes the absence of deliberate coding/algorithm instruction. It also indicates baseline equivalence checks and references the baseline table describing group results. Final summary sentence: Criterion D is met because the paper describes the control condition and reports baseline equivalence information for the groups.
Level 2 Criteria
- S
  School-level RCT
  - The sample comes from three schools, but assignment is described for individual students rather than randomizing schools to conditions.
  - "The population of the study was the students of three urban elementary schools that belong to the urban district and are based in primary school."
  - Relevant Quotes: 1) "The population of the study was the students of three urban elementary schools that belong to the urban district and are based in primary school." (p. 3) 2) "Methods: The study employed a quasi-experimental pre-test/post-test design with 200 students randomly assigned to an experimental group (n = 100) and a control group (n = 100)." (p. 1) Detailed Analysis: Criterion S requires that randomization occur at the school (site) level, meaning whole schools are randomly assigned to intervention vs control. While the study draws students from three elementary schools, the paper’s description of assignment is that "200 students" were randomly assigned to experimental and control groups. It does not state that entire schools were randomized to conditions. Final summary sentence: Criterion S is not met because randomization is described at the student level rather than at the school level.
- I
  Independent Conduct
  - The paper does not document an independent third-party evaluator; author roles indicate the authors conducted the study’s key activities.
  - "XW: Supervision, Writing – review & editing, Software, Writing – original draft, Project administration, Visualization, Methodology."
  - Relevant Quotes: 1) "XW: Supervision, Writing – review & editing, Software, Writing – original draft, Project administration, Visualization, Methodology." (p. 15) 2) "FW: Data curation, Writing – review & editing, Writing – original draft, Methodology, Visualization, Project administration, Validation." (p. 15) 3) "JD: Methodology, Investigation, Conceptualization, Writing – review & editing, Writing – original draft." (p. 15) Detailed Analysis: Criterion I requires quoted evidence that the evaluation was conducted independently from the intervention designers/providers (e.g., an external evaluation team responsible for data collection and/or analysis). The paper provides author contribution statements showing the authors were responsible for methodology, investigation, data curation, validation, supervision, and project administration. The paper does not include a statement that an independent organization conducted implementation, outcome testing, randomization oversight, or data analysis. Final summary sentence: Criterion I is not met because the paper does not document independent third-party conduct of implementation and/or evaluation separate from the authors.
- Y
  Year Duration
  - The study’s pre/post window is Week 0 to Week 25 (about six months), which is below the ERCT year-duration threshold (≥75% of an academic year).
  - "All the 200 participants were subjected to a pre-test in Week 0, before the intervention started... The intervention took 24 weeks. In Week 25, a post-test..."
  - Relevant Quotes: 1) "All the 200 participants were subjected to a pre-test in Week 0, before the intervention started." (p. 4) 2) "The intervention took 24 weeks. In Week 25, a post-test, same in form and difficulty as the pretest but containing different specific items to avoid memory effects was given." (p. 4) Detailed Analysis: Criterion Y requires outcome measurement at least 75% of an academic year after the intervention begins. The paper’s timeline is Week 0 pre-test and Week 25 post-test, with an intervention lasting 24 weeks. This is substantial and satisfies term-long duration, but it is still only about six months and is not documented as reaching 75% of a full academic year (typically ~9–10 months). The paper does not define an alternative, shorter academic year that would make 24 weeks meet the 75% threshold. Final summary sentence: Criterion Y is not met because the study follows students for about 24–25 weeks, which is below the ERCT year-duration requirement.
- B
  Balanced Control Group
  - The intervention provides substantial additional structured instruction (48 hours) while the control is explicitly described as passive business-as-usual, so time/engagement resources are not balanced.
  - "Lastly, a passive control of a business-as-usual was the control group."
  - Relevant Quotes: 1) "The intervention was a structured 24-week curriculum that was conducted as 2 h of weekly sessions amounting to a total of 48 h of instruction." (p. 4) 2) "The control group continued in their normal classes at such periods." (p. 4) 3) "Lastly, a passive control of a business-as-usual was the control group." (p. 14) Detailed Analysis: Criterion B evaluates whether the nature, quantity, and quality of resources (time, instructional attention, enrichment, materials) are balanced across intervention and control, unless the study explicitly frames additional resources as the treatment variable being tested. The intervention clearly includes a structured additional instructional program: 2 hours per week for 24 weeks (48 hours total). The control group is described as continuing normal classes, and the authors explicitly characterize the control as a "passive" business-as-usual comparison. The paper frames the treatment as "coding education" (not as a formal test of "extra time" as the causal variable), and it does not describe an active control that matches the intervention’s added structured time and engagement (e.g., alternative enrichment of comparable dosage). The paper itself notes this limitation by stating that the passive control does not separate the specific benefits of coding from the benefits of serious enrichment generally. Final summary sentence: Criterion B is not met because the intervention adds substantial structured instructional time/engagement while the control is a passive business-as-usual condition without a comparable substitute.
Level 3 Criteria
- R
  Reproduced
  - No independent replications by other research teams were identified, and the paper itself states replication across settings is needed.
  - "There is need to replicate with various demographic populations."
  - Relevant Quotes: 1) "There is need to replicate with various demographic populations." (p. 13) Detailed Analysis: Criterion R requires that the study be independently reproduced in a peer-reviewed publication by a different research team in a different context. The paper does not report being a replication study, and it explicitly states that replication is needed. In addition, an internet search (using the DOI and full title) did not identify any peer-reviewed independent replication studies that explicitly reproduce this specific trial as of the ERCT check date. Final summary sentence: Criterion R is not met because no independent published replication of this specific study was found, and the authors themselves state replication is still needed.
- A
  All-subject Exams
  - Criterion E is not met (no standardized exam), so criterion A is also not met; additionally, the study does not assess achievement across all core subjects.
  - Relevant Quotes: 1) "Computational thinking skills were measured using a researcher- designed, validated test battery and rubric." (p. 4) 2) "General problem-solving skill was assessed with a set of logic puzzles of standardized problems, Raven Progressive Matrices-style pattern problems, and problems involving arithmetic reasoning scaled to the age group (Raven 2000)." (p. 4) Detailed Analysis: Criterion A requires standardized, exam-based assessment across all main subjects, and ERCT specifies that if criterion E is not met then criterion A is not met. The paper’s outcomes are computational thinking and general problem-solving assessed via researcher-designed instruments and puzzle sets rather than standardized exams across core subjects (e.g., mathematics, reading, science). Since E is not met, A automatically fails, and the measured outcomes also do not cover all core school subjects. Final summary sentence: Criterion A is not met because the study does not use standardized exams (so E fails) and it does not measure all main subject areas.
- G
  Graduation Tracking
  - Criterion Y is not met (not year-long), so criterion G is automatically not met; no evidence of tracking to graduation (or equivalent completion) was found in the paper or in follow-up publications.
  - "The intervention took 24 weeks. In Week 25, a post-test, same in form and difficulty as the pretest but containing different specific items to avoid memory effects was given."
  - Relevant Quotes: 1) "The intervention took 24 weeks. In Week 25, a post-test, same in form and difficulty as the pretest but containing different specific items to avoid memory effects was given." (p. 4) 2) "Second, the 6-month period is long, although not as long as other studies, and it may not reflect the long-term retention of these skills. Longitudinal studies done in future should evaluate whether such gains are sustained in 1–2 years without further teaching." (p. 13) Detailed Analysis: Criterion G requires follow-up tracking through graduation (or the end of the relevant educational stage). ERCT also specifies a dependency: if criterion Y (year duration) is not met, then criterion G is not met. The paper’s measurement window is Week 0 to Week 25, and it explicitly discusses that longer-term retention is unknown and should be evaluated in future longitudinal studies. The paper contains no mention of tracking students to graduation, and an internet search for follow-up publications by the same author team tracking this cohort to graduation did not identify any such follow-up paper as of the ERCT check date. Final summary sentence: Criterion G is not met because outcomes are only measured through Week 25 (and Y is not met), with no evidence of tracking through graduation in this paper or identified follow-up publications.
- P
  Pre-Registered
  - No pre-registration registry, identifier, or registration date (prior to data collection) is reported, and no external registry entry was found.
  - Relevant Quotes: 1) "Participation was voluntary, and informed consent was taken by parents or legal guardians and student assent was taken in compliance with the ethical standards that were endorsed by institutional review board (IRB)." (p. 3) 2) "Data availability statement The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author/s." (p. 15) Detailed Analysis: Criterion P requires that the study protocol be pre-registered, with a registry/platform reference (e.g., OSF, ISRCTN, ClinicalTrials.gov), an identifier, and evidence the registration occurred before data collection began. The paper includes ethics/IRB and data availability statements but does not provide any pre-registration registry name, ID, URL, or registration date. A keyword check within the PDF (e.g., "preregister", "registered", "OSF") yields no matches, and an internet search using the DOI and title did not identify a public pre-registration entry attributable to this study. Final summary sentence: Criterion P is not met because no registry identifier or date is provided in the paper, and no external pre- registration record was found.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Request Valuation Update

All Other Requests

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

Submit a Study for Evaluation

Share your research with us for review
Suggest Improvements

Provide feedback to help us make things better.
Update Your Study

If you're the author, let us know about necessary updates or corrections.

Enhancing computational thinking through coding education in primary school students: an experimental study on the impact of early programming exposure on problem-solving skills

Randomization is described at the student level, not at the class (or school) level, and the intervention is not a one-to-one tutoring exception.

Outcomes are measured with researcher-designed instruments and puzzle sets (including Raven-style items), not a clearly identified widely used standardized exam.

Outcomes are measured at Week 25 after a 24-week intervention, which is longer than a typical academic term.

The control condition is described and baseline equivalence is reported, providing sufficient documentation of the control group.

The sample comes from three schools, but assignment is described for individual students rather than randomizing schools to conditions.

The paper does not document an independent third-party evaluator; author roles indicate the authors conducted the study’s key activities.

The study’s pre/post window is Week 0 to Week 25 (about six months), which is below the ERCT year-duration threshold (≥75% of an academic year).

The intervention provides substantial additional structured instruction (48 hours) while the control is explicitly described as passive business-as-usual, so time/engagement resources are not balanced.

No independent replications by other research teams were identified, and the paper itself states replication across settings is needed.

Criterion E is not met (no standardized exam), so criterion A is also not met; additionally, the study does not assess achievement across all core subjects.

Criterion Y is not met (not year-long), so criterion G is automatically not met; no evidence of tracking to graduation (or equivalent completion) was found in the paper or in follow-up publications.

No pre-registration registry, identifier, or registration date (prior to data collection) is reported, and no external registry entry was found.

Abstract

ERCT Criteria Breakdown

Level 1 Criteria

Class-level RCT

Exam-based Assessment

Term Duration

Documented Control Group

Level 2 Criteria

School-level RCT

Independent Conduct

Year Duration

Balanced Control Group

Level 3 Criteria

Reproduced

All-subject Exams

Graduation Tracking

Pre-Registered

Request an Update or Contact Us

Have Questions or Suggestions?

Submit a Study for Evaluation

Suggest Improvements

Update Your Study

Have Questions
or Suggestions?