Enhancing computational thinking through coding education in primary school students: an experimental study on the impact of early programming exposure on problem-solving skills

Xin Wang, Feixue Wan, and JiaMin Dai

Published:
ERCT Check Date:
DOI: 10.3389/fpsyg.2026.1734482
  • science
  • K12
  • project-based learning
0
  • C

    Students (not intact classes or schools) were randomized to groups, so the class-level RCT requirement is not satisfied.

    "Methods: The study employed a quasi-experimental pre-test/post-test design with 200 students randomly assigned to an experimental group (n = 100) and a control group (n = 100)." (p. 1)

  • E

    Outcomes were measured using researcher-designed instruments rather than a widely recognized standardized exam.

    "Computational thinking skills were measured using a researcher-designed, validated test battery and rubric." (p. 4)

  • T

    Outcomes were measured at Week 25 following a 24-week intervention, which exceeds one academic term.

    "The intervention took 24 weeks. In Week 25, a post-test, same in form and difficulty as the pretest but containing different specific items to avoid memory effects was given." (p. 4)

  • D

    The control condition and baseline comparability are described, including a baseline equivalence table with demographics and pre-test scores.

    "TABLE 2 Baseline equivalence: demographics and pre-test scores." (p. 6)

  • S

    The unit of randomization is students rather than schools, so this is not a school-level RCT.

    "Methods: The study employed a quasi-experimental pre-test/post-test design with 200 students randomly assigned to an experimental group (n = 100) and a control group (n = 100)." (p. 1)

  • I

    The paper does not provide evidence that an independent third party conducted the evaluation or analysis.

    "XW: Supervision, Writing – review & editing, Software, Writing – original draft, Project administration, Visualization, Methodology." (p. 15)

  • Y

    The study measures outcomes at Week 25 after a 24-week intervention, which is below the ERCT threshold for year-duration tracking.

    "The intervention took 24 weeks. In Week 25, a post-test, same in form and difficulty as the pretest but containing different specific items to avoid memory effects was given." (p. 4)

  • B

    The paper does not document extra instructional time for the experimental group beyond the school schedule, implying the intervention substitutes for regular class time rather than adding resources.

    "The control group continued in their normal classes at such periods." (p. 4)

  • R

    No independent, peer-reviewed replication of this specific study was found (in the paper or via an internet search as of 2026-03-14).

    "There is need to replicate with various demographic populations." (p. 13)

  • A

    Criterion E is not met (no standardized exams), and the study does not use standardized exams across core school subjects.

    "Computational thinking skills were measured using a researcher-designed, validated test battery and rubric." (p. 4)

  • G

    The study does not track students to graduation, and Criterion Y is not met, which prevents meeting the ERCT dependency requirement for G.

    "Longitudinal studies done in future should evaluate whether such gains are sustained in 1–2 years without further teaching." (p. 13)

  • P

    No pre-registration registry, ID, or registration date is reported, and no external registry entry was found as of 2026-03-14.

Abstract

Introduction: This research paper presents an empirical investigation of the effectiveness of early coding instruction in improving problem-solving skills and computational thinking (CT) among primary school students. The primary research question was to determine whether a structured six-month coding intervention yields greater cognitive gains in children aged 8–12 than instruction without coding. Methods: The study employed a quasi-experimental pre-test/post-test design with 200 students randomly assigned to an experimental group (n = 100) and a control group (n = 100). The experimental group participated in a 24-week curriculum using Scratch and Python, while the control group followed the standard school curriculum. To assess skill acquisition and practice intensity, paired-sample t-tests, independent-sample t-tests, and Pearson correlation analyses were conducted. Results: The findings indicate statistically significant effects for the experimental group, with problem-solving scores increasing to 23.5 from a mean of 17.8 (p < 0.001) and computational thinking scores increasing to 30.6 from a mean of 20.4 (p < 0.001), reflecting large effect sizes. In contrast, the control group exhibited no significant gains. Conclusion: From a theoretical perspective, the study contributes to the literature by demonstrating that the concrete operational stage of development (ages 7–11) represents a critical period for the development of abstract algorithmic thinking through programming. From a practical standpoint, the results provide evidence-based support for integrating coding into primary education not merely as a professional competency, but as a cognitive development opportunity essential for 21st-century digital literacy.

Full Article

ERCT Criteria Breakdown

  • Level 1 Criteria

    • C

      Class-level RCT

      • Students (not intact classes or schools) were randomized to groups, so the class-level RCT requirement is not satisfied.
      • "Methods: The study employed a quasi-experimental pre-test/post-test design with 200 students randomly assigned to an experimental group (n = 100) and a control group (n = 100)." (p. 1)
      • Relevant Quotes: 1) "Methods: The study employed a quasi-experimental pre-test/post-test design with 200 students randomly assigned to an experimental group (n = 100) and a control group (n = 100)." (p. 1) 2) "The population of the study was the students of three urban elementary schools that belong to the urban district and are based in primary school." (p. 3) 3) "The randomly selected participants at two groups were allocated at random with the help of a computer-generated randomization sequence in order to maintain allocation concealment." (p. 3) Detailed Analysis: Criterion C requires randomization at the class level (or stronger, e.g., school level) to reduce contamination, unless the intervention is clearly one-to-one tutoring/personal teaching. The paper explicitly states that "200 students" were "randomly assigned" to experimental and control groups, and it describes allocating "participants" using a "computer-generated randomization sequence." It also indicates the sample comes from "three urban elementary schools," which implies students were drawn from multiple schools but does not indicate intact classes (or entire schools) were randomized. Because the unit of randomization is students rather than intact classes (and there is no tutoring/personal teaching exception), contamination risks within schools/classes are not ruled out by design. Criterion C is not met because randomization is described at the student level rather than at the class (or school) level.
    • E

      Exam-based Assessment

      • Outcomes were measured using researcher-designed instruments rather than a widely recognized standardized exam.
      • "Computational thinking skills were measured using a researcher-designed, validated test battery and rubric." (p. 4)
      • Relevant Quotes: 1) "Computational thinking skills were measured using a researcher- designed, validated test battery and rubric." (p. 4) 2) "General problem-solving skill was assessed with a set of logic puzzles of standardized problems, Raven Progressive Matrices-style pattern problems, and problems involving arithmetic reasoning scaled to the age group (Raven 2000)." (p. 4) 3) "Fourth, the instruments of assessment, though they have been validated in terms of internal consistency, were developed by the researcher." (p. 13) Detailed Analysis: Criterion E requires outcomes to be measured with standardized, widely recognized exam-based assessments (e.g., national/state standardized achievement tests), not researcher-created tests closely aligned with the intervention. The paper states the CT outcome is measured using a "researcher-designed" test battery. It also explicitly acknowledges in limitations that the assessment instruments "were developed by the researcher." The problem-solving measure is described as a set of tasks including "Raven Progressive Matrices-style" items, but the text does not document an externally administered, standardized exam; it reads as a researcher- assembled battery. Criterion E is not met because at least the primary CT outcome measure is explicitly researcher-designed rather than a widely recognized standardized exam.
    • T

      Term Duration

      • Outcomes were measured at Week 25 following a 24-week intervention, which exceeds one academic term.
      • "The intervention took 24 weeks. In Week 25, a post-test, same in form and difficulty as the pretest but containing different specific items to avoid memory effects was given." (p. 4)
      • Relevant Quotes: 1) "Figure 1 provides the overview of the allocation of 200 eligible students to a coding intervention or a standard-curriculum control condition in a random fashion. It explains when to be assessed (Week 0 pre-test and Week 25 post-test) and how long/vigorous the intervention should be (24 weeks; 48 h of the instruction)." (p. 4) 2) "The intervention took 24 weeks. In Week 25, a post-test, same in form and difficulty as the pretest but containing different specific items to avoid memory effects was given." (p. 4) Detailed Analysis: Criterion T requires outcome measurement at least one academic term after the intervention begins (roughly 3–4 months). The paper specifies a timeline with pre-testing at Week 0 and post-testing at Week 25, after an intervention lasting 24 weeks. Twenty-four weeks is approximately six months, which is longer than a typical term and therefore satisfies the minimum duration requirement for term-length follow-up from intervention start to outcome measurement. Criterion T is met because outcomes are measured at Week 25 after a 24-week intervention, exceeding one academic term.
    • D

      Documented Control Group

      • The control condition and baseline comparability are described, including a baseline equivalence table with demographics and pre-test scores.
      • "TABLE 2 Baseline equivalence: demographics and pre-test scores." (p. 6)
      • Relevant Quotes: 1) "Experimental Group (n = 100): In this group, the structured coding education intervention was applied and the Control Group (n = 100) maintained the regular school curriculum, including basic ICT literacy (e.g., typing, word processors) but not any deliberate instruction in coding or algorithms." (p. 3) 2) "Table 2 describes the baseline results and pre-test scores of the two groups." (p. 5) 3) "TABLE 2 Baseline equivalence: demographics and pre-test scores." (p. 6) Detailed Analysis: Criterion D requires that the control group is well documented, including what it received and evidence of baseline characteristics/comparability. The paper defines the control group as maintaining the "regular school curriculum" (including basic ICT literacy but no deliberate coding instruction). It also states that Table 2 describes baseline results and provides a dedicated baseline equivalence table listing demographics and pre-test scores. Criterion D is met because the control condition is described and baseline demographics/pre-test outcomes are documented in Table 2.
  • Level 2 Criteria

    • S

      School-level RCT

      • The unit of randomization is students rather than schools, so this is not a school-level RCT.
      • "Methods: The study employed a quasi-experimental pre-test/post-test design with 200 students randomly assigned to an experimental group (n = 100) and a control group (n = 100)." (p. 1)
      • Relevant Quotes: 1) "The population of the study was the students of three urban elementary schools that belong to the urban district and are based in primary school." (p. 3) 2) "Methods: The study employed a quasi-experimental pre-test/post-test design with 200 students randomly assigned to an experimental group (n = 100) and a control group (n = 100)." (p. 1) Detailed Analysis: Criterion S requires school-level randomization (schools assigned to intervention vs. control). Although participants came from "three urban elementary schools," the random assignment described is explicitly for "200 students," not schools. The paper does not state that schools were randomized, nor does it provide a school-level allocation description (e.g., number of schools assigned to each arm). Criterion S is not met because the paper describes student-level rather than school-level randomization.
    • I

      Independent Conduct

      • The paper does not provide evidence that an independent third party conducted the evaluation or analysis.
      • "XW: Supervision, Writing – review & editing, Software, Writing – original draft, Project administration, Visualization, Methodology." (p. 15)
      • Relevant Quotes: 1) "The program was created with references to the principles of constructionist pedagogy and Project-Based Learning (PBL) (Carlana and Fort, 2022)." (p. 4) 2) "Data analysis was done with the SPSS software (Version 28)." (p. 4) 3) "Author contributions" (p. 15) 4) "XW: Supervision, Writing – review & editing, Software, Writing – original draft, Project administration, Visualization, Methodology." (p. 15) Detailed Analysis: Criterion I requires evidence that the study was conducted independently from the intervention designers/providers (e.g., independent evaluators collecting data and/or performing analysis). The paper describes an in-paper created program and states that data analysis was performed (with SPSS) without describing any external evaluation team. The author contributions indicate the author team handled core aspects of the work (methodology, investigation, project administration, and writing), and there is no explicit statement of third-party oversight or independence in implementation, measurement, or analysis. Criterion I is not met because the paper provides no quoted documentation of independent conduct by a third-party evaluator.
    • Y

      Year Duration

      • The study measures outcomes at Week 25 after a 24-week intervention, which is below the ERCT threshold for year-duration tracking.
      • "The intervention took 24 weeks. In Week 25, a post-test, same in form and difficulty as the pretest but containing different specific items to avoid memory effects was given." (p. 4)
      • Relevant Quotes: 1) "Figure 1 provides the overview of the allocation of 200 eligible students to a coding intervention or a standard-curriculum control condition in a random fashion. It explains when to be assessed (Week 0 pre-test and Week 25 post-test) and how long/vigorous the intervention should be (24 weeks; 48 h of the instruction)." (p. 4) 2) "The intervention took 24 weeks. In Week 25, a post-test, same in form and difficulty as the pretest but containing different specific items to avoid memory effects was given." (p. 4) Detailed Analysis: Criterion Y requires outcomes to be measured at least 75% of one academic year after the intervention begins (typically around 9–10 months unless a different academic year is defined). The paper documents a 24-week intervention with post-testing at Week 25. Twenty-four weeks is approximately six months, which is longer than a term but does not reach the ERCT threshold for year-duration tracking, and the paper does not define an academic year short enough for 24 weeks to qualify. Criterion Y is not met because the documented tracking period is about 24–25 weeks, below the year-duration threshold.
    • B

      Balanced Control Group

      • The paper does not document extra instructional time for the experimental group beyond the school schedule, implying the intervention substitutes for regular class time rather than adding resources.
      • "The control group continued in their normal classes at such periods." (p. 4)
      • Relevant Quotes: 1) "The intervention was a structured 24-week curriculum that was conducted as 2 h of weekly sessions amounting to a total of 48 h of instruction." (p. 4) 2) "The control group continued in their normal classes at such periods." (p. 4) 3) "Experimental Group (n = 100): In this group, the structured coding education intervention was applied and the Control Group (n = 100) maintained the regular school curriculum, including basic ICT literacy (e.g., typing, word processors) but not any deliberate instruction in coding or algorithms." (p. 3) Detailed Analysis: Criterion B asks whether the intervention group received additional time or resources (time, budget, materials, special staffing) that were not comparably provided to the control group, unless such additional resources are explicitly the treatment variable. The intervention is described as "2 h of weekly sessions" (48 total hours) across 24 weeks. Importantly, the paper states: "The control group continued in their normal classes at such periods," which implies the coding sessions occurred during time periods where the control group was also receiving regular instruction (i.e., the intervention appears to substitute for regular class time rather than add extra time on top of the school schedule). The control condition is also described as receiving the regular curriculum including basic ICT literacy, suggesting both groups plausibly had access to broadly similar baseline schooling resources. While the paper later labels the comparison a "passive control," that concerns the absence of an active alternative program, not clear evidence that the experimental group received more total instructional time or budget. Criterion B is met because the paper does not document a clear net increase in instructional time/resources for the experimental group beyond what the control group received during the same periods.
  • Level 3 Criteria

    • R

      Reproduced

      • No independent, peer-reviewed replication of this specific study was found (in the paper or via an internet search as of 2026-03-14).
      • "There is need to replicate with various demographic populations." (p. 13)
      • Relevant Quotes: 1) "There is need to replicate with various demographic populations." (p. 13) Detailed Analysis: Criterion R requires evidence that an independent research team replicated this specific study (or a clearly identified close replication) in a different context and published it in a peer-reviewed outlet. The paper itself calls for replication, indicating replication has not yet been established within the article. An internet search conducted on 2026-03-14 (using the DOI, full title, and author names) did not identify any independent peer-reviewed publications explicitly describing a replication of this specific trial and reporting results. Criterion R is not met because independent replication evidence for this specific study was not found as of 2026-03-14.
    • A

      All-subject Exams

      • Criterion E is not met (no standardized exams), and the study does not use standardized exams across core school subjects.
      • "Computational thinking skills were measured using a researcher-designed, validated test battery and rubric." (p. 4)
      • Relevant Quotes: 1) "Computational thinking skills were measured using a researcher- designed, validated test battery and rubric." (p. 4) 2) "General problem-solving skill was assessed with a set of logic puzzles of standardized problems, Raven Progressive Matrices-style pattern problems, and problems involving arithmetic reasoning scaled to the age group (Raven 2000)." (p. 4) Detailed Analysis: Criterion A requires standardized exam-based assessment across all main school subjects, and the ERCT rule specifies that if Criterion E is not met then Criterion A is not met. The study uses a researcher-designed CT test battery and a problem-solving task set rather than standardized school exams. It also does not report standardized exam outcomes across core subjects (e.g., language arts, mathematics, science). Criterion A is not met because standardized exams are not used (Criterion E is not met) and the outcomes are not measured across all core subjects via standardized exams.
    • G

      Graduation Tracking

      • The study does not track students to graduation, and Criterion Y is not met, which prevents meeting the ERCT dependency requirement for G.
      • "Longitudinal studies done in future should evaluate whether such gains are sustained in 1–2 years without further teaching." (p. 13)
      • Relevant Quotes: 1) "Longitudinal studies done in future should evaluate whether such gains are sustained in 1–2 years without further teaching." (p. 13) 2) "The intervention took 24 weeks. In Week 25, a post-test, same in form and difficulty as the pretest but containing different specific items to avoid memory effects was given." (p. 4) Detailed Analysis: Criterion G requires tracking participants until graduation from the relevant educational stage. The paper documents outcome measurement at Week 25 after a 24-week intervention and does not report any follow-up into later grades or graduation outcomes. The paper explicitly frames longer-term follow-up as future work, stating longitudinal studies should assess whether gains are sustained. In addition, per the ERCT dependency rule, Criterion G cannot be met if Criterion Y is not met, and this study does not meet year-duration tracking. Criterion G is not met because the study ends at Week 25 with no graduation tracking, and Criterion Y is not met.
    • P

      Pre-Registered

      • No pre-registration registry, ID, or registration date is reported, and no external registry entry was found as of 2026-03-14.
      • Relevant Quotes: 1) None found: the paper contains no mention of pre-registration, a registry (e.g., OSF), a registration ID, or a registration date. (p. n/a) Detailed Analysis: Criterion P requires quoted evidence that the study protocol was publicly pre-registered before data collection began, including a registry reference and timing information. A full-text check of the paper PDF did not identify any pre-registration statement, registry name, registration ID, or registration date. An internet search on 2026-03-14 using the DOI, title, and author names also did not identify a corresponding public pre-registration record. Criterion P is not met because there is no reported or verifiable pre-registration record for the study.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

  • Submit a Study for Evaluation

    Share your research with us for review

  • Suggest Improvements

    Provide feedback to help us make things better.

  • Update Your Study

    If you're the author, let us know about necessary updates or corrections.