Evaluating the flipped classroom: A randomized controlled trial

Nathan Wozny, Cary Balser & Drew Ives

Published:
ERCT Check Date:
DOI: 10.1080/00220485.2018.1438860
  • social studies
  • higher education
  • US
  • flipped classroom
  • blended learning
  • EdTech website
  • formative assessment
0
  • C

    The study randomizes intact sections (all students in a meeting) to flipped or lecture for each lesson, avoiding within‑session mixing and satisfying class‑level assignment.

    “We randomly assigned each combination of section and experimental lesson to a treatment condition (flipped classroom) or a control condition (traditional lecture), conditional on each section having exactly five flipped lessons and each experimental lesson having three or four flipped sections.” (p. 2)

  • E

    The study used instructor‑designed course exams instead of standardized external assessments.

    "Four announced, written graded exams administered throughout the semester measured medium‑term comprehension on content covered in approximately the eight lessons preceding the exam." (p. 3)

  • T

    Learning outcomes were measured at end of semester, satisfying the term duration requirement.

    "A comprehensive written final exam administered at the end of the semester measured long‑term comprehension." (p. 3)

  • D

    The control (lecture) condition lacks detailed documentation of participant characteristics and baseline outcomes.

    "Students in lecture lessons had access to the same exercises offered in the flipped classes, but the lecture group generally did not have available class time to complete the exercises." (p. 4)

  • S

    Randomization occurred within course sections rather than entire schools.

    "After students and instructors were assigned to sections of the course, we randomly assigned each combination of section and experimental lesson to a treatment condition (flipped classroom) or a control condition (traditional lecture)..." (p. 3)

  • A

    Only econometrics outcomes were assessed, with no broad subject coverage.

    "A comprehensive written final exam administered at the end of the semester measured long‑term comprehension." (p. 3)

  • Y

    Outcomes were measured only through one semester, not a full academic year.

    "A comprehensive written final exam administered at the end of the semester measured long‑term comprehension." (p. 3)

  • B

    Flipped lessons included mandatory pre‑class videos and in‑class exercises not equated by the control group.

    "We assigned students a video lecture and graded comprehension questions in advance of each flipped lesson... Students in lecture lessons had access to the same exercises offered in the flipped classes, but the lecture group generally did not have available class time to complete the exercises." (p. 3–4)

  • G

    The study did not track participants until graduation.

  • R

    An independent replication of the flipped classroom experiment by another research team has been published, satisfying this criterion.

  • I

    Authors who designed the intervention also implemented and analyzed the study.

    "Each of the 137 students enrolled in the course was assigned to one of seven sections (each meeting at a different time)... The three authors each taught two to three sections of the course." (p. 2)

  • P

    No evidence of pre-registration of study protocols.

Abstract

Despite recent interest in flipped classrooms, rigorous research evaluating their effectiveness is sparse. In this study, the authors implement a randomized controlled trial to evaluate the effect of a flipped classroom technique relative to a traditional lecture in an introductory undergraduate econometrics course. Random assignment enables the analysis to eliminate other potential explanations of performance differences between the flipped and traditional classrooms, while assignment of experimental condition by section and lesson enables improved statistical precision. The authors find that the flipped classroom increases scores on medium-term, high‑stakes assessments by 0.16 standard deviation, with similar long‑term effects for high‑performing students. Estimated impacts are robust to alternative specifications accounting for possible spillover effects arising from the experimental design.

Full Article

ERCT Criteria Breakdown

  • Level 1 Criteria

    • C

      Class-level RCT

      • The study randomizes intact sections (all students in a meeting) to flipped or lecture for each lesson, avoiding within‑session mixing and satisfying class‑level assignment.
      • “We randomly assigned each combination of section and experimental lesson to a treatment condition (flipped classroom) or a control condition (traditional lecture), conditional on each section having exactly five flipped lessons and each experimental lesson having three or four flipped sections.” (p. 2)
      • Relevant Quotes: 1) “We implemented a randomized controlled trial to evaluate the effect of a flipped classroom technique relative to a traditional lecture in an introductory undergraduate econometrics course. Random assignment… by section and lesson…” (p. 1) 2) “Each of the 137 students enrolled in the course was assigned to one of seven sections (each meeting at a different time)… We randomly assigned each combination of section and experimental lesson to a treatment condition (flipped classroom) or a control condition (traditional lecture).” (p. 2) Detailed Analysis: The authors never split individual students within a single session into different conditions. Instead, for each targeted lesson, an entire section either experienced the flipped format or the traditional lecture, thereby preventing within‑class contamination and meeting the ERCT requirement for class‑level randomization. Thus, criterion C is met because entire sections were randomized as intact units for each lesson, satisfying the class‑level RCT criterion.
    • E

      Exam-based Assessment

      • The study used instructor‑designed course exams instead of standardized external assessments.
      • "Four announced, written graded exams administered throughout the semester measured medium‑term comprehension on content covered in approximately the eight lessons preceding the exam." (p. 3)
      • Relevant Quotes: 1) "Four announced, written graded exams administered throughout the semester measured medium‑term comprehension on content covered in approximately the eight lessons preceding the exam." (p. 3) 2) "A comprehensive written final exam administered at the end of the semester measured long‑term comprehension." (p. 3) Detailed Analysis: The assessments are course‑specific exams designed by the instructors, not recognized standardized tests. Therefore, they do not satisfy the requirement for standardized, exam‑based assessment. Therefore, criterion E is not met because the study used custom course exams rather than a recognized standardized assessment.
    • T

      Term Duration

      • Learning outcomes were measured at end of semester, satisfying the term duration requirement.
      • "A comprehensive written final exam administered at the end of the semester measured long‑term comprehension." (p. 3)
      • Relevant Quotes: 1) "A comprehensive written final exam administered at the end of the semester measured long‑term comprehension." (p. 3) Detailed Analysis: The study measured outcomes at the end of the semester, covering a full academic term, satisfying the ERCT term duration requirement. Thus, criterion T is met because learning outcomes were assessed at the end of the semester, covering a full academic term.
    • D

      Documented Control Group

      • The control (lecture) condition lacks detailed documentation of participant characteristics and baseline outcomes.
      • "Students in lecture lessons had access to the same exercises offered in the flipped classes, but the lecture group generally did not have available class time to complete the exercises." (p. 4)
      • Relevant Quotes: 1) "Students in lecture lessons had access to the same exercises offered in the flipped classes, but the lecture group generally did not have available class time to complete the exercises." (p. 4) Detailed Analysis: The quote describes the control condition but does not provide demographic details or baseline performance of the control lessons. There is no table or narrative describing participant characteristics for lecture lessons separately. Therefore, documentation of the control group is insufficient. Accordingly, criterion D is not met because the control condition is not described with baseline performance or demographic details.
  • Level 2 Criteria

    • S

      School-level RCT

      • Randomization occurred within course sections rather than entire schools.
      • "After students and instructors were assigned to sections of the course, we randomly assigned each combination of section and experimental lesson to a treatment condition (flipped classroom) or a control condition (traditional lecture)..." (p. 3)
      • Relevant Quotes: 1) "After students and instructors were assigned to sections of the course, we randomly assigned each combination of section and experimental lesson to a treatment condition (flipped classroom) or a control condition (traditional lecture)..." (p. 3) Detailed Analysis: Randomization was performed at the section‑lesson level, not at the school level. The ERCT school‑level RCT criterion requires randomisation among institutions (schools), which is not satisfied here. Consequently, criterion S is not met because randomization did not occur at the school level.
    • Y

      Year Duration

      • Outcomes were measured only through one semester, not a full academic year.
      • "A comprehensive written final exam administered at the end of the semester measured long‑term comprehension." (p. 3)
      • Relevant Quotes: 1) "A comprehensive written final exam administered at the end of the semester measured long‑term comprehension." (p. 3) Detailed Analysis: There is no follow‑up beyond the semester final exam. The study does not track participants over a full academic year, so the year duration requirement is not satisfied. As a result, criterion Y is not met because tracking did not extend through a full academic year.
    • B

      Balanced Resources

      • Flipped lessons included mandatory pre‑class videos and in‑class exercises not equated by the control group.
      • "We assigned students a video lecture and graded comprehension questions in advance of each flipped lesson... Students in lecture lessons had access to the same exercises offered in the flipped classes, but the lecture group generally did not have available class time to complete the exercises." (p. 3–4)
      • Relevant Quotes: 1) "We assigned students a video lecture and graded comprehension questions in advance of each flipped lesson." (p. 3) 2) "Students in lecture lessons had access to the same exercises offered in the flipped classes, but the lecture group generally did not have available class time to complete the exercises." (p. 3–4) Detailed Analysis: The flipped condition required additional preparatory tasks (video lectures and comprehension quizzes), while the control condition did not match these resources or structured time. This imbalance undermines the ability to isolate the effect of the flipping method alone. Therefore, criterion B is not met because the flipped lessons introduced extra preparatory tasks not matched in the control lessons.
    • I

      Independent Conduct

      • Authors who designed the intervention also implemented and analyzed the study.
      • "Each of the 137 students enrolled in the course was assigned to one of seven sections (each meeting at a different time)... The three authors each taught two to three sections of the course." (p. 2)
      • Relevant Quotes: 1) "Each of the 137 students enrolled in the course was assigned to one of seven sections (each meeting at a different time)... The three authors each taught two to three sections of the course." (p. 2) Detailed Analysis: The same individuals who designed the course materials and videos also conducted the lessons and evaluated outcomes. No third‑party evaluator participated. Therefore, criterion I is not met because the study was conducted by its own authors without independent oversight.
  • Level 3 Criteria

    • A

      All Exams

      • Only econometrics outcomes were assessed, with no broad subject coverage.
      • "A comprehensive written final exam administered at the end of the semester measured long‑term comprehension." (p. 3)
      • Relevant Quotes: 1) "A comprehensive written final exam administered at the end of the semester measured long‑term comprehension." (p. 3) Detailed Analysis: The study focuses exclusively on an econometrics course and does not assess learning outcomes across other core subjects. Thus, it fails the all‑subject exams requirement. Thus, criterion A is not met because the intervention’s impact was measured only in the single subject of econometrics.
    • G

      Graduation Tracking

      • The study did not track participants until graduation.
      • Relevant Quotes: None. Detailed Analysis: The paper reports outcomes only up to the final exam at semester end. No data collection continues through graduation, so graduation tracking criterion is not satisfied. Thus, criterion G is not met because no follow‑up beyond the semester final exam is reported.
    • R

      Reproduced Results

      • An independent replication of the flipped classroom experiment by another research team has been published, satisfying this criterion.
      • Relevant Quotes: 1) "We conduct a randomized controlled trial at West Point and find that the flipped classroom produced short term gains in Math and no effect in Economics..." (Setren et al. 2021) Detailed Analysis: A separate research team (Setren et al., 2021) conducted an independent randomized trial implementing a flipped classroom model at a different institution. This external replication demonstrates that the flipped classroom approach can be reproduced in another context, although the outcomes varied by subject (positive in mathematics but null in economics). Thus, criterion R is met as the flipped classroom experiment was independently replicated by an external research team.
    • P

      Pre-Registered Protocol

      • No evidence of pre-registration of study protocols.
      • Relevant Quotes: None. Detailed Analysis: There is no mention of study registration or pre-published protocol before data collection. The criterion for pre-registration is therefore not met. Thus, criterion P is not met because no pre-registration is reported.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

  • Submit a Study for Evaluation

    Share your research with us for review

  • Suggest Improvements

    Provide feedback to help us make things better.

  • Update Your Study

    If you're the author, let us know about necessary updates or corrections.