Air Reading: A Randomized Evaluation of a Virtual Tutoring Model in Louisiana and Texas Schools

Amanda J. Neitzel, PhD, Nathan Storey, PhD, and Xue Wang, MA

Published: Feb 1, 2026

ERCT Check Date: Apr 14, 2026

DOI:

Link

Download PDF

reading
K12
US
blended learning
EdTech platform
formative assessment

C

Although randomization was at the student level, the intervention is tutoring delivered to individuals/small groups, which the ERCT standard treats as an allowed exception for Criterion C.

"Small groups of up to four students are paired with the same highly-qualified tutor in 30-minute sessions four times each week." (p. 4)
E

The outcomes are measured using widely recognized standardized assessments (TPRI, STAAR, DIBELS, LEAP), satisfying the exam-based assessment requirement.

"In Texas, the Texas Primary Reading Inventory (TPRI) was administered in grades 1–2, and the Texas STAAR assessment in grade 3." (p. 9)
T

The evaluation reports outcomes over the full 2024-25 school year, which exceeds the minimum one-term follow-up requirement.

"The trial was conducted across the 2024-25 school year in a large suburban district in Louisiana as well as a district in Texas, and delivered tutoring for a full school year." (p. 4)
D

The control condition is defined as business-as-usual literacy instruction and the report provides baseline characteristics and equivalence checks for treatment vs. control.

"Students assigned to treatment were offered Air Reading tutoring during the school day, while control students received business-as-usual literacy instruction." (p. 4)
S

Randomization occurred at the student level within schools, so the study does not meet the school-level RCT requirement.

"Randomization occurred at the student level within schools and grade cohorts, ensuring that treatment and control students were directly comparable at baseline." (p. 4)
I

The report does not explicitly document evaluator independence from the intervention provider across key evaluation steps, and it notes provider involvement in preparing the de-identified analytic dataset.

"At the conclusion of tutoring, data on student achievement, student demographics, and program dosage was merged and de-identified by Air Reading." (p. 11)
Y

Outcomes are measured over the 2024-25 school year with a full-year tutoring implementation, meeting the year-duration threshold.

"The trial was conducted across the 2024-25 school year in a large suburban district in Louisiana as well as a district in Texas, and delivered tutoring for a full school year." (p. 4)
B

The treatment adds substantial tutoring time and resources, and this added tutoring is the intervention being tested versus business-as-usual, so the resource difference is integral to the intended treatment contrast.

"Students assigned to treatment were offered Air Reading tutoring during the school day, while control students received business-as-usual literacy instruction." (p. 4)
R

No independent replication by a different research team in a peer-reviewed journal was found; the report only references a prior study by the same authors.

"First, the stronger impacts observed in the present evaluation of a full year implementation compared with a prior study of a semester-long implementation (Neitzel & Storey, 2024) highlight the importance of program duration and exposure." (p. 16)
A

The study uses standardized reading assessments but does not assess all core school subjects, focusing on literacy outcomes only.

"This study employed a randomized controlled trial (RCT) to estimate the causal impact of Air Reading, a structured virtual tutoring program, on elementary students’ literacy outcomes." (p. 4)
G

The study measures outcomes within a single school year and explicitly states that longer-term persistence requires future research; no graduation-tracking follow-up papers by the same authors were found.

"Finally, like most randomized controlled trials, this study focused on short-term achievement outcomes." (p. 17)
P

The report contains no public pre-registration link/ID or registration date demonstrating protocol registration before data collection.

Abstract

Air Reading partnered with the Center for Research and Reform in Education (CRRE) to conduct an evaluation of Air Reading in a project funded by Accelerate and Arnold Ventures. The trial was conducted across the 2024-25 school year in a large suburban district in Louisiana as well as a district in Texas, and delivered tutoring for a full school year. Air Reading is an assessment-driven virtual tutoring program designed to improve students’ foundational reading skills in Kindergarten through 8th grade.

Full Article

ERCT Criteria Breakdown

Level 1 Criteria
- C
  Class-level RCT
  - Although randomization was at the student level, the intervention is tutoring delivered to individuals/small groups, which the ERCT standard treats as an allowed exception for Criterion C.
  - "Small groups of up to four students are paired with the same highly-qualified tutor in 30-minute sessions four times each week." (p. 4)
  - Relevant Quotes: 1) "Small groups of up to four students are paired with the same highly-qualified tutor in 30-minute sessions four times each week." (p. 4) 2) "Randomization occurred at the student level within schools and grade cohorts, ensuring that treatment and control students were directly comparable at baseline." (p. 4) 3) "Students were grouped by instructional need, with group sizes ranging from 1:1 to 1:4." (p. 8) Detailed Analysis: Criterion C requires randomization at the class (or school) level to minimize contamination, but it allows an exception for interventions that are inherently individualized (e.g., tutoring/personal teaching). This study explicitly randomizes "at the student level within schools and grade cohorts," which would normally fail a strict class-level randomization requirement. However, the intervention is clearly described as tutoring delivered in small groups (1:1 to 1:4) with a tutor, which is consistent with the ERCT exception logic: student-level assignment is acceptable when the educational delivery is tutoring rather than a whole-class pedagogy. Final sentence: Criterion C is met because this is a tutoring intervention where student-level randomization is an allowed ERCT exception.
- E
  Exam-based Assessment
  - The outcomes are measured using widely recognized standardized assessments (TPRI, STAAR, DIBELS, LEAP), satisfying the exam-based assessment requirement.
  - "In Texas, the Texas Primary Reading Inventory (TPRI) was administered in grades 1–2, and the Texas STAAR assessment in grade 3." (p. 9)
  - Relevant Quotes: 1) "In Texas, the Texas Primary Reading Inventory (TPRI) was administered in grades 1–2, and the Texas STAAR assessment in grade 3." (p. 9) 2) "In Louisiana, student performance was measured using DIBELS (grade 2) and the state’s LEAP assessments (grade 4)." (p. 9) Detailed Analysis: Criterion E requires that academic outcomes be measured with standardized, widely recognized exam-based assessments rather than custom researcher-made tests tailored to the intervention. The report explicitly names established assessments used by Texas and Louisiana (TPRI, STAAR, DIBELS, LEAP). The paper does not describe these instruments as being created for this study; they are standard measures used in the relevant education systems. Final sentence: Criterion E is met because the study uses standardized assessments (TPRI, STAAR, DIBELS, LEAP) as outcome measures.
- T
  Term Duration
  - The evaluation reports outcomes over the full 2024-25 school year, which exceeds the minimum one-term follow-up requirement.
  - "The trial was conducted across the 2024-25 school year in a large suburban district in Louisiana as well as a district in Texas, and delivered tutoring for a full school year." (p. 4)
  - Relevant Quotes: 1) "The trial was conducted across the 2024-25 school year in a large suburban district in Louisiana as well as a district in Texas, and delivered tutoring for a full school year." (p. 4) 2) "This section presents the estimated impacts of Air Reading on student literacy outcomes over the 24-25 school year." (p. 12) Detailed Analysis: Criterion T requires outcome measurement at least one academic term after the intervention begins. The paper repeatedly situates both implementation and impact estimation across the "2024-25 school year" and reports results "over the 24-25 school year." A full school year necessarily exceeds one term (typically ~3-4 months), so the minimum ERCT tracking duration requirement for T is satisfied. Final sentence: Criterion T is met because outcomes are measured over a full school year, which is longer than one academic term.
- D
  Documented Control Group
  - The control condition is defined as business-as-usual literacy instruction and the report provides baseline characteristics and equivalence checks for treatment vs. control.
  - "Students assigned to treatment were offered Air Reading tutoring during the school day, while control students received business-as-usual literacy instruction." (p. 4)
  - Relevant Quotes: 1) "Students assigned to treatment were offered Air Reading tutoring during the school day, while control students received business-as-usual literacy instruction." (p. 4) 2) "The sample included students in grades 1–4 randomized to either the Air Reading program (n = 174) or the business-as-usual control condition (n = 203)." (p. 4) 3) "Table 1 provides descriptive statistics on student demographics and baseline achievement, pooled across the full sample and disaggregated by cohort and treatment status." (p. 8) 4) "Balance checks indicated no meaningful baseline differences between treatment and control groups overall or within cohorts." (p. 8) Detailed Analysis: Criterion D requires that the control group be clearly described, including what it received, and that baseline characteristics are documented so the reader can judge comparability. The report explicitly defines the control condition as "business-as-usual literacy instruction" and gives sample sizes for each arm. It also points to baseline descriptive statistics (Table 1) and explicitly states that baseline balance checks showed no meaningful differences. Final sentence: Criterion D is met because the control condition and baseline comparability are clearly documented.
Level 2 Criteria
- S
  School-level RCT
  - Randomization occurred at the student level within schools, so the study does not meet the school-level RCT requirement.
  - "Randomization occurred at the student level within schools and grade cohorts, ensuring that treatment and control students were directly comparable at baseline." (p. 4)
  - Relevant Quotes: 1) "Randomization occurred at the student level within schools and grade cohorts, ensuring that treatment and control students were directly comparable at baseline." (p. 4) 2) "Randomization was conducted at the student level within schools and grade cohorts, so models included school fixed effects and blocking variables from the random assignment process." (p. 9) Detailed Analysis: Criterion S requires randomization at the school level (i.e., schools are assigned to treatment or control). The report explicitly states student-level randomization within schools, which is incompatible with school-level assignment. Therefore, this study does not satisfy Criterion S. Final sentence: Criterion S is not met because the unit of randomization is students, not schools.
- I
  Independent Conduct
  - The report does not explicitly document evaluator independence from the intervention provider across key evaluation steps, and it notes provider involvement in preparing the de-identified analytic dataset.
  - "At the conclusion of tutoring, data on student achievement, student demographics, and program dosage was merged and de-identified by Air Reading." (p. 11)
  - Relevant Quotes: 1) "Air Reading partnered with the Center for Research and Reform in Education (CRRE) to conduct an evaluation of Air Reading in a project funded by Accelerate and Arnold Ventures." (p. 4) 2) "At the conclusion of tutoring, data on student achievement, student demographics, and program dosage was merged and de-identified by Air Reading." (p. 11) 3) "The de-identified dataset was shared with the research team for analysis." (p. 11) Detailed Analysis: Criterion I requires clear documentation that the evaluation is conducted independently from the intervention provider/designers, so that provider involvement does not plausibly bias measurement, data handling, analysis, or reporting. The report describes the evaluation as a partnership and indicates that Air Reading (the provider) merged and de-identified the dataset before sharing it with the research team. The report does not include a clear independence statement (e.g., that the provider had no role in outcome measurement, data cleaning decisions, analysis decisions, or conclusions), nor does it describe third-party auditing or similar safeguards. Final sentence: Criterion I is not met because independence from the provider is not explicitly documented and the provider prepared the de-identified analytic dataset.
- Y
  Year Duration
  - Outcomes are measured over the 2024-25 school year with a full-year tutoring implementation, meeting the year-duration threshold.
  - "The trial was conducted across the 2024-25 school year in a large suburban district in Louisiana as well as a district in Texas, and delivered tutoring for a full school year." (p. 4)
  - Relevant Quotes: 1) "The trial was conducted across the 2024-25 school year in a large suburban district in Louisiana as well as a district in Texas, and delivered tutoring for a full school year." (p. 4) 2) "This section presents the estimated impacts of Air Reading on student literacy outcomes over the 24-25 school year." (p. 12) Detailed Analysis: Criterion Y requires outcome measurement at least 75% of an academic year after the intervention begins. The report explicitly states the trial was conducted across the full "2024-25 school year" and reports results over that year, indicating a year-long implementation and end-of-year outcomes rather than a short follow-up window. Final sentence: Criterion Y is met because outcomes are measured over a full school year (2024-25), which meets the ERCT year-duration threshold.
- B
  Balanced Control Group
  - The treatment adds substantial tutoring time and resources, and this added tutoring is the intervention being tested versus business-as-usual, so the resource difference is integral to the intended treatment contrast.
  - "Students assigned to treatment were offered Air Reading tutoring during the school day, while control students received business-as-usual literacy instruction." (p. 4)
  - Relevant Quotes: 1) "Students assigned to treatment were offered Air Reading tutoring during the school day, while control students received business-as-usual literacy instruction." (p. 4) 2) "Small groups of up to four students are paired with the same highly-qualified tutor in 30-minute sessions four times each week." (p. 4) 3) "On average, treatment students attended 55 sessions, with 56% reaching the high-dosage threshold of 56 or more sessions, translating to an average of 27.1 hours of tutoring per student." (p. 4) Detailed Analysis: Criterion B examines whether the nature, quantity, and quality of resources (time, personnel, materials, budget) are balanced between treatment and control, unless the additional resources are explicitly the treatment variable being tested. Here, the treatment clearly includes substantial additional resources: frequent live tutoring sessions delivered during the school day by paid, qualified tutors, with reported average exposure of 27.1 hours. The control group receives "business-as-usual literacy instruction" and therefore does not receive matched tutoring time. This imbalance is not an accidental confound; it is the central causal contrast of the study (tutoring versus business-as-usual). Under the ERCT Criterion B decision logic, when the additional time/personnel are integral to what the intervention is (and are the intended treatment contrast), the control group may remain business as usual. Final sentence: Criterion B is met because the extra tutoring time and tutoring personnel are integral to the intervention being tested against business-as-usual.
Level 3 Criteria
- R
  Reproduced
  - No independent replication by a different research team in a peer-reviewed journal was found; the report only references a prior study by the same authors.
  - "First, the stronger impacts observed in the present evaluation of a full year implementation compared with a prior study of a semester-long implementation (Neitzel & Storey, 2024) highlight the importance of program duration and exposure." (p. 16)
  - Relevant Quotes: 1) "First, the stronger impacts observed in the present evaluation of a full year implementation compared with a prior study of a semester-long implementation (Neitzel & Storey, 2024) highlight the importance of program duration and exposure." (p. 16) Detailed Analysis: Criterion R requires an independent replication of the study (or its central experimental claim) by a different research team, in a different context, published in a peer-reviewed scientific journal. The only explicitly referenced prior study is "(Neitzel & Storey, 2024)," which is not independent replication because it is the same author team. An internet search (ERIC and related public sources) did not identify a peer-reviewed journal replication of this specific Air Reading RCT by an independent research team. Final sentence: Criterion R is not met because no independent peer-reviewed replication by a different author team was found.
- A
  All-subject Exams
  - The study uses standardized reading assessments but does not assess all core school subjects, focusing on literacy outcomes only.
  - "This study employed a randomized controlled trial (RCT) to estimate the causal impact of Air Reading, a structured virtual tutoring program, on elementary students’ literacy outcomes." (p. 4)
  - Relevant Quotes: 1) "This study employed a randomized controlled trial (RCT) to estimate the causal impact of Air Reading, a structured virtual tutoring program, on elementary students’ literacy outcomes." (p. 4) 2) "In Texas, the Texas Primary Reading Inventory (TPRI) was administered in grades 1–2, and the Texas STAAR assessment in grade 3. In Louisiana, student performance was measured using DIBELS (grade 2) and the state’s LEAP assessments (grade 4)." (p. 9) Detailed Analysis: Criterion A requires standardized exam-based outcome measurement across all main school subjects (not only the targeted subject), to detect possible cross-subject tradeoffs. The report frames the evaluation as estimating impacts on "literacy outcomes," and the listed assessments are all reading/literacy measures. There is no evidence that mathematics, science, or other core subjects were assessed as part of the impact evaluation. Final sentence: Criterion A is not met because outcomes are limited to literacy rather than all core school subjects.
- G
  Graduation Tracking
  - The study measures outcomes within a single school year and explicitly states that longer-term persistence requires future research; no graduation-tracking follow-up papers by the same authors were found.
  - "Finally, like most randomized controlled trials, this study focused on short-term achievement outcomes." (p. 17)
  - Relevant Quotes: 1) "Finally, like most randomized controlled trials, this study focused on short-term achievement outcomes." (p. 17) 2) "Longitudinal analyses will be needed to assess the persistence of impacts over time, as well as potential spillover effects on broader measures of literacy development and academic engagement." (p. 17) 3) "First, future work should investigate the persistence of impacts beyond a single school year, including long-term outcomes in later grades such as reading fluency, comprehension, and broader academic achievement." (p. 17) Detailed Analysis: Criterion G requires that participants be followed until graduation (i.e., long-term tracking through the end of the relevant schooling stage). The report explicitly describes the outcomes as "short-term" and calls for future work to study persistence "beyond a single school year," indicating that this evaluation does not itself follow students through graduation. Per the ERCT instructions, an internet search was conducted for follow-up papers by the same authors that track this cohort to graduation; no such graduation-tracking follow-up publication was found in the accessible sources checked (e.g., ERIC). Final sentence: Criterion G is not met because the study does not track students to graduation and no graduation-tracking follow-up publication was found.
- P
  Pre-Registered
  - The report contains no public pre-registration link/ID or registration date demonstrating protocol registration before data collection.
  - Relevant Quotes: (No explicit pre-registration statement, registry ID/link, or registration date appears in the report.) 1) "Randomization was conducted using a pre-specified algorithm and procedures, implemented via the IndependentRandomizer app developed by Amanda J. Neitzel." (p. 11) Detailed Analysis: Criterion P requires a publicly accessible pre-registered protocol (e.g., OSF, AEA RCT Registry) with evidence that registration occurred before data collection began. The report mentions a "pre-specified algorithm and procedures" for randomization, but it does not provide a public registry link/ID or a registration date. An internet search for a public pre-registration entry specific to this February 2026 Louisiana-and-Texas evaluation did not yield a registry record that can be verified and quoted as having been registered before study start. Final sentence: Criterion P is not met because no public pre-registration link/ID and timing evidence are provided or verifiable.

Request an Update or Contact Us

Are you the author of this study? Let us know if you have any questions or updates.

Request Valuation Update

All Other Requests

Have Questions
or Suggestions?

Get in Touch

Have a study you'd like to submit for ERCT evaluation? Found something that could be improved? If you're an author and need to update or correct information about your study, let us know.

Submit a Study for Evaluation

Share your research with us for review
Suggest Improvements

Provide feedback to help us make things better.
Update Your Study

If you're the author, let us know about necessary updates or corrections.

Air Reading: A Randomized Evaluation of a Virtual Tutoring Model in Louisiana and Texas Schools

Although randomization was at the student level, the intervention is tutoring delivered to individuals/small groups, which the ERCT standard treats as an allowed exception for Criterion C.

The outcomes are measured using widely recognized standardized assessments (TPRI, STAAR, DIBELS, LEAP), satisfying the exam-based assessment requirement.

The evaluation reports outcomes over the full 2024-25 school year, which exceeds the minimum one-term follow-up requirement.

The control condition is defined as business-as-usual literacy instruction and the report provides baseline characteristics and equivalence checks for treatment vs. control.

Randomization occurred at the student level within schools, so the study does not meet the school-level RCT requirement.

The report does not explicitly document evaluator independence from the intervention provider across key evaluation steps, and it notes provider involvement in preparing the de-identified analytic dataset.

Outcomes are measured over the 2024-25 school year with a full-year tutoring implementation, meeting the year-duration threshold.

The treatment adds substantial tutoring time and resources, and this added tutoring is the intervention being tested versus business-as-usual, so the resource difference is integral to the intended treatment contrast.

No independent replication by a different research team in a peer-reviewed journal was found; the report only references a prior study by the same authors.

The study uses standardized reading assessments but does not assess all core school subjects, focusing on literacy outcomes only.

The study measures outcomes within a single school year and explicitly states that longer-term persistence requires future research; no graduation-tracking follow-up papers by the same authors were found.

The report contains no public pre-registration link/ID or registration date demonstrating protocol registration before data collection.

Abstract

ERCT Criteria Breakdown

Level 1 Criteria

Class-level RCT

Exam-based Assessment

Term Duration

Documented Control Group

Level 2 Criteria

School-level RCT

Independent Conduct

Year Duration

Balanced Control Group

Level 3 Criteria

Reproduced

All-subject Exams

Graduation Tracking

Pre-Registered

Request an Update or Contact Us

Have Questions or Suggestions?

Submit a Study for Evaluation

Suggest Improvements

Update Your Study

Have Questions
or Suggestions?