Yin Burgess

Date of Award

Summer 2020

Document Type

Open Access Dissertation



First Advisor

Christine DiStefano


This simulation study investigated the accuracy of the mean square and the standardized values of the item INFIT and OUTFIT statistics (i.e., based on total item fit) in Rasch dichotomous model under large-scale testing situations. It also examined their associated Type I error rates to determine how the rule-of-thumb critical values perform in detecting item misfit. Furthermore, simulated systematic measurement disturbances were used to test the power (i.e., the hit rates of true positive cases, true positive rates) and the false positive rates (i.e., Type I error rates) of the obtained values through between-item fit indices in identifying poor-fitting items. A total of four sample sizes (i.e., 5,000, 10,000, 25,000, and 50,000 test-taking students) and three test length (i.e., 30, 50, and 70 multiple-choice items) conditions were simulated to study how these statistics perform. Additionally, different percentages of items (i.e., 4%, 10%, 20% and 40%) with moderate to large uniform DIF (i.e., 0.35, 0.45, 0.55, and 0.65 logit units) were designed to test the power as well as the Type I error rates. The measurement disturbances were simulated between two balanced groups with “C” category DIF as defined by the ETS guidelines. Results suggested that ±2.0 for standardized values may be recommended for large-scale testing situations. Furthermore, it was found that the DIF item detection procedure currently used by Winsteps® is based on logistic regression, which, is sensitive to sample size and resulted in large numbers of items to be incorrectly identified with DIF.