Date of Award


Document Type

Campus Access Dissertation


Educational Studies


Educational Psychology / Research

First Advisor

Robert Johnson


Inter-rater reliability coefficients are often reported in performance assessment as a measure of rating quality. Rater effects, as a systematic error in rating, may change the decision made about an examinee when an absolute interpretation of scores, such as a pass/fail decision, is used. Little research, however, examines the robustness of different inter-rater reliability estimators to rater effects. The purpose of the present research is to inform assessment practitioners or researchers reporting inter-rater reliability of (a) the more appropriate estimator in the presence of rater effects to achieve the most accurate estimates of inter-rater reliability and (b) the possible biases of the estimators under various conditions, such as number of scale points, initial inter-rater reliability, and number of papers. Results from this study indicate that the leniency/severity effects and the number of scale points substantially affect the accuracy of different inter-rater reliability estimators. The phi coefficient (or index of dependability) is not always the most accurate estimator, and it becomes the best choice only when the leniency/severity effect is large and the number of scale points is large. However, for other estimators, the increase in the number of scale points does not necessarily help the improvement of estimation accuracy.