TY - JOUR
TI - Correction for guessing in the framework of the 3PL item response theory
DO - https://doi.org/doi:10.7282/T3668D8M
PY - 2010
AB - Guessing behavior is an important topic with regard to assessing proficiency on multiple choice tests, particularly for examinees at lower levels of proficiency due to greater the potential for systematic error or bias which that inflates observed test scores. Methods that incorporate a correction for guessing on high-stakes tests generally rely on a scoring model that aims to minimize the potential benefit of guessing. In some cases, a formula score based on classical test theory (CTT) is applied with the intention of eliminating the influence of guessing from the number-right score (e.g., Holzinger, 1924). However, since its inception, significant controversy has surrounded the use and consequences associated with classical methods of correcting for guessing. More recently, item response theory (IRT) has been used to conceptualize and describe the effects of guessing. Yet CTT remains a dominant aspect of many assessment programs, and IRT models are rarely used for estimating proficiency with MC items – where guessing is most likely to exert an influence. Although there has been tremendous growth in the research of formal modeling based on IRT with respect to guessing, none of these IRT approaches have had widespread application. This dissertation provides a conceptual analysis of how the ―correction for guessing works within the framework of a 3PL model, and two new guessing correction formulas based on IRT are derived for improving observed score estimates. To demonstrate the utility of the new formula scores, they are applied as conditioning variable in two different approaches to DIF: the Mantel-Haenszel and logistic regression procedures. Two IRT formula scores were developed using Taylor approximations. Each of these formula scores requires the use of sample statistics in lieu of IRT parameters for estimating corrected true scores, and these statistics were obtained in two different ways that are referred to as the pseudo-Bayes and conditional probability methods. It is shown that the IRT formula scores adjust the number-correct score based on both the proficiency of an examinees and the examinee‘s pattern of responses across items. In two different simulation studies, the classical formula score performed better in terms of bias statistics, but the IRT formula scores had notable improvement in bias and r2 statistics compared to the number-correct score. The advantage of the IRT formula scores accounted for about 10% more of the variance in corrected true scores in the first quartile. Results also suggested that not much information lost due to the use of Taylor approximation. The pseudo-Bayes and conditional probabilities methods also resulted in little information loss. When applied to DIF analyses, the IRT formula scores had lower bias in both the log-odds ratios and type 1 error rates compared to the number-corrected score. Overall, the IRT formula scores decreased bias in the log-odds ratio by about 6% and in the type 1 error rate by about 10%.
KW - Education
KW - Item response theory
KW - Ability--Testing
LA - eng
ER -