Research Framework - Statistical Analysis
Reliability and Generalizability
Estimation of reliability might seem like an area not requiring further research, but the simple indices we have come to trust sometimes systematically over- or under-estimate their intended parameters.
Thus, the use of IRT-based, reliability-related indices is an important area of study, including research on information functions and conditional standard errors of measurement. Also, new or modified indices are sometimes needed for new settings. For example, when the objective of an assessment is to classify students according to four levels of mastery in a broad subject area, the concept of reliability is translated into classification accuracy and consistency; the methods of estimation must be different than those usually applied to standardized tests. Included in this category are methods for estimating inter-rater reliability.
Unlike research for reliability, research on generalizability must be frequently revisited for a given assessment program, because assessment settings often undergo changes during the lifetime of a program. These changes must be evaluated for their effects on the reliability of assessment instruments.
Research
Interjudge Inconsistency Index for Body of Work, Yes/No, and Bookmark Standard Setting Procedures, by Abdullah Ferdous and Barbara Plake (2007), Paper presented at the American Education Research Association Annual Meeting,
Chicago, IL
A Comparison of IRT Equating Methods on Recovering Parameters and Capturing Growth in Mixed-format Tests, by Su Baldwin, Peter Baldwin, and Michael Nering (2007), Paper presented at the American Education Research Association Annual Meeting,
Chicago, IL
An Adaptive Scoring Protocol to Enhance Accuracy of Performance Classification, by Mark Darby, Matt Finkelman, and Michael Nering (2007), Paper presented at the American Education Research Association Annual Meeting, Chicago, IL
Missing Data Treatment Methods in Parameter Recovery for a Mixed-Format Test, by Thakur Karkee and Matt Finkelman (2007), Paper presented at the American Education Research Association Annual Meeting, Chicago, IL
Standardized Conditional SEM: A Case for Conditional Reliability, by Michael Nering, T.C. Oshima, Larry Price, and Nambury Raju (2006), from Applied Psychological Measurement
Cognitive Diagnostic Attribute Level Discrimination Indices, by Robert Henson, William Stout, Jeff Douglas, Xuming He, and Louis Roussos (2006), Unpublished ETS Project Report, Princeton, NJ

