Open Response

Michael Nering is a senior psychometrician at Measured Progress, with particular expertise in the application of quantitative methodologies to state assessment contracts. His research has focused on item response theory, equating, and scaling.

   Reliability and Validity   

In this issue, we briefly define reliability and validity. These two concepts form the foundation of any assessment instrument, and together they demonstrate the quality of an assessment. Both validity and reliability are very broad in nature and can involve many formulas. In future issues, we will delve into the specifics; our purpose here is to focus on the general concepts. 

Reliability

Let’s start off by assuming that after a student takes an assessment, he or she remembers nothing about that assessment whatsoever. Thus, in theory, we could administer an assessment to a student again and again without any sort of practice effect taking place. A test is considered reliable if it produces the same test score for each student over these multiple administrations. That is, a test is reliable if its results are “repeatable.” Reliability can be calculated as a coefficient, and this number would reflect the extent to which scores “wobble” around. 

Clearly, we cannot assume that taking an assessment isn’t going to have some sort of effect on the examinee. Thus, it is rarely feasible to administer a test to a student multiple times. As a result, there is no way to determine the exact reliability of an assessment, and instead we must estimate reliability. To avoid a bunch of formulas for now, just remember that reliability equals repeatability of results. 

Validity

Validity is the foundation of all assessment programs. Validity is the extent to which a test measures what it is supposed to measure. Think of it this way: a math test is valid if, and only if, the test questions assess a student’s math ability. Much of validity in an educational assessment program relates to content. This is why our test developers play such a critical role in the test construction process. Validity evidence can take on several forms, such as predictive and concurrent validity. While there are other forms of validity, let’s focus on these two: 

Predictive Validity

A test is considered valid if it can predict some future behavior that is related to the assessment. For example, if a student scores high on a math test and then performs well in an advanced math class, the math test can be considered to have good predictive validity. 

Concurrent Validity

If we developed a new math test and the test scores are highly correlated (see previous issue) with some other math assessment known to be valid, then our new math assessment is considered valid. 

Relationship

Reliability and validity are inextricably linked. Reliability addresses the consistency of test results; validity addresses what the test is measuring. Indeed, what good is a reliable assessment if it is measuring the wrong content? Thus, reliability is a necessary, but not suffcient, condition for validity. We will explore this delicate relationship further in future issues.

Copyright 2005 by Measured Progress. All rights reserved.