Open Response

Equating Test Scores 

Test score equating is a procedure that helps ensure that scores reported to students, parents, and schools are interpreted accurately. Before we can talk about equating, however, we need to provide some background about test scores in general. 

Test Scoring 

The initial step in scoring a test is simply determining the number of points that a student has received. A student’s total number of points, referred to as the raw score, is generally the sum of the number of multiple-choice items answered correctly (each of which is worth one point) plus the total number of points received on any open-response items (i.e., items that require a written response) that appear on the test. 

Raw scores are seldom used for reporting students’ scores because they can be difficult to interpret without additional information. For example, a raw score of 40 is hard to interpret without knowing the total number of points on the test, as well as how difficult the items are. This problem becomes even more complicated when you want to compare scores on two different tests that measure the same content, or two different versions of the same test.  

Why is equating necessary? 

When different versions of the same test are administered, scores need to be interchangeable. For example, if a test is given yearly, some or all of the items may be replaced from one year to the next. However, it is important to be able to make accurate comparisons between scores on the two versions of the test, even though the new items on a test may be more or less difficult as a group than the items they replaced. Equating is the process by which we ensure that the scores awarded based on two different versions of a test are comparable. 

Say we administered two versions of a test—Version A in 2004 and Version B in 2005. If students who took Version B received higher raw scores on average than students who took Version A, that difference in average raw scores could be due to:

  • a difference in the difficulty of the two tests (i.e., Version B was easier),
  • the fact that the group of students who took Version B were more able (as a result, perhaps, of instructional improvements), or
  • a combination of these two factors. 

If Version B actually is easier than Version A, and we don’t equate the two test forms, students who take Version B will be at an unfair advantage, and the change in average test scores from 2004 to 2005 may be misinterpreted. 

How is equating accomplished? 

There are a number of methods that could be used to equate the two versions, but one of the most commonly used methods involves including a common set of items, called equating items, on both versions of the test.  

Let’s return to our example, in which students who took Version B performed better than students who took Version A. If the two groups of students performed equally well on the set of equating items, the groups were of equal ability; in other words, there was no improvement in the students’ understanding of the test content from 2004 to 2005. We could then infer that scores on Version B were higher because the test items were easier, and we would make an adjustment to the reported scores to ensure that students who took Version B didn't benefit from the unfair advantage. If, on the other hand, the students who took Version B performed better on the equating items, then it is safe to assume that those students truly are higher performing and that their reported scores should reflect that higher level of achievement. 

Although the process used in equating involves complex statistical methods, the end result is fairly simple; that is, scores from one test are adjusted such that they are interchangeable with scores on another test measuring the same content.

Liz Burton

Copyright 2005 by Measured Progress. All rights reserved.