Open Response

We will answer as many questions from our readers as time and space permit. If you have an assessment-related question, please send us an E-mail.

What is the status of computerized scoring of student responses to constructed-response questions?

First, let’s make sure we are talking about the same thing. By computerized scoring, we mean an automated system of assigning scores to students’ written responses whereby the computer, not a human being, determines those scores. This is not to be confused with image scoring, where digitized images of student written responses are displayed to human scorers on their computer screens, and the scorers then enter scores for the responses as they evaluate them. 

While image scoring is widely used in the testing industry, computerized scoring of constructed responses is not as widely used at this time. One significant application is for the Graduate Management Admissions Test (GMAT), for which it is used for practice testing and “second reads.”  The two most widely known computerized scoring systems are e-Rater by Educational Testing Service and Intellimetric by Vantage Technologies. 

The primary concern many educators have about computers assigning scores is that they believe the uniqueness and variability of responses requires a degree of flexibility and judgment that only humans can bring to the task of scoring. It is interesting that the critics of constructed-response testing involving human scoring believe that the scoring is too subjective and unreliable. Actually, in any other field of endeavor, the process used by humans in assigning scores would be considered extremely objective. Scorers have to have appropriate subject matter expertise, they go through scoring exercises to qualify for a scoring project, they are trained extensively on the use of scoring rubrics usually developed uniquely for each question, they do not know the individuals whose work they are scoring, and their scoring is monitored throughout a scoring project for accuracy.  Reliability is rarely a problem and is usually documented at the completion of a project. 

Advocates of computerized scoring argue that if human scoring can be made objective and reliable, then it is a process that lends itself well to computerization. Nevertheless, the doubts of many educators have, in recent years, been an obstacle to the application of computerized scoring in many instances. Where it is used, it is often for the scoring of writing samples, as opposed to responses to constructed-response questions in content areas such as science or social studies. The “training” of the computer involves the entry of large numbers of sample responses to a writing prompt or question. People tend to have more faith in the capacity of the computer to be trained to score writing samples, because knowledge of specific content is generally not a factor, whereas various mechanical aspects of writing are. Interestingly, the research evidence suggests that the computer can generally score writing samples and responses to questions in various subject areas as accurately as humans. For this reason, some people are willing to use computers for “second reads” in circumstances where double scoring is required. Then, when two scores for the same response disagree by more than the accepted amount, the usual practice of arbitration by a third reader, a human, is followed. 

Because of the demand for quick turnaround of test results to satisfy requirements of the No Child Left Behind Act of 2001 (NCLB), we can expect a significant increase in the use of computerized scoring. In the 1980s, when statewide accountability testing was just coming into its own, educators argued vehemently against the states coming up with detailed content objectives to be addressed by the state tests because of issues of local control. As the stakes associated with the results increased, however, they realized they wanted to know as much as possible about the material covered by the tests. They now appreciate good state content standards and grade level expectations. The same will probably happen with computerized scoring. The demand for earlier results will win out. Even officials in the U.S. Department of Education have told state officials that they are going to have to employ this technology if they are going to use constructed-response questions and meet NCLB-related timelines.

Copyright 2004 by Measured Progress. All rights reserved.