QUESTION (summarized): A state has a multi-grade, multi-subject, criterion-referenced testing program. The state has decided to require third-grade students to pass the third-grade reading test before being promoted to the fourth grade. The state says the test has content validity, but has not researched the predictive validity, so it cannot be definitively stated that a student who fails the test is unprepared to go on to the fourth grade. In some cases in the past, there were students who failed the third-grade test, but who were promoted and did fine in fourth grade. What can I, as a parent and fellow educator, do to enlighten the lawmakers about this inappropriate use of test scores? In most cases, the legislators are not educators and do not understand the validity that should be required of the test before such high stakes are attached to it. How will the test hold up under scrutiny if they have not collected the necessary data to determine that the test is a fair and equitable assessment?
ANSWER: Unfortunately, there is no simple, short answer to this question. We would say that predictive validity may not be the best basis for an argument against the described high-stakes use of the test in question. Standards 13.6 and 13.7 in the “Standards for Educational and Psychological Testing,” produced by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education, may provide stronger arguments. Those arguments are presented in “Measurement Error, Human Error, and Decisions Based on a Test,” a Measured Progress issue paper.
The state test in question is aligned with the state’s content standards and specific grade-level expectations. That is, the test questions have been developed and selected specifically to cover the concepts and skills identified in the state content standards, which were prepared by grade-appropriate teachers and curriculum specialists. Content validity is an inherent strength of standards-based tests, such as the state tests required by the No Child Left Behind Act of 2001 (NCLB).
While we agree that the state should document the predictive validity of the grade-three test in question, it is important to recognize that it is fairly easy to demonstrate such validity. Predictive validity is often established by computing a correlation coefficient reflecting the statistical relationship between a predictor variable and a criterion variable. Quite honestly, a reliable test of reading will correlate reasonably well with other reliable academic measures, including tests in other subjects, school grades, and especially other reading measures.
Even with acceptable, predictive validity in terms of a correlation coefficient or expectancy table, predictions are not always borne out. As the question notes, some students who failed a third-grade reading test still “did fine” in the fourth grade. While we question what “did fine” means, we have no doubt that some students who failed the third-grade test passed both the fourth grade and fourth-grade tests. This would be the case no matter how strong the predictive validity coefficient was. Some of these students may have improved their academic skills significantly, and others may have just squeaked by.
Take a look at the scatterplot below.

This scatterplot shows a reasonable correlation between a grade-three reading test and a grade-four teacher’s grades. The vertical dashed line represents the passing score on the grade three test, while the horizontal dashed line represents the passing grade from the fourth-grade teacher. Students represented by dots to the left of the vertical dashed line and above the horizontal dashed line failed the predictor measure but “did fine” in the fourth grade. While this diagram is not based on real data, the situation is what would be depicted if real data were used. A state can establish acceptable predictive validity and apply the consequences associated with a high-stakes decision based on test scores, all with the understanding that some students who failed the predictor test could have passed the criterion measure. That’s the reality when it comes to statistical correlations, test scores and passing scores, predictive validity, etc.
Assuming the state can readily establish predictive validity, the problem remains that test scores are estimates of some underlying proficiency and, as such, have a degree of error in them. It is for that reason that the “Standards for Educational and Psychological Testing” makes several suggestions in conjunction with the use of test scores for making important decisions regarding individual students. The first is that students should be given multiple testing opportunities. That is, retesting should be accommodated, preferably with time for remediation between test administrations. (A proficient student is unlikely to fail a test repeatedly just due to measurement error.) Second, alternative forms (approaches) of tests should be available to some students who may be better able to demonstrate their skills through varying testing modes. Third, additional information should be allowed to be used in the high-stakes decision. For example, it could be that a student who failed the state test but throughout the school year consistently outperformed a number of students who passed the state test, should be allowed to move on to the next grade based on this additional information presented through an appeals process.
As for what parents can do, they can contact the student assessment staff at the state department of education and ask if there are
- opportunities for retesting before the consequences of failing the test are applied,
- alternative modes of testing that would be appropriate for your child or others, and
- appeals processes that allow other information to be used in the decision about promotion to the fourth grade.