What is a p-value?
First, it is important to keep in mind that a p-value is different for multiple-choice (M-C) items than for constructed-response (C-R) items. For M-C items, a p-value is simply the proportion of students answering an item correctly. If you test 100 students with an item and 54 of them respond correctly, then the p-value is 0.54.
For C-R items, a p-value represents the proportion of points obtained, averaged across students. For example, let’s assume we have three students taking a 4-point item, with Student A scoring a 2, Student B scoring a 3, and Student C scoring a 3. Together, they have a total of three scores: 2, 3, and 3. In a 4-point item, a score of 2 is half the points and a score of 3 is three quarters of the points. Proportionally, half is equal to 0.50 and three quarters is equal to 0.75. Because we are adjusting the student scores to proportional values, we commonly refer to C-R p-values as “adjusted p-values.” Thus, the adjusted p-value for this item is equal to:

We divide by three because there are three students.
What does p-value really mean?
We commonly refer to a p-value as item difficulty. After all, it reflects something about how students responded to the item and how many of them earned positive scores. Thus, we can actually think of p-value as item easiness because, in the case of M-C items, the “bigger” the p-value, the more kids responded correctly.
Are there limitations?
There are limitations to using any statistic. For example, one major drawback in using a p-value is that it is sample-dependent; administering an item to different groups of students will result in different p-values.
Copyright 2004 by Measured Progress. All rights reserved.