What is Generalizability Theory?


What is Generalizability (G) Theory? 

Generalizability (G) theory was largely conceptualized by Cronbach, Gleser, Nanda, & Rajaratnam (1972). G theory is mainly a methodology used to characterize and quantify specific sources of error that contaminate the observed measurement of interest such that future measurements can be made more error-free.

For example, the intent of an assessment may be to measure a student's ability to respond appropriately to oral instructions delivered in English; however, in obtaining student scores, raters may be necessary and the judgments of those raters may contain some biases. Additionally, tasks selected for the assessment are usually a sample of all possible tasks that could be used on the assessment and thus students may perform differentially on any given task due to various reasons unrelated to the trait being measured. The goal of G theory is to quantify the amount of error associated with those raters and items such that future test development and implementation may be improved.

All measurements contain some amount of error. This measurement error may be attributed to several sources. For the example presented, sources of measurement variability could include; the person(s) rating the student's performance, the test items, or how the test items and the person rating the student's performance interact with one another. These sources of variability are articulated in the research design. Then, by way of analysis of variance (ANOVA), variance components for these sources of error are computed. Thus, G theory is a conceptual framework for identifying and quantifying the error associated with measurements.

How can G Theory be applied in the field of educational measurement?

Often times, in educational measurement, the goal is to quantify a student's ability. Ability is not a directly observable concept. Thus, it is considered a latent trait. In continuing with the example presented above, consider administering a speaking portion of an English language development assessment. In such an assessment, tasks are administered to students that are then scored by raters. When tasks or items require human scorers, sources of error are introduced and include error associated with the tasks, the scorers, and the interaction of the two. If a response is scored by multiple raters, scores will be aggregated in some fashion to produce a final score estimating the student's English speaking abilities as demonstrated by the assessment.

Measurement occasions possess some amount of uncertainty. Therefore, a student's true score is never really known. However, G theory can be used to refine scores obtained from the assessment by locating the sources of uncertainty that contribute the most variance to the estimate of ability; and then adjusting the assessment and scoring process such that some error is eliminated or reduced.

At this point, some terminology and concepts associated with G theory should be explained and developed further. In doing so, the example of administering an English language development assessment will be used to help solidify each of the definitions and concepts found in the Appendix.

How is G Theory different from and similar to CTT and ANOVA?

CTT and ANOVA can be thought of as parents of G theory (Brennan, 2000). CTT sets up the basic conceptual model of X = T + E, where a person's observed score, X is made up of that person's true score, T and error, E. G theory extends CTT by furtherer explaining the observed score by identifying and modeling different sources of error that contribute to E, the error term. ANOVA provides the computational tools to quantify the variance associated with the multiple sources of error. However, it is important to note that G theory is not simply an application of ANOVA (Brennan, 2000). Unlike ANOVA, G theory does not revolve around hypothesis testing. Instead, G theory emphasizes estimating variance components to reduce error in future measures. For interested readers, Brennan (2000) provides further distinctions between ANOVA and G theory.

Available Software Packages

The University of Iowa's Center for Advanced Studies in Measurement and Assessment (CASMA) has developed a suite of software packages that can be freely downloaded for conducting generalizability studies using generalizability theory's conceptual framework. The following packages are available:

GENOVA: Developed for use with complete and balanced univariate designs
mGENOVA: Developed for use with multivariate designs
urGENOVA: Developed for use with unbalanced random effects designs


Brennan, R. L., (2000). (Mis)conceptions about generalizability theory. Educational Measurement: Issues and Practice, 19(1), 5-10. 

Brennan, R. L., (2001). Generalizability theory. New York: Springer-Verlag. 

Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: Wiley.