Generalizability theory

Generalizability theory

Generalizability theory (G Theory) is a statistical framework for conceptualizing, investigating, and designing reliable observations. It was originally introduced by Lee Cronbach and his colleagues.

The G Theory compares with the Classical test theory (CTT) where the focus is on determining the error of the measurement. Perhaps the most famous model of CTT is the equation X = T + e, where X is the observed score, T is the true score, and e is the error involved in our measurement. Although e could represent many "different types of error (i.e., rater error, instrument error), CTT only allows us to estimate one type of error at a time.

Although CTT is suitable in the context of highly controlled laboratory conditions, variance is a part of everyday life. In field research, for example, it is unrealistic to expect that the conditions of measurement will remain constant. There exists an alternative to CTT, introduced by Cronbach and colleagues (1963, 1972), which both acknowledges and allows for variability in assessment conditions. Generalizability, or G, theory extends beyond CTT by recognizing that many different sources of error may affect our measurement (and that it may benefit us to examine them at the same time). The advantage of G theory, therefore, lies in the fact that researchers can estimate what proportion of the total variance in the results is due to the individual factors that often vary in assessment, such as setting, time, items, and raters.

In G theory, sources of variation are referred to as "facets". Facets are similar to the “factors” used in analysis of variance, and may include persons, raters, items/forms, time, and settings among other possibilities. The usefulness of data gained from a G study is crucially dependent on the design of the study. Therefore, the researcher must carefully consider the ways in which he/she hopes to generalize any specific results. Is it important to generalize from one setting to a larger number of settings? From one rater to a larger number of raters? From one set of items to a larger set of items? The answers to these questions will vary from one researcher to the next, and will drive the design of a G study in different ways.

In addition to deciding which facets the researcher generally wishes to examine, it is necessary to determine which facet will serve as the object of measurement (e.g. the systematic source of variance) for the purpose of analysis. The remaining facets of interest are then considered to be sources of measurement error. In most cases, the object of measurement will be the person to whom a number/score is assigned. Ideally, nearly all of the measured variance will be attributed to the object of measurement (e.g. individual differences), with only a negligible amount of variance attributed to the remaining facets (e.g., rater, time, setting).

The results from a G study can also be used to inform a decision, or D, study. In a D study, we can ask the hypothetical question of “what would happen if different aspects of this study were altered?” For example, a soft drink company might be interested in assessing the quality of a new product through use of a consumer rating scale. By employing a D study, it would be possible to estimate how the consistency of quality ratings would change if consumers were asked 10 questions instead of 2, or if 1,000 consumers rated the soft drink instead of 100. By employing simulated D studies, it is therefore possible to examine how the generalizability coefficients (similar to reliability coefficients in CTT) would change under different circumstances, and consequently determine the ideal conditions under which our measurements would be the most reliable.

Another important difference between CTT and G theory is that the latter approach takes into account how the consistency of outcomes may change if a measure is used to make absolute versus relative decisions. An example of an absolute, or criterion-referenced, decision would be when an individual’s test score is compared to a cut-off score to determine eligibility or diagnosis (i.e. a child’s score on an achievement test is used to determine eligibility for a gifted program). In contrast, an example of a relative, or norm-referenced, decision would be when the individual’s test score is used to either (a) determine relative standing as compared to his/her peers (i.e. a child’s score on a Reading subtest is used to determine which reading group he/she is placed in), or (b) make inter-individual comparisons (i.e. comparing previous versus current performance within the same individual). The type of decision that the researcher is interested in will determine which formula should be used to calculate the generalizability coefficient (similar to a reliability coefficient in CTT).

Readers interested in learning more about G theory are encouraged to seek out publications by Brennan (2001), Chiu (2001), and/or Shavelson and Webb (1991).

"References"

Brennan, R. L. (2001). "Generalizability Theory". New York: Springer-Verlag.

Chiu, C.W.C. (2001). "Large Scale Performance Assessments Based on Human. Judgments: Generalizability Theory". New York: Kluwer.

Crocker, L., & Algina, J. (1986). "Introduction to Classical and Modern Test Theory". New York: Harcourt Brace.

Cronbach, L.J., Gleser, G.C., Nanda, H., & Rajaratnam, N. (1972). "The dependability of behavioral measurements: Theory of generalizability for scores and profiles". New York: John Wiley.

Cronbach, L.J., Nageswari, R., & Gleser, G.C. (1963). Theory of generalizability: A liberation of reliability theory. "The British Journal of Statistical Psychology, 16", 137-163.

Shavelson, R. J., & Webb, N. M. (1991). "Generalizability Theory: A Primer". Newbury Park, CA: SAGE.

External links

* [http://www.psychology.sdsu.edu/faculty/matt/Pubs/GThtml/GTheory_GEMatt.html Georg E. Matt, Generalizability Theory]
* [http://www.rasch.org/rmt/rmt71h.htm Rasch-based Generalizability Theory]


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • Classical test theory — is a body of related psychometric theory that predict outcomes of psychological testing such as the difficulty of items or the ability of test takers. Generally speaking, the aim of classical test theory is to understand and improve the… …   Wikipedia

  • G theory — may refer to: * General intelligence factor in psychology * Generalizability theory in the measurement theory …   Wikipedia

  • Adaptive resonance theory — (ART) is a neural network architecture developed by Stephen Grossberg and Gail Carpenter. Learning model The basic ART system is an unsupervised learning model. It typically consists of a comparison field and a recognition field composed of… …   Wikipedia

  • Cronbach's alpha — Cronbach s α (alpha)[1] is a coefficient of reliability. It is commonly used as a measure of the internal consistency or reliability of a psychometric test score for a sample of examinees. It was first named alpha by Lee Cronbach in 1951, as he… …   Wikipedia

  • Industrial and organizational psychology — Psychology …   Wikipedia

  • List of mathematics articles (G) — NOTOC G G₂ G delta space G networks Gδ set G structure G test G127 G2 manifold G2 structure Gabor atom Gabor filter Gabor transform Gabor Wigner transform Gabow s algorithm Gabriel graph Gabriel s Horn Gain graph Gain group Galerkin method… …   Wikipedia

  • Curriculum-Based Measurement — Curriculum based measurement, or CBM, is also referred to as a general outcomes measures (GOMs) of a student’s performance in either basic skills or content knowledge. CBM began in the mid 1970s with research headed by Stan Deno at the University …   Wikipedia

  • Curriculum-based measurement — Curriculum based measurement, or CBM, is also referred to as a general outcomes measures (GOMs) of a student s performance in either basic skills or content knowledge. CBM began in the mid 1970s with research headed by Stan Deno at the University …   Wikipedia

  • Lee Cronbach — Lee J. Cronbach (1916 2001) was an American educational psychologist who made significant contributions to psychological testing and measurement. Born in Fresno, California, Cronbach received a bachelor s degree from Fresno State College and a… …   Wikipedia

  • Inter-rater reliability — In statistics, inter rater reliability, inter rater agreement, or concordance is the degree of agreement among raters. It gives a score of how much homogeneity, or consensus, there is in the ratings given by judges. It is useful in refining the… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”