Classical test theory

Classical test theory

Classical test theory is a body of related psychometric theory that predict outcomes of psychological testing such as the difficulty of items or the ability of test-takers. Generally speaking, the aim of classical test theory is to understand and improve the reliability of psychological tests.

Classical test theory may be regarded as roughly synonymous with true score theory. The term "classical" refers not only to the chronology of these models but also contrasts with the more recent psychometric theories, generally referred to collectively as item response theory, which sometimes bear the appellation "modern" as in "modern latent trait theory".

Classical test theory as we know it today was codified by Novick (1966) and described in classic texts such as Lord & Novick 1968) and Allen & Yen (1979/2002). The description of classical test theory below follows these seminal publications.



Classical test theory assumes that each person has a true score,T, that would be obtained if there were no errors in measurement. A person's true score is defined as the expected number-correct score over an infinite number of independent administrations of the test. Unfortunately, test users never observe a person's true score, only an observed score, X. It is assumed that observed score = true score plus some error:

                X         =       T      +    E
          observed score     true score     error

Classical test theory is concerned with the relations between the three variables X, T, and E in the population. These relations are used to say something about the quality of test scores. In this regard, the most important concept is that of reliability. The reliability of the observed test scores X, which is denoted as {\rho^2_{XT}}, is defined as the ratio of true score variance {\sigma^2_T} to the observed score variance {\sigma^2_X}:

{\rho^2_{XT}} = \frac{{\sigma^2_T}}{{\sigma^2_X}}

Because the variance of the observed scores can be shown to equal the sum of the variance of true scores and the variance of error scores, this is equivalent to

{\rho^2_{XT}} = \frac{{\sigma^2_T}}{{\sigma^2_X}} = \frac{{\sigma^2_T}}{{\sigma^2_T}+{\sigma^2_E}}

This equation, which formulates a signal-to-noise ratio, has intuitive appeal: The reliability of test scores becomes higher as the proportion of error variance in the test scores becomes lower and vice versa. The reliability is equal to the proportion of the variance in the test scores that we could explain if we knew the true scores. The square root of the reliability is the correlation between true and observed scores.


Reliability cannot be estimated directly since that would require one to know the true scores, which according to classical test theory is impossible. However, estimates of reliability can be obtained by various means. One way of estimating reliability is by constructing a so-called parallel test. The fundamental property of a parallel test is that it yields the same true score and the same observed score variance as the original test for every individual. If we have parallel tests x and x', then this means that

ε(Xi) = ε(X'i)



Under these assumptions, it follows that the correlation between parallel test scores is equal to reliability (see Lord & Novick, 1968, Ch. 2, for a proof).

\frac{ {\sigma}_T^2 }{ {\sigma}_X^2 }=

Using parallel tests to estimate reliability is cumbersome because parallel tests are very hard to come by. In practice the method is rarely used. Instead, researchers use a measure of internal consistency known as Cronbach's α. Consider a test consisting of k items uj, j=1,\ldots,k. The total test score is defined as the sum of the individual item scores, so that for individual i


Then Cronbach's alpha equals

 \alpha =\frac{k}{k-1}\left(1-\frac{\sum_{j=1}^{k}{\sigma^{2}_{U_{j}}}}{\sigma^2_{X}}\right)

Cronbach's α can be shown to provide a lower bound for reliability under rather mild assumptions. Thus, the reliability of test scores in a population is always higher than the value of Cronbach's α in that population. Thus, this method is empirically feasible and, as a result, it is very popular among researchers. Calculation of Cronbach's α is included in many standard statistical packages such as SPSS and SAS.[1]

As has been noted above, the entire exercise of classical test theory is done to arrive at a suitable definition of reliability. Reliability is supposed to say something about the general quality of the test scores in question. The general idea is that, the higher reliability is, the better. Classical test theory does not say how high reliability is supposed to be. Too high a value for α, say over .9, indicates redundancy of items. Around .8 is recommended for personality research, while .9+ is desirable for individual high-stakes testing.[2] It must be noted that these 'criteria' are not based on formal arguments, but rather are the result of convention and professional practice. The extent to which they can be mapped to formal principles of statistical inference is unclear.


Classical test theory is an influential theory of test scores in the social sciences. In psychometrics, the theory has been superseded by the more sophisticated models in Item Response Theory (IRT) and Generalizability theory (G-theory). However, IRT is not included in standard statistical packages like SPSS and SAS, whereas these packages routinely provide estimates of Cronbach's α. Specialized software is necessary.


  1. ^ Pui-Wa Lei and Qiong Wu (2007). "CTTITEM: SAS macro and SPSS syntax for classical item analysis". Behavior Research Methods 39 (3): 527–530. doi:10.3758/BF03193021. PMID 17958163. 
  2. ^ Streiner, D. L. (2003). "Starting at the Beginning: An Introduction to Coefficient Alpha and Internal Consistency". Journal of Personality Assessment 80 (1): 99–103. doi:10.1207/S15327752JPA8001_18. PMID 12584072. 


  • Allen, M.J., & Yen, W. M. (2002). Introduction to Measurement Theory. Long Grove, IL: Waveland Press.
  • Novick, M.R. (1966) The axioms and principal results of classical test theory Journal of Mathematical Psychology Volume 3, Issue 1, February 1966, Pages 1-18
  • Lord, F. M. & Novick, M. R. (1968). Statistical theories of mental test scores. Reading MA: Addison-Welsley Publishing Company

Further reading

  • Gregory, Robert J. (2011). Psychological Testing: History, Principles, and Applications (Sixth ed.). Boston: Allyn & Bacon. ISBN 978-0-205-78214-7. Lay summary (7 November 2010). 
  • Hogan, Thomas P.; Brooke Cannon (2007). Psychological Testing: A Practical Introduction (Second ed.). Hoboken (NJ): John Wiley & Sons. ISBN 978-0-471-73807-7. Lay summary (21 November 2010). 

External links

See also

Wikimedia Foundation. 2010.

Нужно решить контрольную?

Look at other dictionaries:

  • classical test theory — klasikinė testų teorija statusas T sritis Kūno kultūra ir sportas apibrėžtis Matavimų teorija, kuri remiasi koncepcija, kad stebimi (gauti) rezultatai yra tikrojo ir klaidingojo rezultatų suma. atitikmenys: angl. classical test theory vok.… …   Sporto terminų žodynas

  • Classical field theory — A classical field theory is a physical theory that describes the study of how one or more physical fields interact with matter. The word classical is used in contrast to those field theories that incorporate quantum mechanics (quantum field… …   Wikipedia

  • Theory of conjoint measurement — The theory of conjoint measurement (also known as conjoint measurement or additive conjoint measurement) is a general, formal theory of continuous quantity. It was independently discovered by the French economist Gerard Debreu (1960) and by the… …   Wikipedia

  • Theory — The word theory has many distinct meanings in different fields of knowledge, depending on their methodologies and the context of discussion.In science a theory is a testable model of the manner of interaction of a set of natural phenomena,… …   Wikipedia

  • Item response theory — In psychometrics, item response theory (IRT) is a body of theory describing the application of mathematical models to data from questionnaires and tests as a basis for measuring abilities, attitudes, or other variables. It is used for statistical …   Wikipedia

  • Computerized classification test — A computerized classification test (CCT) refers to, as its name would suggest, a test that is administered by computer for the purpose of classifying examinees. The most common CCT is a mastery test where the test classifies examinees as Pass or… …   Wikipedia

  • Generalizability theory — (G Theory) is a statistical framework for conceptualizing, investigating, and designing reliable observations. It was originally introduced by Lee Cronbach and his colleagues. The G Theory compares with the Classical test theory (CTT) where the… …   Wikipedia

  • Sequential probability ratio test — The sequential probability ratio test (SPRT) is a specific sequential hypothesis test, developed by Abraham Wald. [cite journal first=Abraham last=Wald title=Sequential Tests of Statistical Hypotheses journal=Annals of Mathematical Statistics… …   Wikipedia

  • Classical theories of gravitation — The current Gold Standard Theory of Gravitation is the general theory of relativity. This is a classical, relativistic field theory (physics) of gravitation. At present, there is no quantum theory of gravitation. Contents 1 Discussion 2 See also …   Wikipedia

  • Classical conditioning — This dog was fitted with a cannula to measure the amount of salivation when presented with a certain stimulus, Pavlov Museum, 2005 Classical conditioning (also Pavlovian or respondent conditioning, Pavlovian reinforcement) is a form of… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”