Overdispersion

Overdispersion

In statistics, overdispersion is the presence of greater variability (statistical dispersion) in a data set than would be expected based on a given simple statistical model.

A common task in applied statistics is choosing a parametric model to fit a given set of empirical observations. This necessitates an assessment of the fit of the chosen model. It is usually possible to choose the model parameters in such a way that the theoretical population mean of the model is approximately equal to the sample mean. However, especially for simple models with few parameters, theoretical predictions may not match empirical observations for higher moments. When the observed variance is higher than the variance of a theoretical model, overdispersion has occurred. Conversely, underdispersion means that there was less variation in the data than predicted. Overdispersion is a very common feature in applied data analysis because in practice, populations are frequently heterogeneous contrary to the assumptions implicit within widely used simple parametric models.

Contents

Examples

Poisson

Overdispersion is often encountered when fitting very simple parametric models, such as those based on the Poisson distribution. The Poisson distribution has one free parameter and does not allow for the variance to be adjusted independently of the mean. The choice of a distribution from the Poisson family is often dictated by the nature of the empirical data. For example, Poisson regression analysis is commonly used to model count data. If overdispersion is a feature, an alternative model with additional free parameters may provide a better fit. In the case of the count data, a Poisson mixture model like the negative binomial distribution can be used instead where the mean of the Poisson distribution can itself be thought of as a random variable drawn - in this case - from the gamma distribution thereby introducing an additional free parameter (note the resulting negative binomial distribution has 2 parameters).

Binomial

As a more concrete example, it has been observed that the random number of boys born to each family do not — as might be expected — conform faithfully to a binomial distribution. Instead, each family seems to skew the sex ratio of their children in favor of either boys or girls (see, for example the Trivers–Willard hypothesis for one possible explanation) i.e. there are too many all boy families, too many all girls families, and not enough families close to the population 51:49 boy-to-girl mean ratio thereby yielding an estimated variance that is larger than predicted by the binomial model.

In this case, the beta-binomial model is a popular and analytically tractable alternative to the binomial that captures the overdispersion absent from the binomial model thereby providing a better fit to the observed data. To capture the heterogeneity of the families, one can think of the p parameter (proportion of boys) in the binomial model as itself a random variable (i.e. random effects model) drawn for each family from a beta distribution as the mixing distribution. The resulting compound distribution (Beta-Binomial) has an additional free parameter.

Another common model for overdispersion - when some of the observations are not Bernoulli - arises from introducing a normal random variable into a logistic model. Software is widely available for fitting this type of multilevel model. In this case, if the variance of the normal variable is zero, the model reduces to the classical (undispersed) logistic regression. Note that this model has an additional free parameter - namely the variance of the normal variable.

It should be noted with respect to Binomial random variables that the concept of overdispersion only makes sense if n>1 (i.e. overdispersion is nonsensical for Bernoulli random variables).

Differences in terminology between disciplines

Over- and underdispersion are terms which have been adopted in branches of the biological sciences. In parasitology, the term 'overdispersion' is generally used as defined here — meaning a distribution with a higher than expected variance.

In some areas of ecology, however, meanings have been transposed, so that overdispersion is actually taken to mean more even (lower variance) than expected. This confusion has caused some ecologists to suggest that the terms 'aggregated', or 'contagious', would be better used in ecology for 'overdispersed'.[1] Such preferences are creeping into parasitology too.[2] Generally this suggestion has not been heeded, and confusion persists in the literature.

Furthermore in demography overdispersion is often evident in the analysis of death count data, but demographers prefer the term 'unobserved heterogeneity'.

See also

References

  1. ^ Greig-Smith, P. 1983 Quantitative Plant Ecology University of California Press
  2. ^ Poulin, R. (2006) Evolutionary Ecology of Parasites Princeton University Press

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • overdispersion — An ecological term referring to nonrandom dispersion of individuals in a habitat; as, when a minority of individual hosts bear the majority of parasites …   Dictionary of invertebrate zoology

  • overdispersion — noun The presence of greater dispersion in a data set than would be expected according to the statistical model in use …   Wiktionary

  • Quasi-likelihood — In statistics, quasi likelihood estimation is one way of allowing for overdispersion, that is, greater variability in the data than would be expected from the statistical model used. It is most often used with models for count data or grouped… …   Wikipedia

  • Generalized linear model — In statistics, the generalized linear model (GLM) is a flexible generalization of ordinary least squares regression. It relates the random distribution of the measured variable of the experiment (the distribution function ) to the systematic (non …   Wikipedia

  • Poisson regression — In statistics, Poisson regression is a form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can …   Wikipedia

  • Negative binomial distribution — Probability mass function The orange line represents the mean, which is equal to 10 in each of these plots; the green line shows the standard deviation. notation: parameters: r > 0 number of failures until the experiment is stopped (integer,… …   Wikipedia

  • Neutral theory of molecular evolution — The neutral theory of molecular evolution states that the vast majority of evolutionary changes at the molecular level are caused by random drift of selectively neutral mutants (not affecting fitness).[1] The theory was introduced by Motoo Kimura …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • List of mathematics articles (O) — NOTOC O O minimal theory O Nan group O(n) Obelus Oberwolfach Prize Object of the mind Object theory Oblate spheroid Oblate spheroidal coordinates Oblique projection Oblique reflection Observability Observability Gramian Observable subgroup… …   Wikipedia

  • K. Ullas Karanth — (Kannada:ಕೋಟಾ ಉಲ್ಲಾಸ ಕಾರಂತ ), Ph. D., is a conservation zoologist and a leading tiger expert based in Karnataka, India. He is the director of the Wildlife Conservation Society India Program. Dr. Karanth directs the WCS effort to help save the… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”