- Poisson regression
In
statistics , Poisson regression is a form ofregression analysis used to modelcount data andcontingency table s. Poisson regression assumes the response variable "Y" has aPoisson distribution , and assumes thelogarithm of itsexpected value can be modelled by a linear combination of unknown parameters. A Poisson regression model is sometimes known as a log-linear model, especially when used to model contingency tables.In the simplest case with a single
independent variable "x", the model takes the form::
If "Y""i" are independent observations with corresponding values "x""i" of the predictor variable, then "a" and "b" can be estimated by
maximum likelihood if the number of distinct "x" values is at least 2. The maximum-likelihood estimates lack aclosed-form expression and must be found by numerical methods.Poisson regression models are
generalized linear model s with the logarithm as the (canonical)link function , and the Poisson distribution function.Poisson regression in practice
Poisson regression is appropriate when the dependent variable is a count, for instance of events such as the arrival of a telephone call at a call centre. The events must be independent in the sense that the arrival of one call will not make another more or less likely, but the probability per unit time of events is understood to be related to covariates such as time of day.
"Exposure" and offset
Poisson regression is also appropriate for rate data, where the rate is a count of events occurring to a particular unit of observation, divided by some measure of that unit's "exposure". For example, biologists may count the number of tree species in a forest, and the rate would be the number of species per square kilometre. Demographers may model death rates in geographic areas as the count of deaths divided by person−years. More generally, event rates can be calculated as events per unit time, which allows the observation window to vary for each unit. In these examples, exposure is respectively unit area, person−years and unit time. In Poisson regression this is handled as an offset, where the exposure variable enters on the right-hand side of the equation, but with a parameter estimate constrained to 1.
:which implies :
Overdispersion
A characteristic of the
Poisson distribution is that its mean is equal to its variance. In certain circumstances, it will be found that the observedvariance is greater than the mean; this is known asoverdispersion and indicates that the model is not appropriate. A common reason is the omission of relevant explanatory variables.Another common problem with Poisson regression is excess zeros: if there are two processes at work, one determining whether there are zero events or any events, and a Poisson process determining how many events there are, there will be more zeros than a Poisson regression would predict. An example would be the distribution of cigarettes smoked in an hour by members of a group where some individuals are non-smokers.
Other
generalized linear model s such as the negative binomial model may function better in these cases.Use in survival analysis
Algorithms and software for Poisson regression are sometimes used as a computational shortcut in
survival analysis : seeproportional hazards models .Implementations
Some statistics packages, such as
gretl orEViews , include implementations of Poisson regression.References
* Cameron, A.C. and P.K. Trivedi (1998). "Regression analysis of count data," Cambridge University Press. ISBN 0-521-63201-3
* Hilbe, J.M. (2007). "Negative Binomial Regression", Cambridge University Press. ISBN 978-0-521-85772-7
Wikimedia Foundation. 2010.