# Bias of an estimator

Bias of an estimator

In statistics, the difference between an estimator's expected value and the true value of the parameter being estimated is called the bias. An estimator or decision rule having nonzero bias is said to be biased.

Although the term "bias" sounds pejorative, it is not necessarily used in that way in statistics. Biased estimators may have desirable properties. Not only do they sometimes have a smaller mean squared error than any unbiased estimator, but in some cases the only unbiased estimators are not even within the convex hull of the parameter space, so their use is absurd.

Definition

Suppose we are trying to estimate the parameter $heta$ using an estimator $widehat\left\{ heta\right\}$ (that is, some function of the observed data). Then the bias of $widehat\left\{ heta\right\}$ is defined to be

:$operatorname\left\{E\right\}\left(widehat\left\{ heta\right\}\right)- heta.,$

In words, this would be "the expected value of the estimator $widehat\left\{ heta\right\}$ minus the true value $heta$." This may be rewritten as

:$operatorname\left\{E\right\}\left(widehat\left\{ heta\right\}- heta\right).,$

which would read "the expected value of the difference between the estimator and the true value" (the expected value of $heta$ is precisely $heta$ ).

Examples

Estimating variance

Suppose "X"1, ..., "X""n" are independent and identically distributed normal random variables with expectation μ and variance σ2. Let

:$overline\left\{X\right\}=\left(X_1+cdots+X_n\right)/n$

be the "sample average", and let

:$S^2=frac\left\{1\right\}\left\{n\right\}sum_\left\{i=1\right\}^n\left(X_i-overline\left\{X\right\},\right)^2$

be a "sample variance". We also know that the variance σ2 is defined by::$\left\{\right\}sigma^2 = frac 1N sum_\left\{i=1\right\}^N left\left(x_i - overline\left\{x\right\} ight\right)^ 2 ,$where N is the population size and "x"i represents the member of the whole population.

Then "S"2 is a "biased estimator" of σ2 because

:$operatorname\left\{E\right\}\left(S^2\right)=frac\left\{n-1\right\}\left\{n\right\}sigma^2 eqsigma^2.$

In other words, the "sample variance" does not equal the "population variance", unless multiplied by the normalization factor.

Common sense would suggest to apply the population formula to the sample as well. The reason that it is biased is that the sample mean is generally somewhat closer to the observations in the sample than the population mean is, to these observations. This is so because the sample mean is, by definition, in the middle of the sample, while the population mean may even lie outside the sample. So the deviations to the sample mean will often be smaller than the deviations to the population mean, and so, if the same formula is applied to both, then this variance estimate will on average be somewhat smaller in the sample than in the population.

Note that when a transformation is applied to an unbiased estimator, the result is not necessarily itself an unbiased estimate of its corresponding population statistic. That is, for a non-linear function "f" and an unbiased estimator "U" of a parameter "p", "f"("U") is usually not an unbiased estimator of "f"("p"). For example the square root of the unbiased estimator of the population variance is not an unbiased estimator of the population standard deviation.

Estimating a Poisson probability

A far more extreme case of a biased estimator being better than any unbiased estimator is well-known: Suppose "X" has a Poisson distribution with expectation λ. It is desired to estimate

:$operatorname\left\{P\right\}\left(X=0\right)^2=e^\left\{-2lambda\right\}.quad$

(For example, when incoming calls at a telephone switchboard are modeled as a Poisson process, and λ is the average number of calls per minute, then "e"−2λ is the probability that no calls arrive in the next two minutes.)

Since the expectation of an unbiased estimator $delta\left(X\right)$ is equal to the estimand, i.e.:$E\left(delta\left(X\right)\right)=sum_\left\{x=0\right\}^infty delta\left(x\right) frac\left\{lambda^x e^\left\{-lambda\left\{x!\right\}=e^\left\{-2lambda\right\}$,

the only function of the data constituting an unbiased estimator is

:$delta\left(x\right)=\left(-1\right)^Xquad$.

If the observed value of "X" is 100, then the estimate is 1, although the true value of the quantity being estimated is obviously very likely to be near 0, which is the opposite extreme. And if "X" is observed to be 101, then the estimate is even more absurd: it is −1, although the quantity being estimated obviously must be positive.

The (biased) maximum likelihood estimator

:$e^\left\{-2X\right\}quad$

is far better than this unbiased estimator. Not only is its value always positive, but it is also more accurate in the sense that its mean squared error (MSE)

:$e^\left\{-4lambda\right\}-2e^\left\{lambda\left(1/e^2-3\right)\right\}+e^\left\{lambda\left(1/e^4-1\right)\right\}$

is smaller; compare the unbiased estimator's MSE of

:$1-e^\left\{-4lambda\right\}$.

The MSEs are a functions of the true value λ. The bias of the maximum-likelihood estimator is:

:$e^\left\{-2lambda\right\}-e^\left\{lambda\left(1/e^2-1\right)\right\}$.

Maximum of a discrete uniform distribution

The bias of maximum-likelihood estimators can be substantial. Consider a case where "n" tickets numbered from 1 through to "n" are placed in a box and one is selected at random, giving a value "X". If "n" is unknown, then the maximum-likelihood estimator of "n" is "X", even though the expectation of "X" is only ("n" + 1)/2; we can only be certain that "n" is at least "X" and is probably more. In this case, the natural unbiased estimator is 2"X" − 1.

* Omitted-variable bias
* Consistency (statistics)

* [http://www.math.umn.edu/~hardy/An_Illuminating_Counterexample.pdf An Illuminating Counterexample]

Wikimedia Foundation. 2010.

### Look at other dictionaries:

• Bias (disambiguation) — Bias is an inclination towards something, or a predisposition, partiality, prejudice, preference, or predilection. Bias may also refer to:In science and statistics: * Bias (statistics), the systematic distortion of a statistic ** A biased sample… …   Wikipedia

• Estimator — In statistics, an estimator is a function of the observable sample data that is used to estimate an unknown population parameter (which is called the estimand ); an estimate is the result from the actual application of the function to a… …   Wikipedia

• Bias (statistics) — In statistics, the term bias is used for describing several different concepts: * A biased sample is one in which some members of the population are more likely to be included than others. **Spectrum bias refers to evaluating the ability of a… …   Wikipedia

• Bias — This article is about different ways the term bias is used . For other uses, see Bias (disambiguation). Bias is an inclination to present or hold a partial perspective at the expense of (possibly equally valid) alternatives. Bias can come in many …   Wikipedia

• Bias — In a clinical trial, bias refers to effects that a conclusion that may be incorrect as, for example, when a researcher or patient knows what treatment is being given. To avoid bias, a blinded study may be done. * * * 1. Systematic discrepancy… …   Medical dictionary

• bias of estimator — įverčio poslinkis statusas T sritis Standartizacija ir metrologija apibrėžtis Pastovus įverčio vidurkio ir vertinamojo parametro vertės skirtumas. atitikmenys: angl. bias; bias of estimator rus. смещение оценки, n pranc. biais d’un estimateur, m… …   Penkiakalbis aiškinamasis metrologijos terminų žodynas

• bias — įverčio poslinkis statusas T sritis Standartizacija ir metrologija apibrėžtis Pastovus įverčio vidurkio ir vertinamojo parametro vertės skirtumas. atitikmenys: angl. bias; bias of estimator rus. смещение оценки, n pranc. biais d’un estimateur, m… …   Penkiakalbis aiškinamasis metrologijos terminų žodynas

• Omitted-variable bias — In statistics, omitted variable bias (OVB) occurs when a model is created which incorrectly leaves out one or more important causal factors. The bias is created when the model compensates for the missing factor by over or under estimating one of… …   Wikipedia

• Experimenter's bias — In experimental science, experimenter s bias is subjective bias towards a result expected by the human experimenter. David Sackett,[1] in a useful review of biases in clinical studies, states that biases can occur in any one of seven stages of… …   Wikipedia

• Confirmation bias — (also called confirmatory bias or myside bias) is a tendency for people to favor information that confirms their preconceptions or hypotheses regardless of whether the information is true.[Note 1][1] As a result, people gather evidence and recall …   Wikipedia