Unbiased estimation of standard deviation

Unbiased estimation of standard deviation

In statistics, the standard deviation is often estimated from a random sample drawn from the population. The most common measure used is the "sample standard deviation", which is defined by:s = sqrt{frac{1}{n-1} sum_{i=1}^n (x_i - overline{x})^2},,where {x_1,x_2,ldots,x_n} is the sample (formally, realizations from a random variable "X") and overline{x} is the sample mean.

The reason for this definition is that "s"2 is an unbiased estimator for the variance σ2 of the underlying population, if that variance exists and the sample values are drawn independently with replacement. However, "s" estimates the population standard deviation σ with negative bias; that is, "s" tends to underestimate σ.

An explanation why the square root of the sample variance is a biased estimator of the standard deviation is that the square root is a nonlinear function, and only linear functions commute with taking the mean. Since the square root is a concave function, it follows from Jensen's inequality that the square root of the sample variance is an underestimate.

Bias correction

When the random variable is normally distributed, a minor correction exists to eliminate the bias. To derive the correction, note that for normally distributed "X", Cochran's theorem implies that sqrt{n{-}1},s/sigma has a chi distribution with n-1 degrees of freedom. Consequently,:operatorname{E} [s] = c_4sigmawhere c_4 is a constant that depends on the sample size "n" as follows::c_4=sqrt{frac{2}{n-1frac{Gammaleft(frac{n}{2} ight)}{Gammaleft(frac{n-1}{2} ight)} = 1 - frac{1}{4n} - frac{7}{32n^2} - O(n^{-3})and Gamma(cdot) is the gamma function.

Thus an unbiased estimator of σ is had by dividing "s" by c_4. Tables giving the value of c_4 for selected values of "n" may be found in most textbooks on statistical quality control. As "n" grows large it approaches 1, and even for smaller values the correction is minor. For example, for n=2,5,10 the values of c_4 are about 0.7979, 0.9400, 0.9727. It is important to keep in mind this correction only produces an unbiased estimator for normally distributed "X". When this condition is satisfied, another result about "s" involving c_4 is that the standard deviation of "s" is sigmasqrt{1-c_4^{2, while the standard deviation of the unbiased estimator is sigmasqrt{c_4^{-2}-1} .

ee also

*Estimation of covariance matrices
*Sample mean and sample covariance

References

* [http://www.itl.nist.gov/div898/handbook/pmc/section3/pmc32.htm What are Variables Control Charts?]
* Douglas C. Montgomery and George C. Runger, "Applied Statistics and Probability for Engineers", 3rd edition, Wiley and sons, 2003. (see Sections 7-2.2 and 16-5)


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Standard deviation — In probability and statistics, the standard deviation is a measure of the dispersion of a collection of values. It can apply to a probability distribution, a random variable, a population or a data set. The standard deviation is usually denoted… …   Wikipedia

  • Standard error (statistics) — For a value that is sampled with an unbiased normally distributed error, the above depicts the proportion of samples that would fall between 0, 1, 2, and 3 standard deviations above and below the actual value. The standard error is the standard… …   Wikipedia

  • Déviation standard — Écart type En mathématiques, plus précisément en statistiques et probabilités, l écart type mesure la dispersion d une série de valeurs autour de leur moyenne. Dans le domaine des probabilités, l écart type est une quantité réelle positive,… …   Wikipédia en Français

  • Minimum-variance unbiased estimator — In statistics a uniformly minimum variance unbiased estimator or minimum variance unbiased estimator (UMVUE or MVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter. The… …   Wikipedia

  • Absolute deviation — In statistics, the absolute deviation of an element of a data set is the absolute difference between that element and a given point. Typically the point from which the deviation is measured is a measure of central tendency, most often the median… …   Wikipedia

  • Median absolute deviation — In statistics, the median absolute deviation (MAD) is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample. For a… …   Wikipedia

  • Maximum spacing estimation — The maximum spacing method tries to find a distribution function such that the spacings, D(i), are all approximately of the same length. This is done by maximizing their geometric mean. In statistics, maximum spacing estimation (MSE or MSP), or… …   Wikipedia

  • Minimum distance estimation — (MDE) is a statistical method for fitting a mathematical model to data, usually the empirical distribution. Contents 1 Definition 2 Statistics used in estimation 2.1 Chi square criterion …   Wikipedia

  • Maximum a posteriori estimation — In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is a mode of the posterior distribution. The MAP can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data. It is closely related to… …   Wikipedia

  • Gold standard (test) — For other uses, see Gold standard (disambiguation). In medicine and statistics, gold standard test refers to a diagnostic test or benchmark that is the best available under reasonable conditions. It does not have to be necessarily the best… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”