- Stein's example
Stein's example, sometimes referred to as Stein's phenomenon or Stein's paradox, is a surprising effect observed in
decision theory andestimation theory . Simply stated, the example demonstrates that when three or more parameters are estimated simultaneously, their combinedestimator is more accurate than any method which handles the parameters separately. This is surprising since the parameters and the measurements might be totally unrelated. The phenomenon is named after its discoverer,Charles Stein .Formal statement
Let be a vector consisting of unknown parameters. To estimate these parameters, a single measurement is performed for each parameter , resulting in a vector of length . Suppose the measurements are independent, identically distributed, Gaussian
random variable s, with mean and variance 1, i.e., :Thus, each parameter is estimated using a single noisy measurement, and each measurement is equally inaccurate.Under such conditions, it is most intuitive (and most common) to use each measurement as an estimate of its corresponding parameter. This so-called "ordinary" decision rule can be written as:
The quality of such an estimator is measured by its
risk function . A commonly used risk function is themean squared error , defined as:Surprisingly, it turns out that the "ordinary" estimator proposed above is suboptimal in terms of mean squared error. In other words, in the setting discussed here, there exist alternative estimators which "always" achieve lower mean squared error, no matter what the value of is.More accurately, an estimator is said to dominate another estimator if, for all values of , the risk of is lower than, or equal to, the risk of , "and" if the inequality is
strict for some . An estimator is said to be admissible if no other estimator dominates it, otherwise it is "inadmissible". Thus, Stein's example can be simply stated as follows: "The ordinary decision rule for estimating the mean of a multivariate Gaussian distribution is inadmissible under mean squared error risk."Many simple, practical estimators achieve better performance than the ordinary estimator. The best-known example is the
James-Stein estimator .For a sketch of the proof of this result, see
Proof of Stein's example .Implications
Stein's example is surprising, since the "ordinary" decision rule is intuitive and commonly used. In fact, numerous methods for estimator construction, including
maximum likelihood estimation , best linear unbiased estimation,least squares estimation and optimalequivariant estimation , all result in the "ordinary" estimator. Yet, as discussed above, this estimator is suboptimal.To demonstrate the unintuitive nature of Stein's example, consider the following real-world example. Suppose we are to estimate three unrelated parameters, such as the US wheat yield for 1993, the number of spectators at the Wimbledon tennis tournament in 2001, and the weight of a randomly chosen candy bar from the supermarket. Suppose we have independent Gaussian measurements of each of these quantities. Stein's example now tells us that we will get a better estimate for the three parameters by simultaneously using the three unrelated measurements.
At first sight it appears that somehow we get a better estimate for US wheat yield by measuring some other unrelated statistics such as the number of spectators at Wimbeldon and the weight of a candy bar. This is of course absurd; we have not obtained a better estimate for US wheat yield alone, but we have produced an estimate for the means of "all" of the random variables, which has a reduced "total" risk. So the cost of a bad estimate in one component can be compensated by a better estimate in another component.
Resolution of the "paradox"
One may ask how the simultaneous measurement of several parameters reduces the total error of the parameters. This stems from the fact that some properties of a distribution can be estimated more accurately when multiple observations are present, even if those observations are statistically independent. For example, consider the squared norm of the parameter vector, . One might consider estimating this value using . However, the expectation of this estimate can be shown to be:so that tends to be an overestimate of . Furthermore, can be estimated more accurately when more parameters are present.
It follows from the above equation that the "ordinary" estimate tends to overestimate the norm of the parameters. This can be corrected by shrinking the ordinary estimator, using, for example, the
James-Stein estimator .References
A good introduction to the phenomenon:
* cite journal
last = Efron
first = B.
authorlink = Bradley Efron
coauthors = Morris, C.
title = Stein's paradox in statistics
journal = Scientific American
volume = 236
issue = 5
pages = 119–127
date = 1977
url = http://www-stat.stanford.edu/~ckirby/brad/misc/Article1977.pdf A textbook with an extensive discussion of Stein-type estimators:
* cite book
last = Lehmann
first = E. L.
coauthors = Casella, G.
title = Theory of Point Estimation
date = 1998
pages = 2nd ed., ch. 5Stein's original paper:
* cite conference
first = C.
last = Stein
authorlink = Charles Stein
title = Inadmissibility of the usual estimator for the mean of a multivariate distribution
booktitle = Proc. Third Berkeley Symp. Math. Statist. Prob.
pages = 1, 197-206
date = 1956See also
*
James-Stein estimator
*Decision theory
Wikimedia Foundation. 2010.