- Stein's example
**Stein's example**, sometimes referred to as**Stein's phenomenon**or**Stein's paradox**, is a surprising effect observed indecision theory andestimation theory . Simply stated, the example demonstrates that when three or more parameters are estimated simultaneously, their combinedestimator is more accurate than any method which handles the parameters separately. This is surprising since the parameters and the measurements might be totally unrelated. The phenomenon is named after its discoverer,Charles Stein .**Formal statement**Let $\{oldsymbol\; heta\}$ be a vector consisting of $n\; ge\; 3$ unknown parameters. To estimate these parameters, a single measurement $X\_i$ is performed for each parameter $heta\_i$, resulting in a vector $\{mathbf\; X\}$ of length $n$. Suppose the measurements are independent, identically distributed, Gaussian

random variable s, with mean $\{oldsymbol\; heta\}$ and variance 1, i.e., :$\{mathbf\; X\}\; sim\; N(\{oldsymbol\; heta\},\; I).$Thus, each parameter is estimated using a single noisy measurement, and each measurement is equally inaccurate.Under such conditions, it is most intuitive (and most common) to use each measurement as an estimate of its corresponding parameter. This so-called "ordinary" decision rule can be written as:$\{hat\; oldsymbol\; heta\}\; =\; \{mathbf\; X\}.$

The quality of such an estimator is measured by its

risk function . A commonly used risk function is themean squared error , defined as:$E\; left\{\; |\; \{oldsymbol\; heta\}\; -\; \{hat\; oldsymbol\; heta\}\; |^2\; ight\}.$Surprisingly, it turns out that the "ordinary" estimator proposed above is suboptimal in terms of mean squared error. In other words, in the setting discussed here, there exist alternative estimators which "always" achieve lower mean squared error, no matter what the value of $\{oldsymbol\; heta\}$ is.More accurately, an estimator $\{hat\; oldsymbol\; heta\}\_1$ is said to dominate another estimator $\{hat\; oldsymbol\; heta\}\_2$ if, for all values of $\{oldsymbol\; heta\}$, the risk of $\{hat\; oldsymbol\; heta\}\_1$ is lower than, or equal to, the risk of $\{hat\; oldsymbol\; heta\}\_2$, "and" if the inequality is

strict for some $\{oldsymbol\; heta\}$. An estimator is said to be admissible if no other estimator dominates it, otherwise it is "inadmissible". Thus, Stein's example can be simply stated as follows: "The ordinary decision rule for estimating the mean of a multivariate Gaussian distribution is inadmissible under mean squared error risk."Many simple, practical estimators achieve better performance than the ordinary estimator. The best-known example is the

James-Stein estimator .For a sketch of the proof of this result, see

Proof of Stein's example .**Implications**Stein's example is surprising, since the "ordinary" decision rule is intuitive and commonly used. In fact, numerous methods for estimator construction, including

maximum likelihood estimation , best linear unbiased estimation,least squares estimation and optimalequivariant estimation , all result in the "ordinary" estimator. Yet, as discussed above, this estimator is suboptimal.To demonstrate the unintuitive nature of Stein's example, consider the following real-world example. Suppose we are to estimate three unrelated parameters, such as the US wheat yield for 1993, the number of spectators at the Wimbledon tennis tournament in 2001, and the weight of a randomly chosen candy bar from the supermarket. Suppose we have independent Gaussian measurements of each of these quantities. Stein's example now tells us that we will get a better estimate for the three parameters by simultaneously using the three unrelated measurements.

At first sight it appears that somehow we get a better estimate for US wheat yield by measuring some other unrelated statistics such as the number of spectators at Wimbeldon and the weight of a candy bar. This is of course absurd; we have not obtained a better estimate for US wheat yield alone, but we have produced an estimate for the means of "all" of the random variables, which has a reduced "total" risk. So the cost of a bad estimate in one component can be compensated by a better estimate in another component.

**Resolution of the "paradox"**One may ask how the simultaneous measurement of several parameters reduces the total error of the parameters. This stems from the fact that some properties of a distribution can be estimated more accurately when multiple observations are present, even if those observations are statistically independent. For example, consider the squared norm of the parameter vector, $|\{oldsymbol\; heta\}|^2$. One might consider estimating this value using $|\{mathbf\; X\}|^2$. However, the expectation of this estimate can be shown to be:$E\{\; |\{mathbf\; X\}|^2\; \}\; =\; |\{oldsymbol\; heta\}|^2\; +\; n,$so that $|\{mathbf\; X\}|^2$ tends to be an overestimate of $|\{oldsymbol\; heta\}|^2$. Furthermore, $|\{oldsymbol\; heta\}|^2$ can be estimated more accurately when more parameters are present.

It follows from the above equation that the "ordinary" estimate tends to overestimate the norm of the parameters. This can be corrected by shrinking the ordinary estimator, using, for example, the

James-Stein estimator .**References**A good introduction to the phenomenon:

* cite journal

last = Efron

first = B.

authorlink = Bradley Efron

coauthors = Morris, C.

title = Stein's paradox in statistics

journal = Scientific American

volume = 236

issue = 5

pages = 119–127

date = 1977

url = http://www-stat.stanford.edu/~ckirby/brad/misc/Article1977.pdf A textbook with an extensive discussion of Stein-type estimators:

* cite book

last = Lehmann

first = E. L.

coauthors = Casella, G.

title = Theory of Point Estimation

date = 1998

pages = 2nd ed., ch. 5Stein's original paper:

* cite conference

first = C.

last = Stein

authorlink = Charles Stein

title = Inadmissibility of the usual estimator for the mean of a multivariate distribution

booktitle = Proc. Third Berkeley Symp. Math. Statist. Prob.

pages =**1**, 197-206

date = 1956**See also***

James-Stein estimator

*Decision theory

*Wikimedia Foundation.
2010.*