James-Stein estimator

James-Stein estimator

The James-Stein estimator is a nonlinear estimator which can be shown to dominate, or outperform, the "ordinary" (least squares) technique. As such, it is the best-known example of Stein's phenomenon.

An earlier version of the estimator was developed by Charles Stein, and is sometimes referred to as Stein's estimator.Citation
last = Stein | first = C.
author-link = Charles Stein
contribution = Inadmissibility of the usual estimator for the mean of a multivariate distribution
title = Proc. Third Berkeley Symp. Math. Statist. Prob.
year = 1956
volume = 1
pages = 197-206
url = http://projecteuclid.org/euclid.bsmsp/1200501656
] The result was improved by James and Stein.Citation
last = James | first = W.
last2 = Stein | first2 = C. | author2-link = Charles Stein
contribution = Estimation with quadratic loss
title = Proc. Fourth Berkeley Symp. Math. Statist. Prob.
year = 1961
volume = 1
pages = 361-379
url = http://projecteuclid.org/euclid.bsmsp/1200512173


Suppose " is an unknown parameter vector of length m, and let y be a vector of observations of " (also of length m), such that

:{mathbf y} sim N({oldsymbol heta}, sigma^2 I).,

We are interested in obtaining an estimate widehat{oldsymbol heta} = widehat{oldsymbol heta}({mathbf y}) of ", based on the observations y.

This is an everyday situation in which a set of parameters is measured, and the measurements are corrupted by independent Gaussian noise. Since the noise has zero mean, it is very reasonable to use the measurements themselves as an estimate of the parameters. This is the approach of the least squares estimator, which simply equals widehat{oldsymbol heta}_{LS} = {mathbf y} in this case.

As a result, there was considerable shock and disbelief when Stein demonstrated that, in terms of mean squared error E { | {oldsymbol heta}-widehat {oldsymbol heta} |^2 }, this approach is suboptimal. The result became known as Stein's phenomenon.

The James-Stein estimator

The James-Stein estimator is given by

:widehat{oldsymbol heta}_{JS} = left( 1 - frac{(m-2) sigma^2}{|{mathbf y}|^2} ight) {mathbf y}.

James and Stein showed that the above estimator dominates widehat{oldsymbol heta}_{LS} for any m ge 3, meaning that the James-Stein estimator always achieves lower MSE than the least squares estimator.Citation
first = E. L. | last = Lehmann
first2 = G. | last2 = Casella
year = 1998
title = Theory of Point Estimation
edition = 2nd
address = New York
publisher = Springer

Notice that if (m-2) sigma^2<|{mathbf y}|^2 then this estimator simply takes the natural estimator y and shrinks it towards the origin 0. In fact this is not the only direction of shrinkage that works. Let " be an arbitrary fixed vector of length m. Then there exists a James-Stein estimator that shrinks toward ", namely

:widehat{oldsymbol heta}_{JS} = left( 1 - frac{(m-2) sigma^2}{|{mathbf y} - {oldsymbol u}|^2} ight) ({mathbf y}-{oldsymbol u}) + {oldsymbol u}.

It is interesting to note that the James-Stein estimator dominates the usual estimator for any ". A natural question to ask is whether the improvement over the usual estimator is independent of the choice of ". The answer is no. The improvement is small if |{oldsymbol heta - oldsymbol u}| is large. Thus to get a very great improvement some knowledge of the location of " is necessary. Of course this is the quantity we are trying to estimate so we don't have this knowledge a priori. But we may have some guess as to what the mean vector is. This can be considered a disadvantage of the estimator because the choice is not objective, it depends on the beliefs of the researcher.

Stein has shown that, for m le 2, the least squares estimator is admissible, meaning that no estimator dominates it.


A consequence of the above discussion is the following counterintuitive result: When three or more unrelated parameters are measured, their total MSE can be reduced by using a combined estimator such as the James-Stein estimator; whereas when each parameter is estimated separately, the least squares (LS) estimator is admissible. This quirk has caused some to sarcastically ask whether, in order to estimate the speed of light, one should jointly estimate tea consumption in Taiwan and hog weight in Montana. The response is that the James-Stein estimator always improves upon the "total" MSE, i.e., the sum of the expected errors of each component. Therefore, the total MSE in measuring light speed, tea consumption and hog weight would improve by using the James-Stein estimator. However, any particular component (such as the speed of light) would improve for some parameter values, and deteriorate for others. Thus, although the James-Stein estimator dominates the LS estimator when three or more parameters are estimated, any single component does not dominate the respective component of the LS estimator.

The conclusion from this hypothetical example is that measurements should be combined if one is interested in minimizing their total MSE. For example, in a telecommunication setting, it is reasonable to combine channel tap measurements in a channel estimation scenario, as the goal is to minimize the total channel estimation error. Conversely, it is probably not reasonable to combine channel estimates of different users, since no user would want their channel estimate to deteriorate in order to improve the average network performance.


The basic James-Stein estimator has the peculiar property that for small values of |{mathbf y} - {oldsymbol u} |, the multiplier on {mathbf y} - {oldsymbol u} is actually negative. This can be easily remedied by replacing this multiplier by zero when it is negative. The resulting estimator is called the "positive-part James-Stein estimator" and is given by

:widehat{oldsymbol heta}_{JS+} = left( 1 - frac{(m-2) sigma^2}{|{mathbf y} - {oldsymbol u}|^2} ight)_+ ({mathbf y}-{oldsymbol u}) + {oldsymbol u}.

This estimator has a smaller risk than the basic James-Stein estimator. It follows that the basic James-Stein estimator is itself inadmissible.Citation
first = T. W. | last = Anderson
year = 1984
title = An Introduction to Multivariate Statistical Analysis
edition = 2nd
address = New York
publisher = John Wiley & Sons

It turns out, however, that the positive-part estimator is also inadmissible. This follows from a general result which requires admissible estimators to be smooth.


The James-Stein estimator may seem at first sight to be a result of some peculiarity of the problem setting. In fact, the estimator exemplifies a very wide-ranging effect, namely, the fact that the "ordinary" or least squares estimator is often inadmissible for simultaneous estimation of several parameters. This effect has been called Stein's phenomenon, and has been demonstrated for several different problem settings, some of which are briefly outlined below.
* James and Stein demonstrated that the estimator presented above can still be used when the variance sigma^2 is unknown, by replacing it with the standard estimator of the variance, widehat{sigma}^2 = frac{1}{n}sum ( y_i-overline{y} )^2. The dominance result still holds under the same condition, namely, m > 2.
* Bock extended the work of James and Stein to the case of a general measurement covariance matrix, i.e., where measurements may be statistically dependent and may have differing variances. A similar dominating estimator can be constructed, with a suitably generalized dominance condition. This can be used to construct a linear regression technique which outperforms the standard application of the LS estimator.Citation
last = Bock | first = M. E.
year = 1975
title = Minimax estimators of the mean of a multivariate distribution.
journal = Ann. Statist.
volume = 3
issue = 1
pages 209-218
* Stein's result was substantially extended by Brown to a wide class of distributions and loss functions. However, his theorem is an existence result only, in that explicit dominating estimators were not actually exhibited.Citation
last = Brown | first = L. D.
year = 1966
title = On the admissibility of invariant estimators of one or more location parameters
journal = Ann. Math. Statist.
volume = 37
pages = 1087-1136
] It is quite difficult to obtain explicit estimators improving upon the usual estimator without specific restrictions on the underlying distributions.

See also

* Stein's phenomenon
* Admissible decision rule
* Decision theory
* Estimation theory



Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Stein's example — Stein s example, sometimes referred to as Stein s phenomenon or Stein s paradox, is a surprising effect observed in decision theory and estimation theory. Simply stated, the example demonstrates that when three or more parameters are estimated… …   Wikipedia

  • Stein's unbiased risk estimate — In statistics, Stein s unbiased risk estimate (SURE) is an unbiased estimator of the mean squared error of a given estimator, in a deterministic estimation scenario. In other words, it provides an indication of the accuracy of a given estimator.… …   Wikipedia

  • Minimax estimator — In statistical decision theory, where we are faced with the problem of estimating a deterministic parameter (vector) from observations an estimator (estimation rule) is called minimax if its maximal risk is minimal among all estimators of . In a… …   Wikipedia

  • Charles Stein (statistician) — For other people named Charles Stein, see Charles Stein (disambiguation). Charles M. Stein (born March 22, 1920), an American mathematical statistician, is emeritus professor of statistics at Stanford University. He received his Ph.D in 1947 at… …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • Linear least squares (mathematics) — This article is about the mathematics that underlie curve fitting using linear least squares. For statistical regression analysis using least squares, see linear regression. For linear regression on a single variable, see simple linear regression …   Wikipedia

  • Linear least squares — is an important computational problem, that arises primarily in applications when it is desired to fit a linear mathematical model to measurements obtained from experiments. The goals of linear least squares are to extract predictions from the… …   Wikipedia

  • Linear least squares/Proposed — Linear least squares is an important computational problem, that arises primarily in applications when it is desired to fit a linear mathematical model to observations obtained from experiments. Mathematically, it can be stated as the problem of… …   Wikipedia

  • Stefan Ralescu — Ştefan S. Ralescu (b. 1952 in Bucharest, Romania) is a leading statistician who has made significant contributionsFact|date=March 2007 to the theory of statistical inference, mainly through asymptotic theory. He is a professor of mathematics and… …   Wikipedia

  • List of mathematics articles (J) — NOTOC J J homomorphism J integral J invariant J. H. Wilkinson Prize for Numerical Software Jaccard index Jack function Jacket matrix Jackson integral Jackson network Jackson s dimensional theorem Jackson s inequality Jackson s theorem Jackson s… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”