Stein's method

Stein's method

Stein's method is a general method in probability theory to obtain bounds on the distance between two probability distributions with respect to a probability metric. It was introduced by Charles Stein, who first published it 1972,[1] to obtain a bound between the distribution of a sum of m-dependent sequence of random variables and a standard normal distribution in the Kolmogorov (uniform) metric and hence to prove not only a central limit theorem, but also bounds on the rates of convergence for the given metric.

Contents

History

At the end of the 1960s, unsatisfied with the by-then known proofs of a specific central limit theorem, Charles Stein developed a new way of proving the theorem for his statistics lecture.[2] The seminal paper[1] was presented in 1970 at the sixth Berkeley Symposium and published in the corresponding proceedings.

Later, his Ph.D. student Louis Chen Hsiao Yun modified the method so as to obtain approximation results for the Poisson distribution,[3] therefore the method is often referred to as Stein-Chen method. Whereas moderate attention was given to the new method in the 70s, it has undergone major development in the 80s, where many important contributions were made and on which today's view of the method are largely based. Probably the most important contributions are the monograph by Stein (1986), where he presents his view of the method and the concept of auxiliary randomisation, in particular using exchangeable pairs, and the articles by Barbour (1988) and Götze (1991), who introduced the so-called generator interpretation, which made it possible to easily adapt the method to many other probability distributions. An important contribution was also an article by Bolthausen (1984) on a long-standing open problem around the so-called combinatorial central limit theorem, which surely helped the method to become widely known.[citation needed]

In the 1990s the method was adapted to a variety of distributions, such as Gaussian processes by Barbour (1990), the binomial distribution by Ehm (1991), Poisson processes by Barbour and Brown (1992), the Gamma distribution by Luk (1994), and many others.

The basic approach

Probability metrics

Stein's method is a way to bound the distance of two probability distributions in a specific probability metric. To be tractable with the method, the metric must be given in the form


   (1.1)\quad 
   d(P,Q) 
   = \sup_{h\in\mathcal{H}}\left|\int h dP - \int h dQ \right|
   = \sup_{h\in\mathcal{H}}\left|E h(W) - E h(Y) \right|

Here, P and Q are probability measures on a measurable space \mathcal{X}, W and Y are random variables with distribution P and Q respectively, E is the usual expectation operator and \mathcal{H} is a set of functions from \mathcal{X} to the real numbers. This set has to be large enough, so that the above definition indeed yields a metric. Important examples are the total variation metric, where we let \mathcal{H} consist of all the indicator functions of measurable sets, the Kolmogorov (uniform) metric for probability measures on the real numbers, where we consider all the half-line indicator functions, and the Lipschitz (first order Wasserstein; Kantorovich) metric, where the underlying space is itself a metric space and we take the set \mathcal{H} to be all Lipschitz-continuous functions with Lipschitz-constant 1. However, note that not every metric can be represented in the form (1.1).

In what follows we think of P as a complicated distribution (e.g. a sum of dependent random variables), which we want to approximate by a much simpler and tractable distribution Q (e.g. the standard normal distribution to obtain a central limit theorem).

The Stein operator

We assume now that the distribution Q is a fixed distribution; in what follows we shall in particular consider the case where Q is the standard normal distribution, which serves as a classical example of the application of Stein's method.

First of all, we need an operator \mathcal{A} which acts on functions f from \mathcal{X} to the real numbers, and which 'characterizes' the distribution Q in the sense that the following equivalence holds:


 (2.1)\quad
 E (\mathcal{A}f)(Y) = 0\text{ for all } f  \quad \iff \quad Y \text{ has distribution } Q.

We call such an operator the Stein operator. For the standard normal distribution, Stein's lemma exactly yields such an operator:


 (2.2)\quad
 E\left(f'(Y)-Yf(Y)\right) = 0\text{ for all } f\in C_b^1  \quad \iff \quad Y \text{ has standard normal distribution.}

thus we can take


 (2.3)\quad
(\mathcal{A}f)(x) = f'(x) - x f(x)

We note that there are in general infinitely many such operators and it still remains an open question, which one to choose. However, it seems that for many distributions there is a particular good one, like (2.3) for the normal distribution.

There are different ways to find Stein operators. But by far the most important one is via generators. This approach was, as already mentioned, introduced by Barbour and Götze. Assume that Z=(Z_t)_{t\geq0} is a (homogeneous) continuous time Markov process taking values in \mathcal{X}. If Z has the stationary distribution Q it is easy to see that, if \mathcal{A} is the generator of Z, we have E (\mathcal{A}f)(Y) = 0 for a large set of functions f. Thus, generators are natural candidates for Stein operators and this approach will also help us for later computations.

Setting up the Stein equation

Observe now that saying that P is close to Q with respect to d is equivalent to saying that the difference of expectations in (1.1) is close to 0, and indeed if P = Q it is equal to 0. We hope now that the operator \mathcal{A} exhibits the same behavior: clearly if P = Q we have E (\mathcal{A}f)(W)=0 and hopefully if P\approx Q we have E (\mathcal{A}f)(W) \approx 0.

To make this statement rigorous we could find a function f, such that, for a given function h,


(3.1)\quad
E(\mathcal{A}f)(W)=E h(W) - Eh(Y),

so that the behavior of the right hand side is reproduced by the operator \mathcal{A} and f. However, this equation is too general. We solve instead the more specific equation


(3.2)\quad 
(\mathcal{A}f)(x)= h(x) - Eh(Y), \qquad\text{for all }x,

which is called Stein equation. Replacing x by W and taking expectation with respect to W, we are back to (3.1), which is what we effectively want. Now all the effort is worth only if the left hand side of (3.1) is easier to bound than the right hand side. This is, surprisingly, often the case.

If Q is the standard normal distribution and we use (2.3), the corresponding Stein equation is


(3.3)\quad
f'(x) - x f(x) = h(x) - Eh(Z), \qquad\text{for all }x,

which is just an ordinary differential equation.

Solving the Stein equation

Now, in general, we cannot say much about how the equation (3.2) is to be solved. However, there are important cases, where we can.

Analytic methods. We see from (3.3) that equation (3.2) can in particular be a differential equation (if Q is concentrated on the integers, it will often turn out to be a difference equation). As there are many methods available to treat such equations, we can use them to solve the equation. For example, (3.3) can be easily solved explicitly:


(4.1)\quad
f(x) = e^{x^2/2}\int_{-\infty}^x [h(s)-E h(Y)]e^{-s^2/2}ds.

Generator method. If \mathcal{A} is the generator of a Markov process (Z_t)_{t\geq 0}as explained before, we can give a general solution to (3.2):


(4.2)\quad
f(x) = -\int_0^\infty [E^x h(Z_t)-E h(Y)] dt.

where Ex denotes expectation with respect to the process Z being started in x. However, one still has to prove that the solution (4.2) exists for all desired functions h\in\mathcal{H}.

Properties of the solution to the Stein equation

After showing the existence of a solution to (3.2) we can now try to analyze its properties. Usually, one tries to give bounds on f and its derivatives (which has to be carefully defined if \mathcal{X} is a more complicated space) or differences in terms of h and its derivatives or differences, that is, inequalities of the form


(5.1)\quad
||D^k f|| \leq C_{k,l} ||D^l h||,

for some specific k,l=0,1,2,\dots (typically k\geq l or k\geq l-1, respectively, depending on the form of the Stein operator) and where often ||\cdot|| is taken to be the supremum norm. Here, Dk denotes the differential operator, but in discrete settings it usually refers to a difference operator. The constants Ck,l may contain the parameters of the distribution Q. If there are any, they are often referred to as Stein factors or magic factors.

In the case of (4.1) we can prove for the supremum norm that


(5.2)\quad
||f||_\infty\leq \min\{\sqrt{\pi/2}||h||_\infty,2||h'||_\infty\},\quad
||f'||_\infty\leq \min\{2||h||_\infty,4||h'||_\infty\},\quad
||f''||_\infty\leq 2 ||h'||_\infty,

where the last bound is of course only applicable if h is differentiable (or at least Lipschitz-continuous, which, for example, is not the case if we regard the total variation metric or the Kolmogorov metric!). As the standard normal distribution has no extra parameters, in this specific case, the constants are free of additional parameters.

Note that, up to this point, we did not make use of the random variable W. So, the steps up to here in general have to be calculated only once for a specific combination of distribution Q, metric d and Stein operator \mathcal{A}. However, if we have bounds in the general form (5.1), we usually are able to treat many probability metrics together. Furthermore as there is often a particular 'good' Stein operator for a distribution (e.g., no other operator than (2.3) has been used for the standard normal distribution up to now), one can often just start with the next step below, if bounds of the form (5.1) are already available (which is the case for many distributions).

An abstract approximation theorem

We are now in a position to bound the left hand side of (3.1). As this step heavily depends on the form of the Stein operator, we directly regard the case of the standard normal distribution.

Now, at this point we could directly plug in our random variable W which we want to approximate and try to find upper bounds. However, it is often fruitful to formulate a more general theorem using only abstract properties of W. Let us consider here the case of local dependence.

To this end, assume that W=\sum_{i=1}^n X_i is a sum of random variables such that the EW = 0 and variance VarW = 1. Assume that, for every i=1,\dots,n, there is a set A_i\subset\{1,2,\dots,n\}, such that Xi is independent of all the random variables Xj with j\not\in A_i. We call this set the 'neighborhood' of Xi. Likewise let B_i\subset\{1,2,\dots,n\} be a set such that all Xj with j\in A_i are independent of all Xk, k\not\in B_i. We can think of Bi as the neighbors in the neighborhood of Xi, a second-order neighborhood, so to speak. For a set A\subset\{1,2,\dots,n\} define now the sum X_A := \sum_{j\in A} X_j.

Using basically only Taylor expansion, it is possible to prove that


(6.1)\quad
\left|E(f'(W)-Wf(W))\right| 
\leq ||f''||_\infty\sum_{i=1}^n \left(
 \frac{1}{2}E|X_i X_{A_i}^2|
+ E|X_i X_{A_i}X_{B_i \setminus A_i}|
+ E|X_i X_{A_i}| E|X_{B_i}|
\right)

Note that, if we follow this line of argument, we can bound (1.1) only for functions where ||h'||_{\infty} is bounded because of the third inequality of (5.2) (and in fact, if h has discontinuities, so will f''). To obtain a bound similar to (6.1) which contains only the expressions ||f||_{\infty} and ||f'||_{\infty}, the argument is much more involved and the result is not as simple as (6.1); however, it can be done.

Theorem A. If W is as described above, we have for the Lipschitz metric dW that


(6.2)\quad
d_W(\mathcal{L}(W),N(0,1)) \leq 2\sum_{i=1}^n \left(
 \frac{1}{2}E|X_i X_{A_i}^2|
+ E|X_i X_{A_i}X_{B_i \setminus A_i}|
+ E|X_i X_{A_i}| E|X_{B_i}|
\right).

Proof. Recall that the Lipschitz metric is of the form (1.1) where the functions h are Lipschitz-continuous with Lipschitz-constant 1, thus ||h'||\leq 1. Combining this with (6.1) and the last bound in (5.2) proves the theorem.

Thus, roughly speaking, we have proved that, to calculate the Lipschitz-distance between a W with local dependence structure and a standard normal distribution, we only need to know the third moments of Xi and the size of the neighborhoods Ai and Bi.

Application of the theorem

We can treat the case of sums of independent and identically distributed random variables with Theorem A. So assume now that EXi = 0, VarXi = 1 and W=n^{-1/2}\sum X_i. We can take Ai = Bi = {i} and we obtain from Theorem A that


(7.1)\quad
d_W(\mathcal{L}(W),N(0,1)) \leq \frac{5 E|X_1|^3}{n^{1/2}}.

Connections to other methods

  • Lindeberg's method. Lindeberg (1922) introduced in a seminal article a method, where the difference in (1.1) is directly bounded. This method usually also heavily relies on Taylor expansion and thus shows some similarities with Stein's method.
  • Tikhomirov's method. Clearly the approach via (1.1) and (3.1) does not involve characteristic functions. However, Tikhomirov (1980) presented a proof of a central limit theorem based on characteristic functions and a differential operator similar to (2.3). The basic observation is that the characteristic function ψ(t) of the standard normal distribution satisfies the differential equation ψ'(t) + tψ(t) = 0 for all t. Thus, if the characteristic function ψW(t) of W is such that \psi'_W(t)+t\psi_W(t)\approx 0 we expect that \psi_W(t)\approx \psi(t) and hence that W is close to the normal distribution. Tikhomirov states in his paper that he was inspired by Stein's seminal paper.

Literature

The following text is advanced, and gives a comprehensive overview of the normal case

  • Chen, L.H.Y., Goldstein, L., and Shao, Q.M (2011). Normal approximation by Stein's method. www.springer.com. ISBN 978-3-642-15006-7. 

Another advanced book, but having some introductory character, is

  • ed. Barbour, A.D. and Chen, L.H.Y. (2005). An introduction to Stein's method. Lecture Notes Series, Institute for Mathematical Sciences, National University of Singapore. 4. Singapore University Press. ISBN 981256280X. 

A standard reference is the book by Stein,

  • Stein, C. (1986). Approximate computation of expectations. Institute of Mathematical Statistics Lecture Notes, Monograph Series, 7. Hayward, Calif.: Institute of Mathematical Statistics. ISBN 0940600080. 

which contains a lot of interesting material, but may be a little hard to understand at first reading.

Despite its age, there are few standard introductory books about Stein's method available. The following recent textbook has a chapter (Chapter 2) devoted to introducing Stein's method:

  • Ross, Sheldon and Peköz, Erol (2007). A second course in probability. www.ProbabilityBookstore.com. ISBN 978-0979570407. 

Although the book

  • Barbour, A. D. and Holst, L. and Janson, S. (1992). Poisson approximation. Oxford Studies in Probability. 2. The Clarendon Press Oxford University Press. ISBN 0198522355. 

is by large parts about Poisson approximation, it contains nevertheless a lot of information about the generator approach, in particular in the context of Poisson process approximation.

References

  1. ^ a b Stein, C. (1972). "A bound for the error in the normal approximation to the distribution of a sum of dependent random variables". Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability: 583–602. MR402873. Zbl 0278.60026. http://projecteuclid.org/euclid.bsmsp/1200514239. 
  2. ^ Charles Stein: The Invariant, the Direct and the "Pretentious". Interview given in 2003 in Singapore
  3. ^ Chen, L.H.Y. (1975). "Poisson approximation for dependent trials". Annals of Probability 3 (3): 534–545. doi:10.1214/aop/1176996359. JSTOR 2959474. MR428387. Zbl 0335.60016. 

Bibliography

  • Barbour, A. D. (1988). "Stein's method and Poisson process convergence". J. Appl. Probab. (Applied Probability Trust) 25A: 175–184. doi:10.2307/3214155. JSTOR 3214155. 
  • Barbour, A. D. (1990). "Stein's method for diffusion approximations". Probab. Theory Related Fields 84 (3): 297–322. doi:10.1007/BF01197887. 
  • Barbour, A. D. and Brown, T. C. (1992). "Stein's method and point process approximation". Stochastic Process. Appl. 43 (1): 9–31. doi:10.1016/0304-4149(92)90073-Y. 
  • Bolthausen, E. (1984). "An estimate of the remainder in a combinatorial central limit theorem". Z. Wahrsch. Verw. Gebiete 66 (3): 379–386. doi:10.1007/BF00533704. 
  • Ehm, W. (1991). "Binomial approximation to the Poisson binomial distribution". Statist. Probab. Lett. 11 (1): 7–16. doi:10.1016/0167-7152(91)90170-V. 
  • Götze, F. (1991). "On the rate of convergence in the multivariate CLT". Ann. Probab. 19 (2): 724–739. doi:10.1214/aop/1176990448. 
  • Lindeberg, J. W. (1922). "Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeitsrechung". Math. Z. 15 (1): 211–225. doi:10.1007/BF01494395. 
  • Luk, H. M. (1994). Stein's method for the gamma distribution and related statistical applications. Dissertation. 
  • Stein, C. (1986). Approximate computation of expectations. Institute of Mathematical Statistics Lecture Notes, Monograph Series, 7. ISBN 0940600080. 
  • Tikhomirov, A. N. (1980). "Convergence rate in the central limit theorem for weakly dependent random variables". Teor. Veroyatnost. I Primenen. 25: 800–818. English translation in Theory Probab. Appl. 25 (1980–81): 790–809. 

Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • Stein's example — Stein s example, sometimes referred to as Stein s phenomenon or Stein s paradox, is a surprising effect observed in decision theory and estimation theory. Simply stated, the example demonstrates that when three or more parameters are estimated… …   Wikipedia

  • STEIN, ISAAC — (d. 1495), rabbi, rosh yeshivah, and halakhic authority. Stein probably came from the village of that name near Nuremberg, a district in Bavaria (i.e., Stein bei Nuernberg). He studied under israel isserlein , to whom he invariably refers as the… …   Encyclopedia of Judaism

  • Stein's lemma — Stein s lemma, named in honor of Charles Stein, is a theorem of probability theory that is of interest primarily because of its application to statistical inference mdash; in particular, its application to James Stein estimation and empirical… …   Wikipedia

  • Charles Stein (statistician) — For other people named Charles Stein, see Charles Stein (disambiguation). Charles M. Stein (born March 22, 1920), an American mathematical statistician, is emeritus professor of statistics at Stanford University. He received his Ph.D in 1947 at… …   Wikipedia

  • Charles Stein — Charles M. Stein (* 22. März 1920) ist ein US amerikanischer Statistiker. Stein studierte an der Columbia University. In Zweiten Weltkrieg arbeitete in der Wettervorhersage für die US Air Force und kam mit statistischen Arbeiten in Berührung.… …   Deutsch Wikipedia

  • Kyrill und Method in Groß-Mähren — Kyrill und Method. Die aus Saloniki stammenden griechischen Brüder Konstantin und Michael wurden als die Heiligen Kyrill (griechisch Kyrillos ho Thessalonikeus Κύριλλος ὁ Θεσσαλονικεύς) und Method(ius) (griechis …   Deutsch Wikipedia

  • Discrete element method — A discrete element method (DEM), also called a distinct element method is any of family of numerical methods for computing the motion of a large number of particles of micrometre scale size and above. Though DEM is very closely related to… …   Wikipedia

  • Joback method — The Joback method [Joback K.G., Reid R.C., Estimation of Pure Component Properties from Group Contributions , Chem.Eng.Commun., 57, 233 243, 1987] (often named Joback/Reid method) predicts eleven important and commonly used pure component… …   Wikipedia

  • Accounting method — In computational complexity theory, the accounting method is a method of amortized analysis based on accounting. The accounting method often gives a more intuitive account of the amortized cost of an operation than either aggregate analysis or… …   Wikipedia

  • Discrete element method — Mit Discrete Element Method (DEM) wird eine 1979 von Cundall und Strack entwickelte numerische Berechnungsmethode bezeichnet, mit der die Bewegung einer großen Zahl von Teilchen berechnet werden kann. Die Methode wird manchmal auch als Distinct… …   Deutsch Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”