- Empirical Bayes method
In
statistics , empirical Bayes methods are a class of methods which use empirical data to evaluate / approximate the conditionalprobability distributions that arise fromBayes' theorem . These methods allow one to estimate quantities (probabilities ,average s, etc.) about an individual member of a population by combining information from empirical measurements on the individual and on the entire population.Empirical Bayes methods involve:
*An "underlying"
probability distribution of some unobservable quantity is assigned to each member of astatistical population . This quantity is arandom variable if a member of the population is chosen at random. The probability distribution of this random variable is not known, and is thought of as a property of the population.*An observable quantity assigned to each member of the population. When a random sample is taken from the population, it is desired first to estimate the "underlying" probability distribution, and then to estimate the value of the unobservable quantity assigned to each member of the sample.
Introduction
In the Bayesian approach to statistics, we consider the problem of estimating some probability (such as a future outcome or a noisy measurement), based on measurements of our data, a model for these measurements, and some model for our prior beliefs about the system. Let us consider a standard two-stage model, where we write our data measurements as a vector y = {y_1, y_2, dots, y_n} , and our prior beliefs as some vector of random unknowns heta. We assume we can model our measurements with a conditional probability distribution (or
likelihood ) ho(y| heta), and also the prior as ho( heta|eta) , where eta is somehyperparameter . For example, we might choose ho(y| heta) to be abinomial distribution , and ho( heta|eta) as aBeta distribution (theconjugate prior ). Empirical Bayes then employs the complete set of empirical data to make inferences about the prior heta, and then plugs this into the likelihood ho(y| heta) to make estimates for future outcomes of individual measurements.To see this in action, use
Bayes' theorem to write an expression for theposterior probability for heta as:ho( heta|y)={ { ho(y| heta) ho( heta|eta)} over {int { ho(y| heta) ho( heta|eta), d heta}
and let us define the denominator of the fraction, known as
marginal (ornormalizing constant ) as:m(y|eta)={int { ho(y| heta) ho( heta|eta), d heta
Empirical Bayes (EB) combines
Bayesianism andfrequentist approaches to estimation. EB approximates the marginal (and/or full posterior) distribution with point estimation (maximum likelihood estimation (MLE), squared error loss (SEL), Monte Carlo/numerical integration, etc.) and then estimates the approximate marginal using the empirical data (frequency probability ). EB takes several forms, including non-parameteric and parameteric forms. We describe a few common examples below.Point estimation
Robbins method (1956): non-parameteric empirical Bayes (NPEB)
We consider a case of
compound sampling , where probability for each ho(y_i| heta_i) is specified by aPoisson distribution ,:ho(y_i| heta_i)= heta_i}^{y_i} e^{- heta_i} over {y_i}!}
while the prior is unspecified except that it is also
i.i.d. (call it G(Theta)). Compound sampling arises in a variety of statistical estimation problems, such as accident rates, clinical trials, etc. We simply seek a point estimate of heta_i. Because the prior is unspecified, we seek a non-parametric estimate of the posterior.We may write a point Bayes estimate for the prior (assumingsquared error loss (SEL)) as (see Carlin and Louis, Sec. 3.2 and Appendix B): :E( heta_imid Y=y_i)={(y_i+1)m_G(Y=y_i+1) over m_G(Y=y_i)}.qquadqquadqquad Proof:First, we show that the point estimate for the prior is the estimated mean of the posterior. Write the posterior risk for a point estimate for heta = a under
squared error loss (SEL) (orloss function ) as:ho(G,a) = int ( heta-a)^2g( heta|y),d heta.
To find the minimum error, set the derivative with respect to a equal to zero
:frac{ partial }{partial a} [ ho(G,a)] = int -2( heta-a)g( heta|y),d heta = 0.
Solving for a yields
:a = int heta g( heta|y), d heta = E( heta|y).
A quick check shows that the second derivative is greater than zero, indicating a true minimum.
:frac{ partial^2 }{partial a^2} [ ho(G,a)] = 2 int g( heta|y) d heta = 2.
Note that had we chosen a
absolute error loss , the point estimate would instead be themedian , and with0-1 error loss , it would be the mode (definitions pending).Second, we wish to show that the point estimate is a simple ratio of the estimated marginals for the specific case of the
Poisson distribution . Write the point estimate for the prior as::E( heta_i|y_i) = {int ( heta^{y+1} e^{- heta} / {y_i}!),dG( heta) over {int ( heta^y e^{- heta} / {y_i}!),dG( heta}) }.
Multiply the expression by y_i}+1)/({y_i}+1) and use Bayes definition of the marginal to obtain
:E( heta_i|y_i)= (y_i + 1) m_G(y_i + 1) }over {m_G(y_i).
To take advantage of this, Robbins (1955) suggested estimating the marginals with their empirical frequencies "Y""d", yielding the fully non-parametric estimate as:
:E( heta_i|y_i)= (y_i + 1) { {#(Y_d = y_i + 1)} over {#( Y_d = y_i)} }
(see also
Good-Turing frequency estimation ).Example: Accident rates
Suppose each customer of an insurance company has an "accident rate" Θ and is insured against "accidents"; the probability distribution of Theta is the "underlying" distribution, and is unknown. The number of "accidents" suffered by each customer in a specified baseline time period has a
Poisson distribution whose expected value is the particular customer's "accident rate". That number of "accidents" is the observable quantity. A crude way to estimate the underlying probability distribution of the "accident rate" Θ is to estimate the proportion of members of the whole population suffering 0, 1, 2, 3, ... accidents during the specified time period to be equal to the corresponding proportion in the observed random sample. Having done so, it is then desired the "accident rate" of each customer in the sample. One may use the conditional expected value of the "accident rate" Θ given the observed number "X" of "accidents" during the baseline period.Thus, if a customer suffers six "accidents" during the baseline period, that customer's estimated "accident rate" is 7 × [the proportion of the sample who suffered 7 "accidents"] / [the proportion of the sample who suffered 6 "accidents"] .
Parametric empirical Bayes
If the likelihood and its prior take on simple parametric forms (such as 1- or 2-dimensional likelihood functions with simple
conjugate prior s), then the empirical Bayes problem is only to estimate the marginal m(y|eta) and the hyperparameters eta using the complete set of empirical measurements. For example, one common approach, called parametric Empirical Bayes point estimation, is to approximate the marginal using themaximum likelihood estimate (MLE), or aMoments expansion, which allows one to express the hyperparameters eta in terms of the empirical mean and variance. This simplified marginal allows one to plug in the empirical averages into a point estimate for the prior heta. The resulting equation for the prior heta is greatly simplified, as shown below.There are several common Parametric Empirical Bayes models, including the
Poisson-Gamma model (below), theBeta-binomial model , theGaussian-Gaussian model , themultinomial-Dirichlet model , as well specific models forBayesian linear regression (see below) andBayesian multivariate linear regression . More advanced approaches includehierarchial Bayesian model s andBayesian mixture model s.Poisson-Gamma model
For example, in the example above, let the likelihood be a
Poisson distribution , and let the prior now be specified by theconjugate prior , which is aGamma distribution (G(alpha,eta)) (where eta = (alpha,eta))::ho( heta|alpha,eta) = frac{ heta^{alpha-1}, e^{- heta / eta} }{eta^{alpha} Gamma(alpha)} mathrm{for} heta > 0, alpha > 0, eta > 0 ,!
It is straightforward to show the posterior is also a Gamma distribution. Write
:ho( heta|y) propto ho(y| heta) ho( heta|alpha, eta)
where we have omitted the marginal since it does not depend explicitly on heta.Expanding terms which do depend on heta gives the posterior as:
:ho( heta|y) propto ( heta^{y}, e^{- heta}) ( heta^{alpha-1}, e^{- heta / eta}) = heta^{y+ alpha -1}, e^{- heta (1+1 / eta)}
So we see that the posterior density is also a
Gamma distribution G(alpha',eta'), where alpha' = y + alpha, and eta' = (1+1 / eta)^{-1}. Also notice that the marginal is simply the integral of the posterior over all Theta, which turns out to be anegative binomial distribution .To apply Empirical Bayes, we will approximate the marginal using the
maximum likelihood estimate (MLE). But since the posterior is a Gamma distribution, the MLE of the marginal turns out to be just the mean the of posterior, which is the point estimate E( heta|y) we need. Recalling that the mean mu of a Gamma distribution G(alpha', eta') is simply alpha' eta', we have:E( heta|y) = alpha' eta' = frac{ar{y}+alpha}{1+1 / eta} = frac{eta}{1+eta}ar{y} + frac{1}{1+eta} (alpha eta)
To obtain the values of alpha and eta, Empirical Bayes prescribes estimating mean alphaeta and variance alphaeta^2 using the complete set of empirical data.
The resulting point estimate E( heta|y) is therefore like a weighted average of the sample mean ar{y} and the prior mean mu = alphaeta. This turns out to be a general feature of Empirical Bayes; the point estimates for the prior (i.e mean) will look like a weighted averages of the sample estimate and the prior estimate. (Likewise for estimates of the variance).
ee also
*
Bayes estimator
*Bayes' theorem
*Bayesian probability
*Best linear unbiased prediction
*Conditional probability
*Monty Hall problem
*Posterior probability
*Bayesian coding hypothesis References
* Herbert Robbins, "An Empirical Bayes Approach to Statistics", Proceeding of the Third Berkeley Symposium on Mathematical Statistics, volume 1, pages 157-163, University of California Press, Berkeley, 1956.
* Bradley P. Carlin and Thomas A. Louis, "Bayes and Empirical Bayes Methods for Data Analysis", Chapman & Hall/CRC, Second edition 2000,
* Peter E. Rossi, Greg M. Allenby, and Robert McCulloch, "Bayesian Statistics and Marketing", John Wiley & Sons, Ltd, 2006
* George Casella, "An Introduction to Empirical Bayes Data Analysis" American Statistician, Vol. 39, No. 2 (May, 1985), pp. 83-87
External links
* [http://ca.geocities.com/hauer@rogers.com/Pubs/TRBpaper.pdf Use of Empirical Bayes Method in estimating road safety (North America)]
* [http://www.math.uu.se/research/pub/Brandel.pdf Empirical Bayes Methods for missing data analysis]
* [http://it.stlawu.edu/~msch/biometrics/papers.htm Using the Beta-Bionomial distribution to assess performance of a biometric identification device]
* [http://www.biomedcentral.com/1471-2105/7/514/abstract/ A Hierarchical Naive Bayes Classifiers] (for continuous and [http://labmedinfo.org/download/lmi339.pdf discrete] variables).
Wikimedia Foundation. 2010.