Metropolis–Hastings algorithm

Metropolis–Hastings algorithm
The Proposal distribution Q proposes the next point that the random walk might move to.

In mathematics and physics, the Metropolis–Hastings algorithm is a Markov chain Monte Carlo method for obtaining a sequence of random samples from a probability distribution for which direct sampling is difficult. This sequence can be used to approximate the distribution (i.e., to generate a histogram), or to compute an integral (such as an expected value).

Contents

History

The algorithm was named after Nicholas Metropolis, who was an author along with Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller of the 1953 paper Equation of State Calculations by Fast Computing Machines which first proposed the algorithm for the specific case of the Boltzmann distribution;[1] and W. Keith Hastings,[2] who extended it to the more general case in 1970.[3] There is controversy over the credit for discovery of the algorithm. Edward Teller states in his memoirs that the five authors of the 1953 paper worked together for "days (and nights)".[4] M. Rosenbluth, in an oral history recorded shortly before his death [5] credits E. Teller with posing the original problem, himself with solving it, and A.W. Rosenbluth (his wife) with programming the computer. According to M. Rosenbluth, neither Metropolis nor A.H. Teller participated in any way. Rosenbluth's account of events is supported by other contemporary recollections.[6]

The Gibbs sampling algorithm is a special case of the Metropolis–Hastings algorithm which is usually faster and easier to use but is less generally applicable.

Overview

The Metropolis–Hastings algorithm can draw samples from any probability distribution \displaystyle P(x), requiring only that a function that is proportional to the density be calculable. In Bayesian applications, the normalization factor is often extremely difficult to compute, so the ability to generate a sample without knowing this constant of proportionality is a major virtue of the algorithm.

The general idea of the algorithm is to use a Markov chain that, at sufficiently long times, generates states that obey the \displaystyle P(x) distribution. To this purpose, the Markov chain must fulfill two conditions: ergodicity and condition of balance, which is generally obtained if it fulfills detailed balance (a stronger condition). The ergodicity condition ensures that at most, one asymptotic distribution exists. The detailed balance condition ensures that there exists at least one asymptotic distribution.

A Markov chain generates a new state \displaystyle x_{t+1} that depends only on the previous state \displaystyle x_t. The algorithm uses a proposal density \displaystyle Q(x'; x_t ), which depends on the current state \displaystyle x_t, to generate a new proposed sample \displaystyle x'. This proposal is "accepted" as the next value (\displaystyle x_{t+1}=x') if \displaystyle \alpha drawn from \displaystyle U(0,1), the uniform distribution, satisfies


\alpha < \frac{P(x')Q(x_t; x')}{P(x_t)Q(x'; x_t)} \,\!

If the proposal is not accepted, then the current value of \displaystyle x is retained: \displaystyle x_{t+1}=x_t.

For example, the proposal density could be a Gaussian function centered on the current state \displaystyle x_t:


Q( x'; x_t ) \sim N( x_t, \sigma^2 I) \,\!

reading \displaystyle Q(x'; x_t) as the probability density function for \displaystyle x' given the previous value \displaystyle x_t. This proposal density would generate samples centered around the current state with variance \displaystyle \sigma^2 I. The original Metropolis algorithm calls for the proposal density to be symmetric \displaystyle (\ Q(x; y) = Q(y; x)\ ) ; the generalization by Hastings lifts this restriction. It is also permissible for \displaystyle Q(x'; x_t) not to depend on \displaystyle x_t at all, in which case the algorithm is called "Independence Chain Metropolis-Hastings" (as opposed to "Random Walk Metropolis-Hastings"). The Independence Chain M-H algorithm with a suitable proposal density function can reduce serial correlation and offer higher efficiency than the random walk version, but it requires some a priori knowledge of the distribution.

Step-by-step instructions

Suppose the most recent value sampled is x_t\,. To follow the Metropolis–Hastings algorithm, we next draw a new proposal state x'\, with probability density Q(x'; x_t)\,, and calculate a value


a = a_1 a_2\,

where


a_1 = \frac{P(x')}{P(x_t)} \,\!

is the likelihood ratio between the proposed sample x'\, and the previous sample x_t\,, and


a_2 = \frac{Q( x_t; x' )}{Q(x';x_t)}

is the ratio of the proposal density in two directions (from x_t\, to x'\, and vice versa). This is equal to 1 if the proposal density is symmetric. Then the new state \displaystyle x_{t+1} is chosen according to the following rules.


\begin{matrix}
\mbox{If } a \geq 1: &  \\
& x_{t+1} = x',
\end{matrix}

\begin{matrix}
\mbox{else} & \\
& x_{t+1} = \left\{
                   \begin{matrix}
                       x'\mbox{ with probability }a \\
                       x_t\mbox{ with probability }1-a.
                   \end{matrix}
            \right.
\end{matrix}

The Markov chain is started from an arbitrary initial value \displaystyle x_0 and the algorithm is run for many iterations until this initial state is "forgotten". These samples, which are discarded, are known as burn-in. The remaining set of accepted values of x represent a sample from the distribution P(x).

The result of three Markov chains running on the 3D Rosenbrock function using the Metropolis-Hastings algorithm. The algorithm samples from regions where the posterior probability is high and the chains begin to mix in these regions. The approximate position of the maximum has been illuminated. Note that the red points are the ones that remain after the burn-in process. The earlier ones have been discarded.

The algorithm works best if the proposal density matches the shape of the target distribution \displaystyle P(x) from which direct sampling is difficult, that is Q(x'; x_t) \approx P(x') \,\!. If a Gaussian proposal density \displaystyle Q is used the variance parameter \displaystyle \sigma^2 has to be tuned during the burn-in period. This is usually done by calculating the acceptance rate, which is the fraction of proposed samples that is accepted in a window of the last \displaystyle N samples. The desired acceptance rate depends on the target distribution, however it has been shown theoretically that the ideal acceptance rate for a one dimensional Gaussian distribution is approx 50%, decreasing to approx 23% for an \displaystyle N-dimensional Gaussian target distribution.[7] If \displaystyle \sigma^2 is too small the chain will mix slowly (i.e., the acceptance rate will be high but successive samples will move around the space slowly and the chain will converge only slowly to \displaystyle P(x)). On the other hand, if \displaystyle \sigma^2 is too large the acceptance rate will be very low because the proposals are likely to land in regions of much lower probability density, so \displaystyle a_1 will be very small and again the chain will converge very slowly.

See also

Notes

  1. ^ Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. (1953). "Equations of State Calculations by Fast Computing Machines". Journal of Chemical Physics 21 (6): 1087–1092. Bibcode 1953JChPh..21.1087M. doi:10.1063/1.1699114. 
  2. ^ Rosenthal, Jeffrey (March 2004). "W.K. Hastings, Statistician and Developer of the Metropolis-Hastings Algorithm". http://probability.ca/hastings/. Retrieved 2009-06-02. 
  3. ^ Hastings, W.K. (1970). "Monte Carlo Sampling Methods Using Markov Chains and Their Applications". Biometrika 57 (1): 97–109. doi:10.1093/biomet/57.1.97. JSTOR 2334940. Zbl 0219.65008. 
  4. ^ Teller, Edward. "Memoirs: A Twentieth-Century Journey in Science and Politics". Perseus Publishing, 2001, p. 328
  5. ^ Rosenbluth, Marshall. "Oral History Transcript". American Institute of Physics
  6. ^ J.E. Gubernatis (2005). "Marshall Rosenbluth and the Metropolis Algorithm". Physics of Plasmas 12: 057303. Bibcode 2005PhPl...12e7303G. doi:10.1063/1.1887186. 
  7. ^ Roberts, G.O.; Gelman, A.; Gilks, W.R. (1997). "Weak convergence and optimal scaling of random walk Metropolis algorithms". Ann. Appl. Probab. 7 (1): 110–120. doi:10.1214/aoap/1034625254. 

References

  • Bernd A. Berg. Markov Chain Monte Carlo Simulations and Their Statistical Analysis. Singapore, World Scientific 2004.
  • Siddhartha Chib and Edward Greenberg: "Understanding the Metropolis–Hastings Algorithm". American Statistician, 49(4), 327–335, 1995
  • Bolstad, William M. (2010) Understanding Computational Bayesian Statistics, John Wiley ISBN 0-470-04609-8

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • Metropolis (disambiguation) — A metropolis (lit. mother city ) is a major city, defined in a variety of ways. Contents 1 Type of city 2 Places 2.1 Fictional places 3 Entertainment …   Wikipedia

  • Metropolis light transport — The Metropolis light transport (MLT) is a SIGGRAPH 1997 paper by Eric Veach and Leonidas J. Guibas, describing an application of a variant of the Monte Carlo method called the Metropolis Hastings algorithm to the rendering equation for generating …   Wikipedia

  • Nicholas Metropolis — Nicholas Constantine Metropolis Born June 11, 1915(1915 06 11) Chicago …   Wikipedia

  • Multiple-try Metropolis — In Markov chain Monte Carlo, the Metropolis–Hastings algorithm (MH) can be used to sample from a probability distribution which is difficult to sample from directly. However, the MH algorithm requires the user to supply a proposal distribution,… …   Wikipedia

  • Gibbs algorithm — In statistical mechanics, the Gibbs algorithm, first introduced by J. Willard Gibbs in 1878, is the injunction to choose a statistical ensemble (probability distribution) for the unknown microscopic state of a thermodynamic system by minimising… …   Wikipedia

  • Simulated annealing — (SA) is a generic probabilistic meta algorithm for the global optimization problem, namely locating a good approximation to the global optimum of a given function in a large search space. It is often used when the search space is discrete (e.g.,… …   Wikipedia

  • Gibbs sampling — In statistics and in statistical physics, Gibbs sampling or a Gibbs sampler is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables. The purpose of such a sequence is to… …   Wikipedia

  • Monte Carlo method — Not to be confused with Monte Carlo algorithm. Computational physics …   Wikipedia

  • Metaheuristic — In computer science, metaheuristic designates a computational method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. Metaheuristics make few or no assumptions about the… …   Wikipedia

  • List of numerical analysis topics — This is a list of numerical analysis topics, by Wikipedia page. Contents 1 General 2 Error 3 Elementary and special functions 4 Numerical linear algebra …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”