- Forward-backward algorithm
In
computer science , the forward-backward algorithm, adynamic programming algorithm for computing theprobability of a particular observation sequence, given the parameters of the model, operates in the context ofhidden Markov model s.Overview
The algorithm comprises three steps:
# computing forward probabilities
# computing backward probabilities
# computing smoothed valuesThe forward and backward steps are often called "forward message pass" and "backward message pass". The wording originates from the way the algorithm processes the given observation sequence. First the algorithm moves forward starting with the first observation in the sequence and going to the last, and then returning back to the first. At each single observation in the sequence probabilities to be used for calculations at the next observation are computed. During the backward pass the algorithm simultaneously performs the smoothing step. This step allows the algorithm to take into account any past observations of output for computing more accurate results.
Formal description
The following description takes as its base matrices of probability values rather than probability distributions. We transform the probability distributions related to a given
hidden Markov model into matrix notation as follows.The transition probabilities of a given random variable representing all possible states in thehidden Markov model will be represented by the matrix where the row index, i, will represent the start state and the column index, j, represents the target state. For example, in the example below would be defined as:Similarly, we will represent the probabilities of a new state, given some observed states as evidence, in an observation matrix where each diagonal entry contains the probability of a new state given the particular observed state at t as evidence. Note that t indicates a particular observation in the observation sequence. All other entries in the matrix will be zero. For example, in the example below the first observed evidence (t=1) is "umbrella". Therefore would be defined as:
This allows a very elegant description of how to compute the next forward probability. Let the set of forward probabilities be stored in yet another matrix . Where 1:t+1 indicates that the computed probabilities are depending on all forward probabilities from 1 to t+1 including the current one which we will describe with . Hence, is equal to the multiplication of the transposed transformation matrix with the current forward probabilities and the observation matrix for the next evidence observation. Finally the resulting matrix requires normalization, i.e. the resulting values in the matrix are divided by the sum of all values in the matrix. The normalization factor is described with . Therefore the resulting forward computation is described by the following general formula:
We can describe the backward computation that starts at the end of the sequence in a similar manner. Let the end of the sequence be described by the index k, starting at 0. Therefore running from k down to t=0 and calculating each backward probability can be described by the following formula:
Note that we use the non-transposed matrix of and that the order of the terms has changed. Also note that as a final product we have not a usual matrix multiplication, but a point product. This operation multiplies each value in one matrix with the corresponding value of the other. Finally note that the description in Russel & Norvig 2003 pp. 550 excludes the point product, thought the procedure is required earlier.
The third and final step involves the computation of smoothed probabilities . These are the point product of the backward and forward probabilities for each t. Therefore the formular is defined as:
The example in the following section will show numerically the calculation of these matrices.
Example
This example takes as its basis the umbrella world in Russel & Norvig 2003 pp. 540. The example uses a longer observation sequence {umbrella, umbrella, no umbrella, umbrella, umbrella} than the example described in the book. Assume the probabilities for rain given an umbrella observation are 90% if an umbrella is observed and 20% if no umbrella is observed. Further more assume that the probability of rain in t+1 given that it is raining in now (i.e. t) is 70% and the probability that it does not rain is 30%. The following matrices can be extracted from the umbrella world given the above observations:
Note that differs from the others because of the "no umbrella" observation. Also note that and are equal because they are symmetrical. Before we can start to compute forward messages we need to describe two special values, the first forward probability and the k+2 backward probability. The first forward probability at t=0 is defined by the prior of the random variable. The k+2 backward probability is defined by the "true" matrix. Therefore it follows:
Now we will first iterate through all t and compute the forward probabilities. The following matrices appear in the same order as described in the forward formula above. Some calculations may be less accurate due to the limited numbers of decimals used in this example.
Now that we have defined the forward probabilities, we continue to compute the backward probabilities. Again the matrices appear in the order as in the backward formula above.
Finally we will compute the smoothed probability values. The ordering of the matrices follows the smoothed values formula above.
Performance
The brute-force procedure for the solution of this problem is the generation of all possible sequences of observed events and hidden states with their probabilities using the two transition matrices. The joint probability of two sequences, given the model, is calculated by multiplying the corresponding probabilities. The brute force procedure has
time complexity , where is the length of sequences and is the number of symbols in the state alphabet. This is intractable for realistic problems, as the number of possible hidden node sequences typically is extremely high. However, the forward-backward algorithm has time complexity .Several enhancements are known to the general forward-backward algorithm which allow for computations to take place in constant space. In addition, as t grows, algorithms have been developed to compute efficiently through online smoothing such as the fixed-lag smoothing (FLS) algorithm Russel & Norvig 2003 pp. 552.Pseudocode
ForwardBackward(guessState, sequenceIndex): if sequenceIndex is past the end of the sequence, return 1 if (guessState, sequenceIndex) has been seen before, return saved result result = 0 for each neighboring state n: result = result + (transition probability from guessState to n given observation element at sequenceIndex)*ForwardBackward(n, sequenceIndex+1) save result for (guessState, sequenceIndex) return resultee also
*
Baum-Welch algorithm
*Hidden Markov model
*Viterbi algorithm References
* Lawrence R. Rabiner, [http://www.caip.rutgers.edu/~lrr/Reprints/tutorial%20on%20hmm%20and%20applications.pdf A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition] . "Proceedings of the
IEEE ", 77 (2), p. 257–286, February 1989.
*cite journal |author=Rabiner L. R., Juang B. H.|title=An introduction to hidden Markov models|journal=IEEE ASSP Magazine |month=January |pages=4–15 |year=1986
*cite book | author = Charniak Eugene|title = Statistical Language Learning|publisher = MIT Press|address = Cambridge, Massachusetts|year = 1993|id=ISDN 978-0-262-53141-2
*cite book | author = Russel S., Norvig P.|title = Artificial Intelligence A Modern Approach 2nd Edition|publisher = Pearson Education|address = Upper Saddle River, New Jersey|year = 2003|id=ISBN 0-13-080302-2External links
* [http://www.cs.jhu.edu/~jason/papers/#tnlp02 An Interactive Spreadsheet for Teaching the Forward-Backward Algorithm] (spreadsheet and article with step-by-step walk-through)
* [http://www.cs.brown.edu/research/ai/dynamics/tutorial/Documents/HiddenMarkovModels.html Tutorial of Hidden Markov Models including the forward-backward algorithm]
* [http://code.google.com/p/aima-java/ Collection of AI algorithms implemented in Java] (including HMM and the forward-backward algorithm)
Wikimedia Foundation. 2010.