RSS
热门关键字:  数据挖掘  数据仓库  商业智能  搜索引擎  人工智能

Forward Algorithm

来源: 作者:unkonwn 时间:2004-11-29 点击:

Finding the probability of an observed sequence

1. Exhaustive search for solution

We want to find the probability of an observed sequence given an HMM - that is, the parameters (P,A,B) are known. Consider the weather example; we have a HMM describing the weather and its relation to the state of the seaweed, and we also have a sequence of seaweed observations. Suppose the observations for 3 consecutive days are (dry,damp,soggy) - on each of these days, the weather may have been sunny, cloudy or rainy. We can picture the observations and the possible hidden states as a trellis.

 

Each column in the trellis shows the possible state of the weather and each state in one column is connected to each state in the adjacent columns. Each of these state transitions has a probability provided by the state transition matrix. Under each column is the observation at that time; the probability of this observation given any one of the above states is provided by the confusion matrix.

It can be seen that one method of calculating the probability of the observed sequence would be to find each possible sequence of the hidden states, and sum these probabilities. For the above example, there would be 3^3=27 possible different weather sequences, and so the probability is 数据挖掘研究院

Pr(dry,damp,soggy | HMM) = Pr(dry,damp,soggy | sunny,sunny,sunny) + Pr(dry,damp,soggy | sunny,sunny ,cloudy) + Pr(dry,damp,soggy | sunny,sunny ,rainy) + . . . . Pr(dry,damp,soggy | rainy,rainy ,rainy)

Calculating the probability in this manner is computationally expensive, particularly with large models or long sequences, and we find that we can use the time invariance of the probabilities to reduce the complexity of the problem.

2. Reduction of complexity using recursion

We will consider calculating the probability of observing a sequence recursively given a HMM. We will first define a partial probability, which is the probability of reaching an intermediate state in the trellis. We then show how these partial probabilities are calculated at times t=1 and t=n (> 1).

Suppose throughout that the T-long observed sequence is

数据挖掘研究院

  数据挖掘研究院

Y{k{1}}, Y{k{2}}, .... , Y{k{T}}

2a. Partial probabilities, (a′s)

Consider the trellis below showing the states and first-order transitions for the observation sequence dry,damp,soggy; 数据挖掘研究院

  数据挖掘研究院

We can calculate the probability of reaching an intermediate state in the trellis as the sum of all possible paths to that state.

For example, the probability of it being cloudy at t = 2 is calculated from the paths; 数据挖掘研究院

  数据挖掘实验室

We denote the partial probability of state j at time t as at ( j ) - this partial probability is calculated as;

at ( j )= Pr( observation | hidden state is j ) x Pr(all paths to state j at time t)

The partial probabilities for the final observation hold the probability of reaching those states going through all possible paths - e.g., for the above trellis, the final partial probabilities are calculated from the paths :

  数据挖掘研究院

It follows that the sum of these final partial probabilities is the sum of all possible paths through the trellis, and hence is the probability of observing the sequence given the HMM.

Section 3 introduces an animated example of the calculation of the probabilities.

2b. Calculating a′s at time t = 1
We calculate partial probabilities as :

a t ( j )= Pr( observation | hidden state is j ) x Pr(all paths to state j at time t) 数据挖掘实验室

In the special case where t = 1, there are no paths to the state. The probability of being in a state at t = 1 is therefore the initial probability, i.e. Pr( state | t = 1) = P(state), and we therefore calculate partial probabilities at t = 1 as this probability multiplied by the associated observation probability; 数据挖掘研究院

  数据挖掘研究院

formula
Thus the probability of being in state j at intialisation is dependent on that state′s probability together with the probability of observing what we see at that time.
2c. Calculating a′s at time, t (> 1)
We recall that a partial probability is calculated as :

at ( j )= Pr( observation | hidden state is j ) x Pr(all paths to state j at time t)

数据挖掘研究院

We can assume (recursively) that the first term of the product is available, and now consider the term Pr(all paths to state j at time t). 数据挖掘研究院

To calculate the probability of getting to a state through all paths, we can calculate the probability of each path to that state and sum them - for example,

数据挖掘研究院

The number of paths needed to calculate a increases exponentially as the length of the observation sequence increases but the a′s at time t-1 give the probability of reaching that state through all previous paths, and we can therefore define a′s at time t in terms of those at time t-1 -i.e.,

  数据挖掘研究院

[Formula]

Thus we calculate the probabilities as the product of the appropriate observation probability (that is, that state j provoked what is actually seen at time t+1) with the sum of probabilities of reaching that state at that time - this latter comes from the transition probabilities together with a from the preceding stage.

Notice that we have an expression to calculate a at time t+1 using only the partial probabilities at time t.

We can now calculate the probability of an observation sequence given a HMM recursively - i.e. we use a′s at t=1 to calculate a′s at t=2; a′s at t=2 to calculate a′s at t=3; and so on until t = T. The probability of the sequence given the HMM is then the sum of the partial probabilities at time t = T

数据挖掘研究院

2d. Reduction of computational complexity
We can compare the computational complexity of calculating the probability of an observation sequence by exhaustive evaluation and by the recursive forward algorithm.

We have a sequence of T observations, O. We also have a Hidden Markov Model, l=(P,A,B), with n hidden states. 数据挖掘研究院

An exhaustive evaluation would involve computing for all possible execution sequences

  数据挖掘研究院

[Formula]

the quantity 数据挖掘研究院

  数据挖掘研究院

[Formula]

which sums the probability of observing what we do - note that the load here is exponential in T. Conversely, using the forward algorithm we can exploit knowledge of the previous time step to compute information about a new one - accordingly, the load will only be linear in T.

数据挖掘研究院

3. Summary

Our aim is to find the probability of a sequence of observations given a HMM - (Pr (observations | l).

We reduce the complexity of calculating this probability by first calculating partial probabilities (a′s). These represent the probability of getting to a particular state, s, at time t.

数据挖掘研究院

We then see that at time t = 1, the partial probabilities are calculated using the initial probabilities (from the P vector) and Pr(observation | state) (from the confusion matrix); also, the partial probabilities at time t (> 1) can be calculated using the partial probabilities at time t-1.

数据挖掘实验室

This definition of the problem is recursive, and the probability of the observation sequence is found by calculating the partial probabilities at time t = 1, 2, ..., T, and adding all a′s at t = T. 数据挖掘研究院

Notice that computing the probability in this way is far less expensive than calculating the probabilities for all sequences and adding them. 数据挖掘研究院

最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?