Finding most probable sequence of hidden states
We often wish to take a particular HMM, and determine from an observation sequence the most likely sequence of underlying hidden states that might have generated it.
1. Exhaustive search for a solution
We can use a picture of the execution trellis to visualise the relationship between states and observations.
数据挖掘实验室
We can find the most probable sequence of hidden states by listing all possible sequences of hidden states and finding the probability of the observed sequence for each of the combinations. The most probable sequence of hidden states is that combination that maximises
Pr(observed sequence | hidden state combination). 数据挖掘实验室
For example, for the observation sequence in the trellis shown, the most probable sequence of hidden states is the sequence that maximises :
数据挖掘工具
Pr(dry,damp,soggy | sunny,sunny,sunny), Pr(dry,damp,soggy | sunny,sunny,cloudy), Pr(dry,damp,soggy | sunny,sunny,rainy), . . . . Pr(dry,damp,soggy | rainy,rainy,rainy) 数据挖掘论坛
This approach is viable, but to find the most probable sequence by exhaustively calculating each combination is computationally expensive. As with the forward algorithm, we can use the time invariance of the probabilities to reduce the complexity of the calculation. 数据挖掘研究院
2. Reducing complexity using recursion
We will consider recursively finding the most probable sequence of hidden states given an observation sequence and a HMM. We will first define the partial probability
, which is the probability of reaching a particular intermediate state in the trellis. We then show how these partial probabilities are calculated at t=1 and at t=n (> 1). 数据挖掘工具
These partial probabilities differ from those calculated in the forward algorithm since they represent the probability of the most probable path to a state at time t, and not a total.
数据挖掘论坛
2a. Partial probabilities (
′s) and partial best paths
Consider the trellis below showing the states and first order transitions for the observation sequence dry,damp,soggy; 数据挖掘实验室
For each intermediate and terminating state in the trellis there is a most probable path to that state. So, for example, each of the three states at t = 3 will have a most probable path to it, perhaps like this;
数据挖掘工具
We will call these paths partial best paths. Each of these partial best paths has an associated probability, the partial probability or
. Unlike the partial probabilities in the forward algorithm,
is the probablity of the one (most probable) path to the state.
Thus
(i,t) is the maximum probability of all sequences ending at state i at time t, and the partial best path is the sequence which achieves this maximal probability. Such a probability (and partial path) exists for each possible value of i and t. 数据挖掘研究院
In particular, each state at time t = T will have a partial probability and a partial best path. We find the overall best path by choosing the state with the maximum partial probability and choosing its partial best path.
2b. Calculating
′s at time t = 1
We calculate the
partial probabilities as the most probable route to our current position (given particular knowledge such as observation and probabilities of the previous state). When t = 1 the most probable path to a state does not sensibly exist; however we use the probability of being in that state given t = 1 and the observable state k1 ; i.e.
数据挖掘论坛
数据挖掘实验室
- as in the forward algorithm, this quantity is compounded by the appropriate observation probability.
2c. Calculating
′s at time t ( > 1 )
We now show that the partial probabilities
at time t can be calculated in terms of the
′s at time t-1.
Consider the trellis below :
数据挖掘工具
![[Picture]](http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/viterbi_algorithm/graphics/abcxtrellis.gif)
We consider calculating the most probable path to the state X at time t; this path to X will have to pass through one of the states A, B or C at time (t-1). 数据挖掘实验室
Therefore the most probable path to X will be one of
| |
(sequence of states), . . ., A, X |
| |
(sequence of states), . . ., B, X |
| or |
(sequence of states), . . ., C, X |
数据挖掘论坛
We want to find the path ending AX, BX or CX which has the maximum probability. 数据挖掘工具
Recall that the Markov assumption says that the probability of a state occurring given a previous state sequence depends only on the previous n states. In particular, with a first order Markov assumption, the probability of X occurring after a sequence depends only on the previous state, i.e.
Pr (most probable path to A) . Pr (X | A) . Pr (observation | X) Following this, the most probable path ending AX will be the most probable path to A followed by X. Similarly, the probability of this path will be
Pr (most probable path to A) . Pr (X | A) . Pr (observation | X) So, the probability of the most probable path to X is :
where the first term is given by
at t-1, the second by the transition probabilities and the third by the observation probabilities.
数据挖掘交友
Generalising the above expression, the probability of the partial best path to a state i at time t when the observation kt is seen, is :
![[Formula]](http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/viterbi_algorithm/graphics/6.1.2.3_b.gif)
Here, we are assuming knowledge of the previous state, using the transition probabilites and multiplying by the appropriate observation probability. We then select the maximum such. 数据挖掘研究院
2d. Back pointers,
′s
Consider the trellis
![[Trellis]](http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/viterbi_algorithm/graphics/trellis.1.gif)
At each intermediate and end state we know the partial probability,
(i,t). However the aim is to find the most probable sequence of states through the trellis given an observation sequence - therefore we need some way of remembering the partial best paths through the trellis. 数据挖掘工具
Recall that to calculate the partial probability,
at time t we only need the
′s for time t-1. Having calculated this partial probability, it is thus possible to record which preceding state was the one to generate
(i,t) - that is, in what state the system must have been at time t-1 if it is to arrive optimally at state i at time t. This recording (remembering) is done by holding for each state a back pointer
which points to the predecessor that optimally provokes the current state.
Formally, we can write
数据挖掘研究院
Here, the argmax operator selects the index j which maximises the bracketed expression. 数据挖掘工具
Notice that this expression is calculated from the
′s of the preceding time step and the transition probabilites, and does not include the obervation probability (unlike the calculation of the
′s themselves). This is because we want these
′s to answer the question `If I am here, by what route is it most likely I arrived?′ - this question relates to the hidden states, and therefore confusing factors due to the observations can be overlooked.
2e. Advantages of the approach
Using the Viterbi algorithm to decode an observation sequence carries two important advantages:
数据挖掘论坛
- There is a reduction in computational complexity by using the recursion - this argument is exactly analogous to that used in justifying the forward algorithm.
- The Viterbi algorithm has the very useful property of providing the best interpretation given the entire context of the observations. An alternative to it might be, for example, to decide on the execution sequence
数据挖掘实验室
where 数据挖掘研究院
数据挖掘论坛
Here, decisions are taken about a likely interpretation in a `left-to-right′ manner, with an interpretaion being guessed given an interpretation of the preceding stage (with initialisation from the
vector).
数据挖掘论坛
- continued...
This approach, in the event of a noise garble half way through the sequence, will wander away from the correct answer.
Conversely, the Viterbi algorithm will look at the whole sequence before deciding on the most likely final state, and then `backtracking′ through the
pointers to indicate how it might have arisen. This is very useful in `reading through′ isolated noise garbles, which are very common in live data. 数据挖掘论坛
3. Section Summary
The Viterbi algorithm provides a computationally efficient way of analysing observations of HMMs to recapture the most likely underlying state sequence. It exploits recursion to reduce comuputational load, and uses the context of the entire sequence to make judgements, thereby allowing good analysis of noise. 数据挖掘研究院
In use, the algorithm proceeds through an execution trellis calculating a partial probability for each cell, together with a back-pointer indicating how that cell could most probably be reached. On completion, the most likely final state is taken as correct, and the path to it traced back to t=1 via the back pointers.
资料全文下载