Applications of Hidden Markov Chains

View

Inference

Several inference problems are associated with hidden Markov models, as outlined below.


Probability of an observed sequence

Probability of an observed sequence

The state transition and output probabilities of an HMM are indicated by the line opacity in the upper part of the diagram. Given that we have observed the output sequence in the lower part of the diagram, we may be interested in the most likely sequence of states that could have produced it. Based on the arrows that are present in the diagram, the following state sequences are candidates:

5 3 2 5 3 2
4 3 2 5 3 2
3 1 2 5 3 2

We can find the most likely sequence by evaluating the joint probability of both the state sequence and the observations for each case (simply by multiplying the probability values, which here correspond to the opacities of the arrows involved). In general, this type of problem (i.e. finding the most likely explanation for an observation sequence) can be solved efficiently using the Viterbi algorithm.

The task is to compute in a best way, given the parameters of the model, the probability of a particular output sequence. This requires summation over all possible state sequences:

The probability of observing a sequence

Y=y(0),y(1),\dots ,y(L-1)\,

of length L is given by

P(Y)=\sum _{X}P(Y\mid X)P(X),\,

where the sum runs over all possible hidden-node sequences

X=x(0),x(1),\dots ,x(L-1).\,

Applying the principle of dynamic programming, this problem, too, can be handled efficiently using the forward algorithm.


Probability of the latent variables

A number of related tasks ask about the probability of one or more of the latent variables, given the model's parameters and a sequence of observations y(1),\dots ,y(t).


Filtering

The task is to compute, given the model's parameters and a sequence of observations, the distribution over hidden states of the last latent variable at the end of the sequence, i.e. to compute P(x(t)\ |\ y(1),\dots ,y(t)). This task is used when the sequence of latent variables is thought of as the underlying states that a process moves through at a sequence of points in time, with corresponding observations at each point. Then, it is natural to ask about the state of the process at the end.

This problem can be handled efficiently using the forward algorithm. An example is when the algorithm is applied to a Hidden Markov Network to determine \mathrm {P} {\big (}h_{t}\ |v_{1:t}{\big )}.


Smoothing

This is similar to filtering but asks about the distribution of a latent variable somewhere in the middle of a sequence, i.e. to compute P(x(k)\ |\ y(1),\dots ,y(t)) for some k. From the perspective described above, this can be thought of as the probability distribution over hidden states for a point in time k in the past, relative to time t.

The forward-backward algorithm is a good method for computing the smoothed values for all hidden state variables.


Most likely explanation

The task, unlike the previous two, asks about the joint probability of the entire sequence of hidden states that generated a particular sequence of observations (see illustration on the right). This task is generally applicable when HMM's are applied to different sorts of problems from those for which the tasks of filtering and smoothing are applicable. An example is part-of-speech tagging, where the hidden states represent the underlying parts of speech corresponding to an observed sequence of words. In this case, what is of interest is the entire sequence of parts of speech, rather than simply the part of speech for a single word, as filtering or smoothing would compute.

This task requires finding a maximum over all possible state sequences, and can be solved efficiently by the Viterbi algorithm.


Statistical significance

For some of the above problems, it may also be interesting to ask about statistical significance. What is the probability that a sequence drawn from some null distribution will have an HMM probability (in the case of the forward algorithm) or a maximum state sequence probability (in the case of the Viterbi algorithm) at least as large as that of a particular output sequence?When an HMM is used to evaluate the relevance of a hypothesis for a particular output sequence, the statistical significance indicates the false positive rate associated with failing to reject the hypothesis for the output sequence.