Thanks B. H. Juang in Georgia Institute of Technology.
Thanks Wayne Ward in Carnegie Mellon University.
Introduction
Now we talk about Hidden Markov Model. Well, what is HMM used for? Consider the following problem:
Given an unknown observation:
O
, recognize it as one of
N
classes with minimum probability of error.
So how to define the error and error probability?
Conditional Error: Given
O
, the risk associated with deciding that it is a class
i
event:
R(Si|O)=∑j=1NeijP(Sj|O)
where
P(Sj|O)
is the probability of that
O
is a class
Sj
event and
eij
is the cost of classifying a class
j
event as a class
i
event.
eij>0,eii=0
.
Expected Error:
E=∫R(S(O)|O)p(O)dO
where
S(O)
is the
decision made on
O
based on a policy. Then the question can be considered as:
How should
S(O)
be made to achieve minimum error probability? Or
P(S(O)|O)
is maximized?
Bayes Decision Theory
If we institute the policy:
S(O)=Si=argmaxSjP(Sj|O)
then
R(S(O)|O)=minSjR(Sj|O)
. It is the so-called Maximum A Posteriori (MAP) decision. But how do we know
P(Sj|O),i=1,2,…,M
for any
O
?
Markov Model
States :
S={S0,S1,S2,…,SN}
Transition probabilities :
P(qt=Si|qt−1=Sj)
Markov Assumption:
P(qt=Si|qt−1=Sj,qt−1=Sk,…)=P(qt=Si|qt−1=Sj)=aji,aji≥0,∑i=1Naji=1,∀j
扫描二维码关注公众号,回复:
1524045 查看本文章
Hidden Markov Model
States:
S={S0,S1,S2,…,SN}
Transition probabilities :
P(qt=Si|qt−1=Sj)=aji
Output probability distributions (at state
j
for symbol
k
):
P(yt=Ok|qt=Sj)=bj(k,λj)
parameterized by
λj
.
HMM Problems and Solutions
- Evaluation: Given a model, compute probability of observation sequence.
- Decoding: Find a state sequence which maximizes probability of the observation sequence.
- Training: Adjust model parameters which maximizes probability of observed sequences.
Evaluation
Compute the probability of observation sequence
O=o1o2…oT
, given a HMM model parameter
λ
:
P(O|λ)=∑∀QP(O,Q|λ),Q=q0q1q2…qT=∑∀Qaq0q1bq1(o1)⋅aq1q2bq2(o2)⋯aqT−1qTbqT(oT)
This is not practical since the number of paths is
O(NT)
. How to deal with it?
Forward Algorithm
αt(j)=P(o1o2…ot,qt=Sj|λ)
Compute
α
recursively:
α0(j)={1,if Sj is the start state0,otherwiseαt(j)=[∑i=0Nαt−1(i)aij]bj(ot),t>0
Then
P(O|λ)=αT(N)
Computation is
O(N2T)
.
Backward Algorithm
βt(i)=P(ot+1ot+2…oT|qt=Si,λ)
Compute
β
recursively:
βT(i)={1,if Si is the end state0,otherwiseβt(i)=∑j=0Naijbj(ot+1)βt+1(j),t<T
Then
P(O|λ)=β0(0)
Computation is
O(N2T)
.
Decoding
Find the state sequence
Q
which maximizes
P(O,Q|λ)
.
Viterbi Algorithm
VPt(i)=maxq0q1…qt−1P(o1o2…ot,qt=Si|λ)
Compute
VP
recursively:
VPt(j)=maxi=0,1,…NVPt−1(i)aijbj(ot)t>0
Then
P(O,Q|λ)=VPT(N)
Save each maximum for backtrace at end.
Training
For the sake of tuning
λ
to maximize
P(O|λ)
, there is NO efficient algorithm for global optimum, nonetheless, efficient iterative algorithm capable of finding a local optimum exists.
Baum-Welch Reestimation
Define the probability of transiting from
Si
to
Sj
at time
t
given
O
as
ξt(i,j)=P(qt=Si,qt+1=Sj|O,λ)=αt(i)aijbj(Ot+1)βt+1(j)P(O|λ)
Let
a¯ij=Expected num. of trans. from Si to SjExpected num. of trans. from Si=∑T−1t=0ξt(i,j)∑T−1t=0∑Nj=0ξt(i,j)b¯j(k)=Expected num. of times in Sj with symbol kExpected num. of times in Sj=∑t:Ot+1=k∑Ni=0ξt(i,j)∑T−1t=0∑Ni=0ξt(i,j)
Training Algorithm:
- Initialize
λ=(A,B)
.
- Compute
α,β
and
ξ
.
- Estimate
λ¯=(A¯,B¯)
from
ξ
.
- Replace
λ
with
λ¯
.
- If not converge, go to 2.
Reference
More details needed? Refer to :
- “An Introduction to Hidden Markov Models”, by Rabiner and Juang.
- “Hidden Markov Models: Continuous Speech Recognition”, by Kai-Fu Lee.