混合高斯分布(GMM)
\qquad
摘记(一)描述的三硬币问题实际上是一个具有 2 个子分布的概率模型,更加广为人知的是具有
K
K
K 个子分布的混合高斯模型,整个过程基本上和三硬币问题是一样的。
【1】混合高斯分布的
E
M
EM
E M 公式的推导过程与“三硬币问题”几乎完全一样: 只需要将“三硬币问题”中的
P
(
y
∣
z
k
=
1
,
θ
)
=
α
k
y
(
1
−
α
k
)
1
−
y
,
k
∈
{
1
,
2
}
P(y|z_{k}=1,\theta)=\alpha_{k}^{y}(1-\alpha_{k})^{1-y},k\in\{1,2\}
P ( y ∣ z k = 1 , θ ) = α k y ( 1 − α k ) 1 − y , k ∈ { 1 , 2 } 替换为
P
(
x
∣
z
k
=
1
,
θ
)
=
N
(
x
∣
μ
k
,
Σ
k
)
,
k
∈
{
1
,
⋯
,
K
}
P(\boldsymbol{x}|z_{k}=1,\theta)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k}),k\in\{1,\cdots,K\}
P ( x ∣ z k = 1 , θ ) = N ( x ∣ μ k , Σ k ) , k ∈ { 1 , ⋯ , K } 就可以了。 【2】 不同之处仅仅在于:混合高斯分布的最大似然估计与“三硬币问题”不一样
\qquad
混合高斯模型
\qquad
p
(
x
∣
π
,
μ
,
Σ
)
=
∑
k
=
1
K
π
k
N
(
x
∣
μ
k
,
Σ
k
)
p(\boldsymbol{x}|\boldsymbol{\pi},\boldsymbol{\mu},\bold{\Sigma})=\displaystyle\sum_{k=1}^{K}\pi_{k}\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})
p ( x ∣ π , μ , Σ ) = k = 1 ∑ K π k N ( x ∣ μ k , Σ k )
,
∑
k
=
1
K
π
k
=
1
,
x
∈
R
D
\ \ ,\ \displaystyle\sum_{k=1}^{K}\pi_{k}=1,\ \boldsymbol{x}\in R^{D}
, k = 1 ∑ K π k = 1 , x ∈ R D
\qquad\qquad\qquad\qquad
其中,
N
(
x
∣
μ
,
Σ
)
=
1
(
2
π
)
D
2
∣
Σ
∣
−
1
2
exp
[
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
]
\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu},\boldsymbol{\Sigma})= \dfrac{1}{{(2\pi)}^{\frac{D}{2}}{|\boldsymbol\Sigma|}^{-\frac{1}{2}}}\exp[-\frac{1}{2}(\boldsymbol{x}-\boldsymbol{\mu})^{T}\boldsymbol\Sigma^{-1}(\boldsymbol{x}-\boldsymbol{\mu})]
N ( x ∣ μ , Σ ) = ( 2 π ) 2 D ∣ Σ ∣ − 2 1 1 exp [ − 2 1 ( x − μ ) T Σ − 1 ( x − μ ) ]
\qquad
下图是从一个
K
=
3
K=3
K = 3 的混合高斯分布中抽取了500个样本点,其中图(a)中的样本是从联合分布
p
(
x
,
z
)
=
p
(
z
)
p
(
x
∣
z
)
p(\boldsymbol x,\mathbf z) =p(\mathbf z)p(\boldsymbol x|\mathbf z)
p ( x , z ) = p ( z ) p ( x ∣ z ) 中抽取(包含了隐藏变量
z
\mathbf z
z 的信息,
z
\mathbf z
z 的3种状态分别对应着红 、绿 、蓝 色样本,对应了3种不同的高斯分布),图(b)是从边缘分布
p
(
x
)
=
∑
z
p
(
x
,
z
)
p(\boldsymbol x)=\sum\limits_{\mathbf z}p(\boldsymbol x,\mathbf z)
p ( x ) = z ∑ p ( x , z ) 中抽取(因为缺少隐藏变量
z
\mathbf z
z 的信息,所以只能看到有关
x
\boldsymbol x
x 的信息,无法知道
x
\boldsymbol x
x 到底属于哪个子分布 )。
From 《Pattern Recognition and Machine Learning》. Figure 9.5
p
(
x
)
=
∑
z
p
(
z
)
p
(
x
∣
z
)
p(\boldsymbol x)=\displaystyle\sum_{\mathbf z}p(\mathbf z)p(\boldsymbol x|\mathbf z)
p ( x ) = z ∑ p ( z ) p ( x ∣ z ) :an equivalent formulation of the Gaussian mixture involving an explicit latent variable(包含显式的 隐藏变量) 可以使用 ancestral sampling 生成服从混合高斯模型概率分布的随机样本【图(a)】。 (1) 根据
p
(
z
)
p(z)
p ( z ) 生成
z
∈
{
1
,
2
,
3
}
z\in\{1,2,3\}
z ∈ { 1 , 2 , 3 } 的值(例如以
π
3
\pi_{3}
π 3 的概率产生
z
=
3
z=3
z = 3 来表示
z
3
=
1
z_{3}=1
z 3 = 1 ) (2) 再根据条件概率
p
(
x
∣
z
3
=
1
)
=
N
(
x
∣
μ
3
,
Σ
3
)
p(\boldsymbol x|z_{3}=1)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{3},\boldsymbol{\Sigma}_{3})
p ( x ∣ z 3 = 1 ) = N ( x ∣ μ 3 , Σ 3 ) 产生
x
x
x 的值 如果在图(a)中忽略
z
z
z 的信息,就变成图(b)的情形。
\qquad
\qquad
图(b)中显示了从
p
(
x
∣
π
,
μ
,
Σ
)
p(\mathbf{x}|\boldsymbol{\pi},\boldsymbol{\mu},\bold{\Sigma})
p ( x ∣ π , μ , Σ ) 中抽取了
N
=
500
N=500
N = 5 0 0 个样本为
{
x
1
,
x
2
,
⋯
,
x
N
}
\{ \boldsymbol x_{1},\boldsymbol x_{2},\cdots,\boldsymbol x_{N}\}
{ x 1 , x 2 , ⋯ , x N } ,如果引入隐藏变量
z
z
z 表示GMM中的每一个分布 ,那么这些样本实际上可能是图(a)中所显示的
{
(
x
1
,
z
2
)
,
(
x
2
,
z
3
)
,
(
x
3
,
z
1
)
,
⋯
,
(
x
N
,
z
1
)
}
\{(x_{1},z_{2}), (x_{2},z_{3}), (x_{3},z_{1}),\cdots,(x_{N},z_{1}) \}
{ ( x 1 , z 2 ) , ( x 2 , z 3 ) , ( x 3 , z 1 ) , ⋯ , ( x N , z 1 ) } ,此处采用的是1-of-K
表示法,
z
k
z_{k}
z k 意味着
z
k
=
1
z_{k}=1
z k = 1 ,表明该样本点是从第
k
k
k 个高斯分布中抽取的。
\qquad
1. 隐藏变量的 1-of-K 表示
\qquad
采用
1
−
o
f
−
K
1-of-K
1 − o f − K 表示隐藏变量,需要将
K
K
K 个隐藏变量 组织成 向量 的形式,而且在由
K
K
K 个隐藏变量组成的向量
z
=
[
z
1
,
z
2
,
⋯
,
z
K
]
T
\bold z =[ z_{1},z_{2},\cdots,z_{K} ]^{T}
z = [ z 1 , z 2 , ⋯ , z K ] T 中,仅有一个分量的值为1,其余分量均为0 。
\qquad
以上图中
K
=
3
K=3
K = 3 的情况为例:
1
)
\qquad1)
1 ) 隐藏变量
z
1
=
1
z_{1}=1
z 1 = 1 ,即隐藏向量
z
=
[
1
,
0
,
0
]
T
\bold z=[1,0,0]^{T}
z = [ 1 , 0 , 0 ] T ,表示 事件 “选中第1个(红色 )子分布”
\qquad
该事件的概率为
P
(
z
1
=
1
∣
θ
)
=
π
1
P(z_{1}=1|\theta)=\pi_{1}
P ( z 1 = 1 ∣ θ ) = π 1
2
)
\qquad2)
2 ) 隐藏变量
z
2
=
1
z_{2}=1
z 2 = 1 ,即隐藏向量
z
=
[
0
,
1
,
0
]
T
\bold z=[0,1,0]^{T}
z = [ 0 , 1 , 0 ] T ,表示 事件 “选中第2个(绿色 )子分布”
\qquad
该事件的概率为
P
(
z
2
=
1
∣
θ
)
=
π
2
P(z_{2}=1|\theta)=\pi_{2}
P ( z 2 = 1 ∣ θ ) = π 2
3
)
\qquad3)
3 ) 隐藏变量
z
3
=
1
z_{3}=1
z 3 = 1 ,即隐藏向量
z
=
[
0
,
0
,
1
]
T
\bold z=[0,0,1]^{T}
z = [ 0 , 0 , 1 ] T ,表示 事件 “选中第3个(蓝色 )子分布”
\qquad
该事件的概率为
P
(
z
3
=
1
∣
θ
)
=
π
3
P(z_{3}=1|\theta)=\pi_{3}
P ( z 3 = 1 ∣ θ ) = π 3
\qquad
选中第k个子分布的(先验)概率可以统一描述为:
P
(
z
k
=
1
∣
θ
)
=
π
k
,
k
∈
{
1
,
2
,
3
}
P(z_{k}=1|\theta)=\pi_{k}\ ,\ \ \ k\in\{1,2,3\}
P ( z k = 1 ∣ θ ) = π k , k ∈ { 1 , 2 , 3 }
\qquad
\qquad
用隐藏向量
z
=
[
z
1
,
z
2
,
z
3
]
T
\mathbf z=[z_{1},z_{2},z_{3}]^{T}
z = [ z 1 , z 2 , z 3 ] T 表示,也就是:
\qquad\qquad
P
(
z
∣
θ
)
=
∏
k
=
1
3
π
k
z
k
=
π
1
z
1
⋅
π
2
z
2
⋅
π
3
z
3
,
z
=
[
z
1
z
2
z
3
]
∈
{
[
1
0
0
]
,
[
0
1
0
]
,
[
0
0
1
]
}
\begin{aligned} P(\mathbf z|\theta) &= \prod_{k=1}^{3} \pi_{k}^{z_{k}} =\pi_{1}^{z_{1}}\cdot\pi_{2}^{z_{2}}\cdot\pi_{3}^{z_{3}}\ \ ,\ \mathbf z= \left[ \begin{matrix} z_{1}\\z_{2}\\z_{3} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\0\\1 \end{matrix} \right]\right\} \end{aligned}
P ( z ∣ θ ) = k = 1 ∏ 3 π k z k = π 1 z 1 ⋅ π 2 z 2 ⋅ π 3 z 3 , z = ⎣ ⎡ z 1 z 2 z 3 ⎦ ⎤ ∈ ⎩ ⎨ ⎧ ⎣ ⎡ 1 0 0 ⎦ ⎤ , ⎣ ⎡ 0 1 0 ⎦ ⎤ , ⎣ ⎡ 0 0 1 ⎦ ⎤ ⎭ ⎬ ⎫
\qquad
对于具有
K
K
K 个分布的 GMM,隐藏向量
z
=
[
z
1
,
z
2
,
⋯
,
z
K
]
T
\mathbf z=[z_{1},z_{2},\cdots,z_{K}]^{T}
z = [ z 1 , z 2 , ⋯ , z K ] T ,则为:
\qquad\qquad
P
(
z
∣
θ
)
=
∏
k
=
1
K
π
k
z
k
\begin{aligned} P(\mathbf z|\theta) &= \prod_{k=1}^{K} \pi_{k}^{z_{k}}\end{aligned}
P ( z ∣ θ ) = k = 1 ∏ K π k z k
=
π
1
z
1
⋅
π
2
z
2
⋯
π
K
z
K
,
z
=
[
z
1
z
2
z
3
⋮
z
K
]
∈
{
[
1
0
0
⋮
0
]
,
[
0
1
0
⋮
0
]
,
⋯
,
[
0
0
⋮
0
1
]
}
=\pi_{1}^{z_{1}}\cdot\pi_{2}^{z_{2}}\cdots\pi_{K}^{z_{K}},\ \mathbf z= \left[ \begin{matrix} z_{1}\\z_{2}\\z_{3}\\ \vdots \\z_{K} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0\\0\\ \vdots \\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1\\0\\ \vdots \\0 \end{matrix} \right],\cdots,\left[ \begin{matrix} 0\\0\\ \vdots \\0\\1 \end{matrix} \right]\right\}
= π 1 z 1 ⋅ π 2 z 2 ⋯ π K z K , z = ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡ z 1 z 2 z 3 ⋮ z K ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎤ ∈ ⎩ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎧ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡ 1 0 0 ⋮ 0 ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎤ , ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡ 0 1 0 ⋮ 0 ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎤ , ⋯ , ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡ 0 0 ⋮ 0 1 ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎤ ⎭ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎪ ⎫
\qquad
2. 引入隐藏向量
z
\mathbf z
z 表示观测值
x
\boldsymbol x
x 的概率
\qquad
从图(a)中可以很明显地看出:
1
)
\qquad1)
1 ) 红色 的观测样本
x
\boldsymbol x
x 都是由第1个(红色 )子分布产生,其概率为:
P
(
x
∣
z
1
=
1
,
θ
)
=
N
(
x
∣
μ
1
,
Σ
1
)
P(\boldsymbol x|z_{1}=1,\theta)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{1},\boldsymbol{\Sigma}_{1})
P ( x ∣ z 1 = 1 , θ ) = N ( x ∣ μ 1 , Σ 1 )
2
)
\qquad2)
2 ) 绿色 的观测样本
x
\boldsymbol x
x 都是由第2个(绿色 )子分布产生,其概率为:
P
(
x
∣
z
2
=
1
,
θ
)
=
N
(
x
∣
μ
2
,
Σ
2
)
P(\boldsymbol x|z_{2}=1,\theta)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{2},\boldsymbol{\Sigma}_{2})
P ( x ∣ z 2 = 1 , θ ) = N ( x ∣ μ 2 , Σ 2 )
3
)
\qquad3)
3 ) 蓝色 的观测样本
x
\boldsymbol x
x 都是由第3个(蓝色 )子分布产生,其概率为:
P
(
x
∣
z
3
=
1
,
θ
)
=
N
(
x
∣
μ
3
,
Σ
3
)
P(\boldsymbol x|z_{3}=1,\theta)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{3},\boldsymbol{\Sigma}_{3})
P ( x ∣ z 3 = 1 , θ ) = N ( x ∣ μ 3 , Σ 3 )
\qquad
因此,图(a)中不同颜色样本点所描述的子分布(此处
K
=
3
K=3
K = 3 )可以统一描述为:
\qquad
P
(
x
∣
z
k
=
1
,
θ
)
=
N
(
x
∣
μ
k
,
Σ
k
)
,
k
=
1
,
⋯
,
K
\qquad\qquad P(\boldsymbol x|z_{k}=1,\theta)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ ,\qquad k=1,\cdots,K
P ( x ∣ z k = 1 , θ ) = N ( x ∣ μ k , Σ k ) , k = 1 , ⋯ , K
\qquad
\qquad
用隐藏向量
z
=
[
z
1
z
2
z
3
⋮
z
K
]
∈
{
[
1
0
0
⋮
0
]
,
[
0
1
0
⋮
0
]
,
⋯
,
[
0
0
⋮
0
1
]
}
\mathbf z= \left[ \begin{matrix} z_{1}\\z_{2}\\z_{3}\\ \vdots \\z_{K} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0\\0\\ \vdots \\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1\\0\\ \vdots \\0 \end{matrix} \right],\cdots,\left[ \begin{matrix} 0\\0\\ \vdots \\0\\1 \end{matrix} \right]\right\}
z = ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡ z 1 z 2 z 3 ⋮ z K ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎤ ∈ ⎩ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎧ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡ 1 0 0 ⋮ 0 ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎤ , ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡ 0 1 0 ⋮ 0 ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎤ , ⋯ , ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡ 0 0 ⋮ 0 1 ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎤ ⎭ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎪ ⎫ 表示,也就是:
\qquad
\qquad\qquad
P
(
x
∣
z
,
θ
)
=
∏
k
=
1
K
P
(
x
∣
z
k
=
1
,
θ
)
z
k
=
P
(
x
∣
z
1
=
1
,
θ
)
z
1
⋅
P
(
x
∣
z
2
=
1
,
θ
)
z
2
⋯
P
(
x
∣
z
K
=
1
,
θ
)
z
K
=
N
(
x
∣
μ
1
,
Σ
1
)
z
1
⋅
N
(
x
∣
μ
2
,
Σ
2
)
z
2
⋯
N
(
x
∣
μ
K
,
Σ
K
)
z
K
=
∏
k
=
1
K
[
N
(
x
∣
μ
k
,
Σ
k
)
]
z
k
\begin{aligned} P(\boldsymbol{x}|\mathbf z,\theta) &= \prod_{k=1}^{K} P(\boldsymbol{x}|z_{k}=1,\theta)^{z_{k}} \\ &=P(\boldsymbol{x}|z_{1}=1,\theta)^{z_{1}}\cdot P(\boldsymbol{x}|z_{2}=1,\theta)^{z_{2}} \cdots P(\boldsymbol{x}|z_{K}=1,\theta)^{z_{K}}\\ &=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{1},\boldsymbol{\Sigma}_{1})^{z_{1}}\cdot\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{2},\boldsymbol{\Sigma}_{2})^{z_{2}} \cdots\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{K},\boldsymbol{\Sigma}_{K})^{z_{K}} \\ &= \prod_{k=1}^{K} \left[\ \mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right]^{z_{k}} \end{aligned}
P ( x ∣ z , θ ) = k = 1 ∏ K P ( x ∣ z k = 1 , θ ) z k = P ( x ∣ z 1 = 1 , θ ) z 1 ⋅ P ( x ∣ z 2 = 1 , θ ) z 2 ⋯ P ( x ∣ z K = 1 , θ ) z K = N ( x ∣ μ 1 , Σ 1 ) z 1 ⋅ N ( x ∣ μ 2 , Σ 2 ) z 2 ⋯ N ( x ∣ μ K , Σ K ) z K = k = 1 ∏ K [ N ( x ∣ μ k , Σ k ) ] z k
\qquad
\qquad
又由
P
(
z
∣
θ
)
=
∏
k
=
1
K
π
k
z
k
\begin{aligned} P(\mathbf z|\theta) &= \prod_{k=1}^{K} \pi_{k}^{z_{k}}\end{aligned}
P ( z ∣ θ ) = k = 1 ∏ K π k z k
\qquad
则有:
p
(
x
)
=
∑
z
p
(
z
)
p
(
x
∣
z
)
=
∑
k
=
1
K
π
k
N
(
x
∣
μ
k
,
Σ
k
)
p(\boldsymbol x)=\displaystyle\sum_{\mathbf z}p(\mathbf z)p(\boldsymbol x|\mathbf z)=\displaystyle\sum_{k=1}^{K}\pi_{k}\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})
p ( x ) = z ∑ p ( z ) p ( x ∣ z ) = k = 1 ∑ K π k N ( x ∣ μ k , Σ k )
\qquad
对于所有样本集
X
=
{
x
1
,
x
2
,
⋯
,
x
N
}
\mathbf X=\{ \boldsymbol x_{1},\boldsymbol x_{2},\cdots,\boldsymbol x_{N}\}
X = { x 1 , x 2 , ⋯ , x N } ,对应的隐藏变量(向量)集为
Z
=
{
z
1
,
z
2
,
⋯
,
z
N
}
\mathbf Z=\{ \mathbf z_{1},\mathbf z_{2},\cdots,\mathbf z_{N}\}
Z = { z 1 , z 2 , ⋯ , z N } ,就有:
p
(
X
∣
Z
,
μ
,
Σ
,
π
)
=
∏
n
=
1
N
∏
k
=
1
K
{
N
(
x
n
∣
μ
k
,
Σ
k
)
}
z
n
k
\qquad\qquad\begin{aligned} p(\mathbf X|\mathbf Z,\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) &= \prod_{n=1}^{N}\prod_{k=1}^{K}\left\{\ \mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}^{z_{nk}} \end{aligned}
p ( X ∣ Z , μ , Σ , π ) = n = 1 ∏ N k = 1 ∏ K { N ( x n ∣ μ k , Σ k ) } z n k
\qquad
可以得到:
p
(
X
,
Z
∣
μ
,
Σ
,
π
)
=
∏
n
=
1
N
∏
k
=
1
K
π
k
z
n
k
{
N
(
x
n
∣
μ
k
,
Σ
k
)
}
z
n
k
\qquad\qquad\begin{aligned} p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) &= \prod_{n=1}^{N}\prod_{k=1}^{K} \pi_{k}^{z_{nk}}\left\{\ \mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}^{z_{nk}} \end{aligned}
p ( X , Z ∣ μ , Σ , π ) = n = 1 ∏ N k = 1 ∏ K π k z n k { N ( x n ∣ μ k , Σ k ) } z n k
\qquad\qquad
其中,
z
n
k
z_{nk}
z n k 表示样本
x
n
\boldsymbol x_{n}
x n 所对应隐藏变量(向量 )
z
n
\mathbf z_{n}
z n 的第
k
k
k 个分量值(0或者1)。
\qquad
3. 对数似然函数
\qquad
\qquad
若样本点为图(a)的形式,也就是假设隐藏变量集
Z
\mathbf Z
Z 的信息已知(或说
z
n
k
z_{nk}
z n k 信息已知)时,为了获得模型中的未知参数
(
μ
,
Σ
,
π
)
(\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi})
( μ , Σ , π ) ,可以求取
ln
p
(
X
,
Z
∣
μ
,
Σ
,
π
)
\ln p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi})
ln p ( X , Z ∣ μ , Σ , π ) 的最大似然解。
ln
p
(
X
,
Z
∣
μ
,
Σ
,
π
)
=
∑
n
=
1
N
∑
k
=
1
K
z
n
k
{
ln
π
k
+
ln
N
(
x
n
∣
μ
k
,
Σ
k
)
}
\qquad\qquad\ln p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) = \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{K} z_{nk}\left\{\ \ln\pi_{k}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}
ln p ( X , Z ∣ μ , Σ , π ) = n = 1 ∑ N k = 1 ∑ K z n k { ln π k + ln N ( x n ∣ μ k , Σ k ) }
\qquad
如果
z
n
k
z_{nk}
z n k 信息已知,可将所有样本
X
=
{
x
1
,
x
2
,
⋯
,
x
N
}
\mathbf X=\{ \boldsymbol x_{1},\boldsymbol x_{2},\cdots,\boldsymbol x_{N}\}
X = { x 1 , x 2 , ⋯ , x N } 分为
K
K
K 组,第
k
k
k 组
C
k
=
{
n
∣
z
n
k
=
1
,
z
n
j
=
0
,
∀
j
≠
k
}
C_{k}=\{n\ |\ z_{nk}=1,z_{nj}=0,\forall j \not =k\}
C k = { n ∣ z n k = 1 , z n j = 0 , ∀ j = k } 中的所有样本点
x
n
\boldsymbol x_{n}
x n 是从
N
(
x
n
∣
μ
k
,
Σ
k
)
\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})
N ( x n ∣ μ k , Σ k ) 中抽样的,对数似然函数可以表示为:
ln
p
(
X
,
Z
∣
μ
,
Σ
,
π
)
=
∑
n
∈
C
1
{
ln
π
1
+
ln
N
(
x
n
∣
μ
1
,
Σ
1
)
}
+
⋯
+
∑
n
∈
C
k
{
ln
π
k
+
ln
N
(
x
n
∣
μ
k
,
Σ
k
)
}
+
⋯
+
∑
n
∈
C
K
{
ln
π
K
+
ln
N
(
x
n
∣
μ
K
,
Σ
K
)
}
\qquad\qquad\begin{aligned}\ln p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) &= \displaystyle\sum_{n\in C_{1}}\left\{\ \ln\pi_{1}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{1},\boldsymbol{\Sigma}_{1})\ \right\} \\ &+\cdots \\ &+\displaystyle\sum_{n\in C_{k}}\left\{\ \ln\pi_{k}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}\\ &+\cdots \\ &+\displaystyle\sum_{n\in C_{K}}\left\{\ \ln\pi_{K}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{K},\boldsymbol{\Sigma}_{K})\ \right\} \end{aligned}
ln p ( X , Z ∣ μ , Σ , π ) = n ∈ C 1 ∑ { ln π 1 + ln N ( x n ∣ μ 1 , Σ 1 ) } + ⋯ + n ∈ C k ∑ { ln π k + ln N ( x n ∣ μ k , Σ k ) } + ⋯ + n ∈ C K ∑ { ln π K + ln N ( x n ∣ μ K , Σ K ) }
\qquad
由于
K
K
K 组分布相互独立,只需要分别求出每一组的对数最大似然分量,对于第
k
k
k 组而言(假设有
N
k
N_{k}
N k 个样本):
ln
p
(
X
k
,
Z
k
∣
μ
,
Σ
,
π
)
=
∑
n
∈
C
k
{
ln
π
k
+
ln
N
(
x
n
∣
μ
k
,
Σ
k
)
}
=
N
k
ln
π
k
+
N
k
ln
N
(
x
n
∣
μ
k
,
Σ
k
)
=
N
k
ln
π
k
−
N
k
D
2
ln
(
2
π
)
−
N
k
2
ln
∣
Σ
k
∣
−
1
2
∑
n
∈
C
k
(
x
n
−
μ
k
)
T
Σ
k
−
1
(
x
n
−
μ
k
)
\qquad\qquad\begin{aligned}\ln p(\mathbf X_{k},\mathbf Z_{k}|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) &= \displaystyle\sum_{n\in C_{k}}\left\{\ \ln\pi_{k}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\} \\ &= N_{k}\ln\pi_{k}+ N_{k}\ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \\ &= N_{k}\ln\pi_{k}-\dfrac{N_{k}D}{2}\ln(2\pi)-\dfrac{N_{k}}{2}\ln\begin{vmatrix}\boldsymbol{\Sigma}_{k} \end{vmatrix}-\dfrac{1}{2}\displaystyle\sum_{n\in C_{k}}(\boldsymbol{x}_{n}-\boldsymbol{\mu}_{k})^{T}\boldsymbol{\Sigma}_{k}^{-1}(\boldsymbol{x}_{n}-\boldsymbol{\mu}_{k})\ \\ \end{aligned}
ln p ( X k , Z k ∣ μ , Σ , π ) = n ∈ C k ∑ { ln π k + ln N ( x n ∣ μ k , Σ k ) } = N k ln π k + N k ln N ( x n ∣ μ k , Σ k ) = N k ln π k − 2 N k D ln ( 2 π ) − 2 N k ln ∣ ∣ Σ k ∣ ∣ − 2 1 n ∈ C k ∑ ( x n − μ k ) T Σ k − 1 ( x n − μ k )
\qquad
分别对
μ
k
,
Σ
k
\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k}
μ k , Σ k 求最大似然解:
μ
^
k
=
1
N
k
∑
n
∈
C
k
x
n
=
1
N
k
∑
n
=
1
N
z
n
k
x
n
\qquad\qquad\hat \boldsymbol{\mu}_{k}=\dfrac{1}{N_{k}}\displaystyle\sum_{n\in C_{k}}\boldsymbol{x}_{n}=\dfrac{1}{N_{k}}\displaystyle\sum_{n=1}^{N}z_{nk}\boldsymbol{x}_{n}
μ ^ k = N k 1 n ∈ C k ∑ x n = N k 1 n = 1 ∑ N z n k x n
Σ
^
k
=
1
N
k
∑
n
∈
C
k
(
x
n
−
μ
^
k
)
(
x
n
−
μ
^
k
)
T
=
1
N
k
∑
n
=
1
N
z
n
k
(
x
n
−
μ
^
k
)
(
x
n
−
μ
^
k
)
T
\qquad\qquad\begin{aligned}\hat \boldsymbol{\Sigma}_{k}&=\dfrac{1}{N_{k}}\displaystyle\sum_{n\in C_{k}}(\boldsymbol{x}_{n}-\hat \boldsymbol{\mu}_{k})(\boldsymbol{x}_{n}-\hat \boldsymbol{\mu}_{k})^{T}\\ &=\dfrac{1}{N_{k}}\displaystyle\sum_{n=1}^{N}z_{nk}(\boldsymbol{x}_{n}-\hat \boldsymbol{\mu}_{k})(\boldsymbol{x}_{n}-\hat \boldsymbol{\mu}_{k})^{T} \\ \end{aligned}
Σ ^ k = N k 1 n ∈ C k ∑ ( x n − μ ^ k ) ( x n − μ ^ k ) T = N k 1 n = 1 ∑ N z n k ( x n − μ ^ k ) ( x n − μ ^ k ) T
\qquad\qquad
其中,
N
k
=
∑
n
=
1
N
z
n
k
N_{k}=\displaystyle\sum_{n=1}^{N}z_{nk}
N k = n = 1 ∑ N z n k
\qquad
4. 计算对数似然函数的期望
\qquad
由于所有隐藏变量
Z
=
{
z
1
,
z
2
,
⋯
,
z
N
}
\mathbf Z=\{ \mathbf z_{1},\mathbf z_{2},\cdots,\mathbf z_{N}\}
Z = { z 1 , z 2 , ⋯ , z N } 都是未知的,
E
M
EM
E M 算法通过求对数似然函数
ln
p
(
X
,
Z
∣
μ
,
Σ
,
π
)
\ln p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi})
ln p ( X , Z ∣ μ , Σ , π ) 的期望来猜测所有的隐藏变量。
\qquad
对数似然函数的期望:
\qquad\qquad
Q
(
θ
,
θ
(
i
)
)
=
E
p
(
Z
∣
X
,
θ
)
[
ln
p
(
X
,
Z
∣
μ
,
Σ
,
π
)
]
=
E
p
(
Z
∣
X
,
θ
)
{
∑
n
=
1
N
∑
k
=
1
K
z
n
k
{
ln
π
k
+
ln
N
(
x
n
∣
μ
k
,
Σ
k
)
}
}
=
∑
n
=
1
N
∑
k
=
1
K
E
p
(
Z
∣
X
,
θ
)
[
z
n
k
]
{
ln
π
k
+
ln
N
(
x
n
∣
μ
k
,
Σ
k
)
}
\begin{aligned}Q(\theta,\theta^{(i)})&=E_{p(\bold Z|\bold X,\theta)}[\ln p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi})] \\ &= E_{p(\bold Z|\bold X,\theta)}\left\{\displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{K} z_{nk}\left\{\ \ln\pi_{k}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}\right\} \\ &= \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{K} E_{p(\bold Z|\bold X,\theta)}[z_{nk}] \left\{\ \ln\pi_{k}+\ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\} \\ \end{aligned}
Q ( θ , θ ( i ) ) = E p ( Z ∣ X , θ ) [ ln p ( X , Z ∣ μ , Σ , π ) ] = E p ( Z ∣ X , θ ) { n = 1 ∑ N k = 1 ∑ K z n k { ln π k + ln N ( x n ∣ μ k , Σ k ) } } = n = 1 ∑ N k = 1 ∑ K E p ( Z ∣ X , θ ) [ z n k ] { ln π k + ln N ( x n ∣ μ k , Σ k ) }
\qquad
仍然假设观测数据
X
=
{
x
1
,
x
2
,
⋯
,
x
N
}
\mathbf X=\{ \boldsymbol x_{1},\boldsymbol x_{2},\cdots,\boldsymbol x_{N}\}
X = { x 1 , x 2 , ⋯ , x N } 的产生过程是:
\newline
1
)
\qquad1)
1 ) 首先,以
P
(
z
k
=
1
∣
θ
)
=
π
k
,
k
∈
{
1
,
⋯
,
K
}
P(z_{k}=1|\theta)=\pi_{k},\ k\in\{1,\cdots,K\}
P ( z k = 1 ∣ θ ) = π k , k ∈ { 1 , ⋯ , K } 的概率,确定
z
k
=
1
,
z
j
=
0
(
∀
j
≠
k
)
z_{k}=1, \ z_{j}=0\ (\forall j \not =k)
z k = 1 , z j = 0 ( ∀ j = k ) ,也就是选择第
k
k
k 个子分布
N
(
x
∣
μ
k
Σ
k
)
\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k}\boldsymbol{\Sigma}_{k})\newline
N ( x ∣ μ k Σ k )
2
)
\qquad2)
2 ) 然后,从第
k
k
k 个子分布
N
(
x
∣
μ
k
,
Σ
k
)
\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})
N ( x ∣ μ k , Σ k ) 中抽出样本
x
\boldsymbol x\newline
x
\qquad
如果每个观测样本
x
n
\boldsymbol x_{n}
x n 的产生过程都是独立的
,那么观测样本
x
n
\boldsymbol x_{n}
x n 的产生只与 隐藏向量
z
n
\mathbf z_{n}
z n 有关 。
\qquad
也就是求
\qquad\qquad
E
P
(
Z
∣
X
,
θ
)
[
z
n
k
]
=
∑
z
n
k
∈
{
0
,
1
}
z
n
k
p
(
z
n
∣
x
n
,
θ
(
i
)
)
E_{P(\bold Z|\bold X,\theta)}[z_{nk}]=\displaystyle\sum_{z_{nk}\in\{0,1\}} z_{nk} p(\mathbf z_{n}|\boldsymbol x_{n},\theta^{(i)})
E P ( Z ∣ X , θ ) [ z n k ] = z n k ∈ { 0 , 1 } ∑ z n k p ( z n ∣ x n , θ ( i ) )
\qquad
又由
\qquad\qquad
p
(
z
n
∣
x
n
,
θ
)
=
p
(
z
n
,
x
n
∣
θ
)
p
(
x
n
∣
θ
)
=
p
(
x
n
∣
z
n
,
θ
)
p
(
z
n
∣
θ
)
∑
z
n
{
p
(
x
n
∣
z
n
,
θ
)
p
(
z
n
∣
θ
)
}
,
z
n
=
[
z
n
1
z
n
2
z
n
3
⋮
z
n
K
]
∈
{
[
1
0
0
⋮
0
]
,
[
0
1
0
⋮
0
]
,
⋯
,
[
0
0
⋮
0
1
]
}
=
p
(
x
n
∣
z
n
k
=
1
,
θ
)
p
(
z
n
k
=
1
∣
θ
)
∑
j
=
1
K
{
p
(
x
n
∣
z
n
j
=
1
,
θ
)
p
(
z
n
j
=
1
∣
θ
)
}
=
π
k
p
(
x
n
∣
z
n
k
=
1
,
θ
)
∑
j
=
1
K
{
π
j
p
(
x
n
∣
z
n
j
=
1
,
θ
)
}
\begin{aligned}p(\mathbf z_{n}|\boldsymbol x_{n},\theta) &=\frac{p(\mathbf z_{n},\boldsymbol x_{n}|\theta)}{p(\boldsymbol x_{n}|\theta)} \\ &=\frac{p(\boldsymbol x_{n}|\mathbf z_{n},\theta)p(\mathbf z_{n}|\theta)}{\sum\limits_{\mathbf z_{n}}\left\{ \ p(\boldsymbol x_{n}|\mathbf z_{n},\theta)p(\mathbf z_{n}|\theta)\ \right\}},\ \ \ \ \ \mathbf z_{n}= \left[ \begin{matrix} z_{n1}\\z_{n2}\\z_{n3}\\ \vdots \\z_{nK} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0\\0\\ \vdots \\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1\\0\\ \vdots \\0 \end{matrix} \right],\cdots,\left[ \begin{matrix} 0\\0\\ \vdots \\0\\1 \end{matrix} \right]\right\} \\ &=\frac{p(\boldsymbol x_{n}|z_{nk}=1,\theta)p(z_{nk}=1|\theta)}{\sum\limits_{j=1}^{K}\left\{\ p(\boldsymbol x_{n}|z_{nj}=1,\theta)p(z_{nj}=1|\theta)\ \right\}} \\ &=\frac{\pi_{k}p(\boldsymbol x_{n}|z_{nk}=1,\theta)}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}p(\boldsymbol x_{n}|z_{nj}=1,\theta)\ \right\}} \\ \end{aligned}
p ( z n ∣ x n , θ ) = p ( x n ∣ θ ) p ( z n , x n ∣ θ ) = z n ∑ { p ( x n ∣ z n , θ ) p ( z n ∣ θ ) } p ( x n ∣ z n , θ ) p ( z n ∣ θ ) , z n = ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡ z n 1 z n 2 z n 3 ⋮ z n K ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎤ ∈ ⎩ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎧ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡ 1 0 0 ⋮ 0 ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎤ , ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡ 0 1 0 ⋮ 0 ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎤ , ⋯ , ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡ 0 0 ⋮ 0 1 ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎤ ⎭ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎪ ⎫ = j = 1 ∑ K { p ( x n ∣ z n j = 1 , θ ) p ( z n j = 1 ∣ θ ) } p ( x n ∣ z n k = 1 , θ ) p ( z n k = 1 ∣ θ ) = j = 1 ∑ K { π j p ( x n ∣ z n j = 1 , θ ) } π k p ( x n ∣ z n k = 1 , θ )
\qquad
因此
\qquad\qquad
E
P
(
Z
∣
X
,
θ
)
[
z
n
k
]
=
∑
z
n
k
∈
{
0
,
1
}
z
n
k
P
(
z
n
∣
x
n
,
θ
(
i
)
)
=
1
⋅
p
(
x
n
∣
z
n
k
=
1
,
θ
(
i
)
)
p
(
z
n
k
=
1
∣
θ
(
i
)
)
+
0
⋅
p
(
x
n
∣
z
n
k
=
0
,
θ
(
i
)
)
p
(
z
n
k
=
0
∣
θ
(
i
)
)
∑
j
=
1
K
{
p
(
x
n
∣
z
n
j
=
1
,
θ
(
i
)
)
p
(
z
n
j
=
1
∣
θ
(
i
)
)
}
=
π
k
p
(
x
n
∣
z
n
k
=
1
,
θ
(
i
)
)
∑
j
=
1
K
{
π
j
p
(
x
n
∣
z
n
j
=
1
,
θ
(
i
)
)
}
=
π
k
N
(
x
n
∣
μ
k
,
Σ
k
)
∑
j
=
1
K
{
π
j
N
(
x
n
∣
μ
j
,
Σ
j
)
}
,
k
∈
{
1
,
⋯
,
K
}
\begin{aligned}E_{P(\bold Z|\bold X,\theta)}[z_{nk}] &=\displaystyle\sum_{z_{nk}\in\{0,1\}} z_{nk} P(\mathbf z_{n}|\boldsymbol x_{n},\theta^{(i)}) \\ &=\frac{1\cdot p(\boldsymbol x_{n}|z_{nk}=1,\theta^{(i)})p(z_{nk}=1|\theta^{(i)})+0\cdot p(\boldsymbol x_{n}|z_{nk}=0,\theta^{(i)})p(z_{nk}=0|\theta^{(i)})}{\sum\limits_{j=1}^{K}\left\{\ p(\boldsymbol x_{n}|z_{nj}=1,\theta^{(i)})p(z_{nj}=1|\theta^{(i)})\ \right\}} \\ &=\frac{\pi_{k}p(\boldsymbol x_{n}|z_{nk}=1,\theta^{(i)})}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}p(\boldsymbol x_{n}|z_{nj}=1,\theta^{(i)})\ \right\}} \\ &=\frac{\pi_{k}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{j},\boldsymbol{\Sigma}_{j})\ \right\}}\qquad,\qquad k\in\{1,\cdots,K\} \\ \end{aligned}
E P ( Z ∣ X , θ ) [ z n k ] = z n k ∈ { 0 , 1 } ∑ z n k P ( z n ∣ x n , θ ( i ) ) = j = 1 ∑ K { p ( x n ∣ z n j = 1 , θ ( i ) ) p ( z n j = 1 ∣ θ ( i ) ) } 1 ⋅ p ( x n ∣ z n k = 1 , θ ( i ) ) p ( z n k = 1 ∣ θ ( i ) ) + 0 ⋅ p ( x n ∣ z n k = 0 , θ ( i ) ) p ( z n k = 0 ∣ θ ( i ) ) = j = 1 ∑ K { π j p ( x n ∣ z n j = 1 , θ ( i ) ) } π k p ( x n ∣ z n k = 1 , θ ( i ) ) = j = 1 ∑ K { π j N ( x n ∣ μ j , Σ j ) } π k N ( x n ∣ μ k , Σ k ) , k ∈ { 1 , ⋯ , K }
5.
E
E
E 步公式
\qquad
\qquad
由于在图(b)抽取的数据中,我们并不知道隐藏变量
z
n
\mathbf z_{n}
z n 的值,因此我们在给定模型参数为
(
μ
,
Σ
,
π
)
(\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi})
( μ , Σ , π ) 的情况下,求取了
z
n
\mathbf z_{n}
z n 的期望:
\qquad\qquad
E
P
(
Z
∣
X
,
θ
)
[
z
n
k
]
=
∑
z
n
k
∈
{
0
,
1
}
z
n
k
P
(
z
n
∣
x
n
,
θ
(
i
)
)
=
π
k
N
(
x
n
∣
μ
k
,
Σ
k
)
∑
j
=
1
K
{
π
j
N
(
x
n
∣
μ
j
,
Σ
j
)
}
,
k
∈
{
1
,
⋯
,
K
}
\begin{aligned}E_{P(\bold Z|\bold X,\theta)}[z_{nk}] &=\displaystyle\sum_{z_{nk}\in\{0,1\}} z_{nk} P(\mathbf z_{n}|\boldsymbol x_{n},\theta^{(i)}) \\ &=\frac{\pi_{k}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{j},\boldsymbol{\Sigma}_{j})\ \right\}}\qquad,\qquad k\in\{1,\cdots,K\} \\ \end{aligned}
E P ( Z ∣ X , θ ) [ z n k ] = z n k ∈ { 0 , 1 } ∑ z n k P ( z n ∣ x n , θ ( i ) ) = j = 1 ∑ K { π j N ( x n ∣ μ j , Σ j ) } π k N ( x n ∣ μ k , Σ k ) , k ∈ { 1 , ⋯ , K }
\qquad
并以此来代替
z
n
k
z_{nk}
z n k ,用于进行对数似然函数值
Q
(
θ
,
θ
(
i
)
)
Q(\theta,\theta^{(i)})
Q ( θ , θ ( i ) ) 的计算,若记
\qquad\qquad
γ
(
z
n
k
)
=
π
k
N
(
x
n
∣
μ
k
,
Σ
k
)
∑
j
=
1
K
{
π
j
N
(
x
n
∣
μ
j
,
Σ
j
)
}
,
k
∈
{
1
,
⋯
,
K
}
\gamma(z_{nk})=\dfrac{\pi_{k}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{j},\boldsymbol{\Sigma}_{j})\ \right\}}\ ,\ \ \ k\in\{1,\cdots,K\}
γ ( z n k ) = j = 1 ∑ K { π j N ( x n ∣ μ j , Σ j ) } π k N ( x n ∣ μ k , Σ k ) , k ∈ { 1 , ⋯ , K }
\qquad
则对数似然函数为:
\qquad\qquad
Q
(
θ
,
θ
(
i
)
)
=
∑
n
=
1
N
∑
k
=
1
K
γ
(
z
n
k
)
{
ln
π
k
+
ln
N
(
x
n
∣
μ
k
,
Σ
k
)
}
Q(\theta,\theta^{(i)})= \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{K} \gamma(z_{nk}) \left\{\ \ln\pi_{k}+\ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}
Q ( θ , θ ( i ) ) = n = 1 ∑ N k = 1 ∑ K γ ( z n k ) { ln π k + ln N ( x n ∣ μ k , Σ k ) }
\qquad
6.
M
M
M 步公式
\qquad
对数似然函数值
Q
(
θ
,
θ
(
i
)
)
Q(\theta,\theta^{(i)})
Q ( θ , θ ( i ) ) 分别对
μ
k
,
Σ
k
\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k}
μ k , Σ k 求最大似然解:
\qquad\qquad
μ
^
k
=
1
N
k
∑
n
=
1
N
γ
(
z
n
k
)
x
n
\hat \boldsymbol{\mu}_{k}=\dfrac{1}{N_{k}}\displaystyle\sum_{n=1}^{N}\gamma(z_{nk})\boldsymbol{x}_{n}
μ ^ k = N k 1 n = 1 ∑ N γ ( z n k ) x n
\qquad\qquad
Σ
^
k
=
1
N
k
∑
n
=
1
N
γ
(
z
n
k
)
(
x
n
−
μ
^
k
)
(
x
n
−
μ
^
k
)
T
\begin{aligned}\hat \boldsymbol{\Sigma}_{k} &=\dfrac{1}{N_{k}}\displaystyle\sum_{n=1}^{N}\gamma(z_{nk})(\boldsymbol{x}_{n}-\hat \boldsymbol{\mu}_{k})(\boldsymbol{x}_{n}-\hat \boldsymbol{\mu}_{k})^{T} \\ \end{aligned}
Σ ^ k = N k 1 n = 1 ∑ N γ ( z n k ) ( x n − μ ^ k ) ( x n − μ ^ k ) T
\qquad\qquad
其中,
N
k
=
∑
n
=
1
N
γ
(
z
n
k
)
N_{k}=\displaystyle\sum_{n=1}^{N}\gamma(z_{nk})
N k = n = 1 ∑ N γ ( z n k )
\qquad
\qquad
求解
π
k
\pi_{k}
π k :为了使
Q
(
θ
,
θ
(
i
)
)
Q(\theta,\theta^{(i)})
Q ( θ , θ ( i ) ) 达到最大,同时必须满足
∑
k
π
k
=
1
\sum\limits_{k}\pi_{k}=1
k ∑ π k = 1 ,采用拉格朗日乘子法进行求解
\qquad\qquad
max
{
∑
n
=
1
N
∑
k
=
1
K
γ
(
z
n
k
)
{
ln
π
k
+
ln
N
(
x
n
∣
μ
k
,
Σ
k
)
}
+
λ
(
∑
k
π
k
−
1
)
}
\max\ \left\{ \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{K} \gamma(z_{nk}) \left\{\ \ln\pi_{k}+\ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\} +\lambda(\sum\limits_{k}\pi_{k}-1) \right\}
max { n = 1 ∑ N k = 1 ∑ K γ ( z n k ) { ln π k + ln N ( x n ∣ μ k , Σ k ) } + λ ( k ∑ π k − 1 ) }
\qquad\qquad
对
π
k
\pi_{k}
π k 求偏导:
\qquad\qquad\qquad
∑
n
=
1
N
γ
(
z
n
k
)
π
k
+
λ
=
0
\displaystyle\sum_{n=1}^{N}\frac{\gamma(z_{nk})}{\pi_{k}} +\lambda=0
n = 1 ∑ N π k γ ( z n k ) + λ = 0
\qquad\qquad
等式两端都乘以
π
k
\pi_{k}
π k :
\qquad\qquad\qquad
∑
n
=
1
N
γ
(
z
n
k
)
+
π
k
λ
=
0
\displaystyle\sum_{n=1}^{N}\gamma(z_{nk}) +\pi_{k}\lambda=0
n = 1 ∑ N γ ( z n k ) + π k λ = 0
\qquad\qquad
对
π
k
\pi_{k}
π k 求和:
\qquad\qquad\qquad
∑
k
=
1
K
∑
n
=
1
N
γ
(
z
n
k
)
+
∑
k
=
1
K
π
k
λ
=
0
\displaystyle\sum_{k=1}^{K}\displaystyle\sum_{n=1}^{N}\gamma(z_{nk}) +\displaystyle\sum_{k=1}^{K}\pi_{k}\lambda=0
k = 1 ∑ K n = 1 ∑ N γ ( z n k ) + k = 1 ∑ K π k λ = 0
\qquad\qquad
由于
\qquad\qquad\qquad
∑
k
=
1
K
∑
n
=
1
N
γ
(
z
n
k
)
=
∑
k
=
1
K
∑
n
=
1
N
π
k
N
(
x
n
∣
μ
k
,
Σ
k
)
∑
j
=
1
K
{
π
j
N
(
x
n
∣
μ
j
,
Σ
j
)
}
=
∑
n
=
1
N
∑
k
=
1
K
{
π
k
N
(
x
n
∣
μ
k
,
Σ
k
)
}
∑
j
=
1
K
{
π
j
N
(
x
n
∣
μ
j
,
Σ
j
)
}
=
N
\begin{aligned}\displaystyle\sum_{k=1}^{K}\displaystyle\sum_{n=1}^{N}\gamma(z_{nk}) &=\displaystyle\sum_{k=1}^{K}\displaystyle\sum_{n=1}^{N}\dfrac{\pi_{k}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{j},\boldsymbol{\Sigma}_{j})\ \right\}} \\ &=\displaystyle\sum_{n=1}^{N}\dfrac{ \sum\limits_{k=1}^{K}\left\{\ \pi_{k}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{j},\boldsymbol{\Sigma}_{j})\ \right\}} \\ &= N \\ \end{aligned}
k = 1 ∑ K n = 1 ∑ N γ ( z n k ) = k = 1 ∑ K n = 1 ∑ N j = 1 ∑ K { π j N ( x n ∣ μ j , Σ j ) } π k N ( x n ∣ μ k , Σ k ) = n = 1 ∑ N j = 1 ∑ K { π j N ( x n ∣ μ j , Σ j ) } k = 1 ∑ K { π k N ( x n ∣ μ k , Σ k ) } = N
\qquad\qquad
可得到:
λ
=
−
N
\lambda=-N
λ = − N
\qquad\qquad
因此:
π
k
=
1
N
∑
n
=
1
N
γ
(
z
n
k
)
=
N
k
N
\pi_{k}=\dfrac{1}{N}\displaystyle\sum_{n=1}^{N}\gamma(z_{nk}) =\dfrac{N_{k}}{N}
π k = N 1 n = 1 ∑ N γ ( z n k ) = N N k
\qquad
代码实现
\qquad
import numpy as np
import matplotlib. pyplot as plt
from scipy. stats import multivariate_normal
def gen_gausssian ( num) :
mean1 = [ 1 , 2 ]
cov1 = [ [ 5 , 3 ] , [ 3 , 10 ] ]
data1 = np. random. multivariate_normal( mean1, cov1, num)
label1 = np. zeros( ( 1 , num) ) . T
data11 = np. concatenate( [ data1, label1] , axis= 1 )
mean2 = [ 9 , 7 ]
cov2 = [ [ 10 , - 3 ] , [ - 3 , 5 ] ]
data2 = np. random. multivariate_normal( mean2, cov2, num)
label2 = np. ones( ( 1 , num) ) . T
data22 = np. concatenate( [ data2, label2] , axis= 1 )
mean3 = [ 8 , 0 ]
cov3 = [ [ 9 , 2 ] , [ 2 , 5 ] ]
data3 = np. random. multivariate_normal( mean3, cov3, num)
label3 = 2 * np. ones( ( 1 , num) ) . T
data33 = np. concatenate( [ data3, label3] , axis= 1 )
data = np. concatenate( [ data11, data22, data33] , axis= 0 )
return np. round ( data, 4 ) , data1, data2, data3, data11, data22, data33
def show_all_scatter ( data) :
data0 = data[ : , 0 : 2 ]
x, y = data0. T
plt. scatter( x, y, c= 'k' , s= 3 )
plt. axis( )
plt. title( "p(x)" )
plt. xlabel( "x" )
plt. ylabel( "y" )
def show_scatter ( data1, data2, data3) :
x1, y1 = data1. T
x2, y2 = data2. T
x3, y3 = data3. T
plt. scatter( x1, y1, c= 'b' , s= 3 )
plt. scatter( x2, y2, c= 'r' , s= 3 )
plt. scatter( x3, y3, c= 'g' , s= 3 )
plt. axis( )
plt. title( "p(x,z)" )
plt. xlabel( "x" )
plt. ylabel( "y" )
def gmm ( indata, pi, mu, cov, K) :
pi0 = pi
mu0 = mu
cov0 = cov
flag = 1
iter = 0
while flag:
t1 = np. zeros( ( indata. shape[ 0 ] , K) )
t2 = np. tile( pi0, ( indata. shape[ 0 ] , 1 ) )
for k in range ( K) :
t1[ : , k] = multivariate_normal. pdf( indata, mu0[ k] , cov0[ k] )
mu1 = mu0
cov1 = cov0
t3 = t1* t2
gamma = np. transpose( t3. T/ np. sum ( t3, axis= 1 ) )
nk = np. sum ( gamma, axis= 0 )
pi0 = nk/ indata. shape[ 0 ]
mu0 = np. transpose( np. dot( indata. T, gamma) / nk)
for k in range ( K) :
t = np. zeros( ( 2 , 2 ) ) ;
for n in range ( indata. shape[ 0 ] ) :
t = t + np. dot( np. asmatrix( indata[ n, : ] - mu0[ k, : ] ) . T, np. asmatrix( indata[ n, : ] - mu0[ k, : ] ) ) * gamma[ n, k]
cov0[ k] = np. asarray( t/ nk[ k] )
if np. sum ( mu0- mu1) < 1 :
flag = 0
iter = iter + 1
print ( 'pi:\n' , pi0)
print ( 'mu:\n' , mu0)
print ( 'cov:\n' , cov0)
if __name__ == '__main__' :
data, data1, data2, data3, data11, data22, data33 = gen_gausssian( 1000 )
plt. figure( 1 )
show_scatter( data1, data2, data3)
plt. figure( 2 )
show_all_scatter( data)
plt. show( )
indata = data[ : , 0 : 2 ]
d_min = np. ceil( np. min ( indata) )
d_max = np. floor( np. max ( indata) )
K = 3 ;
mean = np. zeros( ( 3 , 2 ) )
mean[ 0 , : ] = np. array( [ d_min+ 1 , ( d_min+ d_max) / 2.0 ] )
mean[ 1 , : ] = np. array( [ d_max- 1 , d_min+ 1 ] )
mean[ 2 , : ] = np. array( [ d_max- 1 , d_max- 1 ] )
cov = np. zeros( ( 3 , 2 , 2 ) )
cov[ 0 , : , : ] = np. eye( 2 )
cov[ 1 , : , : ] = 3 * np. eye( 2 )
cov[ 2 , : , : ] = 5 * np. eye( 2 )
pi = np. random. rand( 3 )
pi = pi/ np. sum ( pi)
gmm( indata, pi, mean, cov, K)
\qquad
程序中假定3个正态分布(每个分布1000个点,即
π
k
=
0.3333
\pi_{k}=0.3333
π k = 0 . 3 3 3 3 )为:
\qquad\qquad
μ
1
=
[
1
2
]
,
Σ
1
=
[
5
3
3
10
]
\mu_{1} = \left[ \begin{matrix}1\\2\end{matrix}\right], \Sigma_{1}=\left[ \begin{matrix}5&3\\3&10\end{matrix}\right]
μ 1 = [ 1 2 ] , Σ 1 = [ 5 3 3 1 0 ]
\qquad\qquad
μ
2
=
[
9
7
]
,
Σ
2
=
[
10
−
3
−
3
5
]
\mu_{2} = \left[ \begin{matrix}9\\7\end{matrix}\right], \Sigma_{2}=\left[ \begin{matrix}10&-3\\-3&5\end{matrix}\right]
μ 2 = [ 9 7 ] , Σ 2 = [ 1 0 − 3 − 3 5 ]
\qquad\qquad
μ
3
=
[
8
0
]
,
Σ
3
=
[
9
2
2
5
]
\mu_{3} = \left[ \begin{matrix}8\\0\end{matrix}\right], \Sigma_{3}=\left[ \begin{matrix}9&2\\2&5\end{matrix}\right]
μ 3 = [ 8 0 ] , Σ 3 = [ 9 2 2 5 ]
\qquad
某一次程序的运行结果:
\qquad
π
\boldsymbol \pi
π :
\qquad\qquad
[0.25827959 0.39856687 0.34315353]
\qquad
μ
\boldsymbol\mu
μ :
\qquad\qquad
[[ 0.25804083 2.23741362]
\qquad\qquad
[ 7.63843604 -0.07660223]
\qquad\qquad
[ 8.21105998 7.22113956]]
\qquad
Σ
\bold \Sigma
Σ :
\qquad\qquad
[[[ 2.96638168 2.0588839 ]
\qquad\qquad
[ 2.0588839 7.71483041]]
\qquad\qquad
[[13.11392913 4.3224706 ]
\qquad\qquad
[ 4.3224706 5.31874245]]
\qquad\qquad
[[ 9.36387689 -0.85241427]
\qquad\qquad
[-0.85241427 4.14231678]]]