EM算法摘记(二):混合高斯分布

混合高斯分布(GMM)

\qquad 摘记(一)描述的三硬币问题实际上是一个具有 2 个子分布的概率模型,更加广为人知的是具有 K K 个子分布的混合高斯模型,整个过程基本上和三硬币问题是一样的。

【1】混合高斯分布的 E M EM 公式的推导过程与“三硬币问题”几乎完全一样:
   只需要将“三硬币问题”中的 P ( y z k = 1 , θ ) = α k y ( 1 α k ) 1 y , k { 1 , 2 } P(y|z_{k}=1,\theta)=\alpha_{k}^{y}(1-\alpha_{k})^{1-y},k\in\{1,2\} 替换为 P ( x z k = 1 , θ ) = N ( x μ k , Σ k ) , k { 1 , , K } P(\boldsymbol{x}|z_{k}=1,\theta)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k}),k\in\{1,\cdots,K\} 就可以了。
 
【2】 不同之处仅仅在于:混合高斯分布的最大似然估计与“三硬币问题”不一样

\qquad 混合高斯模型 \qquad p ( x π , μ , Σ ) = k = 1 K π k N ( x μ k , Σ k ) p(\boldsymbol{x}|\boldsymbol{\pi},\boldsymbol{\mu},\bold{\Sigma})=\displaystyle\sum_{k=1}^{K}\pi_{k}\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})    ,   k = 1 K π k = 1 ,   x R D \ \ ,\ \displaystyle\sum_{k=1}^{K}\pi_{k}=1,\ \boldsymbol{x}\in R^{D}

\qquad\qquad\qquad\qquad   其中, N ( x μ , Σ ) = 1 ( 2 π ) D 2 Σ 1 2 exp [ 1 2 ( x μ ) T Σ 1 ( x μ ) ] \mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu},\boldsymbol{\Sigma})= \dfrac{1}{{(2\pi)}^{\frac{D}{2}}{|\boldsymbol\Sigma|}^{-\frac{1}{2}}}\exp[-\frac{1}{2}(\boldsymbol{x}-\boldsymbol{\mu})^{T}\boldsymbol\Sigma^{-1}(\boldsymbol{x}-\boldsymbol{\mu})]

\qquad 下图是从一个 K = 3 K=3 的混合高斯分布中抽取了500个样本点,其中图(a)中的样本是从联合分布 p ( x , z ) = p ( z ) p ( x z ) p(\boldsymbol x,\mathbf z) =p(\mathbf z)p(\boldsymbol x|\mathbf z) 中抽取(包含了隐藏变量 z \mathbf z 的信息, z \mathbf z 的3种状态分别对应着绿色样本,对应了3种不同的高斯分布),图(b)是从边缘分布 p ( x ) = z p ( x , z ) p(\boldsymbol x)=\sum\limits_{\mathbf z}p(\boldsymbol x,\mathbf z) 中抽取(因为缺少隐藏变量 z \mathbf z 的信息,所以只能看到有关 x \boldsymbol x 的信息,无法知道 x \boldsymbol x 到底属于哪个子分布 )。
在这里插入图片描述

From 《Pattern Recognition and Machine Learning》. Figure 9.5
 
p ( x ) = z p ( z ) p ( x z ) p(\boldsymbol x)=\displaystyle\sum_{\mathbf z}p(\mathbf z)p(\boldsymbol x|\mathbf z) :an equivalent formulation of the Gaussian mixture involving an explicit latent variable(包含显式的隐藏变量)
  
可以使用 ancestral sampling 生成服从混合高斯模型概率分布的随机样本【图(a)】。
(1) 根据 p ( z ) p(z) 生成 z { 1 , 2 , 3 } z\in\{1,2,3\} 的值(例如以 π 3 \pi_{3} 的概率产生 z = 3 z=3 来表示 z 3 = 1 z_{3}=1
(2) 再根据条件概率 p ( x z 3 = 1 ) = N ( x μ 3 , Σ 3 ) p(\boldsymbol x|z_{3}=1)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{3},\boldsymbol{\Sigma}_{3}) 产生 x x 的值
 
如果在图(a)中忽略 z z 的信息,就变成图(b)的情形。

\qquad
\qquad 图(b)中显示了从 p ( x π , μ , Σ ) p(\mathbf{x}|\boldsymbol{\pi},\boldsymbol{\mu},\bold{\Sigma}) 中抽取了 N = 500 N=500 个样本为 { x 1 , x 2 , , x N } \{ \boldsymbol x_{1},\boldsymbol x_{2},\cdots,\boldsymbol x_{N}\} ,如果引入隐藏变量 z z 表示GMM中的每一个分布 ,那么这些样本实际上可能是图(a)中所显示的 { ( x 1 , z 2 ) , ( x 2 , z 3 ) , ( x 3 , z 1 ) , , ( x N , z 1 ) } \{(x_{1},z_{2}), (x_{2},z_{3}), (x_{3},z_{1}),\cdots,(x_{N},z_{1}) \} ,此处采用的是1-of-K表示法, z k z_{k} 意味着 z k = 1 z_{k}=1 ,表明该样本点是从第 k k 个高斯分布中抽取的。
\qquad

1. 隐藏变量的 1-of-K 表示

\qquad 采用 1 o f K 1-of-K 表示隐藏变量,需要将 K K 个隐藏变量 组织成 向量 的形式,而且在由 K K 个隐藏变量组成的向量 z = [ z 1 , z 2 , , z K ] T \bold z =[ z_{1},z_{2},\cdots,z_{K} ]^{T} 中,仅有一个分量的值为1,其余分量均为0

\qquad 以上图中 K = 3 K=3 的情况为例:

1 ) \qquad1) 隐藏变量 z 1 = 1 z_{1}=1 ,即隐藏向量 z = [ 1 , 0 , 0 ] T \bold z=[1,0,0]^{T} ,表示 事件“选中第1个(红色)子分布”
\qquad   该事件的概率为 P ( z 1 = 1 θ ) = π 1 P(z_{1}=1|\theta)=\pi_{1}

2 ) \qquad2) 隐藏变量 z 2 = 1 z_{2}=1 ,即隐藏向量 z = [ 0 , 1 , 0 ] T \bold z=[0,1,0]^{T} ,表示 事件“选中第2个(绿色)子分布”
\qquad   该事件的概率为 P ( z 2 = 1 θ ) = π 2 P(z_{2}=1|\theta)=\pi_{2}

3 ) \qquad3) 隐藏变量 z 3 = 1 z_{3}=1 ,即隐藏向量 z = [ 0 , 0 , 1 ] T \bold z=[0,0,1]^{T} ,表示 事件“选中第3个(蓝色)子分布”
\qquad   该事件的概率为 P ( z 3 = 1 θ ) = π 3 P(z_{3}=1|\theta)=\pi_{3}

\qquad 选中第k个子分布的(先验)概率可以统一描述为: P ( z k = 1 θ ) = π k   ,     k { 1 , 2 , 3 } P(z_{k}=1|\theta)=\pi_{k}\ ,\ \ \ k\in\{1,2,3\}
\qquad   
\qquad 隐藏向量 z = [ z 1 , z 2 , z 3 ] T \mathbf z=[z_{1},z_{2},z_{3}]^{T} 表示,也就是:

\qquad\qquad    P ( z θ ) = k = 1 3 π k z k = π 1 z 1 π 2 z 2 π 3 z 3    ,   z = [ z 1 z 2 z 3 ] { [ 1 0 0 ] , [ 0 1 0 ] , [ 0 0 1 ] } \begin{aligned} P(\mathbf z|\theta) &= \prod_{k=1}^{3} \pi_{k}^{z_{k}} =\pi_{1}^{z_{1}}\cdot\pi_{2}^{z_{2}}\cdot\pi_{3}^{z_{3}}\ \ ,\ \mathbf z= \left[ \begin{matrix} z_{1}\\z_{2}\\z_{3} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\0\\1 \end{matrix} \right]\right\} \end{aligned}

\qquad 对于具有 K K 个分布的 GMM,隐藏向量 z = [ z 1 , z 2 , , z K ] T \mathbf z=[z_{1},z_{2},\cdots,z_{K}]^{T} ,则为:

\qquad\qquad    P ( z θ ) = k = 1 K π k z k \begin{aligned} P(\mathbf z|\theta) &= \prod_{k=1}^{K} \pi_{k}^{z_{k}}\end{aligned} = π 1 z 1 π 2 z 2 π K z K ,   z = [ z 1 z 2 z 3 z K ] { [ 1 0 0 0 ] , [ 0 1 0 0 ] , , [ 0 0 0 1 ] } =\pi_{1}^{z_{1}}\cdot\pi_{2}^{z_{2}}\cdots\pi_{K}^{z_{K}},\ \mathbf z= \left[ \begin{matrix} z_{1}\\z_{2}\\z_{3}\\ \vdots \\z_{K} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0\\0\\ \vdots \\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1\\0\\ \vdots \\0 \end{matrix} \right],\cdots,\left[ \begin{matrix} 0\\0\\ \vdots \\0\\1 \end{matrix} \right]\right\}

\qquad

2. 引入隐藏向量 z \mathbf z 表示观测值 x \boldsymbol x 的概率

\qquad 从图(a)中可以很明显地看出:

1 ) \qquad1) 红色的观测样本 x \boldsymbol x 都是由第1个(红色)子分布产生,其概率为: P ( x z 1 = 1 , θ ) = N ( x μ 1 , Σ 1 ) P(\boldsymbol x|z_{1}=1,\theta)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{1},\boldsymbol{\Sigma}_{1})

2 ) \qquad2) 绿色的观测样本 x \boldsymbol x 都是由第2个(绿色)子分布产生,其概率为: P ( x z 2 = 1 , θ ) = N ( x μ 2 , Σ 2 ) P(\boldsymbol x|z_{2}=1,\theta)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{2},\boldsymbol{\Sigma}_{2})

3 ) \qquad3) 蓝色的观测样本 x \boldsymbol x 都是由第3个(蓝色)子分布产生,其概率为: P ( x z 3 = 1 , θ ) = N ( x μ 3 , Σ 3 ) P(\boldsymbol x|z_{3}=1,\theta)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{3},\boldsymbol{\Sigma}_{3})
 在这里插入图片描述
\qquad 因此,图(a)中不同颜色样本点所描述的子分布(此处 K = 3 K=3 )可以统一描述为:
\qquad
P ( x z k = 1 , θ ) = N ( x μ k , Σ k )   , k = 1 , , K \qquad\qquad P(\boldsymbol x|z_{k}=1,\theta)=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ ,\qquad k=1,\cdots,K
\qquad
\qquad 隐藏向量 z = [ z 1 z 2 z 3 z K ] { [ 1 0 0 0 ] , [ 0 1 0 0 ] , , [ 0 0 0 1 ] } \mathbf z= \left[ \begin{matrix} z_{1}\\z_{2}\\z_{3}\\ \vdots \\z_{K} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0\\0\\ \vdots \\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1\\0\\ \vdots \\0 \end{matrix} \right],\cdots,\left[ \begin{matrix} 0\\0\\ \vdots \\0\\1 \end{matrix} \right]\right\} 表示,也就是:
\qquad
\qquad\qquad   P ( x z , θ ) = k = 1 K P ( x z k = 1 , θ ) z k = P ( x z 1 = 1 , θ ) z 1 P ( x z 2 = 1 , θ ) z 2 P ( x z K = 1 , θ ) z K = N ( x μ 1 , Σ 1 ) z 1 N ( x μ 2 , Σ 2 ) z 2 N ( x μ K , Σ K ) z K = k = 1 K [   N ( x μ k , Σ k )   ] z k \begin{aligned} P(\boldsymbol{x}|\mathbf z,\theta) &= \prod_{k=1}^{K} P(\boldsymbol{x}|z_{k}=1,\theta)^{z_{k}} \\ &=P(\boldsymbol{x}|z_{1}=1,\theta)^{z_{1}}\cdot P(\boldsymbol{x}|z_{2}=1,\theta)^{z_{2}} \cdots P(\boldsymbol{x}|z_{K}=1,\theta)^{z_{K}}\\ &=\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{1},\boldsymbol{\Sigma}_{1})^{z_{1}}\cdot\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{2},\boldsymbol{\Sigma}_{2})^{z_{2}} \cdots\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{K},\boldsymbol{\Sigma}_{K})^{z_{K}} \\ &= \prod_{k=1}^{K} \left[\ \mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right]^{z_{k}} \end{aligned}
\qquad
\qquad 又由 P ( z θ ) = k = 1 K π k z k \begin{aligned} P(\mathbf z|\theta) &= \prod_{k=1}^{K} \pi_{k}^{z_{k}}\end{aligned}

\qquad 则有: p ( x ) = z p ( z ) p ( x z ) = k = 1 K π k N ( x μ k , Σ k ) p(\boldsymbol x)=\displaystyle\sum_{\mathbf z}p(\mathbf z)p(\boldsymbol x|\mathbf z)=\displaystyle\sum_{k=1}^{K}\pi_{k}\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})

\qquad 对于所有样本集 X = { x 1 , x 2 , , x N } \mathbf X=\{ \boldsymbol x_{1},\boldsymbol x_{2},\cdots,\boldsymbol x_{N}\} ,对应的隐藏变量(向量)集为 Z = { z 1 , z 2 , , z N } \mathbf Z=\{ \mathbf z_{1},\mathbf z_{2},\cdots,\mathbf z_{N}\} ,就有:

p ( X Z , μ , Σ , π ) = n = 1 N k = 1 K {   N ( x n μ k , Σ k )   } z n k \qquad\qquad\begin{aligned} p(\mathbf X|\mathbf Z,\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) &= \prod_{n=1}^{N}\prod_{k=1}^{K}\left\{\ \mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}^{z_{nk}} \end{aligned}

\qquad 可以得到:

p ( X , Z μ , Σ , π ) = n = 1 N k = 1 K π k z n k {   N ( x n μ k , Σ k )   } z n k \qquad\qquad\begin{aligned} p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) &= \prod_{n=1}^{N}\prod_{k=1}^{K} \pi_{k}^{z_{nk}}\left\{\ \mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}^{z_{nk}} \end{aligned}

\qquad\qquad 其中, z n k z_{nk} 表示样本 x n \boldsymbol x_{n} 所对应隐藏变量(向量) z n \mathbf z_{n} 的第 k k 个分量值(0或者1)。

\qquad

3. 对数似然函数

\qquad
\qquad 若样本点为图(a)的形式,也就是假设隐藏变量集 Z \mathbf Z 的信息已知(或说 z n k z_{nk} 信息已知)时,为了获得模型中的未知参数 ( μ , Σ , π ) (\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) ,可以求取 ln p ( X , Z μ , Σ , π ) \ln p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) 的最大似然解。

ln p ( X , Z μ , Σ , π ) = n = 1 N k = 1 K z n k {   ln π k + ln N ( x n μ k , Σ k )   } \qquad\qquad\ln p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) = \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{K} z_{nk}\left\{\ \ln\pi_{k}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}

\qquad 如果 z n k z_{nk} 信息已知,可将所有样本 X = { x 1 , x 2 , , x N } \mathbf X=\{ \boldsymbol x_{1},\boldsymbol x_{2},\cdots,\boldsymbol x_{N}\} 分为 K K 组,第 k k C k = { n     z n k = 1 , z n j = 0 , j k } C_{k}=\{n\ |\ z_{nk}=1,z_{nj}=0,\forall j \not =k\} 中的所有样本点 x n \boldsymbol x_{n} 是从 N ( x n μ k , Σ k ) \mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k}) 中抽样的,对数似然函数可以表示为:

ln p ( X , Z μ , Σ , π ) = n C 1 {   ln π 1 + ln N ( x n μ 1 , Σ 1 )   } + + n C k {   ln π k + ln N ( x n μ k , Σ k )   } + + n C K {   ln π K + ln N ( x n μ K , Σ K )   } \qquad\qquad\begin{aligned}\ln p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) &= \displaystyle\sum_{n\in C_{1}}\left\{\ \ln\pi_{1}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{1},\boldsymbol{\Sigma}_{1})\ \right\} \\ &+\cdots \\ &+\displaystyle\sum_{n\in C_{k}}\left\{\ \ln\pi_{k}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}\\ &+\cdots \\ &+\displaystyle\sum_{n\in C_{K}}\left\{\ \ln\pi_{K}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{K},\boldsymbol{\Sigma}_{K})\ \right\} \end{aligned}

\qquad 由于 K K 组分布相互独立,只需要分别求出每一组的对数最大似然分量,对于第 k k 组而言(假设有 N k N_{k} 个样本):

ln p ( X k , Z k μ , Σ , π ) = n C k {   ln π k + ln N ( x n μ k , Σ k )   } = N k ln π k + N k ln N ( x n μ k , Σ k )   = N k ln π k N k D 2 ln ( 2 π ) N k 2 ln Σ k 1 2 n C k ( x n μ k ) T Σ k 1 ( x n μ k )   \qquad\qquad\begin{aligned}\ln p(\mathbf X_{k},\mathbf Z_{k}|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) &= \displaystyle\sum_{n\in C_{k}}\left\{\ \ln\pi_{k}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\} \\ &= N_{k}\ln\pi_{k}+ N_{k}\ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \\ &= N_{k}\ln\pi_{k}-\dfrac{N_{k}D}{2}\ln(2\pi)-\dfrac{N_{k}}{2}\ln\begin{vmatrix}\boldsymbol{\Sigma}_{k} \end{vmatrix}-\dfrac{1}{2}\displaystyle\sum_{n\in C_{k}}(\boldsymbol{x}_{n}-\boldsymbol{\mu}_{k})^{T}\boldsymbol{\Sigma}_{k}^{-1}(\boldsymbol{x}_{n}-\boldsymbol{\mu}_{k})\ \\ \end{aligned}

\qquad 分别对 μ k , Σ k \boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k} 求最大似然解:

μ ^ k = 1 N k n C k x n = 1 N k n = 1 N z n k x n \qquad\qquad\hat \boldsymbol{\mu}_{k}=\dfrac{1}{N_{k}}\displaystyle\sum_{n\in C_{k}}\boldsymbol{x}_{n}=\dfrac{1}{N_{k}}\displaystyle\sum_{n=1}^{N}z_{nk}\boldsymbol{x}_{n}

Σ ^ k = 1 N k n C k ( x n μ ^ k ) ( x n μ ^ k ) T = 1 N k n = 1 N z n k ( x n μ ^ k ) ( x n μ ^ k ) T \qquad\qquad\begin{aligned}\hat \boldsymbol{\Sigma}_{k}&=\dfrac{1}{N_{k}}\displaystyle\sum_{n\in C_{k}}(\boldsymbol{x}_{n}-\hat \boldsymbol{\mu}_{k})(\boldsymbol{x}_{n}-\hat \boldsymbol{\mu}_{k})^{T}\\ &=\dfrac{1}{N_{k}}\displaystyle\sum_{n=1}^{N}z_{nk}(\boldsymbol{x}_{n}-\hat \boldsymbol{\mu}_{k})(\boldsymbol{x}_{n}-\hat \boldsymbol{\mu}_{k})^{T} \\ \end{aligned}

\qquad\qquad 其中, N k = n = 1 N z n k N_{k}=\displaystyle\sum_{n=1}^{N}z_{nk}

\qquad

4. 计算对数似然函数的期望

\qquad 由于所有隐藏变量 Z = { z 1 , z 2 , , z N } \mathbf Z=\{ \mathbf z_{1},\mathbf z_{2},\cdots,\mathbf z_{N}\} 都是未知的, E M EM 算法通过求对数似然函数 ln p ( X , Z μ , Σ , π ) \ln p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) 的期望来猜测所有的隐藏变量。

\qquad 对数似然函数的期望:

\qquad\qquad Q ( θ , θ ( i ) ) = E p ( Z X , θ ) [ ln p ( X , Z μ , Σ , π ) ] = E p ( Z X , θ ) { n = 1 N k = 1 K z n k {   ln π k + ln N ( x n μ k , Σ k )   } } = n = 1 N k = 1 K E p ( Z X , θ ) [ z n k ] {   ln π k + ln N ( x n μ k , Σ k )   } \begin{aligned}Q(\theta,\theta^{(i)})&=E_{p(\bold Z|\bold X,\theta)}[\ln p(\mathbf X,\mathbf Z|\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi})] \\ &= E_{p(\bold Z|\bold X,\theta)}\left\{\displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{K} z_{nk}\left\{\ \ln\pi_{k}+ \ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}\right\} \\ &= \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{K} E_{p(\bold Z|\bold X,\theta)}[z_{nk}] \left\{\ \ln\pi_{k}+\ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\} \\ \end{aligned}

\qquad 仍然假设观测数据 X = { x 1 , x 2 , , x N } \mathbf X=\{ \boldsymbol x_{1},\boldsymbol x_{2},\cdots,\boldsymbol x_{N}\} 的产生过程是: \newline
1 ) \qquad1) 首先,以 P ( z k = 1 θ ) = π k ,   k { 1 , , K } P(z_{k}=1|\theta)=\pi_{k},\ k\in\{1,\cdots,K\} 的概率,确定 z k = 1 ,   z j = 0   ( j k ) z_{k}=1, \ z_{j}=0\ (\forall j \not =k) ,也就是选择第 k k 个子分布 N ( x μ k Σ k ) \mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k}\boldsymbol{\Sigma}_{k})\newline
2 ) \qquad2) 然后,从第 k k 个子分布 N ( x μ k , Σ k ) \mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k}) 中抽出样本 x \boldsymbol x\newline
\qquad 如果每个观测样本 x n \boldsymbol x_{n} 的产生过程都是独立的,那么观测样本 x n \boldsymbol x_{n} 的产生只与隐藏向量 z n \mathbf z_{n} 有关

\qquad 也就是求

\qquad\qquad E P ( Z X , θ ) [ z n k ] = z n k { 0 , 1 } z n k p ( z n x n , θ ( i ) ) E_{P(\bold Z|\bold X,\theta)}[z_{nk}]=\displaystyle\sum_{z_{nk}\in\{0,1\}} z_{nk} p(\mathbf z_{n}|\boldsymbol x_{n},\theta^{(i)})

\qquad 又由

\qquad\qquad p ( z n x n , θ ) = p ( z n , x n θ ) p ( x n θ ) = p ( x n z n , θ ) p ( z n θ ) z n {   p ( x n z n , θ ) p ( z n θ )   } ,       z n = [ z n 1 z n 2 z n 3 z n K ] { [ 1 0 0 0 ] , [ 0 1 0 0 ] , , [ 0 0 0 1 ] } = p ( x n z n k = 1 , θ ) p ( z n k = 1 θ ) j = 1 K {   p ( x n z n j = 1 , θ ) p ( z n j = 1 θ )   } = π k p ( x n z n k = 1 , θ ) j = 1 K {   π j p ( x n z n j = 1 , θ )   } \begin{aligned}p(\mathbf z_{n}|\boldsymbol x_{n},\theta) &=\frac{p(\mathbf z_{n},\boldsymbol x_{n}|\theta)}{p(\boldsymbol x_{n}|\theta)} \\ &=\frac{p(\boldsymbol x_{n}|\mathbf z_{n},\theta)p(\mathbf z_{n}|\theta)}{\sum\limits_{\mathbf z_{n}}\left\{ \ p(\boldsymbol x_{n}|\mathbf z_{n},\theta)p(\mathbf z_{n}|\theta)\ \right\}},\ \ \ \ \ \mathbf z_{n}= \left[ \begin{matrix} z_{n1}\\z_{n2}\\z_{n3}\\ \vdots \\z_{nK} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0\\0\\ \vdots \\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1\\0\\ \vdots \\0 \end{matrix} \right],\cdots,\left[ \begin{matrix} 0\\0\\ \vdots \\0\\1 \end{matrix} \right]\right\} \\ &=\frac{p(\boldsymbol x_{n}|z_{nk}=1,\theta)p(z_{nk}=1|\theta)}{\sum\limits_{j=1}^{K}\left\{\ p(\boldsymbol x_{n}|z_{nj}=1,\theta)p(z_{nj}=1|\theta)\ \right\}} \\ &=\frac{\pi_{k}p(\boldsymbol x_{n}|z_{nk}=1,\theta)}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}p(\boldsymbol x_{n}|z_{nj}=1,\theta)\ \right\}} \\ \end{aligned}

\qquad 因此

\qquad\qquad E P ( Z X , θ ) [ z n k ] = z n k { 0 , 1 } z n k P ( z n x n , θ ( i ) ) = 1 p ( x n z n k = 1 , θ ( i ) ) p ( z n k = 1 θ ( i ) ) + 0 p ( x n z n k = 0 , θ ( i ) ) p ( z n k = 0 θ ( i ) ) j = 1 K {   p ( x n z n j = 1 , θ ( i ) ) p ( z n j = 1 θ ( i ) )   } = π k p ( x n z n k = 1 , θ ( i ) ) j = 1 K {   π j p ( x n z n j = 1 , θ ( i ) )   } = π k N ( x n μ k , Σ k ) j = 1 K {   π j N ( x n μ j , Σ j )   } , k { 1 , , K } \begin{aligned}E_{P(\bold Z|\bold X,\theta)}[z_{nk}] &=\displaystyle\sum_{z_{nk}\in\{0,1\}} z_{nk} P(\mathbf z_{n}|\boldsymbol x_{n},\theta^{(i)}) \\ &=\frac{1\cdot p(\boldsymbol x_{n}|z_{nk}=1,\theta^{(i)})p(z_{nk}=1|\theta^{(i)})+0\cdot p(\boldsymbol x_{n}|z_{nk}=0,\theta^{(i)})p(z_{nk}=0|\theta^{(i)})}{\sum\limits_{j=1}^{K}\left\{\ p(\boldsymbol x_{n}|z_{nj}=1,\theta^{(i)})p(z_{nj}=1|\theta^{(i)})\ \right\}} \\ &=\frac{\pi_{k}p(\boldsymbol x_{n}|z_{nk}=1,\theta^{(i)})}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}p(\boldsymbol x_{n}|z_{nj}=1,\theta^{(i)})\ \right\}} \\ &=\frac{\pi_{k}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{j},\boldsymbol{\Sigma}_{j})\ \right\}}\qquad,\qquad k\in\{1,\cdots,K\} \\ \end{aligned}

5. E E 步公式

\qquad
\qquad 由于在图(b)抽取的数据中,我们并不知道隐藏变量 z n \mathbf z_{n} 的值,因此我们在给定模型参数为 ( μ , Σ , π ) (\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) 的情况下,求取了 z n \mathbf z_{n} 的期望:

\qquad\qquad E P ( Z X , θ ) [ z n k ] = z n k { 0 , 1 } z n k P ( z n x n , θ ( i ) ) = π k N ( x n μ k , Σ k ) j = 1 K {   π j N ( x n μ j , Σ j )   } , k { 1 , , K } \begin{aligned}E_{P(\bold Z|\bold X,\theta)}[z_{nk}] &=\displaystyle\sum_{z_{nk}\in\{0,1\}} z_{nk} P(\mathbf z_{n}|\boldsymbol x_{n},\theta^{(i)}) \\ &=\frac{\pi_{k}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{j},\boldsymbol{\Sigma}_{j})\ \right\}}\qquad,\qquad k\in\{1,\cdots,K\} \\ \end{aligned}

\qquad 并以此来代替 z n k z_{nk} ,用于进行对数似然函数值 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) 的计算,若记

\qquad\qquad γ ( z n k ) = π k N ( x n μ k , Σ k ) j = 1 K {   π j N ( x n μ j , Σ j )   }   ,     k { 1 , , K } \gamma(z_{nk})=\dfrac{\pi_{k}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{j},\boldsymbol{\Sigma}_{j})\ \right\}}\ ,\ \ \ k\in\{1,\cdots,K\}

\qquad 则对数似然函数为:

\qquad\qquad Q ( θ , θ ( i ) ) = n = 1 N k = 1 K γ ( z n k ) {   ln π k + ln N ( x n μ k , Σ k )   } Q(\theta,\theta^{(i)})= \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{K} \gamma(z_{nk}) \left\{\ \ln\pi_{k}+\ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}

\qquad

6. M M 步公式

\qquad 对数似然函数值 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) 分别对 μ k , Σ k \boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k} 求最大似然解:

\qquad\qquad μ ^ k = 1 N k n = 1 N γ ( z n k ) x n \hat \boldsymbol{\mu}_{k}=\dfrac{1}{N_{k}}\displaystyle\sum_{n=1}^{N}\gamma(z_{nk})\boldsymbol{x}_{n}

\qquad\qquad Σ ^ k = 1 N k n = 1 N γ ( z n k ) ( x n μ ^ k ) ( x n μ ^ k ) T \begin{aligned}\hat \boldsymbol{\Sigma}_{k} &=\dfrac{1}{N_{k}}\displaystyle\sum_{n=1}^{N}\gamma(z_{nk})(\boldsymbol{x}_{n}-\hat \boldsymbol{\mu}_{k})(\boldsymbol{x}_{n}-\hat \boldsymbol{\mu}_{k})^{T} \\ \end{aligned}

\qquad\qquad 其中, N k = n = 1 N γ ( z n k ) N_{k}=\displaystyle\sum_{n=1}^{N}\gamma(z_{nk})

\qquad
\qquad 求解 π k \pi_{k} :为了使 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) 达到最大,同时必须满足 k π k = 1 \sum\limits_{k}\pi_{k}=1 ,采用拉格朗日乘子法进行求解

\qquad\qquad   max   { n = 1 N k = 1 K γ ( z n k ) {   ln π k + ln N ( x n μ k , Σ k )   } + λ ( k π k 1 ) } \max\ \left\{ \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{K} \gamma(z_{nk}) \left\{\ \ln\pi_{k}+\ln\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\} +\lambda(\sum\limits_{k}\pi_{k}-1) \right\}

\qquad\qquad π k \pi_{k} 求偏导:

\qquad\qquad\qquad   n = 1 N γ ( z n k ) π k + λ = 0 \displaystyle\sum_{n=1}^{N}\frac{\gamma(z_{nk})}{\pi_{k}} +\lambda=0

\qquad\qquad 等式两端都乘以 π k \pi_{k}

\qquad\qquad\qquad   n = 1 N γ ( z n k ) + π k λ = 0 \displaystyle\sum_{n=1}^{N}\gamma(z_{nk}) +\pi_{k}\lambda=0

\qquad\qquad π k \pi_{k} 求和:

\qquad\qquad\qquad   k = 1 K n = 1 N γ ( z n k ) + k = 1 K π k λ = 0 \displaystyle\sum_{k=1}^{K}\displaystyle\sum_{n=1}^{N}\gamma(z_{nk}) +\displaystyle\sum_{k=1}^{K}\pi_{k}\lambda=0

\qquad\qquad 由于

\qquad\qquad\qquad   k = 1 K n = 1 N γ ( z n k ) = k = 1 K n = 1 N π k N ( x n μ k , Σ k ) j = 1 K {   π j N ( x n μ j , Σ j )   } = n = 1 N k = 1 K {   π k N ( x n μ k , Σ k )   } j = 1 K {   π j N ( x n μ j , Σ j )   } = N \begin{aligned}\displaystyle\sum_{k=1}^{K}\displaystyle\sum_{n=1}^{N}\gamma(z_{nk}) &=\displaystyle\sum_{k=1}^{K}\displaystyle\sum_{n=1}^{N}\dfrac{\pi_{k}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{j},\boldsymbol{\Sigma}_{j})\ \right\}} \\ &=\displaystyle\sum_{n=1}^{N}\dfrac{ \sum\limits_{k=1}^{K}\left\{\ \pi_{k}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{k},\boldsymbol{\Sigma}_{k})\ \right\}}{\sum\limits_{j=1}^{K}\left\{\ \pi_{j}\mathcal{N}(\boldsymbol{x}_{n}|\boldsymbol{\mu}_{j},\boldsymbol{\Sigma}_{j})\ \right\}} \\ &= N \\ \end{aligned}

\qquad\qquad 可得到:  λ = N \lambda=-N

\qquad\qquad 因此:  π k = 1 N n = 1 N γ ( z n k ) = N k N \pi_{k}=\dfrac{1}{N}\displaystyle\sum_{n=1}^{N}\gamma(z_{nk}) =\dfrac{N_{k}}{N}

\qquad

代码实现

\qquad

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal

def gen_gausssian(num):
    mean1 = [1,2]
    cov1 = [[5,3],[3,10]]
    data1 = np.random.multivariate_normal(mean1,cov1,num)
    label1 = np.zeros((1,num)).T
    data11 = np.concatenate([data1,label1],axis=1)
    
    mean2 = [9,7]
    cov2 = [[10,-3],[-3,5]]
    data2 = np.random.multivariate_normal(mean2,cov2,num)
    label2 = np.ones((1,num)).T
    data22 = np.concatenate([data2,label2],axis=1)    
    
    mean3 = [8,0]
    cov3 = [[9,2],[2,5]]
    data3 = np.random.multivariate_normal(mean3,cov3,num)
    label3 = 2*np.ones((1,num)).T
    data33 = np.concatenate([data3,label3],axis=1)
    
    data = np.concatenate([data11,data22,data33],axis=0)
    
    return np.round(data,4),data1,data2,data3,data11,data22,data33

def show_all_scatter(data):    
    data0 = data[:,0:2]
    x,y = data0.T
    plt.scatter(x,y,c='k',s=3)    
    
    plt.axis()
    plt.title("p(x)")
    plt.xlabel("x")
    plt.ylabel("y")
    
def show_scatter(data1,data2,data3):
    x1,y1 = data1.T
    x2,y2 = data2.T
    x3,y3 = data3.T
    plt.scatter(x1,y1,c='b',s=3)
    plt.scatter(x2,y2,c='r',s=3)
    plt.scatter(x3,y3,c='g',s=3)    
   
    plt.axis()
    plt.title("p(x,z)")
    plt.xlabel("x")
    plt.ylabel("y")
    
def gmm(indata,pi,mu,cov,K):
    pi0 = pi
    mu0 = mu
    cov0 = cov
    flag = 1
    iter = 0
    while flag:
        t1 = np.zeros((indata.shape[0],K))
        t2 = np.tile(pi0,(indata.shape[0],1))
        for k in range(K):
            t1[:,k] = multivariate_normal.pdf(indata, mu0[k], cov0[k])
            
        mu1 = mu0
        cov1 = cov0
        
        t3 = t1*t2
        gamma = np.transpose(t3.T/np.sum(t3,axis=1))
        
        nk = np.sum(gamma,axis=0)
        pi0 = nk/indata.shape[0]
        mu0 = np.transpose(np.dot(indata.T,gamma)/nk)
        for k in range(K):
            t = np.zeros((2,2));
            for n in range(indata.shape[0]):
                t = t + np.dot(np.asmatrix(indata[n,:] - mu0[k,:]).T,np.asmatrix(indata[n,:] - mu0[k,:]))*gamma[n,k]
            cov0[k] = np.asarray(t/nk[k])
            
        if np.sum(mu0-mu1)<1:
            flag = 0
            
        iter = iter + 1
        
    print('pi:\n',pi0)
    print('mu:\n',mu0)
    print('cov:\n',cov0)
    
if __name__ == '__main__':
    
    data,data1,data2,data3,data11,data22,data33 = gen_gausssian(1000)
    plt.figure(1)
    show_scatter(data1,data2,data3)
    plt.figure(2)
    show_all_scatter(data)    
    plt.show()
    
    indata = data[:,0:2]
    d_min = np.ceil(np.min(indata))
    d_max = np.floor(np.max(indata))
    K = 3;
    mean = np.zeros((3,2))
    mean[0,:] = np.array([d_min+1, (d_min+d_max)/2.0])
    mean[1,:] = np.array([d_max-1, d_min+1])
    mean[2,:] = np.array([d_max-1, d_max-1])
    cov = np.zeros((3,2,2))
    cov[0,:,:] = np.eye(2)
    cov[1,:,:] = 3*np.eye(2)
    cov[2,:,:] = 5*np.eye(2) 
    pi = np.random.rand(3)
    pi = pi/np.sum(pi)
    
    gmm(indata,pi,mean,cov,K)

在这里插入图片描述在这里插入图片描述
\qquad 程序中假定3个正态分布(每个分布1000个点,即 π k = 0.3333 \pi_{k}=0.3333 )为:

\qquad\qquad μ 1 = [ 1 2 ] , Σ 1 = [ 5 3 3 10 ] \mu_{1} = \left[ \begin{matrix}1\\2\end{matrix}\right], \Sigma_{1}=\left[ \begin{matrix}5&3\\3&10\end{matrix}\right]

\qquad\qquad μ 2 = [ 9 7 ] , Σ 2 = [ 10 3 3 5 ] \mu_{2} = \left[ \begin{matrix}9\\7\end{matrix}\right], \Sigma_{2}=\left[ \begin{matrix}10&-3\\-3&5\end{matrix}\right]

\qquad\qquad μ 3 = [ 8 0 ] , Σ 3 = [ 9 2 2 5 ] \mu_{3} = \left[ \begin{matrix}8\\0\end{matrix}\right], \Sigma_{3}=\left[ \begin{matrix}9&2\\2&5\end{matrix}\right]

\qquad 某一次程序的运行结果:
在这里插入图片描述
\qquad π \boldsymbol \pi :
\qquad\qquad [0.25827959 0.39856687 0.34315353]
\qquad μ \boldsymbol\mu :
\qquad\qquad [[ 0.25804083 2.23741362]
\qquad\qquad [ 7.63843604 -0.07660223]
\qquad\qquad [ 8.21105998 7.22113956]]
\qquad Σ \bold \Sigma :
\qquad\qquad [[[ 2.96638168 2.0588839 ]
\qquad\qquad [ 2.0588839 7.71483041]]

\qquad\qquad [[13.11392913 4.3224706 ]
\qquad\qquad [ 4.3224706 5.31874245]]

\qquad\qquad [[ 9.36387689 -0.85241427]
\qquad\qquad [-0.85241427 4.14231678]]]

猜你喜欢

转载自blog.csdn.net/xfijun/article/details/103544683