EM算法Q函数推导过程详解

Q函数

$\begin{aligned} Q\left(\theta, \theta^{(i)}\right) & =E_Z \left[\log P(Y, Z \mid \theta) \mid Y, \theta^{(i)}\right] \\ & =\sum_Z \log P(Y, Z \mid \theta) \cdot P\left(Z \mid Y, \theta^{(i)}\right) \end{aligned}$

Q函数是EM算法中的一个重要函数，全称为“期望完全数据对数似然函数”。它的作用是在E步中计算出完全数据的对数似然函数的期望值，以便在M步中求出模型参数的最大似然估计值。

在之前的一篇文章（EM算法求解三硬币模型参数推导）中，为大家介绍了李航教授《统计学习方法》中求解三硬币模型的参数推导过程，其中使用的EM算法是从一个Q函数直接展开求解的，限于篇幅，文章并未展示证明过程，本篇文章作为上一篇文章以及《统计学习方法-第九章-179页》推导的补充，详细推导Q函数的由来。

Q函数推导证明

我们已知关于参数 $\theta$ 的似然函数
$L(\theta)=\log P(Y \mid \theta) \\ =\log \frac{P(Y, \theta)}{ P(\theta)} \\=\log \frac{ \sum_Z P(Y, \theta,Z)}{ P(\theta)} \\=\log \sum_Z \frac{ P(Y, \theta,Z)}{ P(\theta)}=\log \sum_Z P(Y, Z \mid \theta) \\=\log \sum_Z \frac{ P(Y, Z, \theta)}{ P(Z,\theta) }\cdot \frac{ P(Z, \theta)}{ P(\theta) }$
即
$L(\theta)=\log \sum_Z P(Y \mid Z, \theta) \cdot P(Z \mid \theta)$
假设第i次参数取 $\theta^{(i)}$ ，我们希望优化后 $L(\theta)>L(\theta^{(i)})$
于是可以作差

即
$L(\theta)-L\left(\theta^{(i)}\right)=\log \Sigma_Z P(Y \mid Z, \theta) \cdot P(Z \mid \theta)-\log P\left(Y \mid \theta^{(i)}\right)$
第一项可以凑一个分式出来
$L(\theta)-L\left(\theta^{(i)}\right)=\log \left(\Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \frac{P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{P\left(Z \mid Y, \theta^{(i)}\right)}\right)-\log P\left(Y \mid \theta^{(i)}\right)$
利用 $\sum_Z P \left(Z \mid Y, \theta^{(i)}\right)=1$ 的特性，第二项乘以这一串，可以得到
$L(\theta)-L\left(\theta^{(i)}\right)=\log \left(\Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \frac{P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{P\left(Z \mid Y, \theta^{(i)}\right)}\right)-\log P\left(Y \mid \theta^{(i)}\right) \cdot \Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right)$
利用 $J e n se n$ 不等式
$log\sum_{j}\lambda_j \cdot y_j \geqslant \sum_j \lambda_j \cdot log y_j$ ，其中 $\lambda \geqslant 0,\sum_j \lambda_j =1$

可知
$\geqslant \sum_Z P \left(Z \mid Y, \theta^{(i)}\right) \cdot \log \frac{P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{P\left(Z \mid Y, \theta^{(i)}\right)}-\log P\left(Y \mid \theta^{(i)}\right) \cdot \Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right)$

$=\sum_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot\left[\log \frac{P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{p\left(Z \mid Y, \theta^{(i)}\right)}-\log P\left(Y \mid \theta^{(i)}\right)\right]$
$=\Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \log \frac{ P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{P\left(Z \mid Y, \theta^{(i)} \right) \cdot P\left(Y \mid \theta^{(i)} \right) }$
即此时 $L(\theta)-L\left(\theta^{(i)}\right) \geqslant \Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \log \frac{ P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{P\left(Z \mid Y, \theta^{(i)} \right) \cdot P\left(Y \mid \theta^{(i)} \right) }$
即 $L(\theta) \geqslant L\left(\theta^{(i)}\right)+\Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \log \frac{ P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{P\left(Z \mid Y, \theta^{(i)} \right) \cdot P\left(Y \mid \theta^{(i)} \right) }$
令 $B\left(\theta, \theta^{(i)}\right)=L\left(\theta^{(i)}\right)+\Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \log \frac{ P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{P\left(Z \mid Y, \theta^{(i)} \right) \cdot P\left(Y \mid \theta^{(i)} \right) }$
此时 $B\left(\theta, \theta^{(i)}\right)$ 是 $L(\theta)$ 的下界，使 $B\left(\theta, \theta^{(i)}\right)$ 最大化的 $\theta$ 也可使 $L\left( \theta\right)$ 最大

于是我们的目标是 $\theta^{(i+1)}=\underset{\theta}{\operatorname{argmax}} B\left(\theta, \theta^{(i)}\right)$
也即
$\theta^{(i+1)}=\underset{\theta}{\operatorname{argmax}} \left[L\left(\theta^{(i)}\right)+\Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \log \frac{P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{\left.P(Z \mid Y, \theta^{(i)}\right) \cdot P\left(Y \mid \theta^{(i)}\right)}\right]$

可把 $L\left( \theta^{(i)}\right)、P\left(Z \mid Y, \theta^{(i)}\right)、 P\left(Z \mid Y, \theta^{(i)}\right) \cdot P\left(Y \mid \theta^{(i)}\right)$ 三项视为常数
且已知 $P\left(Z \mid Y, \theta^{(i)}\right) \cdot P\left(Y \mid \theta^{(i)}\right)>0$ ，这一项从分母去掉，不影响求最大值，注意这里的 $\left.P(Z \mid Y, \theta^{(i)}\right)$ 不能省略，因为它是 $\sum$ 后面中的每一项的系数

于是
$\theta^{(i+1)}=\underset{\theta}{\operatorname{argmax}}\left[\Sigma_Z P\left(Z \mid Y,{ \theta}^{(i)}\right) \cdot \log P(Y \mid Z, \theta) \cdot P(Z \mid \theta)\right]$

我们令 $Q\left(\theta, \theta^{(i)}\right)=\Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \log P(Y \mid Z, \theta) \cdot P(Z \mid \theta)$
即
$\theta^{(i+1)}=\underset{\theta}{\operatorname{argmax}} Q\left(\theta, \theta^{(i+1)}\right)$

其中 $Q\left(\theta, \theta^{(i)}\right)$ 就是所谓的 $Q$ 函数

参考资料

[1].EM算法求解三硬币模型参数推导