【机器学习算法】高斯判别分析GDA

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/Hemk340200600/article/details/88084947

高斯判别分析

  高斯判别分析(Gaussian discriminative analysis )是一个较为直观的模型,属于生成模型的一种,采用一种软分类的思路,所谓软分类就是我们对一个样本决定它的类别时使用概率模型来决定,而不是直接由函数映射到某一类上。生成模型通过求解联合概率来求解 P ( y x ) P(y|x) 。它假设
y B e r n o u l l i ( ϕ ) x y = 1 N ( μ 1 , Σ ) x y = 0 N ( μ 2 , Σ ) y \sim Bernoulli(\phi) \\ x|y=1 \sim N(\mu_1,\Sigma) \\ x|y=0 \sim N(\mu_2,\Sigma)
  则有
P ( y ) = ϕ y ( 1 ϕ ) 1 y P ( x y ) = N ( μ 1 , Σ ) y N ( μ 2 , Σ ) 1 y \begin{aligned} &P(y)=\phi^y(1-\phi)^{1-y} \\ &P(x|y)=N(\mu_1,\Sigma)^y·N(\mu_2,\Sigma)^{1-y} \end{aligned}
  模型的参数为
θ = ( μ 1 , μ 2 , Σ , ϕ ) \theta=(\mu_1,\mu_2,\Sigma,\phi)
  对于生成模型,我们要求解的目标函数是
y ^ = arg max y { 0 , 1 } p ( y x ) = arg max y p ( y ) p ( x y ) \hat y=\arg \max_{y \in \{0,1\}}p(y|x)=\arg \max_yp(y)p(x|y)
  定义似然函数,则
θ ^ = arg max θ l ( θ ) = arg max θ log i = 1 N p ( x i , y i ) = arg max θ log i = 1 N p ( y i ) p ( x i y i ) = arg max θ i = 1 N ( log N ( μ 1 , Σ ) y i + log N ( μ 2 , Σ ) 1 y i + log ϕ y i ( 1 ϕ ) 1 y i ) \begin{aligned} \hat \theta &=\arg \max_\theta l(\theta) \\ &=\arg \max_\theta \log \prod_{i=1}^Np(x_i,y_i) \\ &=\arg \max_\theta \log \prod_{i=1}^Np(y_i)p(x_i|y_i) \\ &=\arg \max_\theta \sum_{i=1}^N(\log N(\mu_1,\Sigma)^{y_i} \\&+\log N(\mu_2,\Sigma)^{1-y_i}+\log \phi^{y_i}(1-\phi)^{1-y_i})\\ \end{aligned}

  • ϕ \phi
    l ( θ ) ϕ = i = 1 N y i 1 ϕ ( 1 y i ) 1 1 ϕ = 0       i = 1 N y i ( 1 ϕ ) ( 1 y i ) ϕ = 0       i = 1 N ( y i ϕ ) = 0       i = 1 N y i N ϕ = 0       ϕ ^ = 1 N i = 1 N y i = N 1 N \begin{aligned} &\frac{\partial l(\theta)}{\partial \phi}=\sum_{i=1}^Ny_i\frac{1}{ \phi}-(1-y_i)\frac{1}{1-\phi} = 0 \\ &\iff \sum_{i=1}^Ny_i(1-\phi)-(1-y_i)\phi=0 \\ &\iff \sum_{i=1}^N(y_i-\phi)=0 \\ &\iff \sum_{i=1}^Ny_i-N\phi=0 \\ &\iff \hat \phi =\frac{1}{N}\sum_{i=1}^Ny_i =\frac{N_1}{N}\\ \end{aligned}
  • μ 1 , μ 2 \mu_1,\mu_2
      两个的求解过程其实是相同的,所以我们直接求解 μ 1 \mu_1 ,由于我们只对 μ 1 \mu_1 求解,所以原式可以化简为
    i = 1 N y i log 1 ( 2 π ) p 2 Σ 1 2 exp ( 1 2 ( x i μ 1 ) T Σ 1 ( x i μ 1 ) ) = i = 1 N y i log 1 ( 2 π ) p 2 Σ 1 2 exp ( 1 2 ( x i T Σ 1 μ 1 T Σ 1 ) ( x i μ 1 ) ) = i = 1 N y i log 1 ( 2 π ) p 2 Σ 1 2 exp ( 1 2 ( x i T Σ 1 x i 2 μ 1 T Σ 1 x i + μ 1 T Σ 1 μ 1 ) ) \begin{aligned} &\sum_{i=1}^Ny_i\log \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\exp(-\frac{1}{2}(x_i-\mu_1)^T\Sigma^{-1}(x_i-\mu_1)) \\ &=\sum_{i=1}^Ny_i\log \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\exp(-\frac{1}{2}(x_i^T\Sigma^{-1}-\mu_1^T\Sigma^{-1})(x_i-\mu_1))\\ &=\sum_{i=1}^Ny_i\log \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\exp(-\frac{1}{2}(x_i^T\Sigma^{-1}x_i-2\mu_1^T\Sigma^{-1}x_i+\mu_1^T\Sigma^{-1}\mu_1)) \end{aligned}
      对上式求导并令导数为0,有
    1 2 i = 1 N y i ( 2 Σ 1 x i + 2 Σ 1 μ 1 ) = 0       i = 1 N y i ( Σ 1 μ 1 Σ 1 x i ) = 0       i = 1 N y i ( μ 1 x i ) = 0       i = 1 N y i μ 1 = i = 1 N y i x i       μ ^ 1 = i = 1 N y i x i i = 1 N y i = i = 1 N y i x i N 1 \begin{aligned} &-\frac{1}{2}\sum_{i=1}^Ny_i(-2\Sigma^{-1}x_i+2\Sigma^{-1}\mu_1)=0 \\ &\iff \sum_{i=1}^Ny_i(\Sigma^{-1}\mu_1-\Sigma^{-1}x_i)=0 \\ &\iff \sum_{i=1}^Ny_i(\mu_1-x_i)=0 \\ &\iff \sum_{i=1}^Ny_i\mu_1=\sum_{i=1}^Ny_ix_i \\ &\iff \hat \mu_1=\frac{\sum\limits_{i=1}^Ny_ix_i}{\sum\limits_{i=1}^Ny_i}=\frac{\sum\limits_{i=1}^Ny_ix_i}{N_1} \\ \end{aligned}
      同理可得
    μ ^ 2 = i = 1 N ( 1 y i ) x i i = 1 N ( 1 y i ) = i = 1 N ( 1 y i ) x i N 2 \hat \mu_2=\frac{\sum\limits_{i=1}^N(1-y_i)x_i}{\sum\limits_{i=1}^N(1-y_i)}=\frac{\sum\limits_{i=1}^N(1-y_i)x_i}{N_2}
  • Σ \Sigma :
      尝试对通项 log N ( μ , Σ ) \log N(\mu,\Sigma) 进行化简,有
    i = 1 N log N ( μ , Σ ) = i = 1 N log 1 ( 2 π ) p 2 Σ 1 2 exp ( 1 2 ( x i μ ) T Σ 1 ( x i μ ) ) = i = 1 N ( log 1 ( 2 π ) p 2 + Σ 1 2 1 2 ( x i μ ) T Σ 1 ( x i μ ) ) = i = 1 N ( C 1 2 log Σ 1 2 ( x i μ ) T Σ 1 ( x i μ ) ) = C 1 2 N log Σ 1 2 t r ( i = 1 N ( x i μ ) T Σ 1 ( x i μ ) ) = C 1 2 N log Σ 1 2 t r ( i = 1 N ( x i μ ) ( x i μ ) T Σ 1 ) = 1 2 N log Σ 1 2 t r ( S Σ 1 ) + C \begin{aligned} \sum_{i=1}^N\log N(\mu,\Sigma) &=\sum_{i=1}^N \log \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\exp (-\frac{1}{2}(x_i-\mu)^T\Sigma^{-1}(x_i-\mu)) \\ &=\sum_{i=1}^N(\log \frac{1}{(2\pi)^{\frac{p}{2}}}+|\Sigma|^{-\frac{1}{2}}-\frac{1}{2}(x_i-\mu)^T\Sigma^{-1}(x_i-\mu)) \\ &=\sum_{i=1}^N(C-\frac{1}{2}\log|\Sigma|-\frac{1}{2}(x_i-\mu)^T\Sigma^{-1}(x_i-\mu))\\ &=C-\frac{1}{2}N\log |\Sigma|-\frac{1}{2}tr(\sum_{i=1}^N(x_i-\mu)^T\Sigma^{-1}(x_i-\mu))\\ &=C-\frac{1}{2}N\log |\Sigma|-\frac{1}{2}tr(\sum_{i=1}^N(x_i-\mu)(x_i-\mu)^T\Sigma^{-1})\\ &=-\frac{1}{2}N\log |\Sigma|-\frac{1}{2}tr(S\Sigma^{-1})+C\\ \end{aligned}
      由于只需要对 Σ \Sigma 求解,所以对似然函数化简为
    i = 1 N ( y i log N ( μ 1 , Σ ) + ( 1 y i ) log N ( μ 2 , Σ ) ) = x i c 1 log N ( μ 1 , Σ ) + x i c 2 log N ( μ 2 , Σ ) = 1 2 N 1 log Σ 1 2 t r ( S 1 Σ 1 ) 1 2 N 2 log Σ 1 2 N 2 t r ( S 2 Σ 1 ) + C = 1 2 ( N 1 log Σ + N 1 t r ( S 1 Σ 1 ) + N 2 log Σ + N 2 t r ( S 2 Σ 1 ) ) + C \begin{aligned} &\sum_{i=1}^N(y_i\log N(\mu_1,\Sigma) +(1-y_i)\log N(\mu_2,\Sigma) ) \\ &=\sum_{x_i \in c_1}\log N(\mu_1,\Sigma)+\sum_{x_i \in c_2}\log N(\mu_2,\Sigma) \\ &=-\frac{1}{2}N_1\log |\Sigma|-\frac{1}{2}tr(S_1\Sigma^{-1})-\frac{1}{2}N_2\log |\Sigma|-\frac{1}{2}N_2tr(S_2\Sigma^{-1})+C \\ &=-\frac{1}{2}(N_1\log |\Sigma|+N_1tr(S_1\Sigma^{-1})+N_2\log |\Sigma|+N_2tr(S_2\Sigma^{-1}))+C \\ \end{aligned}
      根据tr的求导公式
    t r ( A B ) A = B 1 t r ( A ) A = A A 1 t r ( A B ) = t r ( B A ) \begin{aligned} &\frac{\partial tr(AB)}{\partial A}=B^{-1}\\ &\frac{\partial tr(|A|)}{\partial A}=|A|·A^{-1} \\ &tr(AB)=tr(BA) \end{aligned}
      对上面化简后的式子进行求导并令导数为0,有
    1 2 ( N 1 Σ Σ Σ 1 + N 1 t r ( Σ 1 S 1 ) Σ 1 t r ( Σ 1 ) Σ + N 2 t r ( Σ 1 S 2 ) Σ 1 t r ( Σ 1 ) Σ ) = 0       N 1 Σ Σ Σ 1 N 1 S 1 T Σ 2 N 1 S 2 T Σ 2 = 0       N Σ 1 N 1 S 1 Σ 2 N 1 S 2 Σ 2 = 0       N Σ N 1 S 1 N 1 S 2 = 0       N Σ N 1 S 1 N 1 S 2 = 0       Σ ^ = 1 N ( N 1 S 1 + N 2 S 2 ) \begin{aligned} &-\frac{1}{2}(N\frac{1}{|\Sigma|}|\Sigma|\Sigma^{-1}+N_1\frac{\partial tr(\Sigma^{-1}S_1)}{\partial \Sigma^{-1}}\frac{\partial tr(\Sigma^{-1})}{\partial \Sigma}+N_2\frac{\partial tr(\Sigma^{-1}S_2)}{\partial \Sigma^{-1}}\frac{\partial tr(\Sigma^{-1})}{\partial \Sigma})=0 \\ &\iff N\frac{1}{|\Sigma|}|\Sigma|\Sigma^{-1}-N_1S_1^T\Sigma^{-2}-N_1S_2^T\Sigma^{-2}=0 \\ &\iff N\Sigma^{-1}-N_1S_1\Sigma^{-2}-N_1S_2\Sigma^{-2}=0\\ &\iff N\Sigma-N_1S_1-N_1S_2=0 \\ &\iff N\Sigma-N_1S_1-N_1S_2=0 \\ &\iff \hat \Sigma =\frac{1}{N}(N_1S_1+N_2S_2) \\ \end{aligned}

猜你喜欢

转载自blog.csdn.net/Hemk340200600/article/details/88084947