logistic回归的参数梯度更新方法的个人理解

logistic回归参数更新看了几篇博文,感觉理解不透彻,所以自己写一下,希望能有更深的理解。logistic回归输入是一个线性函数 W x + b \boldsymbol{W}\boldsymbol{x}+\boldsymbol{b} Wx+b,为了简单理解,考虑batchsize为1的情况。这时输入 x \boldsymbol{x} x为一个 n × 1 n\times1 n×1的向量,标签 y \boldsymbol{y} y我们采用oneHot编码为一个 m × 1 m\times1 m×1的向量,显然\boldsymbol{b}也是一个 m × 1 m\times1 m×1的向量,参数 W \boldsymbol{W} W为一个 m × n m\times n m×n的矩阵。若 n = 4 n=4 n=4 m = 3 m=3 m=3,我们用图形表示logistic回归如下:
在这里插入图片描述
这里的标签 y \boldsymbol{y} y采用onehot编码,长度为3,如果类别编号为1,则其编码为 { 1 , 0 , 0 } T \{1,0,0\}^T { 1,0,0}T,对应上图的话,就是 y ∗ 1 = 1 y_*^1=1 y1=1 y ∗ 2 = 0 y_*^2=0 y2=0 y ∗ 3 = 0 y_*^3=0 y3=0。损失函数 L L L就是 y 1 y^1 y1 y ∗ 1 y_*^1 y1的交叉熵损失+ y 2 y^2 y2 y ∗ 2 y_*^2 y2的交叉熵损失+ y 3 y^3 y3 y ∗ 3 y_*^3 y3的交叉熵损失。
L = ∑ i = 1 3 y ∗ i log ⁡ y i = y ∗ 1 log ⁡ y 1 + y ∗ 2 log ⁡ y 2 + y ∗ 3 log ⁡ y 3 \begin{aligned} L&=\sum_{i=1}^3y^i_*\log{y^i}\\ &=y^1_*\log{y^1}+y^2_*\log{y^2}+y^3_*\log{y^3} \end{aligned} L=i=13yilogyi=y1logy1+y2logy2+y3logy3
上式中:
y 1 = e z 1 e z 1 + e z 2 + e z 3 y 2 = e z 2 e z 1 + e z 2 + e z 3 y 3 = e z 3 e z 1 + e z 2 + e z 3 \begin{aligned} y^1&=\frac{e^{z^1}}{e^{z^1}+e^{z^2}+e^{z^3}}\\ y^2&=\frac{e^{z^2}}{e^{z^1}+e^{z^2}+e^{z^3}}\\ y^3&=\frac{e^{z^3}}{e^{z^1}+e^{z^2}+e^{z^3}}\\ \end{aligned} y1y2y3=ez1+ez2+ez3ez1=ez1+ez2+ez3ez2=ez1+ez2+ez3ez3

z 1 = w 1 T x + b 1 z 2 = w 2 T x + b 2 z 3 = w 3 T x + b 3 \begin{aligned} z^1=\boldsymbol{w_1}^T \boldsymbol{x}+b_1\\ z^2=\boldsymbol{w_2}^T \boldsymbol{x}+b_2\\ z^3=\boldsymbol{w_3}^T \boldsymbol{x}+b_3 \end{aligned} z1=w1Tx+b1z2=w2Tx+b2z3=w3Tx+b3
其中, w 1 = { w 11 , w 12 , w 13 , w 14 } T \boldsymbol{w_1}=\{w_{11},w_{12},w_{13},w_{14}\}^T w1={ w11,w12,w13,w14}T x = { x 1 , x 2 , x 3 , x 4 } T \boldsymbol{x}=\{x_{1},x_{2},x_{3},x_{4}\}^T x={ x1,x2,x3,x4}T因此:

损失函数 L L L w 1 \boldsymbol{w_1} w1求导:
∂ L ∂ w 1 = ∂ L ∂ y 1 ∂ y 1 ∂ z 1 ∂ z 1 ∂ w 1 + ∂ L ∂ y 2 ∂ y 2 ∂ z 1 ∂ z 1 ∂ w 1 + ∂ L ∂ y 3 ∂ y 3 ∂ z 1 ∂ z 1 ∂ w 1 = y 1 ∗ y 1 × y 1 ( 1 − y 1 ) × x − y 2 ∗ y 2 × y 1 y 2 × x − y 3 ∗ y 3 × y 1 y 3 × x = ( y 1 ∗ ( 1 − y 1 ) − y 2 ∗ y 1 − y 3 ∗ y 1 ) x = ( y 1 ∗ − y 1 ( y 1 ∗ + y 2 ∗ + y 3 ∗ ) ) x = ( y 1 ∗ − y 1 ) x \begin{aligned} \frac{\partial L}{\partial \boldsymbol{w_1}}&=\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial z^1}\frac{\partial z^1}{\partial \boldsymbol{w_1}}+\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial z^1}\frac{\partial z^1}{\partial \boldsymbol{w_1}}+\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial z^1}\frac{\partial z^1}{\partial \boldsymbol{w_1}}\\ &=\frac{y_1^*}{y_1}\times y_1(1-y_1)\times \boldsymbol{x}-\frac{y_2^*}{y_2}\times y_1y_2\times \boldsymbol{x}-\frac{y_3^*}{y_3}\times y_1y_3\times \boldsymbol{x}\\ &=(y_1^*(1-y_1)-y_2^*y_1-y_3^*y_1)\boldsymbol{x}\\ &=(y_1^*-y_1(y_1^*+y_2^*+y_3^*))\boldsymbol{x}\\ &=(y_1^*-y_1)\boldsymbol{x}\\ \end{aligned} w1L=y1Lz1y1w1z1+y2Lz1y2w1z1+y3Lz1y3w1z1=y1y1×y1(1y1)×xy2y2×y1y2×xy3y3×y1y3×x=(y1(1y1)y2y1y3y1)x=(y1y1(y1+y2+y3))x=(y1y1)x
注意 ( y 1 ∗ + y 2 ∗ + y 3 ∗ ) (y_1^*+y_2^*+y_3^*) (y1+y2+y3)是标签onehot编码的三个值,和正好为1。同理可得到剩下的两个导数:
∂ L ∂ w 2 = ( y 2 ∗ − y 2 ) x ∂ L ∂ w 3 = ( y 3 ∗ − y 3 ) x \frac{\partial L}{\partial \boldsymbol{w_2}} = (y_2^*-y_2)\boldsymbol{x}\\ \frac{\partial L}{\partial \boldsymbol{w_3}} = (y_3^*-y_3)\boldsymbol{x} w2L=(y2y2)xw3L=(y3y3)x
交叉熵损失函数 L L L关于 w \boldsymbol{w} w的梯度为:
[ ( y 1 ∗ − y 1 ) x 1 ( y 2 ∗ − y 2 ) x 1      ( y 3 ∗ − y 3 ) x 1 ( y 1 ∗ − y 1 ) x 2 ( y 2 ∗ − y 2 ) x 2      ( y 3 ∗ − y 3 ) x 2 ( y 1 ∗ − y 1 ) x 3 ( y 2 ∗ − y 2 ) x 3      ( y 3 ∗ − y 3 ) x 3 ( y 1 ∗ − y 1 ) x 4 ( y 2 ∗ − y 2 ) x 4      ( y 3 ∗ − y 3 ) x 4 ( y 1 ∗ − y 1 ) x 5 ( y 2 ∗ − y 2 ) x 5      ( y 3 ∗ − y 3 ) x 5 ] T \left[ \begin{aligned} &(y_1^*-y_1)x1&(y_2^*-y_2)x1\space\space\space\space&(y_3^*-y_3)x1\\ &(y_1^*-y_1)x2&(y_2^*-y_2)x2\space\space\space\space&(y_3^*-y_3)x2\\ &(y_1^*-y_1)x3&(y_2^*-y_2)x3\space\space\space\space&(y_3^*-y_3)x3\\ &(y_1^*-y_1)x4&(y_2^*-y_2)x4\space\space\space\space&(y_3^*-y_3)x4\\ &(y_1^*-y_1)x5&(y_2^*-y_2)x5\space\space\space\space&(y_3^*-y_3)x5\\ \end{aligned} \right]^T (y1y1)x1(y1y1)x2(y1y1)x3(y1y1)x4(y1y1)x5(y2y2)x1    (y2y2)x2    (y2y2)x3    (y2y2)x4    (y2y2)x5    (y3y3)x1(y3y3)x2(y3y3)x3(y3y3)x4(y3y3)x5 T
这样交叉熵损失函数 L L L关于 w \boldsymbol{w} w的梯度用numpy的外积计算表示为:
∂ L ∂ w = n u m p y . o u t e r ( x , y ∗ − y ) \frac{\partial L}{\partial \boldsymbol{w}}=numpy.outer(\boldsymbol{x},\boldsymbol{y^*}-\boldsymbol{y}) wL=numpy.outer(x,yy)
用同样的方法可以推导出:
∂ L ∂ b = y ∗ − y \frac{\partial L}{\partial \boldsymbol{b}}=\boldsymbol{y^*}-\boldsymbol{y} bL=yy

猜你喜欢

转载自blog.csdn.net/zhuzheqing/article/details/129417845