深度学习花书- 4.3 基于梯度的优化方法

4.3 基于梯度的优化方法

1.向量微积分

(31条消息) 向量微积分基础_文剑木然的专栏-CSDN博客_向量微积分

常用求导公式(基于分母布局,结果转置即为分子布局)
∂ A x ∂ x = A (1) \frac{\partial \mathbf{A} \mathbf{x}}{\partial \mathbf{x}}=\mathbf{A}\tag{1} xAx=A(1)

∂ x ⊤ A ∂ x = A ⊤ (2) \frac{\partial \mathbf{x}^{\top} \mathbf{A}}{\partial \mathbf{x}}=\mathbf{A}^{\top}\tag{2} xxA=A(2)

∂ x ⊤ x ∂ x = 2 x ⊤ (3) \frac{\partial \mathbf{x}^{\top} \mathbf{x}}{\partial \mathbf{x}}=2 \mathbf{x}^{\top}\tag{3} xxx=2x(3)

∂ x ⊤ A x ∂ x = x ⊤ ( A + A ⊤ ) (4) \frac{\partial \mathbf{x}^{\top} \mathbf{A} \mathbf{x}}{\partial \mathbf{x}}=\mathbf{x}^{\top}\left(\mathbf{A}+\mathbf{A}^{\top}\right)\tag{4} xxAx=x(A+A)(4)

∂ ( u + v ) ∂ x = ∂ u ∂ x + ∂ v ∂ x (5) \frac{\partial(\mathbf{u}+\mathbf{v})}{\partial \mathbf{x}}=\frac{\partial \mathbf{u}}{\partial \mathbf{x}}+\frac{\partial \mathbf{v}}{\partial \mathbf{x}}\tag{5} x(u+v)=xu+xv(5)

∂ ( u ⋅ v ) ∂ x = ∂ u ⊤ v ∂ x = u ⊤ ∂ v ∂ x + v ⊤ ∂ u ∂ x (6) \frac{\partial(\mathbf{u} \cdot \mathbf{v})}{\partial \mathbf{x}}=\frac{\partial \mathbf{u}^{\top} \mathbf{v}}{\partial \mathbf{x}}=\mathbf{u}^{\top} \frac{\partial \mathbf{v}}{\partial \mathbf{x}}+\mathbf{v}^{\top} \frac{\partial \mathbf{u}}{\partial \mathbf{x}}\tag{6} x(uv)=xuv=uxv+vxu(6)

∂ f ( u ) ∂ x = ∂ f ( u ) ∂ u ∂ u ∂ x (7) \frac{\partial \mathbf{f}(\mathbf{u})}{\partial \mathbf{x}}=\frac{\partial \mathbf{f}(\mathbf{u})}{\partial \mathbf{u}} \frac{\partial \mathbf{u}}{\partial \mathbf{x}}\tag{7} xf(u)=uf(u)xu(7)

2.方向导数

数学篇-方向导数(讲的很通俗易懂) - 知乎 (zhihu.com)

如果函数 f ( x , y ) f(x,y) f(x,y) 在点 P 0 ( x 0 , y 0 ) P_0(x_0,y_0) P0(x0,y0)可微分,那么函数在该点沿任一方向 l l l 的方向导数存在,且有
∂ f ∂ l ∣ ( x 0 , y 0 ) = f x ( x 0 , y 0 ) cos ⁡ α + f y ( x 0 , y 0 ) cos ⁡ β (8) \left.\frac{\partial f}{\partial l}\right|_{\left(x_{0}, y_{0}\right)}=f_{x}\left(x_{0}, y_{0}\right) \cos \alpha+f_{y}\left(x_{0}, y_{0}\right) \cos \beta\tag{8} lf(x0,y0)=fx(x0,y0)cosα+fy(x0,y0)cosβ(8)
其中, c o s α cos\alpha cosα​ 和 c o s β cos\beta cosβ​的方向余弦.

证明: 由假设 f ( x , y ) f(x,y) f(x,y) 在点 ( x 0 , y 0 ) (x_0,y_0) (x0,y0)可微分,故有
f ( x 0 + Δ x , y 0 + Δ y ) − f ( x 0 , y 0 ) = f x ( x 0 , y 0 ) Δ x + f y ( x 0 , y 0 ) Δ y + o ( ( Δ x ) 2 + ( Δ y ) 2 ) (9) \begin{array}{c} f\left(x_{0}+\Delta x, y_{0}+\Delta y\right)-f\left(x_{0}, y_{0}\right) \\=f_{x}\left(x_{0}, y_{0}\right) \Delta x+f_{y}\left(x_{0}, y_{0}\right) \Delta y+o\left(\sqrt{(\Delta x)^{2}+(\Delta y)^{2}}\right) \end{array}\tag{9} f(x0+Δx,y0+Δy)f(x0,y0)=fx(x0,y0)Δx+fy(x0,y0)Δy+o((Δx)2+(Δy)2 )(9)
但点 ( x 0 + Δ x , y 0 + Δ y ) (x_0+\Delta x,y_0+\Delta y) (x0+Δx,y0+Δy) 在以 ( x 0 , y 0 ) (x_0,y_0) (x0,y0) 为始点的射线 l l l 上时,应有

Δ x = t cos ⁡ α , Δ y = t cos ⁡ β ( Δ x ) 2 + ( Δ y ) 2 = t (10) \begin{array}{c} \Delta x=t \cos \alpha, \Delta y=t \cos \beta \\ \sqrt{(\Delta x)^{2}+(\Delta y)^{2}}=t \end{array}\tag{10} Δx=tcosα,Δy=tcosβ(Δx)2+(Δy)2 =t(10)

所以
lim ⁡ t → 0 + f ( x 0 + t cos ⁡ α , y 0 + t cos ⁡ β ) − f ( x 0 , y 0 ) t = f x ( x 0 , y 0 ) cos ⁡ α + f y ( x 0 , y 0 ) cos ⁡ β (11) \lim _{t \rightarrow 0^{+}} \frac{f\left(x_{0}+t \cos \alpha, y_{0}+t \cos \beta\right)-f\left(x_{0}, y_{0}\right)}{t} \\ =f_{x}\left(x_{0}, y_{0}\right) \cos \alpha+f_{y}\left(x_{0}, y_{0}\right) \cos \beta \tag{11} t0+limtf(x0+tcosα,y0+tcosβ)f(x0,y0)=fx(x0,y0)cosα+fy(x0,y0)cosβ(11)
这就证明了方向导数存在,且其值为

∂ f ∂ l ∣ ( x 0 , y 0 ) = f x ( x 0 , y 0 ) cos ⁡ α + f y ( x 0 , y 0 ) cos ⁡ β (12) \left.\frac{\partial f}{\partial l}\right|_{\left(x_{0}, y_{0}\right)}=f_{x}\left(x_{0}, y_{0}\right) \cos \alpha+f_{y}\left(x_{0}, y_{0}\right) \cos \beta\tag{12} lf(x0,y0)=fx(x0,y0)cosα+fy(x0,y0)cosβ(12)
用x表示多维向量,用u表示方向,用a表示t,即可得到
∂ ∂ α f ( x + α u ) = u T ∇ x f ( x ) = f x ( x 0 , y 0 ) c o s α + f y ( x 0 , y 0 ) c o s β (13) \frac{\partial}{\partial \alpha}f(x+\alpha u) = u^T \nabla_xf(x) = f_x(x0,y0) cos\alpha+f_y(x0,y0) cos\beta\tag{13} αf(x+αu)=uTxf(x)=fx(x0,y0)cosα+fy(x0,y0)cosβ(13)
(7)式第一个等号是花书上给出的,目前仍有疑惑,我的推导如下
令 t = x + α u (14) 令 t = x+\alpha u\tag{14} t=x+αu(14)
∂ ∂ α f ( x + α u ) = ∂ f ( t ) ∂ α = ∂ f ( t ) ∂ t ⋅ ∂ t ∂ α (15) \frac{\partial}{\partial \alpha} f(x+\alpha u)=\frac{\partial f(t)}{\partial \alpha}=\frac{\partial f(t)}{\partial t} \cdot \frac{\partial t}{\partial \alpha}\tag{15} αf(x+αu)=αf(t)=tf(t)αt(15)
= ∂ f ( t ) ∂ t ⋅ ∂ x + α u ∂ α (16) =\frac{\partial f(t)}{\partial t} \cdot \frac{\partial x+\alpha u}{\partial \alpha}\tag{16} =tf(t)αx+αu(16)
= ∂ f ( t ) ∂ t ⋅ u (17) =\frac{\partial f(t)}{\partial t} \cdot u\tag{17} =tf(t)u(17)

= ∇ x f ( x ) T ⋅ u ( 取 α = 0 ) (18) =\nabla_{x} f(x)^T \cdot u (取\alpha=0)\tag{18} =xf(x)Tu(α=0)(18)
希望有明白的人指出我的问题

补充:问题解决了,由于我的公式是基于分母布局(横向),所以在(17)-(18)的时候得到的式子应该是 ∇ x f ( x ) T ⋅ u \nabla_{x} f(x)^T \cdot u xf(x)Tu​​​​(怕误导大家,上面已修改,但是之前是没加转置的),又因为花书中的推导都是基于分子布局的,所以最终结果与我的结果会刚好差一个转置。另外,书中所有的向量都是列向量,尤其是梯度向量。

3.梯度下降

一阶优化方法

4.牛顿法

二阶优化方法

参考链接:数值优化(Numerical Optimization)(3)-牛顿法 - 知乎 (zhihu.com)
f ( x ) ≈ f ( x k ) + ( x − x k ) T ∇ f ( x k ) + 1 2 ( x − x k ) T H ( x − x k ) (19) f(\boldsymbol x) \approx f(\boldsymbol{x_{k}})+\left(\boldsymbol x-\boldsymbol{x_{k}}\right)^T\nabla f\left(\boldsymbol{x_{k}}\right)+\frac{1}{2}\left(\boldsymbol x-\boldsymbol{x_{k}}\right)^{T} H\left(\boldsymbol x-\boldsymbol{x_{k}}\right)\tag{19} f(x)f(xk)+(xxk)Tf(xk)+21(xxk)TH(xxk)(19)
要找到 f ( x ) f(x) f(x)的最小点,对 f f f求导,得
f ′ ( x ) = ∇ f ( x k ) T + 1 2 ( x − x k ) T ( H + H T ) = ∇ f ( x k ) T + ( x − x k ) T H (20) f'(\boldsymbol x) =\nabla f(\boldsymbol{x_k})^T +\frac{1}{2}(\boldsymbol{x-x_k})^T(H+H^T) \\= \nabla f(\boldsymbol{x_k})^T +(\boldsymbol x-\boldsymbol{x_{k}})^TH\tag{20} f(x)=f(xk)T+21(xxk)T(H+HT)=f(xk)T+(xxk)TH(20)
f ′ ( x ) = 0 f'(\boldsymbol x)=\boldsymbol0 f(x)=0​​​,又 H H H为对称矩阵,即 H = H T H = H^T H=HT
x − x k = − ( H − 1 ∗ ∇ f ( x k ) T ) T = − H − 1 ∗ ∇ f ( x k ) (21) \boldsymbol x-\boldsymbol{x_{k}} =- (H^{-1}*\nabla f(\boldsymbol{x_k})^T)^T =- H^{-1}*\nabla f(\boldsymbol{x_k})\tag{21} xxk=(H1f(xk)T)T=H1f(xk)(21)
x k + 1 = x k − H − 1 ∇ f ( x k ) (22) \boldsymbol{x_{k+1}} = \boldsymbol{x_k}-H^{-1}\nabla f(\boldsymbol{x_k})\tag{22} xk+1=xkH1f(xk)(22)

f f f是一个正定二次函数时,牛顿法只要应用一次(22)就能跳到函数最小点

如果 f f f不是真正二次,但能在局部近似为正定二次,牛顿法则需要多次迭代

当附近的临界点是最小点牛顿法才适用,在鞍点附近是有害的

上面这些话我还需要好好琢磨琢磨

猜你喜欢

转载自blog.csdn.net/qq_41335232/article/details/120704310
4.3