机器学习面试必知:拟牛顿法(DFP和BFGS)

牛顿法的特点就是收敛快。但是运用牛顿法需要计算二阶偏导数,而且目标函数的Hesse矩阵可能非正定。为了克服牛顿法的缺点,人们提出了拟牛顿法,它的基本思想是用不包含二阶导数的矩阵近似牛顿法中的Hesse矩阵的逆矩阵。

牛顿法的迭代公式
x ( k + 1 ) = x ( k ) + λ d ( k ) x^{(k+1)}=x^{(k)}+\lambda d^{(k)} d ( k ) = 2 f ( x ( k ) ) 1 f ( x ( k ) ) d^{(k)}=-\bigtriangledown ^{2}f(x^{(k)})^{-1}\bigtriangledown f(x^{(k)})

为了构造 2 f ( x ( k ) ) 1 \bigtriangledown ^{2}f(x^{(k)})^{-1} 的近似矩阵 H k H_{k} ,我们先来分析 2 f ( x ( k ) ) 1 \bigtriangledown ^{2}f(x^{(k)})^{-1} 与一阶导数的关系。将 f ( x ) f(x) 在点 x ( k + 1 ) x^{(k+1)} 展开成泰勒级数 f ( x ) = f ( x ( k + 1 ) ) + f ( x ( k + 1 ) ) T ( x x ( k + 1 ) ) f(x)=f(x^{(k+1)})+\bigtriangledown f(x^{(k+1)})^{T}(x-x^{(k+1)}) + 1 2 ( x x ( k + 1 ) ) T 2 f ( x ( k + 1 ) ) ( x x ( k + 1 ) ) +\frac{1}{2}(x-x^{(k+1)})^{T} \bigtriangledown ^{2}f(x^{(k+1)})(x-x^{(k+1)}) 由此可知,在 x ( k + 1 ) x^{(k+1)} 附近有 f ( x ) f ( x ( k + 1 ) ) + 2 f ( x ( k + 1 ) ) ( x x ( k + 1 ) ) \bigtriangledown f(x) \approx \bigtriangledown f(x^{(k+1)})+\bigtriangledown ^{2}f(x^{(k+1)})(x-x^{(k+1)}) x = x ( k ) x=x^{(k)} f ( x ( k ) ) f ( x ( k + 1 ) ) + 2 f ( x ( k + 1 ) ) ( x ( k ) x ( k + 1 ) ) \bigtriangledown f(x^{(k)}) \approx \bigtriangledown f(x^{(k+1)})+\bigtriangledown ^{2}f(x^{(k+1)})(x^{(k)}-x^{(k+1)}) p ( k ) = x ( k + 1 ) x ( k ) p^{(k)}=x^{(k+1)}-x^{(k)} q ( k ) = f ( x ( k + 1 ) ) f ( x ( k ) ) q^{(k)}=\bigtriangledown f(x^{(k+1)})-\bigtriangledown f(x^{(k)}) q ( k ) 2 f ( x ( k + 1 ) ) p ( k ) q^{(k)}\approx \bigtriangledown ^{2}f(x^{(k+1)})p^{(k)} 如果Hesse矩阵 2 f ( x ( k + 1 ) ) \bigtriangledown ^{2}f(x^{(k+1)}) 可逆则 p ( k ) 2 f ( x ( k + 1 ) ) 1 q ( k ) p^{(k)}\approx \bigtriangledown ^{2}f(x^{(k+1)})^{-1}q^{(k)} 这样计算出p和q后根据上式就能估计Hesse矩阵的逆。因此我们可以用不包含二阶导数的矩阵 H k + 1 H_{k+1} 取代Hesse矩阵的逆 p ( k ) = H k + 1 q ( k ) p^{(k)}=H_{k+1}q^{(k)} 这就是拟牛顿法,接下来所要做的就是确定这个矩阵 H k + 1 H_{k+1}

DFB算法又被称为变尺度法

H k + 1 = H k + p ( k ) p ( k ) T p ( k ) T q ( k ) H k q ( k ) q ( k ) T H k q ( k ) T H k q ( k ) H_{k+1}=H_{k}+\frac{p^{(k)}p^{(k)T}}{p^{(k)T}q^{(k)}}-\frac{H_{k}q^{(k)}q^{(k)T}H_{k}}{q^{(k)T}H_{k}q^{(k)}} 满足 p ( k ) = H k + 1 q ( k ) p^{(k)}=H_{k+1}q^{(k)}
DFB方法计算如下:

  1. 初始化 x ( 1 ) x^{(1)} ,允许误差 ϵ > 0 \epsilon >0
  2. H 1 = I n H_{1}=I_{n} (单位矩阵), k = 1 k=1 ,计算出在 x ( 1 ) x^{(1)} 处的梯度 g 1 = f ( x ( 1 ) ) g_{1}=\bigtriangledown f(x^{(1)})
  3. d ( k ) = H k g k d^{(k)}=-H_{k}g_{k}
  4. x ( k ) x^{(k)} 出发,沿着 d ( k ) d^{(k)} 搜索,求步长 λ k \lambda _{k} ,使其满足 f ( x ( k ) + λ k d ( k ) ) = m i n λ 0 f ( x ( k ) + λ d ( k ) ) f(x^{(k)}+\lambda _{k}d^{(k)})=min_{\lambda \geq0}f(x^{(k)}+\lambda d^{(k)}) 更新 x ( k + 1 ) = x ( k ) + λ k d ( k ) x^{(k+1)}=x^{(k)}+\lambda _{k}d^{(k)}
  5. 检验是否满足收敛准则,若 f ( x ( k + 1 ) ) ϵ ||\bigtriangledown f(x^{(k+1)})|| \leq \epsilon 则停止迭代,得到点 x ^ = x ( k + 1 ) \hat{x}=x^{(k+1)} ;否则进行步骤6
  6. k = n k=n ,则令 x ( 1 ) = x ( k + 1 ) x^{(1)}=x^{(k+1)} ,返回步骤2;否则进行步骤7
  7. g k + 1 = f ( x ( k + 1 ) ) g_{k+1}=\bigtriangledown f(x^{(k+1)}) p ( k ) = x ( k + 1 ) x ( k ) p^{(k)}=x^{(k+1)}-x^{(k)} q ( k ) = g k + 1 g k q^{(k)}=g_{k+1}-g_{k} 计算 H k + 1 = H k + p ( k ) p ( k ) T p ( k ) T q ( k ) H k q ( k ) q ( k ) T H k q ( k ) T H k q ( k ) H_{k+1}=H_{k}+\frac{p^{(k)}p^{(k)T}}{p^{(k)T}q^{(k)}}-\frac{H_{k}q^{(k)}q^{(k)T}H_{k}}{q^{(k)T}H_{k}q^{(k)}} k=k+1,返回步骤3

BFGS
H k + 1 B F G S = H k + ( 1 + q ( k ) T H k q ( k ) p ( k ) T q ( k ) ) p ( k ) p ( k ) T p ( k ) T q ( k ) p ( k ) q ( k ) T H k + H k q ( k ) p ( k ) T p ( k ) T q ( k ) H_{k+1}^{BFGS}=H_{k}+(1+\frac{q^{(k)T}H_{k}q^{(k)}}{p^{(k)T}q^{(k)}})\frac{p^{(k)}p^{(k)T}}{p^{(k)T}q^{(k)}}-\frac{p^{(k)}q^{(k)T}H_{k}+H_{k}q^{(k)}p^{(k)T}}{p^{(k)T}q^{(k)}}

猜你喜欢

转载自blog.csdn.net/Neekity/article/details/88394056