牛顿法与拟牛顿法摘记
admin
2024-02-18 07:46:40
0

牛顿法与拟牛顿法摘记

  • 1 牛顿法
  • 2 拟牛顿法
    • 2.1 拟牛顿条件
    • 2.2 秩1校正
    • 2.3 DFP算法(变尺度法)
    • 2.4 BFGS公式

1 牛顿法

\qquad除了可以采用最速下降法求解“无约束最优化问题”,另一种常用的方法就是牛顿法。设 f(x)f(\boldsymbol{x})f(x) 为二次可微的实函数,在第 kkk 次迭代点 x(k)\boldsymbol{x}^{(k)}x(k) 附近用二阶泰勒级数展开式 ϕ(x)\phi(\boldsymbol{x})ϕ(x) 来近似,也就是

f(x)≈ϕ(x)=f(x(k))+∇f(x(k))T(x−x(k))+12(x−x(k))T∇2f(x(k))(x−x(k))2\qquad\qquad f(\boldsymbol{x})\approx\phi(\boldsymbol{x})=f(\boldsymbol{x}^{(k)})+\nabla f(\boldsymbol{x}^{(k)})^T(\boldsymbol{x}-\boldsymbol{x}^{(k)})+\dfrac{1}{2}(\boldsymbol{x}-\boldsymbol{x}^{(k)})^T\nabla^2f(\boldsymbol{x}^{(k)})(\boldsymbol{x}-\boldsymbol{x}^{(k)})^2f(x)≈ϕ(x)=f(x(k))+∇f(x(k))T(x−x(k))+21​(x−x(k))T∇2f(x(k))(x−x(k))2

\qquad根据一阶必要条件,令 ϕ′(x)=0\phi^{\prime}(\boldsymbol{x})=0ϕ′(x)=0,可得到

ϕ′(x)=∇f(x(k))+∇2f(x(k))(x−x(k))=0\qquad\qquad\phi^{\prime}(\boldsymbol{x})=\nabla f(\boldsymbol{x}^{(k)})+\nabla^2f(\boldsymbol{x}^{(k)})(\boldsymbol{x}-\boldsymbol{x}^{(k)})=0ϕ′(x)=∇f(x(k))+∇2f(x(k))(x−x(k))=0

\qquad将 ϕ(x)\phi(\boldsymbol{x})ϕ(x) 的驻点作为下一个迭代点 x(k+1)\boldsymbol{x}^{(k+1)}x(k+1) 就得到牛顿法的迭代公式

x(k+1)=x(k)−∇2f(x(k))−1∇f(x(k))\qquad\qquad\boldsymbol{x}^{(k+1)}=\boldsymbol{x}^{(k)}-\nabla^2f(\boldsymbol{x}^{(k)})^{-1}\nabla f(\boldsymbol{x}^{(k)})x(k+1)=x(k)−∇2f(x(k))−1∇f(x(k))
\qquad
∙\bullet∙ 实现代码

import numpy as np
import matplotlib.pyplot as plt
import time
def diffMat(f,x,h=0.001):	dx = np.array([h,0])dy = np.array([0,h])gx = (f(x+dx)-f(x-dx))/(2*h)gy = (f(x+dy)-f(x-dy))/(2*h)grad = np.asmatrix(np.array([gx,gy])).T       g2x = (f(x+dx) + f(x-dx) - 2*f(x))/(h*h)g2y = (f(x+dy) + f(x-dy) - 2*f(x))/(h*h)    gx2 = (f(x+dy+dx)-f(x+dy-dx))/(2*h)gx1 = (f(x-dy+dx)-f(x-dy-dx))/(2*h)    gy2 = (f(x+dx+dy)-f(x+dx-dy))/(2*h)gy1 = (f(x-dx+dy)-f(x-dx-dy))/(2*h)   gxy1 = (gx2 - gx1)/(2*h)gxy2 = (gy2 - gy1)/(2*h)hesse = np.matrix([[g2x,gxy1],[gxy2,g2y]])return grad,hesse
def newton(f,x0):k=1xk = x0while True:print('#',k)k = k+1        grad, hesse = diffMat(f,xk)d = np.asarray(hesse.I*grad).flatten()print('grad:\n',np.round(grad,4),'\nhesse:\n',np.round(hesse,4))xk = xk - dprint('d:',d/np.linalg.norm(d))print('x[{}]:{}\n'.format(k,np.round(xk,4)))if np.linalg.norm(grad)<0.00001:breakreturn xk
if __name__ == "__main__":       f = lambda x: (x[0]-1)**4 + (x[1]-0)**2 x0 = np.array([0,1],dtype='float')time0 = time.process_time()minval = newton(f,x0)time1 = time.process_time()print('x:',np.round(minval,4),'minval:',np.round(f(minval),4))print('time: %fs'%(time1-time0))

运行结果:

# 1
grad:[[-4.][ 2.]] 
hesse:[[12.  0.][ 0.  2.]]
d: [-0.316228    0.94868322]
x[2]:[0.3333 0.    ]
# 2
grad:[[-1.1852][ 0.    ]] 
hesse:[[5.3333 0.    ][0.     2.    ]]
d: [-1.00000000e+00  6.28995937e-10]
x[3]:[0.5556 0.    ]
(略)
# 12
grad:[[-0.][ 0.]] 
hesse:[[1.6e-03 0.0e+00][0.0e+00 2.0e+00]]
d: [-1.  0.]
x[13]:[0.9923 0.    ]x: [0.9923 0.    ] minval: 0.0
time: 0.000000s

\qquad

2 拟牛顿法

\qquad牛顿法的优点是收敛很快,但是牛顿法的每次迭代过程中都需要计算二阶偏导数、求 Hesse\text{Hesse}Hesse 矩阵的逆矩阵,而目标函数的 Hesse\text{Hesse}Hesse 矩阵可能是非正定的。

\qquad拟牛顿法(Quasi-Newton Method)\text{(Quasi-Newton\ Method)}(Quasi-Newton Method)是为了克服牛顿法的缺点而提出了“拟牛顿条件”,其基本思想是用“不包含二阶导数的矩阵”来近似牛顿法中 Hesse\text{Hesse}Hesse 矩阵的逆矩阵。
\qquad

2.1 拟牛顿条件

\qquad拟牛顿法是构造近似矩阵 HkH_kHk​ 替代 “Hesse\text{Hesse}Hesse 矩阵的逆矩阵”件,即:Hk≈∇2f(x(k))−1H_k\approx\nabla^2f(\boldsymbol x^{(k)})^{-1}Hk​≈∇2f(x(k))−1。拟牛顿条件是构造 HkH_kHk​ 替代 ∇2f(x(k))−1\nabla^2f(\boldsymbol x^{(k)})^{-1}∇2f(x(k))−1 执行牛顿法迭代运算时所需要满足的条件。

\qquad使用一维搜索时,牛顿法的迭代公式为:

x(k+1)=x(k)−λk∇2f(x(k))−1∇f(x(k))\qquad\qquad\boldsymbol x^{(k+1)}=\boldsymbol x^{(k)}-\lambda_k\nabla^2f(\boldsymbol x^{(k)})^{-1}\nabla f(\boldsymbol x^{(k)})x(k+1)=x(k)−λk​∇2f(x(k))−1∇f(x(k))

\qquad在第 kkk 次迭代后得到了点 x(k+1)\boldsymbol x^{(k+1)}x(k+1),将目标函数在点 x(k+1)\boldsymbol x^{(k+1)}x(k+1) 进行二阶泰勒级数展开:

f(x)≈f(x(k+1))+∇f(x(k+1))T(x−x(k+1))+12(x−x(k+1))T∇2f(x(k+1))(x−x(k+1))\qquad\qquad\textcolor{brown}{f(\boldsymbol x)\approx f(\boldsymbol x^{(k+1)})+\nabla f(\boldsymbol x^{(k+1)})^T(\boldsymbol x-\boldsymbol x^{(k+1)})+\dfrac{1}{2}(\boldsymbol x-\boldsymbol x^{(k+1)})^T\nabla^2f(\boldsymbol x^{(k+1)})(\boldsymbol x-\boldsymbol x^{(k+1)})}f(x)≈f(x(k+1))+∇f(x(k+1))T(x−x(k+1))+21​(x−x(k+1))T∇2f(x(k+1))(x−x(k+1))

\qquad对上式两端取梯度,就得到:

∇f(x)≈∇f(x(k+1))+∇2f(x(k+1))(x−x(k+1))\qquad\qquad\textcolor{royalblue}{\nabla f(\boldsymbol x)\approx\nabla f(\boldsymbol x^{(k+1)})+\nabla^2f(\boldsymbol x^{(k+1)})(\boldsymbol x-\boldsymbol x^{(k+1)})}∇f(x)≈∇f(x(k+1))+∇2f(x(k+1))(x−x(k+1))

\qquad将 x=x(k)\boldsymbol x = \boldsymbol x^{(k)}x=x(k) 带入上式:

∇f(x(k))≈∇f(x(k+1))+∇2f(x(k+1))(x(k)−x(k+1))\qquad\qquad\textcolor{crimson}{\nabla f(\boldsymbol x^{(k)})}\approx\textcolor{royalblue}{\nabla f(\boldsymbol x^{(k+1)})}+\nabla^2f(\boldsymbol x^{(k+1)})(\textcolor{crimson}{\boldsymbol x^{(k)}}-\textcolor{royalblue}{\boldsymbol x^{(k+1)}})∇f(x(k))≈∇f(x(k+1))+∇2f(x(k+1))(x(k)−x(k+1))

\qquad
\qquad在上式中,令 p(k)=x(k+1)−x(k)\textcolor{royalblue}{\boldsymbol p^{(k)}=\boldsymbol x^{(k+1)}-\boldsymbol x^{(k)}}p(k)=x(k+1)−x(k),q(k)=∇f(x(k+1))−∇f(x(k))\textcolor{crimson}{\boldsymbol q^{(k)}=\nabla f(\boldsymbol x^{(k+1)})-\nabla f(\boldsymbol x^{(k)})}q(k)=∇f(x(k+1))−∇f(x(k)),那么:

q(k)≈∇2f(x(k+1))p(k)\qquad\qquad\textcolor{crimson}{\boldsymbol q^{(k)}}\approx\nabla^2f(\boldsymbol x^{(k+1)})\textcolor{royalblue}{\boldsymbol p^{(k)}}q(k)≈∇2f(x(k+1))p(k)

\qquad
\qquad若矩阵 ∇2f(x(k+1))\nabla^2f(\boldsymbol x^{(k+1)})∇2f(x(k+1)) 可逆,则:p(k)≈∇2f(x(k+1))−1q(k)\textcolor{royalblue}{\boldsymbol p^{(k)}}\approx\nabla^2f(\boldsymbol x^{(k+1)})^{-1}\textcolor{crimson}{\boldsymbol q^{(k)}}p(k)≈∇2f(x(k+1))−1q(k)

\qquad若记 Hk+1=∇2f(x(k+1))−1H_{k+1}=\nabla^2f(\boldsymbol x^{(k+1)})^{-1}Hk+1​=∇2f(x(k+1))−1,那么

p(k)=Hk+1q(k)\qquad\qquad\textcolor{royalblue}{\boldsymbol p^{(k)}}= H_{k+1}\textcolor{crimson}{\boldsymbol q^{(k)}}p(k)=Hk+1​q(k) (拟牛顿条件

\qquad拟牛顿法就是为了构造拟牛顿法中的矩阵 Hk+1H_{k+1}Hk+1​ 而提出的一类算法。
\qquad

2.2 秩1校正

\qquad当 ∇2f(x(k+1))−1\nabla^2f(\boldsymbol x^{(k+1)})^{-1}∇2f(x(k+1))−1 是 nnn 阶对称正定矩阵时,满足拟牛顿条件的近似矩阵 Hk+1H_{k+1}Hk+1​ 也是 nnn 阶对称正定矩阵。构造这样的近似矩阵,可以通过不断修正 HkH_{k}Hk​ 来构造下一个 Hk+1H_{k+1}Hk+1​,通常令 H1=IH_{1}=\bold IH1​=I,那么 Hk+1H_{k+1}Hk+1​ 的修正过程为

Hk+1=Hk+ΔHk\qquad\qquad H_{k+1}=H_k+\textcolor{crimson}{\Delta H_k}Hk+1​=Hk​+ΔHk​, 其中 ΔHk\textcolor{crimson}{\Delta H_k}ΔHk​ 为校正矩阵

\qquad
\qquad确定 ΔHk\textcolor{crimson}{\Delta H_k}ΔHk​ 的方法之一是,令 ΔHk=αkz(k)(z(k))T\textcolor{SeaGreen}{\Delta H_k= \alpha_k \boldsymbol z^{(k)} (\boldsymbol z^{(k)})^T}ΔHk​=αk​z(k)(z(k))T,其中 αk\alpha_kαk​ 是一个常数,z(k)∈Rn\boldsymbol z^{(k)}\in R^nz(k)∈Rn 是 nnn 维列向量。显然,矩阵 ΔHk\textcolor{crimson}{\Delta H_k}ΔHk​ 的秩为 111,且 z(k)\boldsymbol z^{(k)}z(k) 应该满足拟牛顿条件 p(k)=Hk+1q(k)\textcolor{royalblue}{\boldsymbol p^{(k)}}=H_{k+1}\textcolor{crimson}{\boldsymbol q^{(k)}}p(k)=Hk+1​q(k),那么

p(k)=Hk+1q(k)=Hkq(k)+ΔHkq(k)=Hkq(k)+αkz(k)(z(k))Tq(k)\qquad\qquad \begin{aligned}\textcolor{royalblue}{\boldsymbol p^{(k)}}&=H_{k+1}\textcolor{crimson}{\boldsymbol q^{(k)}}\\ &=H_k\boldsymbol q^{(k)}+\Delta H_k\textcolor{crimson}{\boldsymbol q^{(k)}}\\ &=H_k\boldsymbol q^{(k)}+\alpha_k \boldsymbol z^{(k)} \textcolor{SeaGreen}{(\boldsymbol z^{(k)})^T\boldsymbol q^{(k)}}\end{aligned}p(k)​=Hk+1​q(k)=Hk​q(k)+ΔHk​q(k)=Hk​q(k)+αk​z(k)(z(k))Tq(k)​

\qquad由于 (z(k))Tq(k)\textcolor{SeaGreen}{(\boldsymbol z^{(k)})^T\boldsymbol q^{(k)}}(z(k))Tq(k) 是两个 nnn 维列向量的内积,其值为常数,可得到: z(k)=p(k)−Hkq(k)αk(z(k))Tq(k)\boldsymbol z^{(k)} =\dfrac{\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)}}{\alpha_k \textcolor{crimson}{(\boldsymbol z^{(k)})^T\boldsymbol q^{(k)}}}z(k)=αk​(z(k))Tq(k)p(k)−Hk​q(k)​

\qquad两边都左乘 (q(k))T(\boldsymbol q^{(k)})^T(q(k))T,可得到:

(q(k))Tz(k)=(z(k))Tq(k)=(q(k))T(p(k)−Hkq(k))αk(z(k))Tq(k)\qquad\qquad (\boldsymbol q^{(k)})^T\boldsymbol z^{(k)}=\textcolor{SeaGreen}{(\boldsymbol z^{(k)})^T\boldsymbol q^{(k)}} =\dfrac{(\boldsymbol q^{(k)})^T(\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)})}{\alpha_k \textcolor{SeaGreen}{(\boldsymbol z^{(k)})^T\boldsymbol q^{(k)}}}(q(k))Tz(k)=(z(k))Tq(k)=αk​(z(k))Tq(k)(q(k))T(p(k)−Hk​q(k))​

⟹(q(k))T(p(k)−Hkq(k))=αk((z(k))Tq(k))2\qquad\qquad \Longrightarrow\quad\textcolor{blue}{(\boldsymbol q^{(k)})^T(\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)})}=\alpha_k (\textcolor{crimson}{(\boldsymbol z^{(k)})^T\boldsymbol q^{(k)}})^2⟹(q(k))T(p(k)−Hk​q(k))=αk​((z(k))Tq(k))2

xTy=yTx,∀x,y∈Rn⟹(z(k))Tq(k)=(q(k))Tz(k)\boldsymbol x^T\boldsymbol y=\boldsymbol y^T\boldsymbol x,\ \forall\boldsymbol x,\boldsymbol y\in R^n\quad\Longrightarrow\quad(\boldsymbol z^{(k)})^T\boldsymbol q^{(k)}=(\boldsymbol q^{(k)})^T\boldsymbol z^{(k)}xTy=yTx, ∀x,y∈Rn⟹(z(k))Tq(k)=(q(k))Tz(k)

\qquad可得到:

ΔHk=αkz(k)(z(k))T=αkp(k)−Hkq(k)αk(z(k))Tq(k)(p(k)−Hkq(k)αk(z(k))Tq(k))T,(z(k))Tq(k)为常量=αk(p(k)−Hkq(k))(p(k)−Hkq(k))Tαk((z(k))Tq(k))2=αk(p(k)−Hkq(k))(p(k)−Hkq(k))T(q(k))T(p(k)−Hkq(k))\qquad\qquad \begin{aligned}\textcolor{DeepPink}{\Delta H_k}&=\alpha_k \boldsymbol z^{(k)} (\boldsymbol z^{(k)})^T\\ &=\alpha_k\dfrac{\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)}}{\alpha_k \textcolor{crimson}{(\boldsymbol z^{(k)})^T\boldsymbol q^{(k)}}}\left(\dfrac{\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)}}{\alpha_k \textcolor{crimson}{(\boldsymbol z^{(k)})^T\boldsymbol q^{(k)}}}\right)^T,\qquad\textcolor{crimson}{(\boldsymbol z^{(k)})^T\boldsymbol q^{(k)}}为常量\\ &=\alpha_k\dfrac{\left(\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)}\right)\left(\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)}\right)^T}{\alpha_k\left( \textcolor{crimson}{(\boldsymbol z^{(k)})^T\boldsymbol q^{(k)}}\right)^2}\\ &=\alpha_k\dfrac{\left(\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)}\right)\left(\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)}\right)^T}{\textcolor{blue}{(\boldsymbol q^{(k)})^T(\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)})}}\end{aligned}ΔHk​​=αk​z(k)(z(k))T=αk​αk​(z(k))Tq(k)p(k)−Hk​q(k)​(αk​(z(k))Tq(k)p(k)−Hk​q(k)​)T,(z(k))Tq(k)为常量=αk​αk​((z(k))Tq(k))2(p(k)−Hk​q(k))(p(k)−Hk​q(k))T​=αk​(q(k))T(p(k)−Hk​q(k))(p(k)−Hk​q(k))(p(k)−Hk​q(k))T​​

\qquad
\qquad因此,Hk+1H_{k+1}Hk+1​ 的修正过程称为秩1校正公式,也就是:

Hk+1=Hk+ΔHk=Hk+αk(p(k)−Hkq(k))(p(k)−Hkq(k))T(q(k))T(p(k)−Hkq(k))\qquad\qquad H_{k+1}=H_k+\textcolor{DeepPink}{\Delta H_k}=H_k+\alpha_k\dfrac{\left(\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)}\right)\left(\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)}\right)^T}{\textcolor{blue}{(\boldsymbol q^{(k)})^T(\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)})}}Hk+1​=Hk​+ΔHk​=Hk​+αk​(q(k))T(p(k)−Hk​q(k))(p(k)−Hk​q(k))(p(k)−Hk​q(k))T​

\qquad
\qquad基于秩1校正公式拟牛顿法计算步骤为:

(1)\qquad(1)(1) 搜索方向为 d(k)=−Hk∇f(x(k)),H1=I\boldsymbol d^{(k)}=-H_k\nabla f(\boldsymbol x^{(k)}),\quad H_1=\bold Id(k)=−Hk​∇f(x(k)),H1​=I

(2)\qquad(2)(2) 沿 d(k)\boldsymbol d^{(k)}d(k) 进行一维搜索求出 x(k+1)\boldsymbol x^{(k+1)}x(k+1)

(3)\qquad(3)(3) 求出 p(k)=x(k+1)−x(k)\boldsymbol p^{(k)}=\boldsymbol x^{(k+1)}-\boldsymbol x^{(k)}p(k)=x(k+1)−x(k) 以及 q(k)=∇f(x(k+1))−∇f(x(k))\boldsymbol q^{(k)}=\nabla f(\boldsymbol x^{(k+1)})-\nabla f(\boldsymbol x^{(k)})q(k)=∇f(x(k+1))−∇f(x(k)),计算近似矩阵

Hk+1=Hk+αk(p(k)−Hkq(k))(p(k)−Hkq(k))T(q(k))T(p(k)−Hkq(k))\qquad\qquad H_{k+1}=H_k+\alpha_k\dfrac{\left(\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)}\right)\left(\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)}\right)^T}{\textcolor{blue}{(\boldsymbol q^{(k)})^T(\boldsymbol p^{(k)}-H_k\boldsymbol q^{(k)})}}Hk+1​=Hk​+αk​(q(k))T(p(k)−Hk​q(k))(p(k)−Hk​q(k))(p(k)−Hk​q(k))T​

\qquad\quad重新回到第 (1)(1)(1) 步开始下一轮的计算,直到 ∥∇f(x(k))∥<ε\Vert \nabla f(\boldsymbol x^{(k)})\Vert<\varepsilon∥∇f(x(k))∥<ε。
\qquad
∙\bullet∙ 实现代码

import numpy as np
import matplotlib.pyplot as plt
import time
def diff1(f,x,h=0.001):dx = np.array([h,0])dy = np.array([0,h])gx = (f(x+dx)-f(x-dx))/(2*h)gy = (f(x+dy)-f(x-dy))/(2*h)return np.array([gx,gy])
def newton1d(f,xk,d,h=0.001):dif1 = (f(xk+h*d) - f(xk-h*d))/(2*h)dif2 = (f(xk+h*d) + f(xk-h*d) - 2*f(xk))/(h*h)deltax = dif1/dif2xk1 = xk - deltax * dreturn xk1
def newton(f,x0):k=1xk = x0Hk = np.eye(x0.shape[0])while True:gk = diff1(f,xk)        dk = -np.dot(Hk,gk)xk1 = newton1d(f,xk,dk)	gk1 = diff1(f,xk1)if np.linalg.norm(gk)<0.0001:breakpk = xk1 - xkqk = gk1 - gkt1 = (pk-Hk.dot(qk)).reshape(x0.shape[0],-1)deltHk = t1.dot(t1.T)/qk.dot(pk-Hk.dot(qk))Hk = Hk + deltHkxk = xk1print('#{} min:{}'.format(k,np.round(xk,4)))k = k + 1return xk
if __name__ == "__main__":       f = lambda x: (x[0]-1)**4 + (x[1]-0)**2 x0 = np.array([0,1],dtype='float')time0 = time.process_time()minval = newton(f,x0)time1 = time.process_time()print('x:',np.round(minval,4),'minval:',np.round(f(minval),4))print('time: %fs'%(time1-time0))

运行结果:

#1 min:[0.4 0.8]
#2 min:[ 0.4084 -0.0043]
#3 min:[6.056e-01 1.000e-04]
#4 min:[ 0.7371 -0.    ]
#5 min:[0.8247 0.    ]
#6 min:[ 0.8831 -0.    ]
#7 min:[0.9221 0.    ]
#8 min:[ 0.9481 -0.    ]
#9 min:[0.9654 0.    ]
#10 min:[ 0.9769 -0.    ]
x: [ 0.9769 -0.    ] minval: 0.0
time: 0.000000s

\quad

2.3 DFP算法(变尺度法)

\qquad变尺度法,也就是DFP\text{DFP}DFP法,其校正矩阵为:

ΔHk=p(k)(p(k))T(p(k))Tq(k)−Hkq(k)(q(k))THk(q(k))THkq(k)\qquad\qquad\Delta H_k=\dfrac{\boldsymbol p^{(k)}(\boldsymbol p^{(k)})^T}{(\boldsymbol p^{(k)})^T\boldsymbol q^{(k)}}-\dfrac{H_k\boldsymbol q^{(k)}(\boldsymbol q^{(k)})^TH_k}{(\boldsymbol q^{(k)})^TH_k\boldsymbol q^{(k)}}ΔHk​=(p(k))Tq(k)p(k)(p(k))T​−(q(k))THk​q(k)Hk​q(k)(q(k))THk​​

\qquad近似矩阵 Hk+1H_{k+1}Hk+1​ 的修正过程为:

Hk+1=Hk+ΔHk=Hk+p(k)(p(k))T(p(k))Tq(k)−Hkq(k)(q(k))THk(q(k))THkq(k)\qquad\qquad H_{k+1}=H_k+\textcolor{DeepPink}{\Delta H_k}=H_k+\dfrac{\boldsymbol p^{(k)}(\boldsymbol p^{(k)})^T}{(\boldsymbol p^{(k)})^T\boldsymbol q^{(k)}}-\dfrac{H_k\boldsymbol q^{(k)}(\boldsymbol q^{(k)})^TH_k}{(\boldsymbol q^{(k)})^TH_k\boldsymbol q^{(k)}}Hk+1​=Hk​+ΔHk​=Hk​+(p(k))Tq(k)p(k)(p(k))T​−(q(k))THk​q(k)Hk​q(k)(q(k))THk​​

DFP\qquad\text{DFP}DFP法具有二次终止性,对于正定二次函数的目标函数,经过有限次搜索可以达到极小点。

\qquad
DFP\qquad\text{DFP}DFP法的计算步骤为:

(1)\qquad(1)(1) 搜索方向为 d(k)=−Hk∇f(x(k)),H1=I\boldsymbol d^{(k)}=-H_k\nabla f(\boldsymbol x^{(k)}),\quad H_1=\bold Id(k)=−Hk​∇f(x(k)),H1​=I

(2)\qquad(2)(2) 沿 d(k)\boldsymbol d^{(k)}d(k) 进行一维搜索求出 λk\lambda_kλk​ 满足 x(k+1)\boldsymbol x^{(k+1)}x(k+1)

(3)\qquad(3)(3) 求出 p(k)=x(k+1)−x(k)\boldsymbol p^{(k)}=\boldsymbol x^{(k+1)}-\boldsymbol x^{(k)}p(k)=x(k+1)−x(k) 以及 q(k)=∇f(x(k+1))−∇f(x(k))\boldsymbol q^{(k)}=\nabla f(\boldsymbol x^{(k+1)})-\nabla f(\boldsymbol x^{(k)})q(k)=∇f(x(k+1))−∇f(x(k)),计算近似矩阵

Hk+1=Hk+p(k)(p(k))T(p(k))Tq(k)−Hkq(k)(q(k))THk(q(k))THkq(k)\qquad\qquad H_{k+1}=H_k+\dfrac{\boldsymbol p^{(k)}(\boldsymbol p^{(k)})^T}{(\boldsymbol p^{(k)})^T\boldsymbol q^{(k)}}-\dfrac{H_k\boldsymbol q^{(k)}(\boldsymbol q^{(k)})^TH_k}{(\boldsymbol q^{(k)})^TH_k\boldsymbol q^{(k)}}Hk+1​=Hk​+(p(k))Tq(k)p(k)(p(k))T​−(q(k))THk​q(k)Hk​q(k)(q(k))THk​​

\qquad\qquad重新回到第 (1)(1)(1) 步开始下一轮的计算,直到 ∥∇f(x(k))∥<ε\Vert \nabla f(\boldsymbol x^{(k)})\Vert<\varepsilon∥∇f(x(k))∥<ε。
\qquad
∙\bullet∙ 实现代码

import numpy as np
import matplotlib.pyplot as plt
import time
def diff1(f,x,h=0.001):dx = np.array([h,0])dy = np.array([0,h])gx = (f(x+dx)-f(x-dx))/(2*h)gy = (f(x+dy)-f(x-dy))/(2*h)return np.array([gx,gy])
def newton1d(f,xk,d,h=0.001):dif1 = (f(xk+h*d) - f(xk-h*d))/(2*h)dif2 = (f(xk+h*d) + f(xk-h*d) - 2*f(xk))/(h*h)deltax = dif1/dif2xk1 = xk - deltax * dreturn xk1
def newton(f,x0):k=1xk = x0Hk = np.eye(x0.shape[0])while True:gk = diff1(f,xk)        dk = -np.dot(Hk,gk)xk1 = newton1d(f,xk,dk)	gk1 = diff1(f,xk1)if np.linalg.norm(gk)<0.0001:breakpk = xk1 - xkqk = gk1 - gkpkm = (xk1 - xk).reshape(x0.shape[0],-1)qkm = (gk1 - gk).reshape(x0.shape[0],-1)deltHk = pkm.dot(pkm.T)/pk.dot(qk) - Hk.dot(qkm.dot(qkm.T)).dot(Hk)/qk.dot(Hk.dot(qk))Hk = Hk + deltHkxk = xk1print('#{} min:{}'.format(k,np.round(xk,4)))k = k + 1return xk
if __name__ == "__main__":       f = lambda x: (x[0]-1)**4 + (x[1]-0)**2 # f = lambda x: 2*(x[0]-0)**2 + (x[1]-0)**2 - 4*x[0] + 2x0 = np.array([0,1],dtype='float')time0 = time.process_time()minval = newton(f,x0)time1 = time.process_time()print('x:',np.round(minval,4),'minval:',np.round(f(minval),4))print('time: %fs'%(time1-time0))

运行结果:

#1 min:[0.4 0.8]
#2 min:[ 0.4064 -0.0033]
#3 min:[0.6043 0.    ]
#4 min:[ 0.7362 -0.    ]
#5 min:[0.8241 0.    ]
#6 min:[ 0.8828 -0.    ]
#7 min:[0.9218 0.    ]
#8 min:[ 0.9479 -0.    ]
#9 min:[0.9653 0.    ]
#10 min:[ 0.9768 -0.    ]
x: [ 0.9768 -0.    ] minval: 0.0
time: 0.000000s

\qquad

2.4 BFGS公式

\qquad在讨论拟牛顿条件时,首先构造出 q(k)≈∇2f(x(k+1))p(k)\textcolor{crimson}{\boldsymbol q^{(k)}}\approx\nabla^2f(\boldsymbol x^{(k+1)})\textcolor{royalblue}{\boldsymbol p^{(k)}}q(k)≈∇2f(x(k+1))p(k) 的关系,然后记 Hk+1=∇2f(x(k+1))−1H_{k+1}=\nabla^2f(\boldsymbol x^{(k+1)})^{-1}Hk+1​=∇2f(x(k+1))−1,得到了 p(k)=Hk+1q(k)\textcolor{royalblue}{\boldsymbol p^{(k)}}= H_{k+1}\textcolor{crimson}{\boldsymbol q^{(k)}}p(k)=Hk+1​q(k) 的拟牛顿条件,前文中的秩111校正和DFP\text{DFP}DFP公式都是基于 p(k)=Hk+1q(k)\textcolor{royalblue}{\boldsymbol p^{(k)}}= H_{k+1}\textcolor{crimson}{\boldsymbol q^{(k)}}p(k)=Hk+1​q(k) 的拟牛顿条件
\qquad
\qquad而BFGS\text{BFGS}BFGS公式直接基于 q(k)≈∇2f(x(k+1))p(k)\textcolor{crimson}{\boldsymbol q^{(k)}}\approx\nabla^2f(\boldsymbol x^{(k+1)})\textcolor{royalblue}{\boldsymbol p^{(k)}}q(k)≈∇2f(x(k+1))p(k) 的关系,用不含二阶导数的矩阵 Bk+1B_{k+1}Bk+1​ 近似 ∇2f(x(k+1))\nabla^2f(\boldsymbol x^{(k+1)})∇2f(x(k+1)),从而有 q(k)=Bk+1p(k)\textcolor{crimson}{\boldsymbol q^{(k)}}=B_{k+1}\textcolor{royalblue}{\boldsymbol p^{(k)}}q(k)=Bk+1​p(k),关于 BkB_{k}Bk​ 的修正公式为:

Bk+1=Bk+q(k)(q(k))T(q(k))Tp(k)−Bkp(k)(p(k))TBk(p(k))TBkp(k)\qquad\qquad B_{k+1}=B_k+\dfrac{\boldsymbol q^{(k)}(\boldsymbol q^{(k)})^T}{(\boldsymbol q^{(k)})^T\boldsymbol p^{(k)}}-\dfrac{B_k\boldsymbol p^{(k)}(\boldsymbol p^{(k)})^TB_k}{(\boldsymbol p^{(k)})^TB_k\boldsymbol p^{(k)}}Bk+1​=Bk​+(q(k))Tp(k)q(k)(q(k))T​−(p(k))TBk​p(k)Bk​p(k)(p(k))TBk​​

\qquad上式即为关于矩阵 B\text{B}B 的 BFGS\text{BFGS}BFGS 修正公式。
 
\qquad由于 q(k)=Bk+1p(k)\textcolor{crimson}{\boldsymbol q^{(k)}}=B_{k+1}\textcolor{royalblue}{\boldsymbol p^{(k)}}q(k)=Bk+1​p(k),因此 p(k)=Bk+1−1q(k)\textcolor{royalblue}{\boldsymbol p^{(k)}}= B_{k+1}^{-1}\textcolor{crimson}{\boldsymbol q^{(k)}}p(k)=Bk+1−1​q(k),也就相当于令 Hk+1=Bk+1−1H_{k+1}=B_{k+1}^{-1}Hk+1​=Bk+1−1​ 时的拟牛顿法。
 
\qquad利用Sherman-Morrison\text{Sherman-Morrison}Sherman-Morrison公式:

(M+uvT)−1=M−1−M−1uvTM−11+vTM−1u\qquad\qquad(\boldsymbol{M}+\boldsymbol{u}\boldsymbol{v}^T)^{-1}=\boldsymbol{M}^{-1}-\dfrac{\boldsymbol{M}^{-1} \boldsymbol{u}\boldsymbol{v}^T \boldsymbol{M}^{-1}}{1+\boldsymbol{v}^T \boldsymbol{M}^{-1}\boldsymbol{u}}(M+uvT)−1=M−1−1+vTM−1uM−1uvTM−1​

\qquad可以求出 Bk+1−1B_{k+1}^{-1}Bk+1−1​,从而得到关于 H\boldsymbol{H}H 的BFGS\text{BFGS}BFGS公式:

Hk+1BFGS=Hk+(1+(q(k))THkq(k)(p(k))Tq(k))p(k)(p(k))T(p(k))Tq(k)−p(k)(q(k))THk+Hkq(k)(p(k))T(p(k))Tq(k)\qquad\qquad H_{k+1}^{\text{BFGS}}=H_k+\left(1+\dfrac{(\boldsymbol q^{(k)})^TH_k \boldsymbol q^{(k)} }{(\boldsymbol p^{(k)})^T\boldsymbol q^{(k)}} \right)\dfrac{\boldsymbol p^{(k)}(\boldsymbol p^{(k)})^T}{(\boldsymbol p^{(k)})^T\boldsymbol q^{(k)}}-\dfrac{\boldsymbol p^{(k)}(\boldsymbol q^{(k)})^TH_k + H_k\boldsymbol q^{(k)}(\boldsymbol p^{(k)})^T }{(\boldsymbol p^{(k)})^T\boldsymbol q^{(k)}}Hk+1BFGS​=Hk​+(1+(p(k))Tq(k)(q(k))THk​q(k)​)(p(k))Tq(k)p(k)(p(k))T​−(p(k))Tq(k)p(k)(q(k))THk​+Hk​q(k)(p(k))T​

\qquad
DFP\qquad\text{DFP}DFP公式和BFGS\text{BFGS}BFGS公式都是由 p(k)\boldsymbol p^{(k)}p(k) 和 Hkq(k)H_k\boldsymbol q^{(k)}Hk​q(k) 构成的对称秩222校正,因此两个公式的加权组合具有相同的形式,所有这类修正公式的集合可以表示为:

Hk+1ϕ=(1−ϕ)Hk+1DFP+ϕHk+1BFGS\qquad\qquad H_{k+1}^{\phi}=(1-\phi)H_{k+1}^{\text{DFP}}+\phi H_{k+1}^{\text{BFGS}}Hk+1ϕ​=(1−ϕ)Hk+1DFP​+ϕHk+1BFGS​

\qquad这类修正公式的全体,称为Broyden\text{Broyden}Broyden族。当 ϕ=0\phi=0ϕ=0 时为 DFP\text{DFP}DFP公式,当 ϕ=1\phi=1ϕ=1 时为 BFGS\text{BFGS}BFGS公式。

相关内容

热门资讯

法国兴业银行股价下跌3% 每经AI快讯,5月12日,法国兴业银行股价下跌3%。 每日经济新闻 【免责声明】本文仅代表作者本人观...
独家 | 低空经济,重磅收购发... 作者 | 铅笔道 惜文 编辑 | 铅笔道 邹蔚 王方 最近,低空经济赛道,发生一起重磅并购。低空经济...
购房收据挂失登报流程 购房收据挂失登报流程并不复杂,首先需要确认登报的具体要求和所需材料。登报是通过报纸等公开媒体发布声明...
【公告复盘】PCB+CPO+覆... 【A股收盘|沪指跌0.25% 半导体设备、特高压概念股活跃】四大股指今日收盘涨跌不一,沪指跌0.25...
原创 货... 导语:银行板块极致低估值隐含安全边际,且附带估值修复期权。 01 诸神的黄昏 货币基金和黄金,这些曾...
资本“救火”一年后,大润发的调... 出品 | 创业最前线 付艳翠 近期,随着CEO闪电失联与董事会主席“零元救火”的戏剧性一幕接连上演...
59岁浙江前首富直播间跳团舞,... 美特斯邦威曾是80、90后青春记忆里绕不开的符号,那句“不走寻常路”更是响彻街头巷尾。2008年上市...
周琦18+8杰曼三双 北京2-... 【搜狐体育战报】北京时间5月12日CBA季后赛,主场作战的北京北汽以88-73击败广东东阳光,北京首...
汽油价格持续攀升!美国4月CP... 受伊朗战争推动的汽油价格持续攀升,美国4月通胀继续加速。战争影响正在随着能源成本飙升而冲击美国经济。...
老铁流量见顶,快手要靠可灵20... 来源:市场资讯 (来源:野马财经) 可灵年入10亿,仅占快手的0.73%。 作者|刘钦文 编辑|高...
美国4月未季调CPI同比升3.... 美国4月未季调CPI同比升3.8%,前值升3.3%; (本文来自第一财经)
挪威财长:主权财富基金在道德撤... 来源:环球市场播报 当全球最大的挪威规模达2.2万亿美元的主权财富基金因伦理考量出售某公司股份时,...
1300亿,快手可灵酝酿“单飞... 来源:猎云精选,文/韩文静 AI视频生成赛道,从来不缺资本故事。 近日,快手旗下视频生成大模型“可灵...
光模块龙头股价一年涨超990%... 5月12日,光模块龙头中际旭创(300308.SZ)股价大涨,盘中突破1000元,成为继爱美客(30...
小红书或再回购期权,半年回购价... 小红书再传期权回购。近日,有消息称小红书开启了2026年第一轮期权回购,有离职员工爆料称最新的回购价...
重仓科技板块基金狂飙,超160... 打开基金账户,你的净值创新高了吗? 5月A股震荡攀高,市场情绪悄然回暖,公募基金的赚钱效应同步释放。...
突发!韩股跳水 韩股跳水。 5月12日盘中,韩国股市走低,截至发稿,韩国KOSPI指数转跌超3%,此前一度涨超2.4...
【IPO速递】东圣实业:布局磷... 5月12日,湖北东圣实业股份有限公司(下称“东圣实业”)首次向港股市场发起冲刺,计划登陆港交所主板,...
跨国巨头再现百亿美元级“天价”... 5月12日,恒瑞医药宣布,百时美施贵宝(BMS)与恒瑞医药达成授权协议,BMS将向恒瑞医药支付最高达...
焦点复盘沪指缩量调整跌0.25... 财联社5月12日讯,今日57股涨停,23股炸板,封板率71%。大唐发电5连板,通鼎互联、宝鼎科技4连...