计算梯度的三种方法: 数值法,解析法,反向传播法
一个简单的函数:
Python:
f(x,y,z)=(x+y)zf(x,y,z)=(x+y)z
# coding=gbk
"""
function : f(x,y,z) = (x+y)z
"""
# first method 解析法
def grad1(x,y,z):
dx = z
dy = z
dz = (x+y)
return (dx,dy,dz)
# second method 数值法
def grad2(x,y,z,epi):
# dx
fx1 = (x+epi+y)*z
fx2 = (x-epi+y)*z
dx = (fx1-fx2)/(2*epi)
# dy
fy1 = (x+y+epi)*z
fy2 = (x+y-epi)*z
dy = (fy1-fy2)/(2*epi)
# dz
fz1 = (x+y)*(z+epi)
fz2 = (x+y)*(z-epi)
dz = (fz1-fz2)/(2*epi)
return (dx,dy,dz)
# third method 反向传播法
def grad3(x,y,z):
# forward
p = x+y;
f = p*z;
# backward
dp = z
dz = p
dx = 1 * dp
dy = 1 * dp
return (dx,dy,dz)
print ("<df/dx,df/dy,df/dz>: %.2f %.2f %.2f"%(grad1(1,2,3)))
print ("<df/dx,df/dy,df/dz>: %.2f %.2f %.2f"%(grad2(1,2,3,1e-5)))
print ("<df/dx,df/dy,df/dz>: %.2f %.2f %.2f"%(grad3(1,2,3)))
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
结果:
<df/dx,df/dy,df/dz>: 3.00 3.00 3.00
<df/dx,df/dy,df/dz>: 3.00 3.00 3.00
<df/dx,df/dy,df/dz>: 3.00 3.00 3.00
- 1
- 2
- 3
复杂一点的函数
以Sigmoid 为例:
f(w,x)=11+e−(w0x0+w1x1+w2)f(w,x)=11+e−(w0x0+w1x1+w2)
上面的Sigmoid 函数是输入二维的情况。x=[x0x1]Tx=[x0x1]T
,w=[w0,w1]Tw=[w0,w1]T,w2=bw2=b
显然函数是一个复合函数,是简单函数:f(x)=1x,f(x)=ex,f(x)=ax,f(x)=c+xf(x)=1x,f(x)=ex,f(x)=ax,f(x)=c+x复合而成。
因此,我们可以写成: 波兰表达式树的形式。
扫描二维码关注公众号,回复:
1838473 查看本文章
这里我们只关心关于ww的梯度,我们将函数写为:
f(w)=11+e−(w0x0+w1x1+w2)f(w)=11+e−(w0x0+w1x1+w2)
Matlab:
clc;
%% 下面向量书写的格式不采用统一规范形式。例如全部采用列向量的形式等。
w = [2,-3,-3];
x = [-1,-2];
% 一般形式的反向传播
[dw0,dw1,dw2] = grad1(w(1),w(2),w(3),x(1),x(2));
fprintf('%.8f,%.8f,%.8f \n',dw0,dw1,dw2);
% 数值法
[dw0,dw1,dw2] = grad2(w(1),w(2),w(3),x(1),x(2),1e-5);
fprintf('%.8f,%.8f,%.8f \n',dw0,dw1,dw2);
% 技巧形式的反向传播
dw = grad3(w,x);
fprintf('%.8f,%.8f,%.8f \n',dw(1),dw(2),dw(3));
% 解析法
dw = grad4(w,x);
fprintf('%.8f,%.8f,%.8f \n',dw(1),dw(2),dw(3));
% 一般形式的反向传播
function [dw0,dw1,dw2] = grad1(w0,w1,w2,x0,x1)
% forward
p0 = -1*(w0*x0+w1*x1+w2);
p1 = exp(p0);
p2 = 1+p1;
p3 = 1/p2;
% backward
dp2 = (-1)*(p2^(-2));
dp1 = 1*dp2;
dp0 = dp1*exp(p0);
dw0 = dp0*(-x0);
dw1 = dp0*(-x1);
dw2 = dp0 *(-1);
end
% 数值法
function [dw0,dw1,dw2] = grad2(w0,w1,w2,x0,x1,epi)
% dw0
f1w0 = 1.0/(1+exp(-1*((w0+epi)*x0+w1*x1+w2)));
f2w0 = 1.0/(1+exp(-1*((w0-epi)*x0+w1*x1+w2)));
dw0 = (f1w0 - f2w0)/(2*epi);
% dw1
f1w1 = 1.0/(1+exp(-1*(w0*x0+(w1+epi)*x1+w2)));
f2w1 = 1.0/(1+exp(-1*(w0*x0+(w1-epi)*x1+w2)));
dw1 = (f1w1 - f2w1)/(2*epi);
% dw2
f1w2 = 1.0/(1+exp(-1*(w0*x0+w1*x1+(w2+epi))));
f2w2 = 1.0/(1+exp(-1*(w0*x0+w1*x1+(w2-epi))));
dw2 = (f1w2 - f2w2)/(2*epi);
end
% 技巧形式的反向传播
% 利用sigmoid 函数的技巧: sigma(x)' = (1-sigma(x))*sigma(x)
function dw = grad3(w,x)
% forward
dot = w(1)*x(1) + w(2)*x(2) + w(3);
f = 1.0/(1+exp(-dot));
% backward
ddot = (1-f)*f;
dx = [w(1)*ddot,w(2)*ddot]; % 不输出
dw = [x(1)*ddot,x(2)*ddot,1.0*ddot];
end
% 解析法
% f(w)' = 1/(1+e^()) * e^() * (-x0)
function dw = grad4(w,x)
x = [x 1];
dw = (-1)*(1+exp(- w*x'))^(-2)*exp(- w*x').*(-x);
end
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
结果:
-0.19661193,-0.39322387,0.19661193
-0.19661193,-0.39322387,0.19661193
-0.19661193,-0.39322387,0.19661193
-0.19661193,-0.39322387,0.19661193
- 1
- 2
- 3
- 4
更复杂一些的函数
如下函数:
f(x,y)=x+σ(y)σ(x)+(x+y)2f(x,y)=x+σ(y)σ(x)+(x+y)2
其中
σ(x)=11+e−xσ(x)=11+e−x
上述公式写出解析形式的表达式,似乎吃力。
略… 请参考[参考文献].
参考文献:
- https://zhuanlan.zhihu.com/p/21407711?refer=intelligentunit [CS231n课程笔记翻译:反向传播笔记]
- http://cs231n.github.io/optimization-2/ [CS231n backpropagation]