BP神经网络python算法实现
- 误差逆传播(errorBackPropagation)算法,简称BP算法
例.用BP神经网络解决异或问题
异或运算结果是0或者1,属于分类问题。
输入数据为 ( 0 , 0 ) , ( 0 , 1 ) , ( 1 , 0 ) , ( 1 , 1 ) (0,0),(0,1),(1,0),(1,1) (0,0),(0,1),(1,0),(1,1)输出为0,1,1,0
偏置量设置为 x 0 = 1 x_0 = 1 x0=1,则输入神经元为 x 0 , x 1 , x 2 x_0,x_1,x_2 x0,x1,x2,隐藏层设置10个神经元,输入层到隐藏层的权重系数为 v v v,隐藏层到输出层权重系数为 w w w,通过BP神经网络训练得到 v , w v,w v,w的取值。
训练目标:让代价函数取得最小值。
代价函数/目标函数/损失函数: E = 1 2 ( y ^ − y ) 2 E = \dfrac{1}{2}(\hat y-y)^2 E=21(y^−y)2
输入: X = ( x 0 , x 1 , x 2 ) X = (x_0,x_1,x_2) X=(x0,x1,x2)
输出/label Y = [ 0 , 1 , 1 , 0 ] T Y = [0,1,1,0]^T Y=[0,1,1,0]T
X = np.array([[1,0,0],
[1,0,1],
[1,1,0],
[1,1,1]])
Y = np.array([[0],
[1],
[1],
[0]])
W , V W,V W,V都以随机数初始化,同时设置学习率
# 生成 -1~1的随机数
V = np.random.random([3,10]) * 2 - 1
W = np.random.random([10,1]) * 2 - 1
# 学习率
lr = 0.21
δ \delta δ学习规则:
激活函数:sigmoid函数 f ( x ) = 1 1 − e − x f(x) = \dfrac{1}{1-e^{-x}} f(x)=1−e−x1,求导得 f ′ ( x ) = f ( 1 − f ) f'(x) = f(1-f) f′(x)=f(1−f)
最后一层学习信号 δ = ( y ^ − y ) f ′ ( L 1 W ) \delta = (\hat y-y)f'(L_1W) δ=(y^−y)f′(L1W)
前一层学习信号 δ l = δ l + 1 W T f ′ ( X V ) \delta^l = \delta^{l+1}W^Tf'(XV) δl=δl+1WTf′(XV)
Δ W l = − η ∂ E ∂ W l = η X T δ l \Delta W^l = -\eta \dfrac{\partial E}{\partial W^l} = \eta X^T\delta^l ΔWl=−η∂Wl∂E=ηXTδl(梯度下降法, Δ W l \Delta W^l ΔWl是第 l l l层权重变化, η \eta η是学习率, δ l \delta^l δl是第 l l l层学习信号)
尾层学习信号与输入信号和输入权重有关,非尾层学习信号与它后一层(右一层)学习信号和后一层输入权重有关,因此要从最后一层信号往前算。(误差逆传播)
输入权值变化受到输入信号和当前层的学习信号影响。 Δ V \Delta V ΔV是第一层输入权重变化,与输入 X X X和第一层学习信号 δ 1 \delta^1 δ1有关, Δ W \Delta W ΔW是第二层输入权重变化,与输入 L 1 L_1 L1和第一层学习信号 δ 2 \delta^2 δ2有关。
# 权值调整函数
def update():
global V,W
# 每一层输出
L1 = sigmoid(np.dot(X,V))
L2 = sigmoid(np.dot(L1,W))
# 每一层的学习信号
L2_delta = (Y - L2)*dsigmoid(np.dot(L1,W))
L1_delta = np.dot(L2_delta,W.T)*dsigmoid(np.dot(X,V))
# 求每一层权值的变化
delta_W = lr*np.dot(L1.T,L2_delta)
delta_V = lr*np.dot(X.T,L1_delta)
W = W + delta_W
V = V + delta_V
输入输出都是矩阵,为了便于理解代码,这里将部分矩阵运算关系写出来
第一层输入: X = ( x 0 , x 1 , x 2 ) X = (x_0,x_1,x_2) X=(x0,x1,x2)
第一层输出/第二层输入: L 1 = ( l 1 , l 2 , . . . , l 10 ) L_1 = (l_1,l_2,...,l_{10}) L1=(l1,l2,...,l10)
第二次层输出: L 2 = ( y ^ 1 , y ^ 2 , y ^ 3 , y ^ 4 ) L_2 = (\hat y_1,\hat y_2,\hat y_3,\hat y_4) L2=(y^1,y^2,y^3,y^4)
权重 V = ( v 1 , v 2 , v 3 ) T V = (v_1,v_2,v_3)^T V=(v1,v2,v3)T
权重 W = ( w 1 , w 2 , . . . , w 10 ) T W = (w_1,w_2,...,w_{10})^T W=(w1,w2,...,w10)T
L 1 = ( l 1 , l 2 , . . . , l 10 ) = f ( x 0 v 0 + x 1 v 1 + x 2 v 2 ) = f ( X V ) L_1 = (l_1,l_2,...,l_{10}) = f(x_0v_0+x_1v_1+x_2v_2) = f(XV) L1=(l1,l2,...,l10)=f(x0v0+x1v1+x2v2)=f(XV)
L 2 = ( y ^ 1 , y ^ 2 , y ^ 3 , y ^ 4 ) = f ( l 1 w 1 + l 2 w 2 + . . . + l 10 w 10 ) = f ( L 1 W ) L_2 = (\hat y_1,\hat y_2,\hat y_3,\hat y_4) = f(l_1w_1+l_2w_2+...+l_{10}w_{10})= f(L_1W) L2=(y^1,y^2,y^3,y^4)=f(l1w1+l2w2+...+l10w10)=f(L1W)
训练并判断
loss为损失函数的向量,这里进行了一个求均值的操作
for i in range(10001):
update()
if i%500 == 0:
L1 = sigmoid(np.dot(X,V))
L2 = sigmoid(np.dot(L1,W))
loss = np.mean(np.square(Y-L2)/2)
print("loss:",loss)
print(L2)
def judge(x):
if x >= 0.5:
return 1
else:
return 0
for i in map(judge,L2):
print(i)
输出训练结果
loss: 0.1503049849879402
loss: 0.11292215088746196
loss: 0.055260106483890375
loss: 0.012689599356839564
loss: 0.005019800627645192
loss: 0.002838688933267325
loss: 0.0019049200410914762
loss: 0.0014063388085483982
loss: 0.0011023445653717398
loss: 0.000900027309043023
loss: 0.0007567885904853457
loss: 0.000650615357323275
loss: 0.0005690855354596988
loss: 0.0005047001580278678
loss: 0.0004526842864463839
loss: 0.00040986323784031537
loss: 0.0003740495228501279
loss: 0.0003436899378899049
loss: 0.0003176530747398405
loss: 0.00029509639853180273
loss: 0.0002753803240158825
[[0.01758859]
[0.97396267]
[0.97818043]
[0.02719647]]
0
1
1
0
分类正确
完整代码如下
import numpy as np
X = np.array([[1,0,0],
[1,0,1],
[1,1,0],
[1,1,1]])
Y = np.array([[0],
[1],
[1],
[0]])
# 3-10-1
# 生成 -1~1的随机数
V = np.random.random([3,10]) * 2 - 1
W = np.random.random([10,1]) * 2 - 1
# 学习率
lr = 0.21
def sigmoid(x):
return 1/(1+np.exp(-x))
def dsigmoid(x):
s = 1/(1+np.exp(-x))
return s*(1-s)
# 权值调整函数
def update():
global V,W
# 每一层输出
L1 = sigmoid(np.dot(X,V))
L2 = sigmoid(np.dot(L1,W))
# 每一层的学习信号
L2_delta = (Y - L2)*dsigmoid(np.dot(L1,W))
L1_delta = np.dot(L2_delta,W.T)*dsigmoid(np.dot(X,V))
# 求每一层权值的变化
delta_W = lr*np.dot(L1.T,L2_delta)
delta_V = lr*np.dot(X.T,L1_delta)
W = W + delta_W
V = V + delta_V
for i in range(10001):
update()
if i%500 == 0:
L1 = sigmoid(np.dot(X,V))
L2 = sigmoid(np.dot(L1,W))
loss = np.mean(np.square(Y-L2)/2)
print("loss:",loss)
print(L2)
def judge(x):
if x >= 0.5:
return 1
else:
return 0
# map函数可以将L2带入judge中运算
for i in map(judge,L2):
print(i)