符号表示
- Superscript
[l] denotes a quantity associated with the
lth layer.
- Example:
a[L] is the
Lth layer activation.
W[L] and
b[L] are the
Lth layer parameters.
- Superscript
(i) denotes a quantity associated with the
ith example.
- Example:
x(i) is the
ith training example.
- Lowerscript
i denotes the
ith entry of a vector.
- Example:
ai[l] denotes the
ith entry of the
lth layer’s activations).
-
∂a∂J=da for any variable
a.
1. 一层神经网络
逻辑回归实际就是一层神经网络
x=(x1,x2)T,w=(w1,w2)T
z=wTx+b=w1x1+w2x2+b则
∂w∂z=x=(x1,x2)T
∂b∂z=1
a=σ(z)=1+e−z1,则
∂z∂a=a(1−a)
L(a,y)=−[yloga+(1−y)log(1−a)],则
∂a∂L=−ay+1−a1−y=a(1−a)a−y根据链式法则:
∂w∂L=∂a∂L∂z∂a∂w∂z=(a−y)x
∂b∂L=∂a∂L∂z∂a∂b∂z=(a−y)
For
ith training example
x(i)(i=1,...,m):
z(i)=wTx(i)+b
y^(i)=a(i)=σ(z(i))
L(a(i),y(i))=−[y(i)loga(i)+(1−y(i))log(1−a(i))]
For
X=(x(0),x(1),...,x(m))
Z=wTX+b,Z=(z(0),z(1),...,z(m))
Y^=A=σ(Z),A=(a(0),a(1),...,a(m))
J=m1i=1∑mL(a(i),y(i))=−m1[i=1∑my(i)loga(i)+(1−y(i))log(1−a(i))]则
∂w∂J=m1X(A−Y)T
∂b∂J=m1i=1∑m(a(i)−y(i))
2. 两层神经网络
(1)Forward Propagation
Input Layer
x=(x1,x2)T
Hidden Layer
z1[1]=w1[1]Tx+b1[1]a1[1]=g(z1[1])
z2[1]=w2[1]Tx+b2[1]a2[1]=g(z2[1])
z3[1]=w3[1]Tx+b3[1]a3[1]=g(z3[1])
z4[1]=w4[1]Tx+b4[1]a4[1]=g(z4[1])
令
z[1]=(z1[1],z2[1],z3[1],z4[1])T,W[1]=(w1[1]T,w2[1]T,w3[1]T,w4[1]T)T,b[1]=(b1[1],b2[1],b3[1],b4[1])T
则
z[1]=W[1]x+b[1]a[1]=g(z[1])
Output Layer
z[2]=W[2]a[1]+b[2]a[2]=σ(z[2])
For
x(i)(i=1,...,m):
z[1](i)=W[1]x(i)+b[1]a[1](i)=g(z[1](i))
z[2](i)=W[2]a[1](i)+b[2]y^(i)=a[2](i)=σ(z[2](i))
yprediction(i)={1ify^(i)>0.50otherwise
For
X=(x(0),x(1),...,x(m)):
Z[1]=W[1]X+b[1]A[1]=g(Z[1])
Z[2]=W[2]A[1]+b[2]Y^=A[2]=σ(Z[2])
其中,
Z[l]=(z[l](1),z[l](2),...,z[l](m))A[l]=(a[l](1),a[l](2),...,a[l](m))
J=−m1i=0∑m(y(i)log(a[2](i))+(1−y(i))log(1−a[2](i)))
(2) Borward Propagation
3. 三层神经网络
3_Layers_NN : LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID
m=5,L=3,n[0]=n[x]=3,n[1]=2,n[2]=3,n[3]=1
(1) Forward Propatation
J=−m1i=0∑m(y(i)log(a[3](i))+(1−y(i))log(1−a[3](i)))
(2) Borward Propatation
Layer 3:
dZ[3]=A[3]−Y
dW[3]=m1dZ[3]A[2]T
db[3]=m1i=1∑mdz[3](i)=m1np.sum(dZ[3],axis=1,keepdim=True)
Layer 2:
dA[2]=W[3]TdZ[3]
dZ[2]=dA[2]∗g′(Z[2])
dW[2]=m1dZ[2]A[1]T
db[2]=m1i=1∑mdz[2](i)=m1np.sum(dZ[2],axis=1,keepdim=True)
Layer 1:
dA[1]=W[2]TdZ[2]
dZ[1]=dA[1]∗g′(Z[1])
dW[1]=m1dZ[1]A[0]T
db[1]=m1i=1∑mdz[1](i)=m1np.sum(dZ[1],axis=1,keepdim=True)
4. L_Layers_NN
L_Layers_NN : [LINEAR -> RELU]
× (L-1) -> LINEAR -> SIGMOID
For layer
l(l=1,2...,L),
Forward Propatation And Backward Propagation