
1. CNN组成

(1)Types of layer in a CNN

  • Convolution(CONV)
    - Zero Padding
    - Convolve window
    - Convolution forward
    - Convolution backward
  • Pooling(POOL)
    - Pooling forward
    - Create mask
    - Distribute value
    - Pooling backward
  • Fully connected (FC)

  • CONV + POOL = 特征提取
  • FC=分类器

(2) Why CNN

  • 参数共享(Parameter sharing)
    • A feature detector (such as a vertical edge detector) that’s useful in one part of the image is probably useful in another part of the image.
  • 局部连接(Sparsity of connections)
    • In each layer, each output value depends only on a small number of inputs.


2. Convolutional Layer

(1)CNN Unit

通过设计特定的滤波器(filter,也称为kernel),与图片进行卷积,可以识别出图片中的某些特征,比如边界,从而实现边界检测(edge detection)等功能。滤波器对于原输入图片来说,就是个特征探测器,它的元

注意,上图 3 × 3 3 \times 3 矩阵每次只“看见”输入图片的一部分,即局部感受野。



The main benefits of padding are the following:

  • It allows you to use a CONV layer without necessarily shrinking the height and width of the volumes. This is important for building deeper networks, since otherwise the height/width would shrink as you go to deeper layers. An important special case is the “same” convolution, in which the height/width is exactly preserved after one layer.

  • It helps us keep more of the information at the border of an image. Without padding, very few values at the next layer would be affected by pixels as the edges of an image.

(3)Forward pass

Layer l l 的卷积层由如下四个量确定:

  • f [ l ] f^{[l]} : 感受野(receptive field filter size)
  • p [ l ] p^{[l]} : 零填补(padding size)
  • s [ l ] s^{[l]} : 步长(stride size)
  • n C [ l ] n_C^{[l]} : Layer l l 的滤波器个数,又称深度。一个滤波器对输入进行卷积得到一个二维的特征图(feature map),多个滤波器对输入进行卷积得到多个特征图

Input: m × n H [ l 1 ] × n W [ l 1 ] × n C [ l 1 ] m \times n_H^{[l-1]}\times n_W^{[l-1]}\times n_C^{[l-1]}
Filter: f [ l ] × f [ l ] × n C [ l 1 ] f^{[l]}\times f^{[l]}\times n_C^{[l-1]}
Weights: f [ l ] × f [ l ] × n C [ l 1 ] × n C [ l ] f^{[l]}\times f^{[l]}\times n_C^{[l-1]} \times n_C^{[l]}
Bias: n C [ l ] n_C^{[l]}
Activation: m × n H [ l ] × n W [ l ] × n C [ l ] m \times n_H^{[l]}\times n_W^{[l]}\times n_C^{[l]}
Output: m × n H [ l ] × n W [ l ] × n C [ l ] m \times n_H^{[l]}\times n_W^{[l]}\times n_C^{[l]}

n H [ l ] = n H [ l 1 ] f [ l ] + 2 p [ l ] s [ l ] + 1 n_H^{[l]} = \lfloor \frac{n_H^{[l-1]} - f^{[l]} + 2 p^{[l]}}{s^{[l]}} \rfloor +1

n W [ l ] = n W [ l 1 ] f [ l ] + 2 p [ l ] s [ l ] + 1 n_W^{[l]} = \lfloor \frac{n_W^{[l-1]} - f^{[l]} + 2 p^{[l]}}{s^{[l]}} \rfloor +1

n C [ l ] = number of filters of layer  l n_C^{[l]} = \text{number of filters of layer } l


  • 输入图片: X ( 8 , 8 , 3 ) X(8,8,3) ,
  • 使用4个filters ( 3 , 3 , 3 ) (3,3,3) 进行卷积( f = 3 , s = 1 , p = 0 f=3, s=1, p=0 ),第一层神经网络的参数 W 1 ( 3 , 3 , 3 , 4 ) W1(3,3,3,4) ,得到输出 Z 1 ( 6 , 6 , 4 ) Z1(6,6,4)
  • 经过激活函数后, Z 1 Z1 变为 A 1 ( 6 , 6 , 4 ) A1(6,6,4)

3. Pooling layer


(1)WHY Pooling

The pooling (POOL) layer reduces the height and width of the input.

  • 减小特征图大小,汇合层对空间局部区域进行下采样,使下一层需要的参数量和计算量减少,并降低过拟合风险。
  • 增加特征平移不变性,make feature detectors more invariant to its position in the input。
  • 带来非线性,近年来多使用全局平均汇合(global average pooling)

(2) Two types of pooling layers

  • Max-pooling layer: slides an ( f , f f, f ) window over the input and stores the max value of the window in the output.

  • Average-pooling layer: slides an ( f , f f, f ) window over the input and stores the average value of the window in the output.
    These pooling layers have no parameters for backpropagation to train.
    However, they have hyperparameters such as the window size f f . This specifies the height and width of the f × f f \times f window you would compute a max or average over.

(3)Forward Pooling

n H = n H p r e v f s t r i d e + 1 n_H = \lfloor \frac{n_{H_{prev}} - f}{stride} \rfloor +1

n W = n W p r e v f s t r i d e + 1 n_W = \lfloor \frac{n_{W_{prev}} - f}{stride} \rfloor +1

n C = n C p r e v n_C = n_{C_{prev}}

4. FC Layer



