DeepLearning-L3-CNN简介:卷积、池化、全连接

1. CNN组成

(1)Types of layer in a CNN

  • Convolution(CONV)
    - Zero Padding
    - Convolve window
    - Convolution forward
    - Convolution backward
  • Pooling(POOL)
    - Pooling forward
    - Create mask
    - Distribute value
    - Pooling backward
  • Fully connected (FC)

  • CONV + POOL = 特征提取
  • FC=分类器

(2) Why CNN

  • 参数共享(Parameter sharing)
    • A feature detector (such as a vertical edge detector) that’s useful in one part of the image is probably useful in another part of the image.
  • 局部连接(Sparsity of connections)
    • In each layer, each output value depends only on a small number of inputs.

通过卷积可以捕获图像的局部信息,多层卷积层堆叠提取到特征逐渐由边缘、纹理、方向等低层级特征过渡到高层级特征。卷积层中的卷积实质是输入和权值的互相关(cross-correlation)函数,和数学中的卷积没有关系。

2. Convolutional Layer

(1)CNN Unit

在这里插入图片描述
通过设计特定的滤波器(filter,也称为kernel),与图片进行卷积,可以识别出图片中的某些特征,比如边界,从而实现边界检测(edge detection)等功能。滤波器对于原输入图片来说,就是个特征探测器,它的元
素的值是通过训练得到。

注意,上图 3 × 3 3 \times 3 矩阵每次只“看见”输入图片的一部分,即局部感受野。

CNN就是通过一个个的filter,不断提取特征,从局部特征到总体特征,从而进行图像识别等。每个filter中的数字(参数)可以通过对大量数据的训练得到。

(2)Zero-Padding


The main benefits of padding are the following:

  • It allows you to use a CONV layer without necessarily shrinking the height and width of the volumes. This is important for building deeper networks, since otherwise the height/width would shrink as you go to deeper layers. An important special case is the “same” convolution, in which the height/width is exactly preserved after one layer.

  • It helps us keep more of the information at the border of an image. Without padding, very few values at the next layer would be affected by pixels as the edges of an image.

(3)Forward pass

Layer l l 的卷积层由如下四个量确定:

  • f [ l ] f^{[l]} : 感受野(receptive field filter size)
  • p [ l ] p^{[l]} : 零填补(padding size)
  • s [ l ] s^{[l]} : 步长(stride size)
  • n C [ l ] n_C^{[l]} : Layer l l 的滤波器个数,又称深度。一个滤波器对输入进行卷积得到一个二维的特征图(feature map),多个滤波器对输入进行卷积得到多个特征图

Input: m × n H [ l 1 ] × n W [ l 1 ] × n C [ l 1 ] m \times n_H^{[l-1]}\times n_W^{[l-1]}\times n_C^{[l-1]}
Filter: f [ l ] × f [ l ] × n C [ l 1 ] f^{[l]}\times f^{[l]}\times n_C^{[l-1]}
Weights: f [ l ] × f [ l ] × n C [ l 1 ] × n C [ l ] f^{[l]}\times f^{[l]}\times n_C^{[l-1]} \times n_C^{[l]}
Bias: n C [ l ] n_C^{[l]}
Activation: m × n H [ l ] × n W [ l ] × n C [ l ] m \times n_H^{[l]}\times n_W^{[l]}\times n_C^{[l]}
Output: m × n H [ l ] × n W [ l ] × n C [ l ] m \times n_H^{[l]}\times n_W^{[l]}\times n_C^{[l]}

n H [ l ] = n H [ l 1 ] f [ l ] + 2 p [ l ] s [ l ] + 1 n_H^{[l]} = \lfloor \frac{n_H^{[l-1]} - f^{[l]} + 2 p^{[l]}}{s^{[l]}} \rfloor +1

n W [ l ] = n W [ l 1 ] f [ l ] + 2 p [ l ] s [ l ] + 1 n_W^{[l]} = \lfloor \frac{n_W^{[l-1]} - f^{[l]} + 2 p^{[l]}}{s^{[l]}} \rfloor +1

n C [ l ] = number of filters of layer  l n_C^{[l]} = \text{number of filters of layer } l

e.g.

  • 输入图片: X ( 8 , 8 , 3 ) X(8,8,3) ,
  • 使用4个filters ( 3 , 3 , 3 ) (3,3,3) 进行卷积( f = 3 , s = 1 , p = 0 f=3, s=1, p=0 ),第一层神经网络的参数 W 1 ( 3 , 3 , 3 , 4 ) W1(3,3,3,4) ,得到输出 Z 1 ( 6 , 6 , 4 ) Z1(6,6,4)
  • 经过激活函数后, Z 1 Z1 变为 A 1 ( 6 , 6 , 4 ) A1(6,6,4)

3. Pooling layer

池化又称亚采样或下采样,降低了每个特征映射的维度,但仍保留了最重要的信息。

(1)WHY Pooling

The pooling (POOL) layer reduces the height and width of the input.

  • 减小特征图大小,汇合层对空间局部区域进行下采样,使下一层需要的参数量和计算量减少,并降低过拟合风险。
  • 增加特征平移不变性,make feature detectors more invariant to its position in the input。
  • 带来非线性,近年来多使用全局平均汇合(global average pooling)

(2) Two types of pooling layers

  • Max-pooling layer: slides an ( f , f f, f ) window over the input and stores the max value of the window in the output.

  • Average-pooling layer: slides an ( f , f f, f ) window over the input and stores the average value of the window in the output.
    在这里插入图片描述
    These pooling layers have no parameters for backpropagation to train.
    However, they have hyperparameters such as the window size f f . This specifies the height and width of the f × f f \times f window you would compute a max or average over.

(3)Forward Pooling

n H = n H p r e v f s t r i d e + 1 n_H = \lfloor \frac{n_{H_{prev}} - f}{stride} \rfloor +1

n W = n W p r e v f s t r i d e + 1 n_W = \lfloor \frac{n_{W_{prev}} - f}{stride} \rfloor +1

n C = n C p r e v n_C = n_{C_{prev}}

4. FC Layer

全连接表示上一层的每一个神经元,都和下一层的每一个神经元是相互连接的,使用softmax激活函数作为输出层。

卷积层和池化层的输出代表了输入图像的高级特征,全连接层的目的就是类别基于训练集用这些特征进行分类。
除了分类以外,加入全连接层也是学习特征之间非线性组合的有效办法。卷积层和池化层提取出来的特征很好,但是如果考虑这些特征之间的组合,就更好了。

猜你喜欢

转载自blog.csdn.net/apr15/article/details/106310660