Keras版Faster-RCNN代码学习(IOU,RPN)1
Keras版Faster-RCNN代码学习(Batch Normalization)2
Keras版Faster-RCNN代码学习(loss,xml解析)3
Keras版Faster-RCNN代码学习(roipooling resnet/vgg)4
Keras版Faster-RCNN代码学习(measure_map,train/test)5
Batch Normalization介绍
参考文献:Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
参考知乎魏秀参给出的答案:深度学习中 Batch Normalization为什么效果好?
根据我的了解,Batch Normalization在卷积神经网络中,是对每个核卷积出来的一个batchsize中所有图片的feature map上的值进行归一化后再进行激活,所以叫批标准化。
【Tips】BN层的作用
1)加速收敛 (2)控制过拟合,可以少用或不用Dropout和正则 (3)降低网络对初始化权重不敏感 (4)允许使用较大的学习率
可以把它看做一个自适应重参数化的方法,主要解决训练非常深的模型的困难。当然也不是万能的,对RNN来说,Batch Normalization并没有起到好的效果。
主要是把BN变换,置于网络激活函数层的前面。在没有采用BN的时候,激活函数层是这样的:
Y=g(WX+b)
也就是我们希望一个激活函数,比如sigmoid函数s(x)的自变量x是经过BN处理后的结果。因此前向传导的计算公式就应该是:
Y=g(BN(WX+b))
其实因为偏置参数b经过BN层后其实是没有用的,最后也会被均值归一化,当然BN层后面还有个β参数作为偏置项,所以b这个参数就可以不用了。因此最后把BN层+激活函数层就变成了:
Y=g(BN(WX))
Keras中的Batch Normalization
keras.layers.normalization.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', moving_mean_initializer='zeros', moving_variance_initializer='ones', beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None)
该层在每个batch上将前一层的激活值重新规范化,即使得其输出数据的均值接近0,其标准差接近1
参数:
axis: 整数,指定要规范化的轴,通常为特征轴。例如在进行data_format="channels_first的2D卷积后,一般会设axis=1。
momentum: 动态均值的动量
epsilon:大于0的小浮点数,用于防止除0错误
center: 若设为True,将会将beta作为偏置加上去,否则忽略参数beta
scale: 若设为True,则会乘以gamma,否则不使用gamma。当下一层是线性的时,可以设False,因为scaling的操作将被下一层执行。
beta_initializer:beta权重的初始方法
gamma_initializer: gamma的初始化方法
moving_mean_initializer: 动态均值的初始化方法
moving_variance_initializer: 动态方差的初始化方法
beta_regularizer: 可选的beta正则
gamma_regularizer: 可选的gamma正则
beta_constraint: 可选的beta约束
gamma_constraint: 可选的gamma约束
FixedBatchNormalization.py
from keras.engine import Layer, InputSpec
from keras import initializers, regularizers
from keras import backend as K
class FixedBatchNormalization(Layer):
#定义BN所需参数
def __init__(self, epsilon=1e-3, axis=-1,
weights=None, beta_init='zero', gamma_init='one',
gamma_regularizer=None, beta_regularizer=None, **kwargs):
self.supports_masking = True
self.beta_init = initializers.get(beta_init)
self.gamma_init = initializers.get(gamma_init)
self.epsilon = epsilon
self.axis = axis
self.gamma_regularizer = regularizers.get(gamma_regularizer)
self.beta_regularizer = regularizers.get(beta_regularizer)
self.initial_weights = weights
super(FixedBatchNormalization, self).__init__(**kwargs)
#定义BN权重不可训练,参数固定
def build(self, input_shape):
self.input_spec = [InputSpec(shape=input_shape)]
shape = (input_shape[self.axis],)
self.gamma = self.add_weight(shape,
initializer=self.gamma_init,
regularizer=self.gamma_regularizer,
name='{}_gamma'.format(self.name),
trainable=False)
self.beta = self.add_weight(shape,
initializer=self.beta_init,
regularizer=self.beta_regularizer,
name='{}_beta'.format(self.name),
trainable=False)
self.running_mean = self.add_weight(shape, initializer='zero',
name='{}_running_mean'.format(self.name),
trainable=False)
self.running_std = self.add_weight(shape, initializer='one',
name='{}_running_std'.format(self.name),
trainable=False)
if self.initial_weights is not None:
self.set_weights(self.initial_weights)
del self.initial_weights
self.built = True
#定义BN方法
def call(self, x, mask=None):
assert self.built, 'Layer must be built before being called'
input_shape = K.int_shape(x)
reduction_axes = list(range(len(input_shape)))
del reduction_axes[self.axis]
broadcast_shape = [1] * len(input_shape)
broadcast_shape[self.axis] = input_shape[self.axis]
#判断是否对axis是否为-1,即channel_last,对数据BN
if sorted(reduction_axes) == range(K.ndim(x))[:-1]:
x_normed = K.batch_normalization(
x, self.running_mean, self.running_std,
self.beta, self.gamma,
epsilon=self.epsilon)
else:
# need broadcasting
broadcast_running_mean = K.reshape(self.running_mean, broadcast_shape)
broadcast_running_std = K.reshape(self.running_std, broadcast_shape)
broadcast_beta = K.reshape(self.beta, broadcast_shape)
broadcast_gamma = K.reshape(self.gamma, broadcast_shape)
x_normed = K.batch_normalization(
x, broadcast_running_mean, broadcast_running_std,
broadcast_beta, broadcast_gamma,
epsilon=self.epsilon)
return x_normed
def get_config(self):
config = {'epsilon': self.epsilon,
'axis': self.axis,
'gamma_regularizer': self.gamma_regularizer.get_config() if self.gamma_regularizer else None,
'beta_regularizer': self.beta_regularizer.get_config() if self.beta_regularizer else None}
base_config = super(FixedBatchNormalization, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
定义了类似Keras中的Batch Normalization的参数,在后续的resnet有使用如:
x = Convolution2D(nb_filter1, (1, 1), name=conv_name_base + '2a', trainable=trainable)(input_tensor)
x = FixedBatchNormalization(axis=bn_axis, name=bn_name_base + '2a')(x)
x = Activation('relu')(x)
先卷积不激活,再分别对每个卷积核出来的结果batch_normalization(值已经固定,个人感觉只是进行了相应的线性变化),再进行relu激活