神经网络参数的各种初始化算法

本文所采用的数据集为UCI 鲍鱼年龄预测数据集，网络模型为8层的全连接神经网络。

1. 实验步骤：

a. 基于不同的权重初始化方式初始化各层权重；

b. 以直方图的形式查看每层输入给激活函数(线性运算后)的数据分布;

2. 正态分布初始化权重

a. 权重更新

weight = np.random.randn(in_node, out_node)

使用默认的均值和方差

b. 实验结论

如图所示，随着网络层的加深，每层的激活函数值有较多的比例接近1和-1，这使得网络的更新梯度太小而无法更新参数。这是本人所使用此鲍鱼年龄预测数据集的实验，您也可以使用别的数据集进行测试，虽实验图像有所不同，但结论相同。

c. 原因分析

若输入和权重都服从均值为0，方差为1的正态分布时，则x*w也服从均值为0，方差为1的正态分布，想像其概率密度曲线，大部分的数据都在[-1,1]之间，这里是有一个比例的。

若x*w+x*w，则服从均值为0，方差为2的正态分布，此时概率密度曲线会变宽，大部分的数据都在[-2,2]之间。而神经网络的每层线性操作就是对正态分布方差的累加，即前层的神经元有多少个，则累加多少次，最终导致线性操作的结果的概率密度曲线非常宽，所以很容易进入激活函数饱和区。

4. Xavier初始化权重

a. 权重更新

weight = np.random.randn(in_node, out_node)/np.sqrt(in_node)

Tensorflow API:

tf.contrib.layers.xavier_initializer_conv2d

b. 实验结论

如果所示，随着网络层的加深，每层的激活函数值仍能保持着正态分布，对于tanh激活函数可以较快的进行梯度更新。

c. 原因分析

基于正态分布初始化的问题，此方法则保持输入和输出的方差一致。

公式证明：

5. MSRA初始化权重

Xavier论文中使用的激活函数是tanh函数，而神经网络中使用较广泛的是relu激活函数，所以提出此方法。

a. 权重更新

weight = np.random.randn(in_node, out_node)/np.sqrt(in_node/2)

b. 实验结论

图1为使用Xavier relu激活函数，图2为使用He relu激活函数

如图所示，当使用Xavier初始化，relu作为激活函数时，会有部分数据进入relu死区，从而无法更新，而He初始化则不会出现此问题。

c. 原因分析

He初始化的思想是假设前层神经元有一半是不被激活的，所以为了保持方差累加之后，输入和输出仍能服从相同的分布，则在输入节点的数量上再除以2。

6. MSRA初始化权重

a. 权重初始化

weight = np.random.normal(loc=0.0, scale=np.sqrt(2/in_node), size=[in_node, out_node])

Tensorflow API:

tf.contrib.layers.variance_scaling_initializer

b. 实验结论

如图所示，两种MSRA初始化方式的结论是相同的，MSRA初始化修改的正态分布的方差，而上面的那种修改的是输出之后的方差。

c.公式推导

7. 均分分布初始化

a. 权重初始化

weight = np.random.uniform(0,1,in_node*out_node).reshape(in_node, out_node)

b. 实验结论

8. BilinearFiller 初始化(双线性插值初始化)

反卷积神经网络使用此初始化

9. Fine-tuning

也可以称之为有监督预训练，其思想是将源领域模型的权重迁移至目标领域，让此权重作为目标领域训练模型的初始权重。但是，使用此方法的前提条件是源领域数据集与目标领域数据集需服从独立同分布条件，或者具有较大的领域相似性。

10. 实验代码

import numpy as np
import matplotlib.pyplot as plt

# 读取数据集
def load_dataset(file_path):
    x = []
    y = []
    with open(file_path, 'r') as file:
        lines = file.readlines()
    for line in lines:
        one_line = line.strip().split('\t')
        one_line = [float(i) for i in one_line]
        x.append(one_line[:-1])
        y.append(one_line[-1])
    return np.array(x), np.array(y)

# 选择权重参数初始化的方法
def select_init_weight_method(in_node, out_node, init_method):
    np.random.seed(1)
    weight = 0
    # 均值为0，方差为1的正态分布初始化
    if init_method=="random_normal":
        weight = np.random.randn(in_node, out_node)
    # Xavier初始化
    elif init_method=="Xavier":
        weight = np.random.randn(in_node, out_node)/np.sqrt(in_node)
    # He 初始化
    elif init_method=="he_init":
        weight = np.random.randn(in_node, out_node)/np.sqrt(in_node/2)
    # 均匀分布初始化
    elif init_method=="uniform_normal":
        weight = np.random.uniform(0,1,in_node*out_node).reshape(in_node, out_node)
    # MSRA初始化
    elif init_method=="MSRA_init":
        weight = np.random.normal(loc=0.0, scale=np.sqrt(2/in_node), size=[in_node, out_node])
    return np.array(weight,dtype=np.float32)

# 初始化每层的权重参数，保存在字典里
def init_layers_weight(init_method):
    weights = {}
    weights['w'+str(1)] = select_init_weight_method(8, 16, init_method)
    weights['w'+str(2)] = select_init_weight_method(16, 16, init_method)
    weights['w'+str(3)] = select_init_weight_method(16, 16, init_method)
    weights['w'+str(4)] = select_init_weight_method(16, 16, init_method)
    weights['w'+str(5)] = select_init_weight_method(16, 16, init_method)
    weights['w'+str(6)] = select_init_weight_method(16, 16, init_method)
    weights['w'+str(7)] = select_init_weight_method(16, 16, init_method)
    weights['w'+str(8)] = select_init_weight_method(16, 1, init_method)
    return weights

def inference(x, init_method, activation_fun):
    weights = init_layers_weight(init_method)
    act_opt = x
    for l in range(1, 9):
        # 当前层的数入为前层的输出
        input = act_opt
        w = weights['w' + str(l)]
        # 获得线性操作的结果
        linear_opt = np.dot(input, w)
        # 激活函数
        act_opt = activation_fun(linear_opt)
        # 绘制激活函数的结果
        plt.subplot(2, 4, l)
        plt.hist(linear_opt.flatten(), facecolor='r')
        plt.xlim([-1, 1])
        plt.yticks([])
    plt.show()

def relu(linear_opt):
    return np.maximum(0, linear_opt)
if __name__=="__main__":
    file_path = "abalone.txt"
    x, y = load_dataset(file_path)
    num_sample = x.shape[0]
    print("数据集维度：", x.shape, y.shape)

    # init_method = "uniform_normal"
    # init_method = "he_init"
    # init_method = "random_normal"
    # init_method = "Xavier"
    init_method = "MSRA_init"

    inference(x, init_method, relu)

11. 总结

若使用tanh激活函数，则使用Xavier初始化权重；

若使用relu激活函数，则使用MSRA 初始化权重；

参考资料：

1. https://blog.csdn.net/u012328159/article/details/80025785

2. Xavier论文

3. MSRA论文

神经网络参数的各种初始化算法

1. 实验步骤：

2. 正态分布初始化权重

4. Xavier初始化权重

5. MSRA初始化权重

6. MSRA初始化权重

8. BilinearFiller 初始化(双线性插值初始化)

9. Fine-tuning

10. 实验代码

11. 总结

猜你喜欢