[ MOOC课程学习 ] 人工智能实践：Tensorflow笔记_CH4_4 正则化

正则化

过拟合:神经网络模型在训练数据集上的准确率较高,在新的数据进行预测或分类时准确率较低,说明模型的泛化能力差。
正则化:在损失函数中给每个参数 w 加上权重,引入模型复杂度指标,从而抑制模型噪声,减小过拟合。
使用正则化后,损失函数 loss 变为两项之和:
loss = loss(y 与 y_) + REGULARIZER*loss(w)
其中,第一项是预测结果与标准答案之间的差距,如交叉熵、均方误差等;第二项是正则化计算结果。
正则化计算方法：
- L1正则化：
  - 计算公式：
    $R (w) = | | w | |_{1} = \sum_{i} | w_{i} |$ $R(w)=||w||_1= \sum_i|w_i|$
  - 用 Tesnsorflow 函数表示
```
RL_1 = tf.contrib.layers.l1_regularizer(REGULARIZER)(w)
```
- L2正则化：
  - 计算公式：
    $R (w) = | | w | |_{2}^{2} = \sum_{i} | w_{i}^{2} |$ $R(w)=||w||_2^2= \sum_i|w_i^2|$
  - 用 Tesnsorflow 函数表示
```
RL_2 = tf.contrib.layers.l2_regularizer(REGULARIZER)(w)
```

用 Tesnsorflow 函数实现正则化：

tf.add_to_collection('losses', RL_2)
loss = loss_cem + tf.add_n(tf.get_collection('losses'))

示例：
用 300 个符合正态分布的点 $X[x_0,x_1]$ 作为数据集,根据点 $X[x_0,x_1]$ 计算生成标注 $Y\_$ ,将数据集标注为红色点和蓝色点。
标注规则为:当 $x_0^2 + x_1^2 < 2$ 时, $y\_=1$ ,标注为红色;当 $x_0^2 + x_1^2 \geq 2$ 时, $y\_=0$ ,标注为蓝色。
我们分别用无正则化和有正则化两种方法,拟合曲线,把红色点和蓝色点分开。在实际分类时,如果前向传播输出的预测值 $y$ 接近 1 则为红色点概率越大,接近 0 则为蓝色点概率越大,输出的预测值 $y$ 为 0.5 是红蓝点概率分界线。

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np

BATCH_SIZE = 30
SEED = 2

rdm = np.random.RandomState(SEED)
X = rdm.randn(300, 2)
Y_ = [int(xi[0]*xi[0] + xi[1]*xi[1] < 2) for xi in X]
Y_c = [['red' if y else 'blue'] for y in Y_]
X = np.vstack(X).reshape(-1, 2)
Y_ = np.vstack(Y_).reshape(-1, 1)
plt.scatter(X[:,0], X[:,1], c=np.squeeze(Y_c))
plt.show()

def get_weight(shape, regularizer):
    w = tf.Variable(tf.random_normal(shape), dtype=tf.float32)
    tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(regularizer)(w))
    return w

def get_bias(shape):
    b = tf.Variable(tf.constant(0.01, shape=shape))
    return b

x = tf.placeholder(tf.float32, shape=(None, 2))
y_ = tf.placeholder(tf.float32, shape=(None, 1))

w1 = get_weight([2, 11], 0.01)
b1 = get_bias([11])
y1 = tf.nn.relu(tf.matmul(x, w1) + b1)

w2 = get_weight([11, 1], 0.01)
b2 = get_bias([1])
y = tf.matmul(y1, w2) + b2

loss_mse = tf.reduce_mean(tf.square(y-y_))
loss_total = loss_mse + tf.add_n(tf.get_collection('losses'))

train_step = tf.train.AdamOptimizer(0.0001).minimize(loss_mse)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    STEPS = 40000
    for i in range(STEPS):
        start = (i*BATCH_SIZE) % 300
        end = min(start+BATCH_SIZE, 300)
        sess.run(train_step, feed_dict={x: X[start:end], y_:Y_[start:end]})
        if i % 2000 == 0:
            loss_mse_v = sess.run(loss_mse, feed_dict={x:X, y_:Y_})
            print('After %d steps, loss_mse is: %f' % (i, loss_mse_v))
    xx, yy = np.mgrid[-3:3:0.01, -3:3:0.01]
    grid = np.c_[xx.ravel(), yy.ravel()]
    probs = sess.run(y, feed_dict={x:grid})
    probs = probs.reshape(xx.shape)

    print('w1:', sess.run(w1))
    print('b1:', sess.run(b1))
    print('w2:', sess.run(w2))
    print('b2:', sess.run(b2))

plt.scatter(X[:,0], X[:,1], c=np.squeeze(Y_c))
plt.contour(xx, yy, probs, levels=[.5])
plt.title('loss_mse')
plt.show()

train_step = tf.train.AdamOptimizer(0.0001).minimize(loss_total)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    STEPS = 40000
    for i in range(STEPS):
        start = (i*BATCH_SIZE) % 300
        end = min(start+BATCH_SIZE, 300)
        sess.run(train_step, feed_dict={x: X[start:end], y_:Y_[start:end]})
        if i % 2000 == 0:
            loss_total_v = sess.run(loss_total, feed_dict={x:X, y_:Y_})
            print('After %d steps, loss_total is: %f' % (i, loss_total_v))
    xx, yy = np.mgrid[-3:3:0.01, -3:3:0.01]
    grid = np.c_[xx.ravel(), yy.ravel()]
    probs = sess.run(y, feed_dict={x:grid})
    probs = probs.reshape(xx.shape)

    print('w1:', sess.run(w1))
    print('b1:', sess.run(b1))
    print('w2:', sess.run(w2))
    print('b2:', sess.run(b2))

plt.scatter(X[:,0], X[:,1], c=np.squeeze(Y_c))
plt.contour(xx, yy, probs, levels=[.5])
plt.title('loss_total')
plt.show()

可视化数据集：

无正则化：

有正则化：

np.vstack()

def vstack(tup):
    """
    Stack arrays in sequence vertically (row wise).

    This is equivalent to concatenation along the first axis after 1-D arrays
    of shape `(N,)` have been reshaped to `(1,N)`. Rebuilds arrays divided by
    `vsplit`.

    This function makes most sense for arrays with up to 3 dimensions. For
    instance, for pixel-data with a height (first axis), width (second axis),
    and r/g/b channels (third axis). The functions `concatenate`, `stack` and
    `block` provide more general stacking and concatenation operations.

    Parameters
    ----------
    tup : sequence of ndarrays
        The arrays must have the same shape along all but the first axis.
        1-D arrays must have the same length.

    Returns
    -------
    stacked : ndarray
        The array formed by stacking the given arrays, will be at least 2-D.

    See Also
    --------
    stack : Join a sequence of arrays along a new axis.
    hstack : Stack arrays in sequence horizontally (column wise).
    dstack : Stack arrays in sequence depth wise (along third dimension).
    concatenate : Join a sequence of arrays along an existing axis.
    vsplit : Split array into a list of multiple sub-arrays vertically.
    block : Assemble arrays from blocks.

    Examples
    --------
    >>> a = np.array([1, 2, 3])
    >>> b = np.array([2, 3, 4])
    >>> np.vstack((a,b))
    array([[1, 2, 3],
           [2, 3, 4]])

    >>> a = np.array([[1], [2], [3]])
    >>> b = np.array([[2], [3], [4]])
    >>> np.vstack((a,b))
    array([[1],
           [2],
           [3],
           [2],
           [3],
           [4]])

    """

画散点图

plt.scatter (x 坐标, y 坐标, c=”颜色”)

收集规定区域内所有的网格坐标点:

xx, yy = np.mgrid[起:止:步长, 起:止:步长] # 找到规定区域以步长为分辨率的行列网格坐标点
grid = np.c_[xx.ravel(), yy.ravel()] # 收集规定区域内所有的网格坐标点

例如：

xx, yy = np.mgrid[0:5, 0:5]
xx
array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4]])
yy
array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]])
xx.ravel()
array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4,
       4, 4, 4])
yy.ravel()
array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1,
       2, 3, 4])


Examples
--------

>>> np.c_[np.array([1,2,3]), np.array([4,5,6])]
array([[1, 4],
       [2, 5],
       [3, 6]])
>>> np.c_[np.array([[1,2,3]]), 0, 0, np.array([[4,5,6]])]
array([[1, 2, 3, 0, 0, 4, 5, 6]])

plt.contour()函数:告知 x、y 坐标和各点高度,用 levels 指定高度的点描上颜色
```
plt.contour(x 轴坐标值, y 轴坐标值, 该点的高度, levels=[等高线的高度])
```

[ MOOC课程学习 ] 人工智能实践：Tensorflow笔记_CH4_4 正则化

正则化

猜你喜欢