构建简单网络,训练并测试模型
inputs = tf.keras.Input(shape=(784,), name='img')
h1 = layers.Dense(32, activation='relu')(inputs)
h2 = layers.Dense(32, activation='relu')(h1)
outputs = layers.Dense(10, activation='softmax')(h2)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name='try')
#手写体数据集
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype('float32') /255
x_test = x_test.reshape(10000, 784).astype('float32') /255
model.compile(optimizer=tf.keras.optimizers.RMSprop(),
loss=tf.keras.losses.sparse_categorical_crossentropy,
metrics=['accuracy'])
history = model.fit(x_train, y_train, batch_size=64, epochs=5, validation_split=0.2)
test_scores = model.evaluate(x_test, y_test, verbose=0)
其中,verbose默认为1,显示训练的进程,verbose=0表示沉默不显示。
模型序列化和反序列化
model.save('model_save.h5')
del model
model = tf.keras.models.load_model('model_save.h5')
利用共享网络创建多个模型
在将这部分之前,先说说用到的相关API函数。
tf.keras.layers.Conv2D()表示二维的卷积,卷积的详细过程这里不在描述的,属于基本功。当使用卷积层作为模型的第一层时,需要指定一个关键参数-input_shape.
tf.keras.lay ers.Conv2D()的参数如下:
- filters 卷积核的数目,表示输出空间的通道数。
- kernel_size 卷积核的尺寸,通常应该是一个tuple或者list,比如对于二维来说,可以是(3,4),表示3x4的宽和高,也可以是单个整数,那么每个维度都是相同的值,比如2,表示2x2
- strides 卷积的步长,跟卷积核的尺寸对应,可以是tuple,也可以是单个整数,默认为(1, 1)
- padding 填充类型,默认值为’valid’
- data_format 字符串值,channels_last(默认值)或channels_first。输入维度的顺序。channels_last对应形状(batch, height, width, channels)的输入,而channels_first对应形状(batch, channels, height, width)的输入。默认为图片数据格式的值。可以在~/. Keras / Keras .json的Keras配置文件中找到,如果从未设置它,那么它将是“channels_last”。
- dilation_rate 这个参数用于膨胀卷积(IDCNN),表示膨胀率。可以是整数元组或者单个整数,并且值得指出的是: dilation_rate!=1和stride!=1是两回事,不可混为一谈。dilation_rate默认为(1,1)
- activation 激活函数
- use_bias是否使用偏置项,默认为True
- kernel_initializer 卷积核权重初始化,默认为“glorot_uniform”
- bias_initializer 偏置项初始化,默认都为0
- kernel_regularizer 卷积核权重的正则化函数
- bias_regularizer 偏置项权重的正则
- activity_regularizer激活函数的正则
- kernel_constraint 卷积核权重矩阵的限制函数
- bias_constraint 偏置项的限制函数
对于input_shape和output_shape,官网是这样描述的:
- input_shape:4D tensor with shape: (samples, channels, rows, cols) if data_format=‘channels_first’ or 4D tensor with shape: (samples, rows, cols, channels) if data_format=‘channels_last’.
- output_shape:4D tensor with shape: (samples, filters, new_rows, new_cols) if data_format=‘channels_first’ or 4D tensor with shape: (samples, new_rows, new_cols, filters) if data_format=‘channels_last’. rows and cols values might have changed due to padding.
这里有必要解释一下tf.keras.layers.MaxPool2D和tf.keras.layers.GlobalMaxPool2D区别。
MaxPool2D初始化参数有:
__init__(
pool_size=(2, 2),
strides=None,
padding='valid',
data_format=None,
**kwargs
)
GlobalMaxPool2D初始化参数有:
__init__(
data_format=None,
**kwargs
)
MaxPool2D是普通的池化,其不改变输入矩阵的维度,而GlobalMaxPool2D是全局池化,改变了输入矩阵的维度。举例如下:当data_format='channels_last’时,假设输入都是一个四维tensor:(batch_size, rows, cols, channels). 比较一下二者的输出。
MaxPool2D是这样的:
- 4D tensor with shape (batch_size, pooled_rows, pooled_cols, channels).
GlobalMaxPool2D却是这样的:
- 2D tensor with shape (batch_size, channels)
layers.Conv2DTranspose()表示反卷积,正常卷积的反向过程。layers.UpSampling2D()表示过采样,行和列的采样由各自的采样因子决定。UpSampling2D可以看作是Pooling的逆向操作,采用Nearest Neighbor interpolation扩增数据,进而增大feature map的大小。
到这里相关API讲完了,下面利用共享网络搭建多个模型。
#编码和解码
encode_input = keras.Input(shape=(28,28,1), name='img')
h1 = layers.Conv2D(16, 3, activation='relu')(encode_input)
h1 = layers.Conv2D(32, 3, activation='relu')(h1)
h1 = layers.MaxPool2D(3)(h1)
h1 = layers.Conv2D(32, 3, activation='relu')(h1)
h1 = layers.Conv2D(16, 3, activation='relu')(h1)
encode_output = layers.GlobalMaxPool2D()(h1)
encode_model = keras.Model(inputs=encode_input, outputs=encode_output, name='encoder')
encode_model.summary()
h2 = layers.Reshape((4, 4, 1))(encode_output)
h2 = layers.Conv2DTranspose(16, 3, activation='relu')(h2)
h2 = layers.Conv2DTranspose(32, 3, activation='relu')(h2)
h2 = layers.UpSampling2D(3)(h2)
h2 = layers.Conv2DTranspose(16, 3, activation='relu')(h2)
decode_output = layers.Conv2DTranspose(1, 3, activation='relu')(h2)
autoencoder = keras.Model(inputs=encode_input, outputs=decode_output, name='autoencoder')
autoencoder.summary()
也可以把模型当作一层网络来使用
encode_input = keras.Input(shape=(28,28,1), name='src_img')
h1 = layers.Conv2D(16, 3, activation='relu')(encode_input)
h1 = layers.Conv2D(32, 3, activation='relu')(h1)
h1 = layers.MaxPool2D(3)(h1)
h1 = layers.Conv2D(32, 3, activation='relu')(h1)
h1 = layers.Conv2D(16, 3, activation='relu')(h1)
encode_output = layers.GlobalMaxPool2D()(h1)
encode_model = keras.Model(inputs=encode_input, outputs=encode_output, name='encoder')
encode_model.summary()
decode_input = keras.Input(shape=(16,), name='encoded_img')
h2 = layers.Reshape((4, 4, 1))(decode_input)
h2 = layers.Conv2DTranspose(16, 3, activation='relu')(h2)
h2 = layers.Conv2DTranspose(32, 3, activation='relu')(h2)
h2 = layers.UpSampling2D(3)(h2)
h2 = layers.Conv2DTransp```ose(16, 3, activation='relu')(h2)
decode_output = layers.Conv2DTranspose(1, 3, activation='relu')(h2)
decode_model = keras.Model(inputs=decode_input, outputs=decode_output, name='decoder')
decode_model.summary()
autoencoder_input = keras.Input(shape=(28,28,1), name='img')
h3 = encode_model(autoencoder_input)
autoencoder_output = decode_model(h3)
autoencoder = keras.Model(inputs=autoencoder_input, outputs=autoencoder_output,
name='autoencoder')
autoencoder.summary()
构建多输入输出网络模型
在构建模型之前,先介绍相关API
tf.keras.layers.Embedding()主要用来训练词向量,参数如下所示:
- input_dim 词汇表的大小
- output_dim 输出词向量的维度
- input_length 输入句子的长度
假设输入是一个2维tensor:(batch_size, input_length),那么输出就是一个3维tensor:(batch_size, input_length, output_dim)
tf.keras.layers.LSTM是长短时记忆网络,RNN中的一个循环体,其主要参数如下:
- units 输出空间的维度
- activation 表示输出门的激活函数,默认为tanh
- recurrent_activation 输入门和遗忘门的激活函数,默认为sigmoid
- return_sequences 是否返回每一个cell的输出,或者只返回最终的输出
- return_state 是否返回最后一个cell的细胞状态
# 构建一个根据文档内容、标签和标题,预测文档优先级和执行部门的网络
# 超参
num_words = 2000
num_tags = 12
num_departments = 4
# 输入
body_input = keras.Input(shape=(None,), name='body')
title_input = keras.Input(shape=(None,), name='title')
tag_input = keras.Input(shape=(num_tags,), name='tag')
# 嵌入层
body_feat = layers.Embedding(num_words, 64)(body_input)
title_feat = layers.Embedding(num_words, 64)(title_input)
# 特征提取层
body_feat = layers.LSTM(32)(body_feat)
title_feat = layers.LSTM(128)(title_feat)
features = layers.concatenate([title_feat,body_feat, tag_input])
# 分类层
priority_pred = layers.Dense(1, activation='sigmoid', name='priority')(features)
department_pred = layers.Dense(num_departments, activation='softmax', name='department')(features)
# 构建模型
model = keras.Model(inputs=[body_input, title_input, tag_input],
outputs=[priority_pred, department_pred])
model.summary()
构造数据,训练模型
model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
loss={'priority': 'binary_crossentropy',
'department': 'categorical_crossentropy'},
loss_weights=[1., 0.2])
import numpy as np
# 载入输入数据
title_data = np.random.randint(num_words, size=(1280, 10))
body_data = np.random.randint(num_words, size=(1280, 100))
tag_data = np.random.randint(2, size=(1280, num_tags)).astype('float32')
# 标签
priority_label = np.random.random(size=(1280, 1))
department_label = np.random.randint(2, size=(1280, num_departments))
# 训练
history = model.fit(
{'title': title_data, 'body':body_data, 'tag':tag_data},
{'priority':priority_label, 'department':department_label},
batch_size=32,
epochs=5
)
共享网络层
share_embedding = layers.Embedding(1000, 64)
input1 = keras.Input(shape=(None,), dtype='int32')
input2 = keras.Input(shape=(None,), dtype='int32')
feat1 = share_embedding(input1)
feat2 = share_embedding(input2)
自定义网络层
# import tensorflow as tf
# import tensorflow.keras as keras
class MyDense(layers.Layer):
def __init__(self, units=32):
super(MyDense, self).__init__()
self.units = units
def build(self, input_shape):
self.w = self.add_weight(shape=(input_shape[-1], self.units),
initializer='random_normal',
trainable=True)
self.b = self.add_weight(shape=(self.units,),
initializer='random_normal',
trainable=True)
def call(self, inputs):
return tf.matmul(inputs, self.w) + self.b
def get_config(self):
return {'units': self.units}
inputs = keras.Input((4,))
outputs = MyDense(10)(inputs)
model = keras.Model(inputs, outputs)
config = model.get_config()
new_model = keras.Model.from_config(
config, custom_objects={'MyDense':MyDense}
)
# 在自定义网络层调用其他网络层
# 超参
time_step = 10
batch_size = 32
hidden_dim = 32
inputs_dim = 5
# 网络
class MyRnn(layers.Layer):
def __init__(self):
super(MyRnn, self).__init__()
self.hidden_dim = hidden_dim
self.projection1 = layers.Dense(units=hidden_dim, activation='relu')
self.projection2 = layers.Dense(units=hidden_dim, activation='relu')
self.classifier = layers.Dense(1, activation='sigmoid')
def call(self, inputs):
outs = []
states = tf.zeros(shape=[inputs.shape[0], self.hidden_dim])
for t in range(inputs.shape[1]):
x = inputs[:,t,:]
h = self.projection1(x)
y = h + self.projection2(states)
states = y
outs.append(y)
# print(outs)
features = tf.stack(outs, axis=1)
print(features.shape)
return self.classifier(features)
# 构建网络
inputs = keras.Input(batch_shape=(batch_size, time_step, inputs_dim))
x = layers.Conv1D(32, 3)(inputs)
print(x.shape)
outputs = MyRnn()(x)
model = keras.Model(inputs, outputs)
rnn_model = MyRnn()
_ = rnn_model(tf.zeros((1, 10, 5)))