TensorFlow Keras 官方教程 * * * * *

TensorFlow版本：1.9

Keras 简介 ¶

Keras 是建立和训练深度学习模型的高级 API。它被用于快速原型、高级研究和生产。Keras 具有三个主要优点：

用户友好
Keras API 简单、稳定、容易调试。
高度模块化
Keras API 可以像搭积木一样来构建深度学习系统。
易于扩展
可以很容易地实现研究过程中的各种新奇想法。
比如：创建新层、新的损失函数、提升state of art 模型的性能等。

1. 导入 `tf.keras` ¶

tf.keras是 Keras API 在TensorFlow 里的实现。这是一个高级API，用于构建和训练模型，同时兼容 TensorFlow 的绝大部分功能，比如，eager execution， tf.data模块，及Estimators。 tf.keras使得 TensorFlow 更容易使用，同时牺牲灵活性和性能。

使用 tf.keras，首先需要在您的代码开始时导入tf.keras：

import tensorflow as tf
from tensorflow import keras
     
     
      
      1
      
      2

tf.keras 与 keras 绝对兼容，但请注意：

tf.keras 与 keras 版本相同时，才绝对兼容。可以通过 tf.keras.version.来查看 tf.keras 的版本。
当保存一个模型的参数时，tf.keras默认保存成 checkpoint格式。可以通过设置save_format='h5'来保存成 HDF5 格式。

2. 建立一个简单的模型 ¶

2.1 使用 Sequential API 构建模型 ¶

在 Keras 里，你用layers来搭建模型。一个模型（通常）是一个 layer 组成的图(Graph)。最常见的模型类型一般是由多个 layer 堆叠体：tf.keras.Sequential 模型

以构建一个简单的全连接网络（比如：多层感知器）为例：

model = keras.Sequential()
# Adds a densely-connected layer with 64 units to the model:
model.add(keras.layers.Dense(64, activation='relu'))
# Add another:
model.add(keras.layers.Dense(64, activation='relu'))
# Add a softmax layer with 10 output units:
model.add(keras.layers.Dense(10, activation='softmax'))
     
     
      
      1
      
      2
      
      3
      
      4
      
      5
      
      6
      
      7

2.2 设置层的参数 ¶

在 tf.keras.layers 中有很多层，下面是一些通用的构造函数的参数：

activation：设置层使用的激活函数。
指定方法：名称或可调用对象
默认为空。
kernel_initializer 和 bias_initializer：设置层创建时，权重和偏差的初始化方法。
指定方法：名称或可调用对象
默认为"Glorot uniform" initializer。
kernel_regularizer 和 bias_regularizer：设置层的权重、偏差的正则化方法。比如：L1 或 L2 正则。
默认为空。

下面是一个实例，例子中对层的参数进行了指定：

# Create a sigmoid layer:
layers.Dense(64, activation='sigmoid')
# Or:
layers.Dense(64, activation=tf.sigmoid)

# A linear layer with L1 regularization of factor 0.01 applied to the kernel matrix:
layers.Dense(64, kernel_regularizer=keras.regularizers.l1(0.01))
# A linear layer with L2 regularization of factor 0.01 applied to the bias vector:
layers.Dense(64, bias_regularizer=keras.regularizers.l2(0.01))

# A linear layer with a kernel initialized to a random orthogonal matrix:
layers.Dense(64, kernel_initializer='orthogonal')
# A linear layer with a bias vector initialized to 2.0s:
layers.Dense(64, bias_initializer=keras.initializers.constant(2.0))

3. 训练和评估 ¶

3.1 配置训练过程 ¶

在模型构建完成后，通过调用 compile 方法来指定配置训练过程。

model.compile(optimizer=tf.train.AdamOptimizer(0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])
  
  
   
   1
   
   2
   
   3

tf.keras.Model.compile 有三个重要的参数：

optimizer：训练过程使用的优化方法。此参数通过 tf.train 模块的优化方法的实例来指定，比如：AdamOptimizer， RMSPropOptimizer， GradientDescentOptimizer。
loss：训练过程中使用的损失函数（通过最小化损失函数来训练模型）。常用的有：(mse)，categorical_crossentropy 和 binary_crossentropy。
指定方法：名称或可调用对象 from the tf.keras.losses 模块。
metrics：训练过程中，监测的指标（Used to monitor training）。
指定方法：名称或可调用对象 from the tf.keras.metrics 模块。

下面是配置模型训练过程的一个例子：

# Configure a model for mean-squared error regression.
model.compile(optimizer=tf.train.AdamOptimizer(0.01),
              loss='mse',       # mean squared error
              metrics=['mae'])  # mean absolute error

# Configure a model for categorical classification.
model.compile(optimizer=tf.train.RMSPropOptimizer(0.01),
              loss=keras.losses.categorical_crossentropy,
              metrics=[keras.metrics.categorical_accuracy])
  
  
   
   1
   
   2
   
   3
   
   4
   
   5
   
   6
   
   7
   
   8
   
   9

3.2 输入 Numpy 数据 ¶

对于小的数据集，可以直接使用 NumPy 格式的数据进行训练、评估模型。模型使用 fit 方法直接使用 numpy 格式的数据训练模型：

import numpy as np

data = np.random.random((1000, 32))
labels = np.random.random((1000, 10))

model.fit(data, labels, epochs=10, batch_size=32)
  
  
   
   1
   
   2
   
   3
   
   4
   
   5
   
   6

tf.keras.Model.fit 有三个重要的参数：

epochs：训练多少个 epochs。
batch_size：当时用 NumPy 格式数据时，模型会将数据切片成许多的小 batches，然后迭代这些batches。
指定 batch size。
注意：如果数据不成正好分成如干个batch，最后一个 batch 可能会很小。
validation_data：当对模型进行原型研究时，您希望在某些验证数据上轻松地监视其性能。
通过这个参数（a tuple of inputs and labels）允许模型在每个 epoch 结束后，以推理模式在指定的数据集上计算并显示损失和评价指标

下面是一个使用 validatation_data 的例子：

import numpy as np

data = np.random.random((1000, 32))
labels = np.random.random((1000, 10))

val_data = np.random.random((100, 32))
val_labels = np.random.random((100, 10))

model.fit(data, labels, epochs=10, batch_size=32,
          validation_data=(val_data, val_labels))
  
  
   
   1
   
   2
   
   3
   
   4
   
   5
   
   6
   
   7
   
   8
   
   9
   
   10

3.3 输入 `tf.data.datasets` ¶

使用 Datasets API 输入大型数据集或跨设备训练。给 fit 方法传递一个 tf.data.Dataset 实例：

# Instantiates a toy dataset instance:
dataset = tf.data.Dataset.from_tensor_slices((data, labels))
dataset = dataset.batch(32)
dataset = dataset.repeat()

# Don't forget to specify `steps_per_epoch` when calling `fit` on a dataset.
model.fit(dataset, epochs=10, steps_per_epoch=30)
  
  
   
   1
   
   2
   
   3
   
   4
   
   5
   
   6
   
   7

这里，fit 方法用 steps_per_epoch 参数来判断训练到第多少个 epoch 了。

Dataset API 也可以用于验证：

dataset = tf.data.Dataset.from_tensor_slices((data, labels))
dataset = dataset.batch(32).repeat()

val_dataset = tf.data.Dataset.from_tensor_slices((val_data, val_labels))
val_dataset = val_dataset.batch(32).repeat()

model.fit(dataset, epochs=10, steps_per_epoch=30,
          validation_data=val_dataset,
          validation_steps=3)
  
  
   
   1
   
   2
   
   3
   
   4
   
   5
   
   6
   
   7
   
   8
   
   9

3.4 评估和预测 ¶

tf.keras.Model.evaluate 和 tf.keras.Model.predict 能够使用 NumPy 数据和 tf.data.Dataset 数据。

以下面的方式进行评估：

model.evaluate(x, y, batch_size=32)

model.evaluate(dataset, steps=30)
  
  
   
   1
   
   2
   
   3

以下面的方式进行预测：

model.predict(x, batch_size=32)

model.predict(dataset, steps=30)
  
  
   
   1
   
   2
   
   3

4. 建立高级模型 ¶

4.1 使用 Function API 构建模型 ¶

tf.keras.Sequential 模型只适用于多层简单堆叠网络，不能表示复杂模型。使用 Keras functional API 可以构建有复杂拓扑结构的模型。比如：

多个输入的模型（Multi-input models）
多个输出的模型（Multi-output models）
有共享层的模型（Models with shared layers (the same layer called several times)）
有 non-sequential 数据流的模型（Models with non-sequential data flows (e.g. residual connections)）

函数式 API 特点：

层是可调用的，返回值是一个 tensor。
输入 tensors 和输出 tensors 被用来定义一个 tf.keras.Model 实例
函数式 API 构建的模型的训练同 Sequential 模型。

下面的代码使用函数式 API 构建了一个简单的全连接网络：

inputs = keras.Input(shape=(32,))  # Returns a placeholder tensor

# A layer instance is callable on a tensor, and returns a tensor.
x = keras.layers.Dense(64, activation='relu')(inputs)
x = keras.layers.Dense(64, activation='relu')(x)
predictions = keras.layers.Dense(10, activation='softmax')(x)

# Instantiate the model given inputs and outputs.
model = keras.Model(inputs=inputs, outputs=predictions)

# The compile step specifies the training configuration.
model.compile(optimizer=tf.train.RMSPropOptimizer(0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Trains for 5 epochs
model.fit(data, labels, batch_size=32, epochs=5)
  
  
   
   1
   
   2
   
   3
   
   4
   
   5
   
   6
   
   7
   
   8
   
   9
   
   10
   
   11
   
   12
   
   13
   
   14
   
   15
   
   16
   
   17

4.2 编写 Model 的子类来构建模型（Model subclassing） ¶

通过编写 tf.keras.Model 的子类来构建一个自定义模型，并且定义你的模型的前向传播。在 __init__ 方法里创建 layers，并且将 layers 设置为类的实例的属性。在 call 方法里定义前向传播过程。

当使用 eager execution 时，Model subclassing 方法特别有用（pytorch 里确实也是这么干的）。

提示：根据工作的不同，请使用不同的 API。虽然 model subclassing 提供了灵活性，但也更复杂、更容易出错。如果可能，请尽量使用 function API。

下面是 model subclassing 例子：

class MyModel(keras.Model):

  def __init__(self, num_classes=10):
    super(MyModel, self).__init__(name='my_model')
    self.num_classes = num_classes
    # Define your layers here.
    self.dense_1 = keras.layers.Dense(32, activation='relu')
    self.dense_2 = keras.layers.Dense(num_classes, activation='sigmoid')

  def call(self, inputs):
    # Define your forward pass here,
    # using layers you previously defined (in `__init__`).
    x = self.dense_1(inputs)
    return self.dense_2(x)

  def compute_output_shape(self, input_shape):
    # You need to override this function if you want to use the subclassed model
    # as part of a functional-style model.
    # Otherwise, this method is optional.
    shape = tf.TensorShape(input_shape).as_list()
    shape[-1] = self.num_classes
    return tf.TensorShape(shape)


# Instantiates the subclassed model.
model = MyModel(num_classes=10)

# The compile step specifies the training configuration.
model.compile(optimizer=tf.train.RMSPropOptimizer(0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Trains for 5 epochs.
model.fit(data, labels, batch_size=32, epochs=5)
  
  
   
   1
   
   2
   
   3
   
   4
   
   5
   
   6
   
   7
   
   8
   
   9
   
   10
   
   11
   
   12
   
   13
   
   14
   
   15
   
   16
   
   17
   
   18
   
   19
   
   20
   
   21
   
   22
   
   23
   
   24
   
   25
   
   26
   
   27
   
   28
   
   29
   
   30
   
   31
   
   32
   
   33
   
   34

4.3 自定义 layers ¶

可以通过编写 tf.keras.layers.Layer 的子类来创建一个自定义 layer，该子类编写过程中需要编写下面的方法：

build：创建层的参数。Add weights with the add_weight method.
call：定义前向传播过程。
compute_output_shape：指定怎么根据输入去计算 layer 的输出 shape。
可选地，a layer can be serialized by implementing the get_config method and the from_config class method.

这里有一个自定义 layer 的例子，该 layer 将输入和一个矩阵进行相乘：

class MyLayer(keras.layers.Layer):

  def __init__(self, output_dim, **kwargs):
    self.output_dim = output_dim
    super(MyLayer, self).__init__(**kwargs)

  def build(self, input_shape):
    shape = tf.TensorShape((input_shape[1], self.output_dim))
    # Create a trainable weight variable for this layer.
    self.kernel = self.add_weight(name='kernel',
                                  shape=shape,
                                  initializer='uniform',
                                  trainable=True)
    # Be sure to call this at the end
    super(MyLayer, self).build(input_shape)

  def call(self, inputs):
    return tf.matmul(inputs, self.kernel)

  def compute_output_shape(self, input_shape):
    shape = tf.TensorShape(input_shape).as_list()
    shape[-1] = self.output_dim
    return tf.TensorShape(shape)

  def get_config(self):
    base_config = super(MyLayer, self).get_config()
    base_config['output_dim'] = self.output_dim

  @classmethod
  def from_config(cls, config):
    return cls(**config)


# Create a model using the custom layer
model = keras.Sequential([MyLayer(10),
                          keras.layers.Activation('softmax')])

# The compile step specifies the training configuration
model.compile(optimizer=tf.train.RMSPropOptimizer(0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Trains for 5 epochs.
model.fit(data, targets, batch_size=32, epochs=5)
  
  
   
   1
   
   2
   
   3
   
   4
   
   5
   
   6
   
   7
   
   8
   
   9
   
   10
   
   11
   
   12
   
   13
   
   14
   
   15
   
   16
   
   17
   
   18
   
   19
   
   20
   
   21
   
   22
   
   23
   
   24
   
   25
   
   26
   
   27
   
   28
   
   29
   
   30
   
   31
   
   32
   
   33
   
   34
   
   35
   
   36
   
   37
   
   38
   
   39
   
   40
   
   41
   
   42
   
   43
   
   44

5. 回调（Callbacks） ¶

回调用来在训练过程中，自定义、扩展模型的行为（A callback is an object passed to a model to customize and extend its behavior during training）。你可以编写自定义 callback，也可以使用 tf.keras.callbacks 内置的 callback。

tf.keras.callbacks 内置的 callback 有：

tf.keras.callbacks.ModelCheckpoint：定期保存checkpoints。
tf.keras.callbacks.LearningRateScheduler：动态改变学习速率。
tf.keras.callbacks.EarlyStopping：当验证集上的性能不再提高时，终止训练。
tf.keras.callbacks.TensorBoard：使用TensorBoard 监测模型的行为。

为了使用一个 tf.keras.callbacks.Callback，将将它传递给模型的 fit 方法：

callbacks = [
  # Interrupt training if `val_loss` stops improving for over 2 epochs
  keras.callbacks.EarlyStopping(patience=2, monitor='val_loss'),
  # Write TensorBoard logs to `./logs` directory
  keras.callbacks.TensorBoard(log_dir='./logs')
]
model.fit(data, labels, batch_size=32, epochs=5, callbacks=callbacks,
          validation_data=(val_data, val_targets))
  
  
   
   1
   
   2
   
   3
   
   4
   
   5
   
   6
   
   7
   
   8

6. 模型的保存和恢复 ¶

6.1 只保存参数（Weights only） ¶

使用 tf.keras.Model.save_weights 来保存和加载模型的 weights：

# Save weights to a TensorFlow Checkpoint file
model.save_weights('./my_model')

# Restore the model's state,
# this requires a model with the same architecture.
model.load_weights('my_model')
  
  
   
   1
   
   2
   
   3
   
   4
   
   5
   
   6

默认情况下，这会以 TensorFlow checkpoint 格式保存模型的 weights。weights 也可以保存为 HDF5 格式（Keras 默认的保存格式）：

# Save weights to a HDF5 file
model.save_weights('my_model.h5', save_format='h5')

# Restore the model's state
model.load_weights('my_model.h5')
  
  
   
   1
   
   2
   
   3
   
   4
   
   5

6.2 只保存模型（Configuration only） ¶

一个模型的 configuration 可以被保存，序列化过程中不包含任何 weights。保存的 configuration 可以用来重新创建、初始化出相同的模型，即使没有模型原始的定义代码。Keras 支持 JSON，YAML 序列化格式：


# Serialize a model to JSON format
json_string = model.to_json()

# Recreate the model (freshly initialized)
fresh_model = keras.models.from_json(json_string)

# Serializes a model to YAML format
yaml_string = model.to_yaml()

# Recreate the model
fresh_model = keras.models.from_yaml(yaml_string)
  
  
   
   1
   
   2
   
   3
   
   4
   
   5
   
   6
   
   7
   
   8
   
   9
   
   10
   
   11
   
   12

注意：Subclassed models 是不可序列化的，因为它们的结构是在 Python 代码内的 call 方法里定义的。

6.3 整个模型（Entire model） ¶

整个模型可以被保存成一个文件（同时包含weights、configuration，甚至optimizer’s configuration）。这允许你去 checkpoint 一个模型、从 checkpoint 中模型所处状态恢复训练，而不需要原始的代码：

# Create a trivial model
model = keras.Sequential([
  keras.layers.Dense(10, activation='softmax', input_shape=(32,)),
  keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(data, targets, batch_size=32, epochs=5)


# Save entire model to a HDF5 file
model.save('my_model.h5')

# Recreate the exact same model, including weights and optimizer.
model = keras.models.load_model('my_model.h5')
  
  
   
   1
   
   2
   
   3
   
   4
   
   5
   
   6
   
   7
   
   8
   
   9
   
   10
   
   11
   
   12
   
   13
   
   14
   
   15
   
   16

7. Eager execution ¶

Eager execution 是一个即时执行的编程环境（图的定义和执行是同步的，可以动态的更改图）。Keras 不支持动态图，但 tf.keras 支持，并且在程序检查和调试过程中非常有用。

tf.keras 里所有的模型构建 API 兼容 eager execution。并且，在编写 model subclassing 和 custom layers 时使用 eager execution，好处多多。

请看 eager execution guide 里的例子：使用 Keras models with custom training loops and tf.GradientTape。

8. 分布式 ¶

8.1 Estimators ¶

Estimators API 被用来在分布时环境训练模型。`Estimator` API 旨在大型数据集的分布式训练，该 API 能够导出工业生产可用的模型。 is used for training models

一个 tf.keras.Model 可以用 tf.estimator API 来训练（通过 tf.keras.estimator.model_to_estimator 将模型转为一个 tf.estimator.Estimator 对象）。详情见 Creating Estimators from Keras models。

model = keras.Sequential([layers.Dense(10,activation='softmax'),
                          layers.Dense(10,activation='softmax')])

model.compile(optimizer=tf.train.RMSPropOptimizer(0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

estimator = keras.estimator.model_to_estimator(model)
  
  
   
   1
   
   2
   
   3
   
   4
   
   5
   
   6
   
   7
   
   8

提示：可以通过开启 eager execution 来调试 Estimator input functions、检查数据。

8.2 多 GPU ¶

tf.keras 模型可以通过 tf.contrib.distribute.DistributionStrategy 在多个 GPU 上运行。这个 API 在几乎不需要更改代码的情况下，实现在多个 GPU 上的分布式训练。

当前，tf.contrib.distribute.MirroredStrategy 是唯一支持的分布式策略。MirroredStrategy 对图进行复制，以同步的方式训练，并且梯度最后聚集在一个机器上。为了使用 DistributionStrategy with Keras，首先用 tf.keras.estimator.model_to_estimator 将 tf.keras.Model 转化为一个 tf.estimator.Estimator，然后训练转化来的estimator。

下面的例子在一个机器的多个 GPU 上实现了 `tf.keras.Model` 的训练

首先，定义一个简单的模型：

model = keras.Sequential()
model.add(keras.layers.Dense(16, activation='relu', input_shape=(10,)))
model.add(keras.layers.Dense(1, activation='sigmoid'))

optimizer = tf.train.GradientDescentOptimizer(0.2)

model.compile(loss='binary_crossentropy', optimizer=optimizer)
model.summary()
  
  
   
   1
   
   2
   
   3
   
   4
   
   5
   
   6
   
   7
   
   8

定义输入 pipeline。`input_fn` 返回一个 `tf.data.Dataset` 对象，该对象用来将数据传给多个设备，每一个设备处理一个 batch 的一个 slice。

def input_fn():
  x = np.random.random((1024, 10))
  y = np.random.randint(2, size=(1024, 1))
  x = tf.cast(x, tf.float32)
  dataset = tf.data.Dataset.from_tensor_slices((x, y))
  dataset = dataset.repeat(10)
  dataset = dataset.batch(32)
  return dataset
  
  
   
   1
   
   2
   
   3
   
   4
   
   5
   
   6
   
   7
   
   8

接下来，创建一个 tf.estimator.RunConfig 并设置 train_distribute 参数为 tf.contrib.distribute.MirroredStrategy 实例。当创建 MirroredStrategy 时，你可以指定一个设备列表或通过 num_gpus 参数设置 GPU 的数量。默认使用所有的 GPU。

strategy = tf.contrib.distribute.MirroredStrategy()
config = tf.estimator.RunConfig(train_distribute=strategy)
  
  
   
   1
   
   2

将 Keras model 转为一个 tf.estimator.Estimator 实例。

keras_estimator = keras.estimator.model_to_estimator(
  keras_model=model,
  config=config,
  model_dir='/tmp/model_dir')
  
  
   
   1
   
   2
   
   3
   
   4

最后，训练这个 Estimator 实例：

keras_estimator.train(input_fn=input_fn, steps=10)
  
  
   
   1