tensorflow 2剪枝（tensorflow_model_optimization）API

找tf关于Pruning和quantization的用例较少，正好在做这方面工作，就搬一搬一些官方文档的应用。

下面的代码主要是结合一个官方Mnist的示例和guide文档看看tf的API中是怎么做pruning这一步优化的。

tensorflow/model-optimization--comprehensive_guide

pruning_with_keras

总的思路是：建baseline model → 加入剪枝操作→ 对比模型大小、acc等变化

其中关注其中如何自定义自己的pruning case和后续quantization等

1.导入一些依赖库，后面似乎没用到tensorboard，暂时注释掉

2.导入Mnist数据集，作简单规整

3.建立一个Baseline模型，并保存权重，方便后续比较性能

4.对整个模型直接magnitude，建立剪枝模型，顺便看看模型前后变化

5.选定某个层进行magnitude（这里选择Dense layer），建立剪枝模型，看看模型变化

import tempfile
import os
import zipfile
import tensorflow as tf
import numpy as np
import tensorflow_model_optimization as tfmot
from tensorflow import keras

#%load_ext tensorboard

1.导入一些依赖库，后面似乎没用到tensorboard，暂时注释掉

#加载MNIST数据集
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
#将图像像素值规整到[0,1]
train_images = train_images / 255.0
test_images = test_images / 255.0

2.导入Mnist数据集，作简单规整



#建立模型
def setup_model():
    model = keras.Sequential([
        keras.layers.InputLayer(input_shape=(28, 28)),
        keras.layers.Reshape(target_shape=(28, 28, 1)),
        keras.layers.Conv2D(filters=12,kernel_size=(3, 3), activation='relu'),
        keras.layers.MaxPooling2D(pool_size=(2,2)),
        keras.layers.Flatten(),
        keras.layers.Dense(10)
    ])
    return model

#训练分类模型参数
def setup_pretrained_weights():
    model = setup_model()
    
    model.compile(optimizer = 'adam',
                  loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits = True),
                  metrics = ['accuracy']
    )

    model.fit(train_images,
              train_labels,
              epochs = 4,
              validation_split = 0.1,
    )

    _, pretrained_weights = tempfile.mkstemp('.tf')
    
    model.save_weights(pretrained_weights)
    return pretrained_weights

3.建立一个Baseline模型，并保存权重，方便后续比较性能

setup_model()

pretrained_weights = setup_pretrained_weights()

#
Train on 54000 samples, validate on 6000 samples
Epoch 1/4
54000/54000 [==============================] - 7s 133us/sample - loss: 0.2895 - accuracy: 0.9195 - val_loss: 0.1172 - val_accuracy: 0.9685
Epoch 2/4
54000/54000 [==============================] - 5s 99us/sample - loss: 0.1119 - accuracy: 0.9678 - val_loss: 0.0866 - val_accuracy: 0.9758
Epoch 3/4
54000/54000 [==============================] - 5s 100us/sample - loss: 0.0819 - accuracy: 0.9753 - val_loss: 0.0757 - val_accuracy: 0.9787
Epoch 4/4
54000/54000 [==============================] - 6s 103us/sample - loss: 0.0678 - accuracy: 0.9797 - val_loss: 0.0714 - val_accuracy: 0.9815

4.对整个模型直接magnitude，建立剪枝模型，顺便看看模型前后变化

#比较baselin与剪裁模型的差别
base_model = setup_model()
base_model.summary()

base_model.load_weights(pretrained_weights)

model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(base_model)
model_for_pruning.summary()

#
Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
reshape_4 (Reshape)          (None, 28, 28, 1)         0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 26, 26, 12)        120       
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 13, 13, 12)        0         
_________________________________________________________________
flatten_4 (Flatten)          (None, 2028)              0         
_________________________________________________________________
dense_4 (Dense)              (None, 10)                20290     
=================================================================
Total params: 20,410
Trainable params: 20,410
Non-trainable params: 0
_________________________________________________________________
Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
prune_low_magnitude_reshape_ (None, 28, 28, 1)         1         
_________________________________________________________________
prune_low_magnitude_conv2d_4 (None, 26, 26, 12)        230       
_________________________________________________________________
prune_low_magnitude_max_pool (None, 13, 13, 12)        1         
_________________________________________________________________
prune_low_magnitude_flatten_ (None, 2028)              1         
_________________________________________________________________
prune_low_magnitude_dense_4  (None, 10)                40572     
=================================================================
Total params: 40,805
Trainable params: 20,410
Non-trainable params: 20,395
_________________________________________________________________

分析：可以看到各层参数都增多了，其中为了剪枝操作增加的参数是Non-trainable的参数

5.选定某个层进行magnitude（这里选择Dense layer），建立剪枝模型，看看模型变化

为了模块化对某类层进行处理，先def一个函数

#修剪模型的Dense layer
def apply_pruning_to_dense(layer):
    if isinstance(layer, tf.keras.layers.Dense):
        print("Apply pruning to Dense")
        return tfmot.sparsity.keras.prune_low_magnitude(layer)
    return layer

其中tf.keras.models.clone_model是对keras定义的层进行一些改变，具体看一看官方api

model_for_pruning = tf.keras.models.clone_model(
base_model, clone_function=apply_pruning_to_dense)
model_for_pruning.summary()

#
Apply pruning to Dense
Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
reshape_4 (Reshape)          (None, 28, 28, 1)         0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 26, 26, 12)        120       
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 13, 13, 12)        0         
_________________________________________________________________
flatten_4 (Flatten)          (None, 2028)              0         
_________________________________________________________________
prune_low_magnitude_dense_4  (None, 10)                40572     
=================================================================
Total params: 40,692
Trainable params: 20,410
Non-trainable params: 20,282
_________________________________________________________________

分析：可以看到只对Dense层加入剪枝操作参数

可能更方便的是根据layer的name在clone_function中去选定剪枝而不是layer的类型

通过下面的方式可以查看层的name（- - 看summary或者定义layer的时候直接给name比较快吧）

print(base_model.layers[0].name)

#reshape_4

对①Functional的方式和②Sequential中直接用magnitude的方式进行了警告：虽然可读性增加，但精度可能不及上述方式

原因是在定义后再load weights是无效的（- - 应该是无法得到去掉剪枝参数的weight，也就是无法还原模型）

Functional example
# Use `prune_low_magnitude` to make the `Dense` layer train with pruning.
i = tf.keras.Input(shape=(20,))
x = tfmot.sparsity.keras.prune_low_magnitude(tf.keras.layers.Dense(10))(i)
o = tf.keras.layers.Flatten()(x)
model_for_pruning = tf.keras.Model(inputs=i, outputs=o)

model_for_pruning.summary()

Sequential example
# Use `prune_low_magnitude` to make the `Dense` layer train with pruning.
model_for_pruning = tf.keras.Sequential([
  tfmot.sparsity.keras.prune_low_magnitude(tf.keras.layers.Dense(20, input_shape=input_shape)),
  tf.keras.layers.Flatten()
])

model_for_pruning.summary()

6.自定义剪枝操作

通过 tfmot.sparsity.keras.PrunableLayer 自定需要剪枝的参数

常有两种情况：（通常bia的prune会严重降低精度，默认是不会prune的，此处只作示例）

serves two use cases:

Prune a custom Keras layer
Modify parts of a built-in Keras layer to prune.

在API的类中有get_prunable_weights（）去返回在训练中需要Prune的张量官方API

class MyDenseLayer(tf.keras.layers.Dense, tfmot.sparsity.keras.PrunableLayer):

  def get_prunable_weights(self):
    # Prune bias also, though that usually harms model accuracy too much.
    return [self.kernel, self.bias]

# Use `prune_low_magnitude` to make the `MyDenseLayer` layer train with pruning.
model_for_pruning = tf.keras.Sequential([
  tfmot.sparsity.keras.prune_low_magnitude(MyDenseLayer(20, input_shape=input_shape)),
  tf.keras.layers.Flatten()
])

model_for_pruning.summary()

#
_________________________________________________________________
Model: "sequential_11"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
prune_low_magnitude_my_dense (None, 28, 10)            583       
_________________________________________________________________
flatten_13 (Flatten)         (None, 280)               0         
=================================================================
Total params: 583
Trainable params: 290
Non-trainable params: 293
_________________________________________________________________


# Use `prune_low_magnitude` to make the `Dense` layer train with pruning.
i = tf.keras.Input(shape=(28,28))
x = tfmot.sparsity.keras.prune_low_magnitude(tf.keras.layers.Dense(10))(i)
o = tf.keras.layers.Flatten()(x)
model_for_pruning = tf.keras.Model(inputs=i, outputs=o)

model_for_pruning.summary()

#
Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_7 (InputLayer)         [(None, 28, 28)]          0         
_________________________________________________________________
prune_low_magnitude_dense_9  (None, 28, 10)            572       
_________________________________________________________________
flatten_12 (Flatten)         (None, 280)               0         
=================================================================
Total params: 572
Trainable params: 290
Non-trainable params: 282
_________________________________________________________________

分析：可以看到两种方法建模的模型参数，多出来的就是bia的量了

7.Tensorboard 可视化

在训练中添加回调参数 tfmot.sparsity.keras.PruningSummaries 去观测过程中的变量

其中回调参数 tfmot.sparsity.keras.UpdatePruningStep() 是必须的，不然会出错官方API

base_model = setup_model()
base_model.load_weights(pretrained_weights) # optional but recommended for model accuracy
model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(base_model)

log_dir = tempfile.mkdtemp()
print(log_dir)#查看保存地址
callbacks = [
    tfmot.sparsity.keras.UpdatePruningStep(),
    # Log sparsity and other metrics in Tensorboard.
    tfmot.sparsity.keras.PruningSummaries(log_dir=log_dir)
]

model_for_pruning.compile(
      loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits = True),
      optimizer='adam',
      metrics=['accuracy']
)

model_for_pruning.fit(
    train_images,
    train_labels,
    callbacks=callbacks,
    epochs=2,
)

给一下这个model的summary方便看name和参数结构

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
prune_low_magnitude_reshape_ (None, 28, 28, 1)         1         
_________________________________________________________________
prune_low_magnitude_conv2d_2 (None, 26, 26, 12)        230       
_________________________________________________________________
prune_low_magnitude_max_pool (None, 13, 13, 12)        1         
_________________________________________________________________
prune_low_magnitude_flatten_ (None, 2028)              1         
_________________________________________________________________
prune_low_magnitude_dense_2  (None, 10)                40572     
=================================================================
Total params: 40,805
Trainable params: 20,410
Non-trainable params: 20,395
_________________________________________________________________

终于到可视化这一步了！

tensorboard --logdir=log_dir

Scalars中有epoch_accuracy、epoch_loss(很简单的两个point，图略）重点：acc比修剪前的高（0.97 ↑ 0.98）

还有两个层的稀疏度与阈值变化图，重点看看这两个

prune_low_magnitude_conv2d_2_mask_0_sparsity

分析：只是简单地用了 model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(base_model)

所以可以看到随着训练step by step最终到达0.5稀疏度的mask(=0)

prune_low_magnitude_conv2d_2/threshold:0/threshold

分析：阈值逐步增大去筛选权重小的参数，最后一个point的value是0.1952

prune_low_magnitude_dense_2/mask:0/sparsity

分析：跟conv2d的一致

prune_low_magnitude_dense_2/threshold:0/threshold

分析：阈值几乎为0就把稀疏度冲上了0.5，证实了Dense Layer有大量冗余信息存在的先验知识，即Dense层可以大幅度扔掉！

8.保存模型比较精度、模型大小

常见错误：strip_pruning和应用标准压缩算法（例如通过gzip）都是必需的，以查看修剪的压缩优势。

说人话：strip_pruning或者用gzip之类的压缩掉有0的参数得到的模型大小来观测稀疏效果

先整一个计算模型大小模块：

#获得模型权重大小 
def get_gzipped_model_size(model):
    _, keras_file = tempfile.mkstemp('.h5')
    model.save(keras_file, include_optimizer=False)
    
    _, zipped_file = tempfile.mkstemp('.zip')
    with zipfile.ZipFile(zipped_file, 'w', compression=zipfile.ZIP_DEFLATED) as f:
        f.write(keras_file)
    return os.path.getsize(zipped_file)

model_for_export = tfmot.sparsity.keras.strip_pruning(model_for_pruning)

print("final model")
model_for_export.summary()

print("\n")
print("Size of gzipped pruned model without stripping: %.2f bytes" % (get_gzipped_model_size(model_for_pruning)))
print("Size of gzipped pruned model with stripping: %.2f bytes" % (get_gzipped_model_size(model_for_export)))

#
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
reshape_3 (Reshape)          (None, 28, 28, 1)         0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 26, 26, 12)        120       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 13, 13, 12)        0         
_________________________________________________________________
flatten_3 (Flatten)          (None, 2028)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                20290     
=================================================================
Total params: 20,410
Trainable params: 20,410
Non-trainable params: 0
_________________________________________________________________


Size of gzipped pruned model without stripping: 55570.00 bytes
Size of gzipped pruned model with stripping: 48518.00 bytes

我们可以看到稀疏操作的参数都通过strip_pruning去掉，恢复到了baseline的样子

模型大概有个×1.15的压缩，精度上面测过略有提升，不再赘述。

中间有个callback的应用跳过了，大致和keras中的callback用法差不多，一些on_epoch和on_train之类的函数可以用作调试点

提高修剪模型的准确性Tips:

修剪模型时学习率不宜过高或过低（- - 有点废话的意思）把修剪视为一个超参数；
作为快速测试，尝试设置begin_step=0去剪枝以达成稀疏度目标，这样可能得到好的结果；
把握剪枝频率（参数frequency），让模型有时间recover；
在Define model下去做自己的case。

Common mistake:

为了保留剪枝操作，须用.h5去load model而不是load weights；
剪枝结束去掉剪枝参数，用Strip_pruning或者gzip的压缩方法的一个就好了。