中文短文本分类实例七-DPCNN（Deep Pyramid Convolutional Neural Networksfor Text Categorization）

一.概述

DPCNN（Deep Pyramid Convolutional Neural Networksfor Text Categorization），是RieJohnson等提出的一种深度卷积神经网络，可以称之为"深度金字塔卷积神经网络"。

在DPCNN的工作之前，研究者们认为，word-level词级embedding优于char-level字级（严格意义上来说，elmo、gpt-2、bert、xlnet等是字(char)、词(word)和句子(sentence)级别的混合体，所以不算）；CNN在图像领域的成功也表明，深度卷积神经网络能够提取更加复杂和高级的特征，尤其是深度残差网络(ResNet)等的流行。那么，深度卷积神经网络在自然语言处理NLP领域，究竟有没有优势呢?

elmo、gpt-2、bert和xlNet的出现，说明深度神经网络在自然语言处理NLP领域，也是很有必要的，不过，这个主角可能不是CNN，而是Attention。

唉，还是NLP任务的字符、词等是离散的数据结构呀，这和图像连续的像素、颜色等具有本质的区别。

闲话不多说，我们还是来说说卷积神经网络CNN和DPCNN吧。

通过TextCNN、DCNN、Bi-LSTM，我们已经知道，句子顺序对于自然语言处理NLP任务具有很非常重要的作用，从直观上看，道理也是这样子的，尤其是对于短文本来说。

在DPCNN之前，研究者们已经证明在char-level的深度卷积神经网络中，更深的模型效果更好，不过，由于word词太多、深度网络太复杂，常常会导致模型过大、参数太多、运行较慢、梯度爆炸和消失等问题。

DPCNN实现项目地址:

https://github.com/yongzhuo/Keras-TextClassification/tree/master/keras_textclassification/m07_TextDPCNN

DPCNN（Word-level deep pyramid CNN）便可以解决掉它们。

DPCNN特点:

1. 在不增加特征数量的情况下进行降采样（Downsampling without increasing the number of feature maps）

就是说

2. 预激活、恒等映射实现直连接（Shortcut connections with pre-activation and identity mapping ）

3. 通过无监督训练的词向量等进行文本区域嵌入 （Text region embedding enhanced with unsupervised embeddings ）

二.DPCNN原理图

2.1 DPCNN网络与shallowCNN、ResNet比较

2.2 下面具体说下结构

DPCNN主要由 A. Redion embedding层(文本区域嵌入层)、

B. 两个convolution block（每层block由两个固定卷积核为3的conv卷积函数构成）（两个block构建的层可以通过pre-activation直接连接）、

C. Repeat结构，与B很相似，只不过在conv之前、pre-activate之后加了个Max-polling层

2.2.1 卷积（等长卷积）

不同于TextCNN的窄卷积(VALID)，也不同于DCNN中的宽卷积(wide)，DPCNN中的卷积使用等长卷积(SAME)，顾名思义，就是输出的卷积长度与输入的一样，步长一样是1，不同的是padding补零，为两端补零pad_size=(m-1)/2 ，m为卷积核尺寸。

2.2.2 池化（Downsampling and polling）

等长卷积后，再固定feature maps的数量（减少计算量等），再进行池化。池化，在每个卷积块后（block）结束之后，对特征合集做一个池化，池化用的是最大池化(pool_size=3, stride=2)，使每个卷积核的维度减半，形成一个金字塔结构。你可以发现，很多文本匹配相似度计算也是会用到该方法。因为这种压缩式的Downsampling，可以拼接等实现文本远距离信息匹配于对应，还是有一定效果的。

2.2.3 近路连接（shortcut connect）

ResNet中思想，shortcut connect残差连接等。使用加法的z+f(z)等长卷积进行近路连接，从而极大地避免了梯度消失问题，右使用线性(Linear)激活函数(activate)，降低计算复杂度等。

三.DPCNN代码实现

3.1 代码并不复杂，卷积块conv block很好实现，shortcut connect有点小麻烦，不过还好

问题在于DPCNN的深度与Padding后的最大句子长度有关系，例如len_max=64=2^5，也就是说长度为64的句子只够5次pooling/2，就是那个金字塔结构。

从这个意义上来说，DPCNN更加适合长文本。

DPCNN实现项目地址:

https://github.com/yongzhuo/Keras-TextClassification/blob/master/keras_textclassification/m07_TextDPCNN/graph.py

3.2 核心代码:

class DPCNNGraph(graph):
    def __init__(self, hyper_parameters):
        """
            初始化
        :param hyper_parameters: json，超参
        """
        self.l2 = hyper_parameters['model'].get('l2', 0.0000032)
        self.pooling_size_strides = hyper_parameters['model'].get('pooling_size_strides', [3, 2])
        self.dropout_spatial = hyper_parameters['model'].get('droupout_spatial', 0.2)
        self.activation_conv = hyper_parameters['model'].get('activation_conv', 'linear')
        self.layer_repeats = hyper_parameters['model'].get('layer_repeats', 5)
        self.full_connect_unit = hyper_parameters['model'].get('self.full_connect_unit', 256)
        super().__init__(hyper_parameters)

    def create_model(self, hyper_parameters):
        """
            构建神经网络, 参考 https://blog.csdn.net/dqcfkyqdxym3f8rb0/article/details/86662906
        :param hyper_parameters:json,  hyper parameters of network
        :return: tensor, moedl
        """
        super().create_model(hyper_parameters)
        embedding_output = self.word_embedding.output
        embedding_output_spatial = SpatialDropout1D(self.dropout_spatial)(embedding_output)

        # 首先是 region embedding 层
        conv_1 = Conv1D(self.filters_num,
                            kernel_size=1,
                            padding='SAME',
                            kernel_regularizer=l2(self.l2),
                            bias_regularizer=l2(self.l2),
                            activation=self.activation_conv,
                            )(embedding_output_spatial)
        conv_1_prelu = PReLU()(conv_1)
        block = None
        layer_curr = 0
        for i in range(self.layer_repeats):
            if i == 0: # 第一层输入用embedding输出的结果作为输入
                block = self.ResCNN(embedding_output_spatial)
                block_add = Add()([block, conv_1_prelu])
                block = MaxPooling1D(pool_size=self.pooling_size_strides[0],
                                     strides=self.pooling_size_strides[1])(block_add)
            elif self.layer_repeats - 1 == i or layer_curr == 1: # 最后一次repeat用GlobalMaxPooling1D
                block_last = self.ResCNN(block)
                # ResNet(shortcut连接|skip连接|residual连接), 这里是shortcut连接. 恒等映射, block+f(block)
                block_add = Add()([block_last, block])
                block = GlobalMaxPooling1D()(block_add)
                break
            else: # 中间层 repeat
                if K.int_shape(block)[1] // 2 < 8: # 防止错误, 不能pooling/2的情况, 就是说size >= 2
                    layer_curr = 1
                block_mid = self.ResCNN(block)
                block_add = Add()([block_mid, block])
                block = MaxPooling1D(pool_size=self.pooling_size_strides[0],
                                     strides=self.pooling_size_strides[1])(block_add)

        # 全连接层
        output = Dense(self.full_connect_unit, activation='linear')(block)
        output = BatchNormalization()(output)
        #output = PReLU()(output)
        output = Dropout(self.dropout)(output)
        output = Dense(self.label, activation=self.activate_classify)(output)
        self.model = Model(inputs=self.word_embedding.input, outputs=output)
        self.model.summary(120)

    def ResCNN(self, x):
        """
            repeat of two conv， ca
        :param x: tensor, input shape
        :return: tensor, result of two conv of resnet
        """
        # pre-activation
        # x = PReLU()(x)
        x = Conv1D(self.filters_num,
                                kernel_size=1,
                                padding='SAME',
                                kernel_regularizer=l2(self.l2),
                                bias_regularizer=l2(self.l2),
                                activation=self.activation_conv,
                                )(x)
        x = BatchNormalization()(x)
        #x = PReLU()(x)
        x = Conv1D(self.filters_num,
                                kernel_size=1,
                                padding='SAME',
                                kernel_regularizer=l2(self.l2),
                                bias_regularizer=l2(self.l2),
                                activation=self.activation_conv,
                                )(x)
        x = BatchNormalization()(x)
        # x = Dropout(self.dropout)(x)
        x = PReLU()(x)
        return x

希望对你有所帮助!

大漠帝国

发布了96 篇原创文章 · 获赞 72 · 访问量 12万+

私信关注

中文短文本分类实例七-DPCNN（Deep Pyramid Convolutional Neural Networksfor Text Categorization）

猜你喜欢