一.概述
DeepMoji(Using millions of emojio ccurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm),是Bjarke Felbo等提出的一种联合Bi-LSTM和Attention的混合神经网络,对表情符号的情绪识别棒,当然在文本分类任务中表现也不错。
情感识别,尤其是互联网时代社交网络上的情绪识别,无疑是丰富多彩的,尤其是各种网络语言的使用(表情、颜文字、火星文......),使得各种特殊字符等在网络上有了新的意义,而很多时候,这些特殊符号和表情等,往往能够表现出用户真正的情感和意图。这篇Deepmoji论文就是从这个问题入手的,看起来也很有意思。
实践了一下,抛开各种tricks不谈,其实DeepMoji也没什么了,模型更加可以追溯到RCNN,甚至可以看成是将CNN换成了Attention罢了,虽然有些差别。可见选题的有趣程度也是很有意思的,个人之见。
github项目地址:
二. DeepMoji模型原理等
2.1 DeepMoji模型图
2.2 DeepMoji模型详解
Embedding后接两层Bi-LSTM,然后再将这三层的输出拼接,到Attention,再接一个sofrmax,也就是那么回事,应该是一个比较简单的模型吧。论文还把许多笔墨放到微调fine-turned上,提出了一个"the chain-thaw transfer learning approach",大致意思就是顺序固定模型层,训练,看不出有多新的,至少我看不出来。
最后比较什么的也只是和FastText等少数几个模型对比,姑且相信你是因为其他模型训练时间太久了吧,从paperweekly看到的,github上的star也好多了,看得出来大家都比较喜欢有意思的事情呀,不过也算是个。
三. DeepMoji代码实现
3.1 很简单的一个模型,两层LSTM,再加上个Attention层
3.2 核心代码
def create_model(self, hyper_parameters):
"""
构建神经网络, a bit like RCNN, R
:param hyper_parameters:json, hyper parameters of network
:return: tensor, moedl
"""
super().create_model(hyper_parameters)
x = self.word_embedding.output
x = Activation('tanh')(x)
# entire embedding channels are dropped out instead of the
# normal Keras embedding dropout, which drops all channels for entire words
# many of the datasets contain so few words that losing one or more words can alter the emotions completely
x = SpatialDropout1D(self.dropout_spatial)(x)
if self.rnn_units=="LSTM":
layer_cell = LSTM
elif self.rnn_units=="GRU":
layer_cell = GRU
elif self.rnn_units=="CuDNNLSTM":
layer_cell = CuDNNLSTM
elif self.rnn_units=="CuDNNGRU":
layer_cell = CuDNNGRU
else:
layer_cell = GRU
# skip-connection from embedding to output eases gradient-flow and allows access to lower-level features
# ordering of the way the merge is done is important for consistency with the pretrained model
lstm_0_output = Bidirectional(layer_cell(units=self.rnn_units,
return_sequences=True,
activation='relu',
kernel_regularizer=regularizers.l2(self.l2),
recurrent_regularizer=regularizers.l2(self.l2)
), name="bi_lstm_0")(x)
lstm_1_output = Bidirectional(layer_cell(units=self.rnn_units,
return_sequences=True,
activation='relu',
kernel_regularizer=regularizers.l2(self.l2),
recurrent_regularizer=regularizers.l2(self.l2)
), name="bi_lstm_1")(lstm_0_output)
x = concatenate([lstm_1_output, lstm_0_output, x])
# if return_attention is True in AttentionWeightedAverage, an additional tensor
# representing the weight at each timestep is returned
weights = None
x = AttentionWeightedAverage(name='attlayer', return_attention=self.return_attention)(x)
if self.return_attention:
x, weights = x
x = Dropout(self.dropout)(x)
# x = Flatten()(x)
# 最后就是softmax
dense_layer = Dense(self.label, activation=self.activate_classify)(x)
output = [dense_layer]
self.model = Model(self.word_embedding.input, output)
self.model.summary(120)
希望对你有所帮助!