1、导入模块
另外部分导入的在这里
from keras.layers import LSTM
2、预处理数据
参考这里
3、搭建模型
对于LSTM神经块输出的shape:
如果return_sequences=True:返回形如(samples,timesteps,output_dim)的3D张量
否则,返回形如(samples,output_dim)的2D张量
在keras开发文档中写明:
to stack recurrent layers, you must use return_sequences=True
为了堆叠循环图层,你需要使用return_sequences=True
到最后的全连接层就不再需要堆叠,所以不使用return_sequences=True
model = Sequential()
model.add(Embedding(vocab_size, 64, input_length = maxword))
model.add(LSTM(128, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(64, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(32))
model.add(Dropout(0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
print(model.summary())
运行结果:
Layer (type) Output Shape Param #
=================================================================
embedding_4 (Embedding) (None, 400, 64) 5669568
_________________________________________________________________
lstm_4 (LSTM) (None, 400, 128) 98816
_________________________________________________________________
dropout_6 (Dropout) (None, 400, 128) 0
_________________________________________________________________
lstm_5 (LSTM) (None, 400, 64) 49408
_________________________________________________________________
dropout_7 (Dropout) (None, 400, 64) 0
_________________________________________________________________
lstm_6 (LSTM) (None, 32) 12416
_________________________________________________________________
dropout_8 (Dropout) (None, 32) 0
_________________________________________________________________
dense_10 (Dense) (None, 1) 33
=================================================================
Total params: 5,830,241
Trainable params: 5,830,241
Non-trainable params: 0
_________________________________________________________________
参数比起卷积网络并没有多少增减
4、训练模型
model.fit(X_train, Y_train, validation_data=(X_test, Y_test), epochs=5, batch_size=100)
运行结果:
Train on 10000 samples, validate on 1000 samples
Epoch 1/5
10000/10000 [==============================] - 265s 26ms/step - loss: 0.4772 - acc: 0.7861 - val_loss: 0.4103 - val_acc: 0.8300
Epoch 2/5
10000/10000 [==============================] - 275s 28ms/step - loss: 0.2904 - acc: 0.8900 - val_loss: 0.3819 - val_acc: 0.8480
Epoch 3/5
10000/10000 [==============================] - 270s 27ms/step - loss: 0.1899 - acc: 0.9345 - val_loss: 0.3689 - val_acc: 0.8480
Epoch 4/5
10000/10000 [==============================] - 270s 27ms/step - loss: 0.1305 - acc: 0.9580 - val_loss: 0.4750 - val_acc: 0.8570
Epoch 5/5
10000/10000 [==============================] - 265s 26ms/step - loss: 0.0855 - acc: 0.9718 - val_loss: 0.6072 - val_acc: 0.8110
训练集准确率为97%,测试集为81%,存在过拟合现象
运行速度较慢,但是迭代的速度较快
尝试把dropout调为0.5
model = Sequential()
model.add(Embedding(vocab_size, 64, input_length = maxword))
model.add(LSTM(128, return_sequences=True))
model.add(Dropout(0.5))
model.add(LSTM(64, return_sequences=True))
model.add(Dropout(0.5))
model.add(LSTM(32))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
print(model.summary())
运行结果:
Train on 10000 samples, validate on 1000 samples
Epoch 1/5
10000/10000 [==============================] - 270s 27ms/step - loss: 0.6315 - acc: 0.6381 - val_loss: 0.4503 - val_acc: 0.8000
Epoch 2/5
10000/10000 [==============================] - 267s 27ms/step - loss: 0.4039 - acc: 0.8414 - val_loss: 0.4800 - val_acc: 0.8010
Epoch 3/5
10000/10000 [==============================] - 265s 27ms/step - loss: 0.2697 - acc: 0.9068 - val_loss: 0.4014 - val_acc: 0.8330
Epoch 4/5
10000/10000 [==============================] - 264s 26ms/step - loss: 0.1843 - acc: 0.9403 - val_loss: 0.4198 - val_acc: 0.8550
Epoch 5/5
10000/10000 [==============================] - 266s 27ms/step - loss: 0.1322 - acc: 0.9581 - val_loss: 0.5904 - val_acc: 0.8300
过拟合现象有一定的改善,准确率也上升了2%,再修改一下测试集的数量应该会有一定的效果。
总结:
本章介绍了不同种类的神经网络,有多层神经网络(MLP),卷积神经网络(CNN)和长短记忆模型(LSTM)。它们的共同点是有很多参数,需要通过后向传播来更新参数。CNN和LSTM作为神经网络的不同类型的模型,需要的参数相对较少,这也反映了它们的一个共性:参数共享。这和传统的机器学习原理很类似:对参数或者模型加的限制越多,模型的自由度越小,越不容易过度拟合。反过来,模型参数越多,模型越灵活,越容易拟合噪声,从而对预测造成负面影响。通常,我们通过交叉验证技术选取最优参数(比如,几层模型、每层节点数、Dropout概率等)。最后需要说明的是,情感分析本质是一个分类问题,是监督学习的一种。