Text Understanding with the Attention Sum Reader Network

其他 2019-03-06 09:51:02 阅读次数: 0

版权声明：本文为博主原创文章，未经博主允许不得转载。 https://blog.csdn.net/LaineGates/article/details/79240232

关键词

Bi-GRU, Bi-LSTM, attention sum

来源

arXiv 2016.03.04 (published at ACL 2016)

问题

使用带attention的深度模型解决完型填空问题

技术细节

模型比attentive reader简单，分以下几步：

使用双向GRU/LSTM单元计算docment每个词的拼接词向量doc_endcoer
使用双向GRU/LSTM单元计算query正向尾词和反向首词的拼接词向量query_endcoer
计算doc_endcoer和query_endcoer的乘积，获得attention_res，并softmax(以保证值为正)
将attention_res中备选词的attention分别累和（论文的关键所在，成为之后完型填空的深度模型的必备结构）
计算交叉熵并更新梯度
如图：

模型实现关键点

由于document长，大约600+/700+，有极个别更长的，这就导致之后训练时document的gradients很大，占用很多内存，笔者的11G显存经常报不够用。所以document长度700内就够了，batch_size设32基本就到极限了
计算准确率时，要计算本epoch内累积准确，而不能以batch为单位，否则会出现准确不断跳动的情况，让人以为训练有错
第5步计算交叉熵时，不能再计算第二次softmax，要计算normalize；即假设第4步输出为 $outputs$ ，那么 $y_{p r e d i c t} = o u t p u t s / \sum (o u t p u t s)$ $y_{predict}=outputs/\sum(outputs)$ $c r o s s E n t r o p y = - \sum (y * t f . l o g (y_{p r e d i c t}))$ $crossEntropy=-\sum(y*tf.log(y_{predict}))$
因为第3步计算attention_res已经是softmax过的，其内所有值都属于 $[0,1)$ ，document长度为700左右，每个值大约都是千分之几到百分之几，这些数再softmax之后，基本成了平均数，比如 $e^{0.005}\approx1.004$ 。

实现代码

Theano版本
 tensorflow版本

猜你喜欢

转载自blog.csdn.net/LaineGates/article/details/79240232

Text Understanding with the Attention Sum Reader Network

Text Understanding with the Attention Sum Reader Network翻译

Understanding Safari Reader

Question Directed Graph Attention Network for Numerical Reasoning over Text

Understanding Hadoop Clusters and the Network

Co-attention network with label embedding for text classification，Neurocomputing2022

[翻译] understanding Linux Network internals

Understanding TCP/IP Network Stack

Residual Attention Network 翻译

Residual Attention Network

HAN（Hierarchical Attention Network）

Message Passing Attention Networks for Document Understanding

ECO: Efficient Convolutional Network for Online Video Understanding

Pyramid Attention Network for Semantic Segmentation

Residual attention network for image classification

residual attention network 论文解读

Dual attention network for scene segmentation

【博文笔记】AoA Reader_Attention-over-Attention Neural Networks for Reading Comprehension

Text Level Graph Neural Network for Text Classification

Reader

Hierarchical Attention Network for Document Classification阅读笔记

「Computer Vision」Notes on Residual Attention Network

《17.Residual Attention Network for Image Classification》

Residual Attention Network for Image Classification 论文阅读

Dual Attention Network for Scene Segmentation讲解

Harmonious Attention Network for Person Re-Identification

Sequential Recommender System based on Hierarchical Attention Network

Residual Attention Network——TensorFlow低阶API实现

Paper | Residual Attention Network for Image Classification

文章阅读：Dual Attention Network for Scene Segmentation

今日推荐

周排行

Leetcode简单题61~80

解决zookeeper磁盘IO高的问题

多线程相关方法详解

Maven-setting.xml文件详解

Maven 项目的 classpath 理解

渊亭科技大数据笔试题

配置JVM内存分配

计算机网络个人学习笔记（三）网络层：第三部分连载

js中两个等号(==)和三个等号(===)的区别

用C程序自动打开电脑上的程序

每日归档

更多

2024-09-18(0)

2024-09-17(0)

2024-09-16(0)

2024-09-15(0)

2024-09-14(0)

2024-09-13(0)

2024-09-12(0)

2024-09-11(0)

2024-09-10(0)

2024-09-09(0)