论文解读：Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for VQA - 代码天地

论文解读：Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for VQA

其他 2018-12-09 14:27:34 阅读次数: 0

版权声明：本文为博主原创文章，未经博主允许不得转载。 https://blog.csdn.net/u014248127/article/details/84887304

这是关于VQA问题的第五篇系列文章。本篇文章将介绍论文：主要思想；模型方法；主要贡献。有兴趣可以查看原文：Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering。

1，主要思想：

论文采用基于空间（图像）的记忆网络（记忆网络是NLP领域中的模型，用于处理逻辑推理的问题）。Spatial Memory Network把图像存区域当做记忆单元的内容，然后用问题去选择相关的区域回答问题。论文同时采用多次attention，模拟寻找答案的推理过程。

2模型：

模型的结构和记忆网络的结构很相似：End to End Memory Network
在这里插入图片描述

a.问题特征部分：

这里处理的很少，只是用词向量做embedding，得到句子的词向量矩阵。shape：（T,N）T是问题长度。

b.图像特征部分：

这里处理的也很少，用CNN提取各个区域的特征，GoogLeNet (inception 5b=output)。shape：（L,M）L是特征个数。
之后为了使得图像特征和问题特征维度一样，采用了两个矩阵进行变换，W_a,W_e。

c.Word Guided Spatial Attention in One-Hop Model（一次attention）：

Word-guided attention：图b中，用单词词向量去计算与图像的相关性。计算过程就是选择关系最大的，然后用softmax进行归一化。（公式符号对应图中）
计算第一次attention的结果：如图a
可以用这一次的attention的结果，加上问题进行预测了：如图a

d.Spatial Attention in Two-Hop Model（多次attention，模拟推理）

计算第一次attention的结果，加上问题：如图a
计算下一次attention的权重：如图a
计算这一次attention的结果：
预测答案：

3，论文贡献：

提出使用Spatial Memory Network，模拟多次关注的寻找答案的推理过程。
在第一次attention时，提出了用每一个单词去计算与图像的相关性。从而实现第一次的相关性权重的计算。
实现了，多次attention，很好的结合的每次计算的结果用于答案的预测，从而实现模拟推理过程。

猜你喜欢

转载自blog.csdn.net/u014248127/article/details/84887304

论文解读：Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for VQA

论文解析：Motion Guided Spatial Attention for Video Captioning

Show, Attend and Translate: Unsupervised Image Translation with Self-Regularization and Attention 解读

论文笔记：Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

论文：Show, Attend and Tell: Neural Image Caption Generation with Visual Attention-阅读总结

[VQA论文阅读]RUBi Reducing Unimodal Biases for Visual Question Answering

论文解读：STANet | A Spatial-Temporal Attention-Based Method and a New Dataset for Remote Sensing Image

论文阅读笔记（三十六）【AAAI2020】：Relation-Guided Spatial Attention and Temporal Reﬁnement for Video-based Person Re-Identiﬁcation

Exploring Models and Data for Image Question Answering 论文翻译

论文解读：A Focused Dynamic Attention Model for Visual Question Answering

论文解读：Stacked Attention Networks for Image Question Answering

《Stacked Attention Networks for Image Question Answering》论文解读与实验

VQA（Visual Question Answering）技术

CBAM: Convolutional Block Attention Module—— channel attention + spatial attention

【SCA-CNN 解读】空间与通道注意力：Spatial and Channel-wise Attention

《SCA-CNN：Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning》论文笔记

《SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning》论文笔记

A Multi-scale Spatial-temporal Attention Model for Person Re-identification in Videos 论文记录总结

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

（六十）：Show, Attend and Tell Neural Image Caption Generation with Visual Attention

VQA

论文浅尝 | Generative QA: Learning to Answer the Whole Question

2022 CVPR VQA相关论文

Answer my question

[Image Caption系列(1)] Show attend and tell论文解读

注意力之spatial attention

tensorflow——attention机制(Spatial and Channel-Wise Attention )

Relation-Aware Graph Attention Network for Visual Question Answering论文解读

论文解读：Question Answering over Knowledge Base with Neural Attention Combining Global Knowledge Info...

GAN注意力机制研究——SPA-GAN: Spatial Attention GAN for Image-to-Image Translation 论文阅读笔记

今日推荐

周排行

Leetcode简单题61~80

解决zookeeper磁盘IO高的问题

多线程相关方法详解

Maven-setting.xml文件详解

Maven 项目的 classpath 理解

渊亭科技大数据笔试题

配置JVM内存分配

计算机网络个人学习笔记（三）网络层：第三部分连载

js中两个等号(==)和三个等号(===)的区别

用C程序自动打开电脑上的程序

每日归档

更多

2024-09-18(0)

2024-09-17(0)

2024-09-16(0)

2024-09-15(0)

2024-09-14(0)

2024-09-13(0)

2024-09-12(0)

2024-09-11(0)

2024-09-10(0)

2024-09-09(0)