《Linking Image and Text with 2-Way Nets》 - 代码天地

《Linking Image and Text with 2-Way Nets》

其他 2018-08-20 00:40:37 阅读次数: 0

Linking Image and Text with 2-Way Nets

CVPR 2017

这篇文章可以为Corr-AE中的Corr-Cross-AE结构的一种拓展，另外文章中加入了很多的技巧和约束，且都有理论上的证明。在介绍这篇文章之前，先回顾一下Corr-Cross-AE结构。

1.文本和图像特征分别通过encoder映射到共同空间，然后用L2计算文本和图像之间相似性，得到correlation loss 2.将共同空间的文本decoder出来的特征与原图像特征做L2，而共享空间的图像decoder出来的特征与原文本做L2，得到representation loss 3.用两个和为1的超参数将两个correlation loss和representation loss相加

一、Introduction

文章提出了双向神经网络架构，匹配来自两个数据源的向量。采用两个绑定的神经网络通道，使用欧几里德损失将两个模态向量投影到一个共同的、最大相关的空间。

引入网络技巧：

Batch Normalization
Leaky ReLU
Locally Dense Layer（将维度大的向量分成几个维度小的向量）
Tied Dropout（乘每个元素都服从伯努利分布的随机矩阵，并引入尺度因子根号下0.5）

二、Model

两个数据源：visual data X 和external data source Y

图像特征：VGG得到4096维特征

文本特征：Fisher Vector 18000（GMM）+18000（HGLMM）=36000维特征

双向网络结构：每个通道将一个视图转化为另一个视图，提取中间特征使相似度最大。

两个通道分别为（如上图）：

图像→文本：4096→2000→3000→2000→36000

文本→图像：36000→2000→3000→2000→4096

选择中间层：

三、损失函数

文章是6个损失函数的叠加

分别为：

1.两端用 L2

2.中间层 j 用 L2

3.中间层用decorrelation regularization

4.全连接层中的参数做weight decay

5.对Batch Normalization中尺度参数做regularization

最后Loss Function为：

四、总结

采用双向网络结构，与大多数方法不同，使用欧几里德损失。
引入了一系列技巧和约束。

具体公式见文章。

猜你喜欢

转载自blog.csdn.net/qq_33373858/article/details/81509703

《Linking Image and Text with 2-Way Nets》

Linking(2)

深度学习论文（九）---DeepLabV2-Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution,

DeepLab-v2：Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully C

论文阅读笔记十：DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs (DeepLabv2)

SegLink（Detecting Oriented Text in Natural Images by Linking Segments）算法详解

convert Text To Image

text to image（六）:《AttnGAN》

text to image（四）:《Stackgan》

图片生成器Text2Image

【ShareCode | 微信小程序】Text2Image

dalle2：hierarchical text-conditional image generation with clip

DALLE·2（Hierarchical Text-Conditional Image Generation with CLIP Latents）

【deeplab】Semantic Image Segmentation with Deep Convolutional Nets and Fully

Non-local U-Nets for Biomedical Image Segmentation

Text2Video-Zero:Text-to-Image Diffusion Models are Zero-Shot Video Generators

OpenGL 笔记 <2> Compiling and Linking a shader program

text to image（二）:《Generative Adversarial Text to Image Synthesis》

text to image（五）:《StackGAN++》

98、Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models

A2-Nets: Double Attention Networks

Text2Video-Zero:Text-to-Image扩散模型是Zero-Shot视频生成器

Text to image论文精读 DM-GAN: Dynamic Memory Generative Adversarial Networks for t2i

从图片到文字：Midjourney 令人费解的 Image2Text 功能正在重塑创意景观

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video论文笔记

深度学习论文（八）---DeepLabV1-SEMANTIC IMAGE SEGMENTATION WITH DEEP CONVOLUTIONAL NETS AND FULLY CONNECTED C

Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval 论文笔记

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Con

「Medical Image Analysis」 Note on Mutually Local-global U-nets

论文笔记：DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution,and......

今日推荐

周排行

Leetcode简单题61~80

解决zookeeper磁盘IO高的问题

多线程相关方法详解

Maven-setting.xml文件详解

Maven 项目的 classpath 理解

渊亭科技大数据笔试题

配置JVM内存分配

计算机网络个人学习笔记（三）网络层：第三部分连载

js中两个等号(==)和三个等号(===)的区别

用C程序自动打开电脑上的程序

每日归档

更多

2024-09-18(0)

2024-09-17(0)

2024-09-16(0)

2024-09-15(0)

2024-09-14(0)

2024-09-13(0)

2024-09-12(0)

2024-09-11(0)

2024-09-10(0)

2024-09-09(0)