大体认识:一文读懂注意力机制 - 知乎 (zhihu.com)
自注意力机制主要是key-value,query怎么选,self-attention mechanism有自己的一套选法。
细节认识:[1,2,3,4,5]
Self-Attention Layer的出现原因
为了解决RNN、LSTM等常用于处理序列化数据的网络结构无法在GPU中并行加速计算的问题。
Transformer-讲这个之前得先讲讲multihead,Transformer架构其实和之前讲的Seq2Seq一致,只是把Seq2Seq里面得Rnn换成了transformer block(就像Resnet50里面的Residual block一样)。
Transformer论文逐段精读【论文精读】 - 哔哩哔哩 //gOOd notes
Transformer论文逐段精读【论文精读】_哔哩哔哩_bilibiliM
Attention Is All You Need_夏末的初雪的博客-CSDN博客_attention is
Attention is all u need
Multi-head attention
Self-Attention
Layer Norm
Batch Norm
Every column is a feature,batch norm do every column by normalization.
Like the blue rectangle shape
Sample length
Decoder
Layernorm
Attention
from left to right
The same way,
Scaled dot-product attention (缩放点积注意力)
n rows,dimension-dk
Key
ZHENDENIUPI:parallel computing save time cost,improve efficiency
Optional
Concretelly speaking
FPN,forward-feed posiontional-wise network
for Semantic representation ,have weights to learn for better expression.
posiontional embedding:Attention haven't posiontional func,so we need p-e for input.
better details for the rest :
Transformer论文逐段精读【论文精读】 - 哔哩哔哩
at 3.3 Position-wise Feed-Forward Networks 作者:BeBraveBeCurious https://www.bilibili.com/read/cv13759416 出处:bilibili
笔记忘保存了···,以后再填吧
参考资料
[2]64 注意力机制【动手学深度学习v2】_哔哩哔哩_bilibili
[3]65 注意力分数【动手学深度学习v2】_哔哩哔哩_bilibili
[4] 66 使用注意力机制的seq2seq【动手学深度学习v2】_哔哩哔哩_bilibili
[5]67 自注意力【动手学深度学习v2】_哔哩哔哩_bilibili
一文读懂注意力机制 - 知乎 (zhihu.com) //总结篇
Pytorch 图像处理中注意力机制的代码详解与应用(Bubbliiiing 深度学习 教程)_哔哩哔哩_bilibili