Attention

Edit

General • 125 methods

Attention is a technique for attending to different parts of an input vector to capture long-term dependencies. Within the context of NLP, traditional sequence-to-sequence models compressed the input sequence to a fixed-length context vector, which hindered their ability to remember long inputs such as sentences. In contrast, attention creates shortcuts between the context vector and the entire source input. Below you will find a continuously updating list of attention based building blocks used in deep learning.

Subcategories

1 Attention Mechanisms

2 Attention Modules

Methods

Add a Method

Method	Year	Papers
Scaled Dot-Product Attention Attention Is All You Need	2017	17526
Multi-Head Attention Attention Is All You Need	2017	17434
Strided Attention Generating Long Sequences with Sparse Transformers	2019	1362
Fixed Factorized Attention Generating Long Sequences with Sparse Transformers	2019	1361
RAN Residual Attention Network for Image Classification	2017	261
Dot-Product Attention Effective Approaches to Attention-based Neural Machine Translation	2015	218
Temporal attention Jointly Attentive Spatial-Temporal Pooling Networks for Video-based Person Re-Identification	2017	217
Additive Attention Neural Machine Translation by Jointly Learning to Align and Translate	2014	196
Spatial Attention Module CBAM: Convolutional Block Attention Module	2018	177
Channel attention Squeeze-and-Excitation Networks	2017	166
Visual Attention InferNER: an attentive model leveraging the sentence-level information for Named Entity Recognition in Microblogs	2021	162
SAGAN Self-Attention Module Self-Attention Generative Adversarial Networks	2018	135
SPIN Learning to Reconstruct Missing Data from Spatiotemporal Graphs with Sparse Observations	2022	113
RAM Recurrent Models of Visual Attention	2014	105
Sliding Window Attention Longformer: The Long-Document Transformer	2020	84
Cross-Attention Module CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification	2021	81
Global and Sliding Window Attention Longformer: The Long-Document Transformer	2020	78
Dilated Sliding Window Attention Longformer: The Long-Document Transformer	2020	77
FAVOR+ Rethinking Attention with Performers	2020	71
Blender BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation	2020	70
Channel Attention Module CBAM: Convolutional Block Attention Module	2018	68
Disentangled Attention Mechanism DeBERTa: Decoding-enhanced BERT with Disentangled Attention	2020	66
Dynamic Convolution Dynamic Convolution: Attention over Convolution Kernels	2019	66
Axial Attention Axial Attention in Multidimensional Transformers	2019	52
Graph Self-Attention BP-Transformer: Modelling Long-Range Context via Binary Partitioning	2019	48
Rendezvous Rendezvous: Attention Mechanisms for the Recognition of Surgical Action Triplets in Endoscopic Videos	2021	45
Global-Local Attention ETC: Encoding Long and Structured Inputs in Transformers	2020	43
Attention Gate Attention U-Net: Learning Where to Look for the Pancreas	2018	39
Slot Attention Object-Centric Learning with Slot Attention	2020	39
LAMA Low Rank Factorization for Compact Multi-Head Self-Attention	2019	37
Channel-wise Soft Attention	2017	37
STN Spatial Transformer Networks	2015	33
Content-based Attention Neural Turing Machines	2014	32
Class Attention Going deeper with Image Transformers	2021	32
Deformable Attention Module Deformable DETR: Deformable Transformers for End-to-End Object Detection	2020	32
Location-based Attention Effective Approaches to Attention-based Neural Machine Translation	2015	31
Location Sensitive Attention Attention-Based Models for Speech Recognition	2015	24
BAM BAM: Bottleneck Attention Module	2018	24
TAM TAM: Temporal Adaptive Module for Video Recognition	2020	24
Spatial-Reduction Attention Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions	2021	23
SRM SRM: A Style-Based Recalibration Module for Convolutional Neural Networks	2019	22
SCA SCAM! Transferring humans between images with Semantic Cross Attention Modulation	2022	22
Coordinate attention Coordinate Attention for Efficient Mobile Network Design	2021	21
Adaptive Masking Adaptive Attention Span in Transformers	2019	20
CBAM CBAM: Convolutional Block Attention Module	2018	20
Highway networks Highway Networks	2015	19
DMA Text-Guided Neural Image Inpainting	2020	19
LSH Attention Reformer: The Efficient Transformer	2020	17
Multi-Head Linear Attention Linformer: Self-Attention with Linear Complexity	2020	17
Set Transformer Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks	2018	16
BigBird Big Bird: Transformers for Longer Sequences	2020	14
Neighborhood Attention Neighborhood Attention Transformer	2022	13
SEAM Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation	2020	11
Multi-DConv-Head Attention Primer: Searching for Efficient Transformers for Language Modeling	2021	11
Global Context Block GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond	2019	11
DV3 Attention Block Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning	2017	9
DANet Dual Attention Network for Scene Segmentation	2018	9
Bi-attention Bilinear Attention Networks for Person Retrieval	2019	9
Grouped-query attention GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints	2023	8
Multi-Attention Network Multi-Attention Network for One Shot Learning	2017	8
Routing Attention Efficient Content-Based Sparse Attention with Routing Transformers	2020	7
Multi-Query Attention Fast Transformer Decoding: One Write-Head is All You Need	2019	7
Bottleneck Transformer Block Bottleneck Transformers for Visual Recognition	2021	7
Spatial Attention-Guided Mask CenterMask : Real-Time Anchor-Free Instance Segmentation	2019	7
RGA Relation-Aware Global Attention for Person Re-identification	2019	7
Triplet Attention Rotate to Attend: Convolutional Triplet Attention Module	2020	7
Multiplicative Attention Effective Approaches to Attention-based Neural Machine Translation	2015	6
FGA Factor Graph Attention	2019	6
GCT Gated Channel Transformation for Visual Recognition	2019	6
GALA Learning what and where to attend	2018	6
Single-Headed Attention Single Headed Attention RNN: Stop Thinking With Your Head	2019	5
Cross-Covariance Attention XCiT: Cross-Covariance Image Transformers	2021	5
Channel-wise Cross Attention UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspective with Transformer	2021	4
scSE Recalibrating Fully Convolutional Networks with Spatial and Channel 'Squeeze & Excitation' Blocks	2018	4
Mixed Attention Block ConvBERT: Improving BERT with Span-based Dynamic Convolution	2020	4
SPNet Strip Pooling: Rethinking Spatial Pooling for Scene Parsing	2020	4
Attention Free Transformer An Attention Free Transformer	2021	3
Channel Squeeze and Spatial Excitation Recalibrating Fully Convolutional Networks with Spatial and Channel 'Squeeze & Excitation' Blocks	2018	3
Spatial & Temporal Attention An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data	2016	3
Feedback Memory Addressing Some Limitations of Transformers with Feedback Memory	2020	3
3D SA Attention based Writer Independent Handwriting Verification	2020	3
Re-Attention Module DeepViT: Towards Deeper Vision Transformer	2021	3
GPSA ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases	2021	3
Hopfield Layer Hopfield Networks is All You Need	2020	3
Branch attention Training Very Deep Networks	2015	3
Point-wise Spatial Attention PSANet: Point-wise Spatial Attention Network for Scene Parsing	2018	3
Sparse Sinkhorn Attention Sparse Sinkhorn Attention	2020	2
Concurrent Spatial and Channel Squeeze & Excitation Recalibrating Fully Convolutional Networks with Spatial and Channel 'Squeeze & Excitation' Blocks	2018	2
Weight excitation Weight Excitation: Built-in Attention Mechanisms in Convolutional Neural Networks		2
Global Sub-Sampled Attention Twins: Revisiting the Design of Spatial Attention in Vision Transformers	2021	2
FcaNet FcaNet: Frequency Channel Attention Networks	2020	2
LeViT Attention Block LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference	2021	2
Channel & Spatial attention Residual Attention Network for Image Classification	2017	2
Locally-Grouped Self-Attention Twins: Revisiting the Design of Spatial Attention in Vision Transformers	2021	2
Deformable ConvNets Deformable Convolutional Networks	2017	2
SCA-CNN SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning	2016	2
RMN Facial Expression Recognition using Residual Masking Network	2021	2
ECANet ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks	2019	2
Self-Calibrated Convolutions Improving Convolutional Networks With Self-Calibrated Convolutions	2020	2
Spatially Separable Self-Attention Twins: Revisiting the Design of Spatial Attention in Vision Transformers	2021	2
Class Activation Guided Attention Mechanism Rendezvous: Attention Mechanisms for the Recognition of Surgical Action Triplets in Endoscopic Videos	2021	2
Attention-augmented Convolution Attention Augmented Convolutional Networks	2019	2
Talking-Heads Attention Talking-Heads Attention	2020	1
Random Synthesized Attention Synthesizer: Rethinking Self-Attention in Transformer Models	2020	1
Gather-Excite Networks Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks	2018	1
GSoP-Net Global Second-order Pooling Convolutional Networks	2018	1
Dense Synthesized Attention Synthesizer: Rethinking Self-Attention in Transformer Models	2020	1
Attention Feature Filters NeuriCam: Key-Frame Video Super-Resolution and Colorization for IoT Cameras	2022	1
STA-LSTM An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data	2016	1
HyperSA Hypergraph Transformer for Skeleton-based Action Recognition	2022	1
DeLighT Block DeLighT: Deep and Light-weight Transformer	2020	1
SimAdapter Exploiting Adapters for Cross-lingual Low-resource Speech Recognition	2021	1
Fast Voxel Query Voxel Transformer for 3D Object Detection	2021	1
Attention Sinks Efficient Streaming Language Models with Attention Sinks	2023	1
Factorized Dense Synthesized Attention Synthesizer: Rethinking Self-Attention in Transformer Models	2020	1
HyperHyperNetwork HyperHyperNetworks for the Design of Antenna Arrays	2021	1
Quick Attention HistoSeg : Quick attention with multi-loss function for multi-structure segmentation in digital histology images	2022	1
Peer-attention AssembleNet++: Assembling Modality Representations via Attention Connections	2020	1
Compact Global Descriptor Compact Global Descriptor for Neural Networks	2019	1
MHMA Rendezvous: Attention Mechanisms for the Recognition of Surgical Action Triplets in Endoscopic Videos	2021	1
Cross-Scale Non-Local Attention Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining	2020	1
All-Attention Layer Augmenting Self-attention with Persistent Memory	2019	1
SortCut Sinkhorn Attention Sparse Sinkhorn Attention	2020	1
Segregated Attention Network InferNER: an attentive model leveraging the sentence-level information for Named Entity Recognition in Microblogs	2021	1
ProCAN ProCAN: Progressive Growing Channel Attentive Non-Local Network for Lung Nodule Classification	2020	1
Factorized Random Synthesized Attention Synthesizer: Rethinking Self-Attention in Transformer Models	2020	1
Differential attention for visual question answering	2000	0

Attention Edit

Methods Add a Method

Attention

Edit

Methods

Add a Method