CS 11-711: Advanced NLP

Modeling 3 - Attention (9/21/2021)

Content:

Attention
What do We Attend To?
Improvements to Attention
Specialized Attention Varieties
A Case Study: "Attention is All You Need"

Reading Material

Required Reading: Neural Machine Translation and Sequence-to-Sequence Models Chapter 8
Required Reading: The Annotated Transformer
Reference: Attentional NMT (Bahdanau et al. 2015)
Reference: Effective Approaches to Attention (Luong et al. 2015)
Reference: Copying Mechanism (Gu et al. 2016)
Reference: Attention-based Bias (Arthur et al. 2016)
Reference: Attending to Images (Xu et al. 2015)
Reference: Attending to Speech (Chan et al. 2015)
Reference: Hierarchical Attention (Yang et al. 2016)
Reference: Attending to Multiple Sources (Zoph and Knight 2015)
Reference: Different Multi-source Strategies (Libovicky and Helcl 2017)
Reference: Multi-modal Attention (Huang et al. 2016)
Reference: Self Attention (Cheng et al. 2016)
Reference: Attention is All You Need (Vaswani et al. 2017)
Reference: Slow Transformer Decoding (Zhang et al. 2018)
Reference: Transformer+RNN Hybrid Models (Chen et al. 2018)
Reference: Training Transformers on Small Data (Nguyen and Salazar 2019)
Reference: Structural Biases in Attention (Cohn et al. 2015)
Reference: Coverage Embedding Models (Mi et al. 2016)
Reference: Interpretability w/ Hard Attention (Lei et al. 2016)
Reference: Supervised Attention (Mi et al. 2016)
Reference: Attention vs. Alignment (Koehn and Knowles 2017)
Reference: Attention is not Explanation (Jain and Wallace 2019)
Reference: Learning to Deceive w/ Attention (Pruthi et al. 2020)
Reference: Monotonic Attention (Yu et al. 2016)
Reference: Convolutional Attention (Allamanis et al. 2016)
Reference: Fine-grained Attention (Choi et al. 2018)

Slides: Attention Slides
Sample Code: Attention Code Examples

<-- Back To Schedule