Modeling 3 - Attention (9/21/2021)
Content:
- Attention
- What do We Attend To?
- Improvements to Attention
- Specialized Attention Varieties
- A Case Study: "Attention is All You Need"
Reading Material
- Required Reading: Neural Machine Translation and Sequence-to-Sequence Models Chapter 8
- Required Reading: The Annotated Transformer
- Reference: Attentional NMT (Bahdanau et al. 2015)
- Reference: Effective Approaches to Attention (Luong et al. 2015)
- Reference: Copying Mechanism (Gu et al. 2016)
- Reference: Attention-based Bias (Arthur et al. 2016)
- Reference: Attending to Images (Xu et al. 2015)
- Reference: Attending to Speech (Chan et al. 2015)
- Reference: Hierarchical Attention (Yang et al. 2016)
- Reference: Attending to Multiple Sources (Zoph and Knight 2015)
- Reference: Different Multi-source Strategies (Libovicky and Helcl 2017)
- Reference: Multi-modal Attention (Huang et al. 2016)
- Reference: Self Attention (Cheng et al. 2016)
- Reference: Attention is All You Need (Vaswani et al. 2017)
- Reference: Slow Transformer Decoding (Zhang et al. 2018)
- Reference: Transformer+RNN Hybrid Models (Chen et al. 2018)
- Reference: Training Transformers on Small Data (Nguyen and Salazar 2019)
- Reference: Structural Biases in Attention (Cohn et al. 2015)
- Reference: Coverage Embedding Models (Mi et al. 2016)
- Reference: Interpretability w/ Hard Attention (Lei et al. 2016)
- Reference: Supervised Attention (Mi et al. 2016)
- Reference: Attention vs. Alignment (Koehn and Knowles 2017)
- Reference: Attention is not Explanation (Jain and Wallace 2019)
- Reference: Learning to Deceive w/ Attention (Pruthi et al. 2020)
- Reference: Monotonic Attention (Yu et al. 2016)
- Reference: Convolutional Attention (Allamanis et al. 2016)
- Reference: Fine-grained Attention (Choi et al. 2018)
Slides: Attention Slides
Sample Code: Attention Code Examples