Representation 1 - Pre-training Methods (9/20/2022)
Content:
- Simple overview of multi-task learning
- Sentence embeddings
- BERT and variants
- Other language modeling objectives
Reading Material
- Highly Recommended Reading: Illustrated BERT (Alammar 2019)
- Reference: Language Model Transfer (Dai et al. 2015)
- Reference: ELMo: Deep Contextualized Word Representations (Peters et al. 2018)
- Reference: Sentence BERT (Reimers and Gurevych 2019)
- Reference: BERT: Bidirectional Transformers (Devlin et al. 2018)
- Reference: RoBERTa: Robustly Optimized BERT (Liu et al. 2019)
- Reference: XLNet: Autoregressive Training w/ Permutation Objectives (Yang et al. 2019)
- Reference: ELECTRA: Pre-training Text Encoders as Discriminators (Clark et al. 2020)
- Reference: GPT-3 (Brown et al. 2020)
- Reference: PaLM (Chowdhery et al. 2020)
- Reference: Should we be Pre-training? (Dery et al. 2021)
- Reference: Automating Auxiliary Learning (Dery et al. 2022)
Slides: Pre-training Slides
Sample Code: Pre-training Code Examples