Representation 1 - Pre-training Methods (9/23/2021)
Content:
- Simple overview of multi-task learning
- Sentence embeddings
- BERT and variants
- Other language modeling objectives
Reading Material
- Highly Recommended Reading: Illustrated BERT (Alammar 2019)
- Reference: Eye Gaze + Summarization (Klerke et al. 2016)
- Reference: Word Representations (Turian et al. 2010)
- Reference: Language Model Transfer (Dai et al. 2015)
- Reference: Paraphrase Detection (Dolan et al. 2005)
- Reference: Semantic Relatedness (Marelli et al. 2014)
- Reference: Recognizing Textual Entailment (Dagan et al. 2006)
- Reference: Paraphrastic Sentence Embeddings (Wieting et al. 2015)
- Reference: Inference -> Generalization (Conneau et al. 2017)
- Reference: context2vec (Melamud et al. 2016)
- Reference: ELMo: Deep Contextualized Word Representations (Peters et al. 2018)
- Reference: Sentence BERT (Reimers and Gurevych 2019)
- Reference: BERT: Bidirectional Transformers (Devlin et al. 2018)
- Reference: RoBERTa: Robustly Optimized BERT (Liu et al. 2019)
- Reference: XLNet: Autoregressive Training w/ Permutation Objectives (Yang et al. 2019)
- Reference: ELECTRA: Pre-training Text Encoders as Discriminators (Clark et al. 2020)
- Reference: Inference -> Generalization (Conneau et al. 2017)
- Reference: Paraphrase ID -> Generalization (Wieting and Gimple 2018)
- Reference: Comparison of Training Objectives (Zhang and Bowman 2018)
Slides: Pre-training Slides
Sample Code: Pre-training Code Examples