Language Modeling, Efficiency/Training Tricks (1/16/2020)
Content: (Concept Progress)
- Language Modeling (task-lm)
- Feed-forward Neural Network Language Models
- Methods to Prevent Overfitting (reg-dropout, reg-stopping, reg-patience)
- Mini-batching (optim-sgd)
- Automatic Optimization: Automatic Minibatching and Code-level Optimization
- Other Optimizers (optim-momentum, optim-adagrad, optim-adadelta, optim-rmsprop, optim-adam)
- Measuring Language Model Performance: Accuracy, Likelihood, and Perplexity
Reading Material
- Highly Recommended: Goldberg Book Chapters 8-9
- Reference: Goldberg Book Chapters 6-7 (because CS11-711 is a pre-requisite, I will assume you know most of this already, but it might be worth browsing for terminology, etc.)
- Reference: Maximum entropy (log-linear) language models. (Rosenfeld 1996)
- Reference: A Neural Probabilistic Language Model. (Bengio et al. 2003, JMLR)
- Reference: An Overview of Gradient Descent Algorithms. (Ruder 2016)
- Reference: The Marginal Value of Adaptive Gradient Methods. (Wilson et al. 2017)
- Reference: Stronger Baselines for Neural MT. (Denkowski Neubig 2017)
- Reference: Dropout. (Srivastava et al. 2014)
- Reference: Dropconnect. (Wan et al. 2013)
- Reference: Marginal Value of Adaptive Gradient Methods (Wilson et al. 2017)
- Reference: Using the Output Embedding. (Press and Wolf 2016)
- Reference: Regularizing and Optimizing LSTM Language Models. (Merity et al. 2019)
Slides: LM Slides
Sample Code: LM Code Examples