Debugging Neural Nets and Interpretable Evaluation (3/9/2021)
Content (joint lecture w/ Pengfei Liu)
- Identifying problems
- Debugging training time problems
- Debugging test time problems
- Interpretable evaluation
- Reference: T5: Larger Models are Better (Raffel et al. 2020)
- Reference: Scaling Laws for Neural Language Models (Kaplan et al. 2020)
- Reference: Train Large, then Distill (Li et al. 2020)
- Reference: Highway Networks (Srivastava et al. 2015)
- Reference: Residual Connections (He et al. 2015)
- Reference: Rethinking Generalization (Zhang et al. 2017)
- Reference: Marginal Value of Adaptive Gradient Methods (Wilson et al. 2017)
- Reference: Adam w/ Learning Rate Decay (Denkowski and Neubig 2017)
- Reference: Dropout (Srivastava et al. 2014)
- Reference: Recurrent Dropout (Gal and Ghahramani 2015)
- Reference: Minibatch Creation Strategies (Morishita et al. 2017)
- Reference: Decoding Problems (Koehn and Knowles 2017)
Slides: Debugging Slides
Slides: Interpretable Evaluation Slides
Video: Debugging/Interpretable Evaluation Video