Efficiency Tricks for Neural Nets (1/28/2020)
Content:
- Softmax Approximations: Negative Sampling, Hierarchical Softmax
- Parallel Training
- Tips for Training on GPUs
Reading Material
- Highly Recommended Reading: Notes on Noise Contrastive Estimation and Negative Sampling (Dyer 2014)
- Reference: Importance Sampling (Bengio and Senécal, 2003)
- Reference: Noise Contrastive Estimation (Mnih and Teh, 2012)
- Reference: Negative Sampling (Goldberg and Levy, 2014)
- Reference: Mini-batching Sampling-based Softmax Approximations (Zoph et al., 2015)
- Reference: Class-based Softmax (Goodman 2001)
- Reference: Hierarchical Softmax (Morin and Bengio 2005)
- Reference: Error Correcting Codes (Dietterich and Bakiri 1995)
- Reference: Binary Code Prediction for Language (Oda et al. 2017)
- Reference: Seq2seq w/ Continuous Outputs (Kumar and Tsvetkov 2019)
Slides (from 2019): Efficiency Slides
Sample Code: Efficiency Code Examples