Efficiency Tricks for Neural Nets (2/1/2018)
Content:
- Softmax Approximations: Negative Sampling, Hierarchical Softmax
- Parallel Training
- Tips for Training on GPUs
Reading Material
- Highly Recommended Reading: Notes on Noise Contrastive Estimation and Negative Sampling (Dyer 2014)
- Reference: Importance Sampling (Bengio and Senécal, 2003)
- Reference: Noise Contrastive Estimation (Mnih and Teh, 2012)
- Reference: Negative Sampling (Goldberg and Levy, 2014)
- Reference: Mini-batching Sampling-based Softmax Approximations (Zoph et al., 2015)
- Reference: Class-based Softmax (Goodman 2001)
- Reference: Hierarchical Softmax (Morin and Bengio 2005)
- Reference: Error Correcting Codes (Dietterich and Bakiri 1995)
- Reference: Binary Code Prediction for Language (Oda et al. 2017)
Slides: Efficiency Slides
Sample Code: Efficiency Code Examples