CS 11-747: Neural Networks for NLP

A Simple (?) Exercise: Predicting the Next Word in a Sentence (1/18/2018)

Content:

Reading Material

Highly Recommended: Goldberg Book Chapters 8-9
Reference: Goldberg Book Chapters 6-7 (because CS11-711 is a pre-requisite, I will assume you know most of this already, but it might be worth browsing for terminology, etc.)
Reference: Maximum entropy (log-linear) language models. (Rosenfeld 1996)
Reference: A Neural Probabilistic Language Model. (Bengio et al. 2003, JMLR)
Reference: An Overview of Gradient Descent Algorithms. (Ruder 2016)
Reference: The Marginal Value of Adaptive Gradient Methods. (Wilson et al. 2017)
Reference: Stronger Baselines for Neural MT. (Denkowski Neubig 2017)
Reference: Using the Output Embedding. (Press and Wolf 2016)

Slides: LM Slides
Sample Code: LM Code Examples