CS 11-711: Advanced NLP

Learning 2 - Structured Learning Algorithms (11/16/2021)

Content:

Reinforcement Learning
Minimum Risk Training
The Structured Perceptron
Structured Max-margin Objectives
Simple Remedies to Exposure Bias

Required Reading: Deep Reinforcement Learning Tutorial (Karpathy 2016)
Reference: Goldberg Book Chapter 19-19.3
Reference: Course in Machine Learning Chapter 17 (Daume)
Reference: Reinforcement Learning Textbook (Sutton and Barto 2016)
Reference: Minimum Risk Training for NMT (Shen et al. 2015)
Reference: REINFORCE (Williams 1992)
Reference: Co-training (Blum and Mitchell 1998)
Reference: Revisiting Self-training (He et al. 2020)
Reference: Adding Baselines (Dayan 1990)
Reference: Sequence-level Training for RNNs (Ranzato et al. 2016)
Reference: Experience Replay (Lin 1993)
Reference: Neural Q Learning (Tesauro 1995)
Reference: Intrinsic Reward (Schmidhuber 1991)
Reference: Intrinsic Reward for Atari (Bellemare et al. 2016)
Reference: Reinforcement Learning for Dialog (Young et al. 2013)
Reference: End-to-end Neural Task-based Dialog (Williams and Zweig 2016)
Reference: Neural Chat Dialog (Li et al. 2016)
Reference: User Simulation for Learning in Dialog (Schatzmann et al. 2007)
Reference: RL for Mapping Instructions to actions (Branavan et al. 2009)
Reference: Deep RL for Mapping Instructions to Actions (Misra et al. 2017)
Reference: RL for Text-based Grames (Narasimhan et al. 2015)
Reference: Incremental Prediction in MT (Grissom et al. 2014)
Reference: Incremental Neural MT (Gu et al. 2017)
Reference: RL for Information Retrieval (Narasimhan et al. 2016)
Reference: RL for Query Reformulation (Nogueira and Cho 2017)
Reference: RL for Coarse-to-fine Question Answering (Choi et al. 2017)
Reference: RL for Learning Neural Network Structure (Zoph and Le 2016)
Reference: Conditional Random Fields (Lafferty et al. 2001)
Reference: Structured Perceptron (Collins 2002)
Reference: Structured Hinge Loss (Taskar et al. 2005)
Reference: SEARN (Daume et al. 2006)
Reference: DAgger (Ross et al. 2011)
Reference: Dynamic Oracles (Goldberg and Nivre 2013)
Reference: Training Neural Parsers w/ Dynamic Oracles (Ballesteros et al. 2016)
Reference: Word Dropout (Gal and Ghahramani 2015)
Reference: RAML (Norouzi et al. 2016)

Slides: Structured Prediction Slides
Sample Code: Structured Prediction Code Examples

<-- Back To Schedule