Minimum Risk Training and Reinforcement Learning (3/3/2020)
Minimum Risk Training Reinforcement Learning
- Error, Risk, and Minimum Risk Training
- What is Reinforcement Learning?
- Policy Gradient and REINFORCE
- Stabilizing Reinforcement Learning
- Value-based Reinforcement Learning
- Required Reading (for quiz): Deep Reinforcement Learning Tutorial (Karpathy 2016)
- Other Useful Reading: Reinforcement Learning Textbook (Sutton and Barto 2016)
- Reference: Minimum Risk Training for NMT (Shen et al. 2015)
- Reference: REINFORCE (Williams 1992)
- Reference: Co-training (Blum and Mitchell 1998)
- Reference: Revisiting Self-training (He et al. 2020)
- Reference: Adding Baselines (Dayan 1990)
- Reference: Sequence-level Training for RNNs (Ranzato et al. 2016)
- Reference: Experience Replay (Lin 1993)
- Reference: Neural Q Learning (Tesauro 1995)
- Reference: Intrinsic Reward (Schmidhuber 1991)
- Reference: Intrinsic Reward for Atari (Bellemare et al. 2016)
- Reference: Reinforcement Learning for Dialog (Young et al. 2013)
- Reference: End-to-end Neural Task-based Dialog (Williams and Zweig 2016)
- Reference: Neural Chat Dialog (Li et al. 2016)
- Reference: User Simulation for Learning in Dialog (Schatzmann et al. 2007)
- Reference: RL for Mapping Instructions to actions (Branavan et al. 2009)
- Reference: Deep RL for Mapping Instructions to Actions (Misra et al. 2017)
- Reference: RL for Text-based Grames (Narasimhan et al. 2015)
- Reference: Incremental Prediction in MT (Grissom et al. 2014)
- Reference: Incremental Neural MT (Gu et al. 2017)
- Reference: RL for Information Retrieval (Narasimhan et al. 2016)
- Reference: RL for Query Reformulation (Nogueira and Cho 2017)
- Reference: RL for Coarse-to-fine Question Answering (Choi et al. 2017)
- Reference: RL for Learning Neural Network Structure (Zoph and Le 2016)
Slides: Minimum Risk and Reinforcement Learning Slides
Sample Code: Reinforcement Learning Code Examples