Data-driven Strategies for NMT (2/8/2022)
Lecture: (by Graham Neubig)
- Data augmentation strategies
 
Language in 10: Bengali
Slides: MT Data Augmentation Slides 
Discussion: Read one of the cited papers on data augmentation
- Reference: Data Augmentation for Low-Resource Neural Machine Translation (Fadaee et al. 2017)
 - Reference: Handling Syntactic Divergence in Low-resource Machine Translation (Zhou et al. 2019)
 - Reference: Generalized Data Augmentation for Low-resource Translation (Xia et al. 2019)
 
References:
- Tool: GIZA++
 - Tool: fastalign
 - Tool: awesome-align
 - Reference: Generalized Data Augmentation for Low-resource Translation (Xia et al. 2019)
 - Reference: Improving Neural Machine Translation Models with Monolingual Data (Sennrich et al. 2016)
 - Reference: Understanding Back-Translation at Scale (Edunov et al. 2018)
 - Reference: Iterative Back-Translation for Neural Machine Translation (Hoang et al. 2018)
 - Reference: Meta Back-translation (Pham et al. 2021)
 - Reference: Copied Monolingual Data Improves Low-Resource Neural Machine Translation (Currey et al. 2018)
 - Reference: Data Augmentation for Low-Resource Neural Machine Translation (Fadaee et al. 2017)
 - Reference: Unsupervised Machine Translation Using Monolingual Corpora Only (Lample et al. 2018)
 - Reference: Handling Syntactic Divergence in Low-resource Machine Translation (Zhou et al. 2019)