Data-driven Strategies for NMT (2/8/2022)
Lecture: (by Graham Neubig)
- Data augmentation strategies
Language in 10: Bengali
Slides: MT Data Augmentation Slides
Discussion: Read one of the cited papers on data augmentation
- Reference: Data Augmentation for Low-Resource Neural Machine Translation (Fadaee et al. 2017)
- Reference: Handling Syntactic Divergence in Low-resource Machine Translation (Zhou et al. 2019)
- Reference: Generalized Data Augmentation for Low-resource Translation (Xia et al. 2019)
References:
- Tool: GIZA++
- Tool: fastalign
- Tool: awesome-align
- Reference: Generalized Data Augmentation for Low-resource Translation (Xia et al. 2019)
- Reference: Improving Neural Machine Translation Models with Monolingual Data (Sennrich et al. 2016)
- Reference: Understanding Back-Translation at Scale (Edunov et al. 2018)
- Reference: Iterative Back-Translation for Neural Machine Translation (Hoang et al. 2018)
- Reference: Meta Back-translation (Pham et al. 2021)
- Reference: Copied Monolingual Data Improves Low-Resource Neural Machine Translation (Currey et al. 2018)
- Reference: Data Augmentation for Low-Resource Neural Machine Translation (Fadaee et al. 2017)
- Reference: Unsupervised Machine Translation Using Monolingual Corpora Only (Lample et al. 2018)
- Reference: Handling Syntactic Divergence in Low-resource Machine Translation (Zhou et al. 2019)