Machine Translation and Sequence to Sequence Models

CS 11-731
Language Technologies Institute, School of Computer Science
Carnegie Mellon University
Tuesday/Thursday 1:30-2:50PM, GHC4102


Instructor: Graham Neubig (
  Office hours: Monday 4:00-5:00PM (GHC5409)
TAs: Qinlan Shen and Dongyeop Kang (
  Office hours: Tuesday 3:00-4:00PM (Dongyeop@GHC5713), Wednesday 11:00-12:00AM (Qinlan@GHC6405)
Questions and Discussion: Ideally in class or through piazza so we can share information with the class, but email and office hours are also OK.

Course Description

Machine Translation and Sequence-to-Sequence Models is an introductory graduate-level course surveying the primary approaches and methods for developing systems to translate between human languages, or other sequential data. The main objective of the course is to obtain basic understanding and implementation skills for modern methods for MT and sequence transduction, including how to design models, how to learn the model parameters, how to search for the best output, and how to create training data. The course will focus on machine translation, but also briefly cover tasks such as dialog response generation, image caption generation, and others.

Pre-requisites: This course has no official pre-requisites, although 11-711 "Algorithms for NLP" or 10-701 "Machine Learning" would be helpful.

Course format: Classes will take the following format:

Discussion: In the second half of the class, we will be having a paper discussion period on Thursday of every week. Before the class, everyone will be expected to read one of three papers, and turn in a some answers to brief open-ended questions in place of the quiz. During the discussion, everyone will be expected to make at least one critical comment about the paper: Was there anything that you think should have been done better, or examined more? Do you have any ideas for ways to expand the method? So please think of this while reading the paper.

Grading: The course will be graded based on the grades of the small quizzes, assignments, and a course project. The in-class implementation exercises during the first half will be turned in as two small-ish assignments. During the second half, your implementation assignments will focus on completing a larger class final project, which will be expected to be novel and interesting (and potentially useful or publishable).

Things to Prepare: Please bring a computer to every class, and make sure you have an account on GitHub. If either of these pose a problem, please consult with the instructors on the first day of class.


Assignment Policies: Assignments and the final project may be done in groups of 1-3. If you work in a group of more tha one person, please use a shared git repository and commit the code that you write, and in reports note who did what part of the project. Assignments done in groups of 2-3 will be expected to 2-3 times as significant as assignments done by one person.

Send to the TAs your names, andrew IDs, and the link to a github repository containing code, output for the "test" and "blind" sets, and a report of 2-4 pages. The report should be named "report.pdf". The output should be tokenized and lowercased, and stored in the "output/" directory. The names of your primary results should be "output/test.primary.en" and "output/blind.primary.en", and should use only the provided IWSLT data for training the system. You can also submit additional outputs that use other methods, or use additional resources other than the IWSLT data listed below. In this case, replace primary by an arbitrary string, and we will calculate results for these as well.

All assignments are expected to be conducted under the CMU policy for academic integrity. All rules here apply and violations will be subject to zero credit on the assignment, or other disciplinary measures. In particular, while you may base your implementation on the pseudo-code provided by the TAs or instructor, copying code of other students in the class who are not part of your assignment group is (obviously) not allowed.

Training Data

Course Schedule

1/17 Introduction (Updated 1/18, 1:50PM), Statistical Machine Translation Overview  
1/19 Language Models 1: n-gram Language Models (Updated 1/18, 1:50PM)  
1/24 Language Models 2: Log-linear Language Models (Updated slightly 1/23, 11:00AM)  
1/26 Language Models 3: Feed-forward Neural LMs (Updated 1/26, 9:45AM)  
1/31 Guest Lecture by Robert Frederking: Rule-based Machine Translation  
2/2 Language Models 4: Recurrent Neural Network LMs  
2/7 Neural MT 1: Encoder-Decoder Models  
2/9 Neural MT 2: Attentional Models Assignment 1
2/14 Evaluation of Generated Output Example of Assesments and Automatic Scores, Data
2/16 Project Discussion Day Prepare to answer two questions:
  • (Required) What kind of things are you interested in pursuing for the project?
  • (Optional) If you have any specific ideas, please share them.
2/21 Symbolic MT 1: IBM Models  
2/23 Symbolic MT 2: Weighted Finite State Transducers (Updated 2/21, 8:30PM)  
2/28 Symbolic MT 3: Phrase-based Machine Translation (Updated algorithm slightly 2/28, 6:00AM)  
3/2 Symbolic MT 4: Tree-based Machine Translation Assignment 2: Create a Symbolic Model (planned due date: 3/20)
3/7,3/9 Other Sequence-to-Sequence Tasks Paper Reading Candidates:
3/14,3/16 Spring Break, No Class  
3/21,3/23 Advanced Topics 1: Parameter Optimization Paper Reading Candidates:
3/28,3/30 Advanced Topics 2: Hybrid Neural/Symbolic Models Paper Reading Candidates:
4/4,4/6 Advanced Topics 3: Subword Models  
4/11,4/13 Advanced Topics 4: Multilingual and Multi-task Learning  
4/18 Guest Lecture by LP Morency  
4/20 School Break, No Class  
4/25,4/27 Advanced Topics 5: Ensembling/System Combination  
5/2,5/4 Final Project Discussion