Course Description

CS 11-731
Language Technologies Institute, School of Computer Science
Carnegie Mellon University
Tuesday/Thursday 1:30-2:50PM, Posner Hall 151


Instructor: Graham Neubig (
  Office hours: Monday 4:00-5:00PM (GHC5409)
TAs: (
  Junjie Hu (Thursday 10:00-11:00AM, GHC5503)
  John Wieting (Tuesday 11:30-12:30AM, GHC5417)
  Pengcheng Yin (Wednesday 11:30AM-12:30PM, GHC5505)
Questions and Discussion: Ideally in class or through piazza so we can share information with the class, but email and office hours are also OK.

Course Description

Machine Translation and Sequence-to-Sequence Models is an introductory graduate-level course surveying the primary approaches and methods for developing systems to translate between human languages, or other sequential data. The main objective of the course is to obtain basic understanding and implementation skills for modern methods for MT and sequence transduction, including how to design models, how to learn the model parameters, how to search for the best output, and how to create training data. The course will focus on machine translation, but also briefly cover tasks such as dialog response generation, image caption generation, and others.

Pre-requisites: This course has no official pre-requisites, although 11-711 "Algorithms for NLP" or 10-701 "Machine Learning" would be helpful.

Class format: As the class aims to provide practical skills necessary to implement cutting-edge machine translation and sequence-to-sequence models, the classes and assignments will put a heavy focus on implementation and following cutting research. In general classes will take the following format:

  • Reading: Before the class, you will be given a reading assignment that you should read before coming to class that day.
  • Quiz: At the beginning of class, there will be a short quiz that tests your knowledge of the reading assignment. (These quizzes should be easy if the reading assignment has been completed and understood.)
  • Summary/Elaboration/Questions: The instructor will summarize the important points of the reading material, elaborate on details that were not included in the reading, and field any questions.
  • Code Walk: The instructor or TAs will walk through some demonstration code that implements a simple version of the main concepts presented in the reading material.
In the latter part of the class, there will also be several sessions that are based around paper discussions: reading a recent prominent paper in the field, and discussing its content. These will take the following format:
  • Reading: Before the class, you will be given a choice of one of several papers on a particular topic.
  • Paper Discussion: The class will split into groups and discuss the reading material. During the discussion, all discussion members will be asked to contribute at least one critical question, comment, or suggestion about the paper and will be graded on such.

Grading: The assignments will be given a grade of A+ (100), A (96), A- (92), B+ (88), B (85), B- (82), or below. The final grades will be determined based on the weighted average of the quizzes, assignments, and project. Cutoffs for final grades will be approximately 97+ A+, 94+ A, 90+ A-, 87+ B+, 84+ B, 80+ B-, etc., although I reserve some flexibility to change these thresholds slightly.

  • Quizzes/Discussions: Worth 20% of the grade. Your lowest 2 quiz grades will be dropped. If you are sick or traveling on business (e.g. to a conference, for a job interview, or delayed in return due to visa issues), send a doctor's note or evidence of the reason for being away to the TA list within a week of the absence, and you will be excused. We expect excused quizzes to be relatively rare, and if you'll be away for more than, e.g. 2 classes over the semester, please consult in advance.
  • Assignments: There will be 2 assignments, each worth 20% of the grade.
  • Project: The final course project will be worth 40%.
The details of the assignments are elaborated on the assignments page.