Course Description

CS 11-747
Language Technologies Institute, School of Computer Science
Carnegie Mellon University
Tuesday/Thursday 2:30-3:40PM, Remote (see Piazza for zoom link)


(See Piazza for Office Hours)

Graham Neubig (

Pengfei Liu (
TAs: (
  Ritam Dutt
  Divyansh Kaushik
  Zhengbao Jiang
  Zhisong Zhang
  Shuyan Zhou
Questions and Discussion: Ideally in class or through piazza so we can share information with the class, but emailing the TA mailing list and coming to office hours are also encouraged.

Course Description

Neural networks provide powerful new tools for modeling language, and have been used both to improve the state-of-the-art in a number of tasks and to tackle new problems that were not easy in the past. This class will start with a brief overview of neural networks, then spend the majority of the class demonstrating how to apply neural networks to natural language problems. Each section will introduce a particular problem or phenomenon in natural language, describe why it is difficult to model, and demonstrate several models that were designed to tackle this problem. In the process of doing so, the class will cover different techniques that are useful in creating neural network models, including handling variably sized and structured sentences, efficient handling of large data, semi-supervised and unsupervised learning, structured prediction, and multilingual modeling.

Pre-requisites: There are no hard pre-requisites for the course, but 11-411/611 "Natural Language Processing", 11-711 "Algorithms for NLP", or equivalent background in NLP is very helpful.

Class format: Given the COVID-19 situation, all classes will be remote and consist of:

  • Lecture Video: The lecture video will be pre-recorded so you can watch it at your leisure. There will be a quiz on the material on Canvas that you can fill out after watching the video.
  • Reading: Most classes will also have associated reading material that you can read either before or after the video.
  • Discussion: In some classes you will have the opportunity to participate in a discussion with the instructor and TAs regarding the material, prompted by some questions.
  • Code/Data Walk: Some classes will include a code walk through code of a particular implementation, or data.

Grading: The assignments will be given a grade of A+ (100), A (96), A- (92), B+ (88), B (85), B- (82), or below. The final grades will be determined based on the weighted average of the quizzes, assignments, and project. Cutoffs for final grades will be approximately 97+ A+, 93+ A, 90+ A-, 87+ B+, 83+ B, 80+ B-, etc., although I reserve some flexibility to change these thresholds slightly.

  • Quizzes: Worth 20% of the grade. Your lowest 3 quiz grades will be dropped.
  • Assignments: There will be 4 assignments (the final one being the project), worth respectively 15%, 15%, 20%, 30% of the grade.
The details of the assignments are elaborated on the assignments page.