Course Details
Time/Location
CS 11-663/763
Language Technologies Institute, School of Computer Science
Carnegie Mellon University
Tuesday/Thursday 3:00-4:20PM, location TBD
Content
As the use of massive and costly-to-train large language models has become increasingly commonplace, much academic and industry interest has focused on the methods used to generate outputs from these models – the inference process. Inference-time algorithms can be applied on top of an already-trained model to improve generation quality, lower latency, or induce additional controllability. Inference-time algorithms can allow users to run models on their laptop, serve millions of outputs at scale, or dramatically increase the quality of generations from a system without additional training.
In this class, we survey the wide space of inference-time techniques with a particular focus on the implementation and practical use cases of such methods. Students will understand the different ways to implement and compare inference-time techniques, learn the theory behind different strategies for inference-time scaling of compute, and implement representative examples from several classes of inference-time algorithms. In the final project, students will apply inference-time strategies of their choice to two shared tasks: an open-ended generation task and a reasoning task.
Course Objectives
By the end of this course, students will be able to:
- Describe, implement, and modify a wide range of inference algorithms
- Categorize the performance of inference algorithms under multiple types of efficiency considerations
- Analyze and contrast the performance of inference algorithms on specific downstream tasks
Prerequisites
One of the following courses, or similar experience:
- 11-411/611/711
- 11-785
- 10-401/601/701
- 10-715
- 11-667
Target Audience
This course is designed for:
- Research masters or PhD students
- Students with basic knowledge of language models and NLP
- Students comfortable with Python programming
- Students interested in understanding language model inference from an algorithmic perspective
Class size is capped at 45 students.
Course Structure
Lectures
- Tuesday/Thursday, 3:30 to 4:50pm
Assignments
- 4 homework assignments (15% each, for 60% of final grade)
- Shared tasks (implementation and report) (30% of final grade)
- Lecture quizzes (10% of final grade, drop the lowest 6 scores)
Office Hours
- to be announced
- Additional appointments available by request