11-663/763: Inference Algorithms for Language Modeling
Fall 2025 @ Carnegie Mellon University
Course Description
As the use of massive and costly-to-train large language models has become increasingly commonplace, much academic and industry interest has focused on the methods used to generate outputs from these models – the inference process. Inference-time algorithms can be applied on top of an already-trained model to improve generation quality, lower latency, or induce additional controllability. Inference-time algorithms can allow users to run models on their laptop, serve millions of outputs at scale, or dramatically increase the quality of generations from a system without additional training.
In this class, we survey the wide space of inference-time techniques with a particular focus on the implementation and practical use cases of such methods. Students will understand the different ways to implement and compare inference-time techniques, learn the theory behind different strategies for inference-time scaling of compute, and implement representative examples from several classes of inference-time algorithms. In the final project, students will apply inference-time strategies of their choice to two shared tasks: an open-ended generation task and a reasoning task.
Course Information
- Prerequisites: 11-411/611/711 OR 11-785 OR 10-401/601/701 OR 10-715 OR 11-667
- Target Audience: Research masters or PhD students who have some basic knowledge of language models and NLP, are comfortable with programming in Python, and are interested in understanding language model inference from an algorithmic perspective
- Class Size: Capped at 45 students