Course Details

Time/Location

CS 11-663/763
Language Technologies Institute, School of Computer Science
Carnegie Mellon University
Tuesday/Thursday 3:00-4:20PM, location TBD

Content

As the use of massive and costly-to-train large language models has become increasingly commonplace, much academic and industry interest has focused on the methods used to generate outputs from these models – the inference process. Inference-time algorithms can be applied on top of an already-trained model to improve generation quality, lower latency, or induce additional controllability. Inference-time algorithms can allow users to run models on their laptop, serve millions of outputs at scale, or dramatically increase the quality of generations from a system without additional training.

In this class, we survey the wide space of inference-time techniques with a particular focus on the implementation and practical use cases of such methods. Students will understand the different ways to implement and compare inference-time techniques, learn the theory behind different strategies for inference-time scaling of compute, and implement representative examples from several classes of inference-time algorithms. In the final project, students will apply inference-time strategies of their choice to two shared tasks: an open-ended generation task and a reasoning task.

Course Objectives

By the end of this course, students will be able to:

  1. Describe, implement, and modify a wide range of inference algorithms
  2. Categorize the performance of inference algorithms under multiple types of efficiency considerations
  3. Analyze and contrast the performance of inference algorithms on specific downstream tasks

Prerequisites

One of the following courses, or similar experience:

  • 11-411/611/711
  • 11-785
  • 10-401/601/701
  • 10-715
  • 11-667

Target Audience

This course is designed for:

  • Research masters or PhD students
  • Students with basic knowledge of language models and NLP
  • Students comfortable with Python programming
  • Students interested in understanding language model inference from an algorithmic perspective

Class size is capped at 45 students.

Course Structure

Lectures

  • Tuesday/Thursday, 3:30 to 4:50pm

Assignments

  • 4 homework assignments (15% each, for 60% of final grade)
  • Shared tasks (implementation and report) (30% of final grade)
  • Lecture quizzes (10% of final grade, drop the lowest 6 scores)

Office Hours

  • to be announced
  • Additional appointments available by request