Assignments
The aim of the assignments and shared tasks is to build basic understanding and advanced implementation skills needed to build cutting-edge inference systems for language models, culminating with a shared task that demonstrates these abilities.
Read all the instructions on this page carefully You are responsible for reading these instructions and following them carefully. If you do not, you may be marked down as a result.
Assignment Policies
Working in Teams: There are 3 homework assignments plus a shared task in the class. All assignments should be submitted individually.
Submission Information: To submit your assignment you must submit via Canvas a zip file containing:
- your code: This should be in a directory “code” in the top directory unless specified otherwise.
- your report: This should be a PDF file named “report.pdf” in the top directory.
Late Policy: Late submissions will be penalized at 5% per day late, up to a maximum of 5 days. Each student has 3 late days to use throughout the semester without penalty.
Shared Tasks
Each of the assignments will have a shared set of tasks that you will implement.
- Task 1: Mathematical reasoning
- Task 2: Multiple-choice question answering
- Task 3: Open-ended generation
All tasks will involve you using the same models across several model scales.
We will provide details about the individual datasets and models shortly.
Quizzes
- One quiz or participatory activity per class
- Due before the next class starts for take-home quizzes, or during the class for in-class quizzes
- Lowest 3 scores dropped automatically (it is also possible to be excused for sickness or CMU business travel)
- No makeups (use the drop policy instead)
Homework Assignments
There will be three homework assignments throughout the semester.
Homework 1: Decoding Methods and Probability Review
- Released: Friday 9/5
- Due: Thursday 9/25
- Topics:
- Math homework with probability review
- Implementation and questions about sampling methods (temperature, top-p, top-k, mirostat)
- Debugging implementation of diverse beam search
- Comparison of results on shared tasks
- Evaluation Criteria:
- Correctness of mathematical answers
- Code must be runnable with the settings provided in the task description
- Code must accurately implement the decoding methods
- Results on the shared tasks must be reported and reproducible
Homework 2: Meta-decoding Methods
Full details pending.
- Released: Monday 9/23
- Due: Monday 10/28
- Topics:
- Implementation of best-of-n and MBR with a reward model
- Implementation of self-refine
- Implementation of LLMs with Python code execution
- Result visualization and analysis
- Comparison of results on shared tasks
- Evaluation Criteria:
- Code must be runnable with the settings provided in the task description
- Code must accurately implement the decoding methods
- Results on the shared tasks must be reported and reproducible
Homework 3: Efficiency
- Released: Monday 10/28
- Due: Monday 11/25
- Topics:
- Batching of requests
- Implementation of KV caching
- Implementation of speculative decoding
- Optimization for available hardware
- Restrictions:
- No off-the-shelf inference servers can be used (e.g. vLLM, sglang, etc)
- Evaluation Criteria:
- Code must be runnable with the settings provided in the task description
- Code must accurately implement the server and basic optimizations
- Grading on performance
- Throughput
- Latency
- Accuracy
Shared Task: Final Submission
The final submission is an API-based inference server designed by the student. The course staff will query the server with queries from all three shared tasks. The final grade for the shared tasks will be based on accuracy within a fixed latency budget (to be announced).
- Due: Wednesday 12/4
- Submission Requirements:
- API-based inference server implementation
- Corresponding technical report
- Report Format: COLM-format, 4-8 page paper
- Report Requirements:
- Introduction to tasks
- Related work discussion
- System description
- Original diagram/visualization of system