Introduction - Overview of NLP (Jan 16)

Content:

  • What is natural language processing?
  • What are the features of natural language?
  • What do we want to do with NLP?
  • What makes it hard?
  • Building a rule-based classifier
  • Training a bag-of-words classifier

Slides: Intro Slides

Code: Simple Text Classifiers

Reading Material

Word Representation and Text Classification (Jan 18)

Content:

  • Subword models
  • Continuous word embeddings
  • Training more complex models
  • Neural network basics
  • Visualizing word embeddings

Recitation (OH): PyTorch and SentencePiece

Slides: Word Representation and Text Classification Slides

Code: Subword Models, Text Classification

Reading Material

Language Modeling (Jan 23)

Content:

  • Language Modeling Problem Definition
  • Count-based Language Models
  • Measuring Language Model Performance: Accuracy, Likelihood, and Perplexity
  • Log-linear Language Models
  • Neural Network Basics
  • Feed-forward Neural Network Language Models

Recitation (OH): N-Gram Language Model

Slides: Language Modeling Slides

Reading Material

Sequence Modeling (Jan 25)

Content:

  • Recurrent Networks
  • Convolutional Networks
  • Attention

Recitation (OH): Hugging Face Transformers

Slides: Sequence Modeling Slides

Reading Material

Transformers (Jan 30)

Content:

  • Transformer Architecture
  • Multi-Head Attention
  • Positional Encodings
  • Layer Normalization
  • Optimizers and Training
  • LLaMa Architecture

Recitation (OH): Annotated Transformer

Slides: Transformers Slides

Reading Material

Generation Algorithms (Feb 1)

Lecturer Amanda Bertsch

Content:

  • Sampling from LMs, beam search and variants
  • Minimum Bayes Risk
  • Constrained decoding
  • Human-in-the-loop decoding
  • Fast inference

Recitation (OH): vLLM

Slides Generation Slides

Reading Material

Prompting (Feb 6)

Content:

  • Prompting Methods
  • Sequence-to-sequence Pre-training
  • Prompt Engineering
  • Answer Engineering
  • Multi-prompt Learning
  • Prompt-aware Training Methods

Recitation(OH): OpenAI API, LiteLLM

Slides Prompting Slides

Reading Material

Fine Tuning and Instruction Tuning (Feb 8)

Content:

  • Multi-tasking
  • Fine-tuning and Instruction Tuning
  • Parameter Efficient Fine-tuning
  • Instruction Tuning Datasets
  • Synthetic Data Generation

Slides Instruction Tuning Slides

Reading Material

Experimental Design and Human Annotation (Feb 13)

Content:

  • Experimental Design
  • Data Annotation

Slides Experimental Design Slides

References:

Retrieval and RAG (Feb 15)

Content:

  • Retrieval Methods
  • Retrieval Augmented Generation
  • Long-context Transformers

Recitation(OH): LangChain or LlamaIndex

Slides Retrieval Augmented Generation Slides

References

Distillation, Quantization, and Pruning (Feb 20)

Co-Lecturer Vijay Viswanathan

Content:

  • Distillation
  • Quantization
  • Pruning

Slides Distillation Slides

References

Reinforcement Learning (Feb 22)

Content:

  • Methods to Gather Feedback
  • Error and Risk
  • Reinforcement Learning
  • Stabilizing Reinforcement Learning

Slides Reinforcement Learning Slides

Debugging and Interpretation (Feb 27)

Co-Lecturer Nishant Subramani

Content:

  • Neural NLP model debugging methods
  • Model interpretability: probing, mechanistic interpretability, and steering vectors

Slides

Recitation (OH): ZenoML

References

Ensembling and Mixture of Experts (Feb 29)

Content:

  • Ensembling
  • Model Merging
  • Sparse Mixture of Experts
  • Pipeline Models

Slides Multi-model Slides

References:

Tour of Modern Large Language Models (Mar 12)

Content:

  • Factors influencing openness in LMs
  • Pythia
  • OLMo
  • LLaMa 2
  • Mistral/Mixtral
  • Qwen
  • Code/Math/Science models
  • Closed models

Slides Modern LM Slides

References

Long Sequence Models - Albert Gu (March 14)

Content:

Recitation (OH): Unlimiformer, Mamba

Code Generation (March 19)

Content:

  • Code Generation

Slides Code Generation Slides

References

Knowledge Based QA (March 21)

Content:

  • Knowledge Based QA

Slides Knowledge Based QA Slides

References

Bias and Fairness - Guest Lecture by Maarten Sap (March 26)

Content:

  • Bias and Fairness

Slides Safety, Ethics, and Biases Slides

Language Agents (March 28)

Co-Lecturer Zhiruo Wang, Frank F. Xu

Content:

  • Tool Use
  • Language Agents

Slides Tool Use Slides

Slides Language Agents Slides

References

Complex Reasoning (April 02)

Content:

  • Types of Reasoning
  • Pre-LLM Approaches
  • Chain-of-thought and Variants
  • Supervised Training for Reasoning
  • Abductive Reasoning

Slides Reasoning Slides

References

Linguistics and Computational Linguistics (April 04)

Multilingual NLP (April 09)

Content:

  • Multilingual NLP

Slides Multilingual NLP Slides

References

State-of-the-art Chat Models and Evaluation - Hao Zhang (Apr 16)

Content:

State-of-the-art RAG Methods - Akari Asai (Apr 18)

Content:

Slides Advancing the State-of-the-art in RAG