Introduction - Overview of NLP (Jan 16)
Content:
- What is natural language processing?
- What are the features of natural language?
- What do we want to do with NLP?
- What makes it hard?
- Building a rule-based classifier
- Training a bag-of-words classifier
Slides: Intro Slides
Code: Simple Text Classifiers
Reading Material
- Reference: Examining Power and Agency in Film (Sap et al. 2017)
Word Representation and Text Classification (Jan 18)
Content:
- Subword models
- Continuous word embeddings
- Training more complex models
- Neural network basics
- Visualizing word embeddings
Recitation (OH): PyTorch and SentencePiece
Slides: Word Representation and Text Classification Slides
Code: Subword Models, Text Classification
Reading Material
- Reference: Neural Machine Translation of Rare Words with Subword Units (Sennrich et al. 2015)
- Reference: Unigram Models for Subword Segmentation (Kudo 2018)
- Software: SentencePiece
- Reference: Exploring BERT’s Vocabulary (Ács 2019)
Language Modeling (Jan 23)
Content:
- Language Modeling Problem Definition
- Count-based Language Models
- Measuring Language Model Performance: Accuracy, Likelihood, and Perplexity
- Log-linear Language Models
- Neural Network Basics
- Feed-forward Neural Network Language Models
Recitation (OH): N-Gram Language Model
Slides: Language Modeling Slides
Reading Material
- Highly Recommended Reading: Goldberg Book Chapter 8-9
- Reference: An Empirical Study of Smoothing Techniques for Language Modeling (Goodman 1998)
- Software: kenlm
- Reference: Maximum entropy (log-linear) language models. (Rosenfeld 1996)
- Reference: Lossless Data Compression with Arithmetic Coding. (neptune.ai 2023)
- Reference: Using the Output Embedding. (Press and Wolf 2016)
- Reference: A Neural Probabilistic Language Model. (Bengio et al. 2003)
- Reference: On the Calibration of Modern Neural Networks (Guo et al. 2017)
- Reference: How can we Know when Language Models Know (Jiang et al. 2020)
- Reference: Self-Consistency Improves Chain of Thought Reasoning in Language Models (Wang et al. 2022)
- Reference: Just Ask for Calibration (Tian et al. 2023)
- Reference: Can LLMs Express Their Uncertainty? (Xiong et al. 2023)
Sequence Modeling (Jan 25)
Content:
- Recurrent Networks
- Convolutional Networks
- Attention
Recitation (OH): Hugging Face Transformers
Slides: Sequence Modeling Slides
Reading Material
- Recommended Reading: Neural Machine Translation and Sequence-to-Sequence Models Chapter 8
- Reference: RNNs (Elman 1990)
- Reference: LSTMs (Hochreiter and Schmidhuber 1997)
- Reference: Attentional NMT (Bahdanau et al. 2015)
- Reference: Effective Approaches to Attention (Luong et al. 2015)
- Reference: Self Attention (Cheng et al. 2016)
- Reference: Attention is All You Need (Vaswani et al. 2017)
Transformers (Jan 30)
Content:
- Transformer Architecture
- Multi-Head Attention
- Positional Encodings
- Layer Normalization
- Optimizers and Training
- LLaMa Architecture
Recitation (OH): Annotated Transformer
Slides: Transformers Slides
Reading Material
- Highly Recommended Reading: The Annotated Transformer
- Reference: Attention is All You Need (Vaswani et al. 2017)
- Reference: Attentional NMT (Bahdanau et al. 2015)
- Reference: Effective Approaches to Attention (Luong et al. 2015)
- Reference: Relative Positional Encodings (Shaw et al. 2018)
- Reference: RoPE (Su et al. 2021)
- Reference: Layer Normalization (Ba et al. 2016)
- Reference: RMSNorm (Zhang and Sennrich 2019)
- Reference: Pre- and Post-LayerNorm (Xiong et al. 2020)
- Reference: SiLU (Hendrycks and Gimpel 2016)
- Reference: AdamW (Loshchilov and Hutter 2017)
- Reference: LLaMa (Touvron et al. 2023)
- Reference: Comparison of Architectures (Gu and Dao 2023)
Generation Algorithms (Feb 1)
Lecturer Amanda Bertsch
Content:
- Sampling from LMs, beam search and variants
- Minimum Bayes Risk
- Constrained decoding
- Human-in-the-loop decoding
- Fast inference
Recitation (OH): vLLM
Slides Generation Slides
Reading Material
- Highly Recommended Reading: Overview of ACL ‘23 tutorial on decoding (Amini et al. 2023)
- Highly Recommended Reading: Modern Generation Techniques Through the Lens of Minimum Bayes Risk (Bertsch, Xie et al. 2023)
- Reference: Calibrated Language Models Must Hallucinate (Kalai & Vempala 2023)
- Reference: Contrastive Decoding (Li et al. 2023)
- Reference: Stochastic beam search (Kool et al. 2019)
- Reference: Diverse beam search (Vijayakumar et al. 2016)
- Reference: Self-consistency (Wang et al. 2022)
- Reference: FUDGE (Yang & Klein 2021)
- Reference: RL with KL penalties is better viewed as Bayesian inference (Korbak et al. 2022)
- Reference: Reward-augmented decoding (Deng & Raffel 2023)
- Reference: Wordcraft: Story Writing with Large Language Models (Yuan et al. 2022)
- Reference: Tree of Thoughts (Yao et al. 2023)
- Reference: Speculative Decoding (Leviathan et al. 2022)
- Reference: Attention Sinks (Xiao et al. 2023
Prompting (Feb 6)
Content:
- Prompting Methods
- Sequence-to-sequence Pre-training
- Prompt Engineering
- Answer Engineering
- Multi-prompt Learning
- Prompt-aware Training Methods
Recitation(OH): OpenAI API, LiteLLM
Slides Prompting Slides
Reading Material
- Recommended Reading: Prompting Survey
- Recommended Reading: Prompt Engineering Guide
- Reference: Unsupervised Prompting (Radford et al. 20)
- Reference: Few-shot Prompting (Brown et al. 2020)
- Reference: LiteLLM Prompt Templates (LiteLLM 2024)
- Reference: How to Format Inputs to ChatGPT Models (OpenAI Cookbook 2024)
- Reference: Prompt Ordering (Lu et al. 2021)
- Reference: Label Balance and Label Coverage (Zhang et al. 2022)
- Reference: What Makes In-context Learning Work (Min et al. 2022)
- Reference: Chain of Thought (Wei et al. 2022)
- Reference: Let’s think Step by Step (Kojima et al. 2022)
- Reference: Structuring Outputs as Programs (Madaan et al. 2022)
- Reference: Program Aided Language Models (Gao et al. 2022)
- Reference: Prompt Paraphrasing (Jiang et al. 2019)
- Reference: Iterative Prompt Paraphrasing (Zhou et al. 2021)
- Reference: AutoPrompt (Shin et al. 2020)
- Reference: Language Model’s Sensitivity to Prompts (Sclar et al. 2023)
- Reference: A Unified View of Parameter-efficient Transfer Learning (He et al. 2021)
- Reference: Adapters (Houlsby et al. 2019)
- Reference: Combining Prompting with Fine-tuning (Schick and Schütze 2020)
Fine Tuning and Instruction Tuning (Feb 8)
Content:
- Multi-tasking
- Fine-tuning and Instruction Tuning
- Parameter Efficient Fine-tuning
- Instruction Tuning Datasets
- Synthetic Data Generation
Slides Instruction Tuning Slides
Reading Material
- Recommended Reading: Instructiion Tuning Survey (Zhang et al. 2023)
- Recommended Reading: FLAN Collection (Longpre et al. 2023)
- Recommended Reading: Unified View of PEFT (He et al. 2021)
- Reference: ZeRo (Rajbhandari et al. 2019)
- Reference: Adapters (Houlsby et al. 2019)
- Reference: Adapter Fusion (Pfeiffer et al. 2020)
- Reference: LoRa (Hu et al. 2021)
- Reference: QLoRA (Dettmers et al. 2023)
- Reference: BitFit (Ben Zaken et al. 2023)
- Reference: MMLU (Hendrycks et al. 2020)
- Reference: Natural Questions (Kwiatkowski et al. 2019)
- Reference: HumanEval (Chen et al. 2021)
- Reference: WikiSum (Liu et al. 2018)
- Reference: FLORES (Goyal et al. 2021)
- Reference: OntoNotes (Weischedel et al. 2013)
- Reference: BIGBench (Srivastava et al. 2022)
- Reference: Instruction Tuning (1) (Wei et al. 2021)
- Reference: Instruction Tuning (2) (Sanh et al. 2021)
- Reference: Learning to In-context Learn (Min et al. 2021)
- Reference: Self-instruct (Wang et al. 2022)
- Reference: ORCA (Mukherjee et al. 2023)
- Reference: Evol-Instruct (Xu et al. 2023)
Experimental Design and Human Annotation (Feb 13)
Content:
- Experimental Design
- Data Annotation
Slides Experimental Design Slides
References:
- Recommended Reading: How to Avoid Machine Learning Pitfalls (Lones 2021)
- Recommended Reading: Best Practices for Data Annotation (Tseng et al. 2020)
- Recommended Viewing: How to Write a Great Research Paper (Peyton-Jones 2006)
- Reference: Sentiment Analysis (Pang et al. 2002)
- Reference: Conversational Question Answering (Reddy et al. 2019)
- Reference: Bottom-up Abstractive Summarization (Gehrmann et al. 2018)
- Reference: Unsupervised Word Segmentation (Kudo and Richardson 2018)
- Reference: Analyzing Language of Bias (Rankin et al. 2017)
- Reference: Are All Languages Equally Hard to Language-Model? (Cotterell et al. 2018)
- Reference: Modeling Podcasts (Reddy et al. 2021)
- Reference: BERT Rediscovers the Classical NLP Pipeline (Tenney et al. 2019)
- Reference: When and Why are Word Embeddings Useful in NMT? (Qi et al. 2018)
- Reference: Kappa Statistic (Carletta 1996)
- Reference: Downside of Surveys (Varian 1994)
- Reference: Penn Treebank Annotation Guidelines (Santoroni 1990)
- Reference: Data Statements for NLP (Bender and Friedman 2018)
- Reference: Power Analysis (Card et al. 2020)
- Reference: Active Learning (Settles 2009)
- Reference: Active Learning Curves (Settles and Craven 2008)
Retrieval and RAG (Feb 15)
Content:
- Retrieval Methods
- Retrieval Augmented Generation
- Long-context Transformers
Recitation(OH): LangChain or LlamaIndex
Slides Retrieval Augmented Generation Slides
References
- Recommended Reading: ACL 2023 RAG Tutorial (Asai et al. 2023)
- Reference: Retrieval-based QA (Chen et al. 2017)
- Reference: Dense Passage Retrieval (Karpukhin et al. 2020)
- Reference: Introduction to Information Retrieval (Manning et al. 2009)
- Software: Apache Lucene
- Reference: DPR (Karpukhin et al. 2020)
- Reference: Contriever (Izacard et al. 2022)
- Software: FAISS
- Software: ChromaDB
- Reference: Cross-encoder Reranking (Nogueira et al. 2019)
- Reference: Token-level Retrieval (Khattab and Zaharia 2020)
- Reference: Hypothetical Document Embeddings (Gao et al. 2022)
- Reference: End-to-end RAG Training (Lewis et al. 2020)
- Reference: Toolformer (Schick et al. 2023)
- Reference: FLARE (Jiang et al. 2023)
- Reference: kNN-LM (Khandelwal et al. 2019)
- Reference: Unlimiformer (Bertsch et al. 2023)
- Reference: Training Transformers with Context (Voita et al. 2018)
- Reference: Transformer-XL (Dai et al. 2019)
- Reference: Mistral (Jiang et al. 2023)
- Reference: Sparse Transformers (Child et al. 2019)
- Reference: Compressive Transformer (Rae et al. 2019)
- Reference: Linformer (Wang et al. 2020)
- Reference: Nystromformer (Xiong et al. 2021)
- Reference: Long Range Arena (Tay et al. 2020)
- Reference: SCROLLS (Shaham et al. 2022)
- Reference: Lost in the Middle (Liu et al. 2023)
- Reference: Deciding Whether to Use Passages (Asai et al. 2021)
- Reference: Learning to Filter Context (Wang et al. 2023)
Distillation, Quantization, and Pruning (Feb 20)
Co-Lecturer Vijay Viswanathan
Content:
- Distillation
- Quantization
- Pruning
Slides Distillation Slides
References
- Recommended Reading: Theia Vogel’s blog on “How to make LLMs go fast”
- Recommended Reading: Lilian Weng’s blog on “Inference Optimization”
- Reference: Over-parametrization is provably useful in training neural nets (Du and Lee 2018)
- Reference: Model-Aware Quantization: GOBO (Zadeh et al. 2020)
- Reference: Model-Aware Quantization: LLM.int8 (Dettmers et al. 2022)
- Software: Binarized Neural Networks (Courbariaux et al. 2016)
- Reference: Layer-by-Layer Quantization-Aware Distillation (Yao et al. 2020)
- Reference: QLoRA (Dettmers et al. 2023)
- Reference: Magnitude pruning (in general) (Han et al. 2015)
- Reference: An analysis of magnitude pruning for machine translation) (See et al. 2016)
- Reference: The Lottery Ticket Hypothesis) (Frankle et al. 2018)
- Reference: Wanda (Pruning by Weights and Activations) (Frankle et al. 2018)
- Reference: Coarse-to-fine Structured Pruning (Xia et al. 2022)
- Reference: Are Sixteen Heads Really Better than One? (Michel and Neubig 2019)
- Reference: Pruning with Forward Passes (Dery et al 2024)
- Reference: Self-Training (Yarowski 1995)
- Reference: Hard vs Soft Target Distillation (Hinton et al 2015)
- Reference: Sequence-Level Distillation (Kim and Rush 2016)
- Reference: DistilBERT (Sanh et al 2019)
- Reference: Deep Learning is Robust to Massive Label Noise (Furlanello et al 2018)
- Reference: Born Again Neural Networks (Rolnick et al 2018)
- Reference: Self-Instruct (Wang et al 2022)
- Reference: Prompt2Model (Viswanathan et al 2023)
- Reference: SynthIE (Exploiting Asymmetry for Synthetic Training Data Generation) (Josifoski et al 2023)
- Reference: DataDreamer: A Toolkit for Synthetic Data Generation (Patel et al 2024)
Reinforcement Learning (Feb 22)
Content:
- Methods to Gather Feedback
- Error and Risk
- Reinforcement Learning
- Stabilizing Reinforcement Learning
Slides Reinforcement Learning Slides
- Recommended Reading: Deep Reinforcement Learning Tutorial (Karpathy 2016)
- Recommended Reading: Human Feedback Survey (Fernandes et al. 2023)
- Reference: Course in Machine Learning Chapter 17 (Daume)
- Reference: Reinforcement Learning Textbook (Sutton and Barto 2016)
- Reference: TrueSkill (Sakaguchi et al. 2014)
- Reference: Multi-dimensional Quality Metrics
- Reference: Large-scale MQM Annotation (Freitag et al. 2021)
- Reference: BERTScore (Zhang et al. 2019)
- Reference: COMET (Rei et al. 2020)
- Reference: GEMBA (Kocmi and Federmann 2023)
- Reference: AutoMQM (Fernandes et al. 2023)
- Reference: WMT Metrics Shared Task (Freitag et al. 2023)
- Reference: SummEval (Fabbri et al. 2020)
- Reference: Summarization Evaluation through QA (Eyal et al. 2019)
- Reference: Minimum Risk Training for NMT (Shen et al. 2015)
- Reference: REINFORCE (Williams 1992)
- Reference: Co-training (Blum and Mitchell 1998)
- Reference: Revisiting Self-training (He et al. 2020)
- Reference: Adding Baselines (Dayan 1990)
- Reference: Sequence-level Training for RNNs (Ranzato et al. 2016)
- Reference: PPO (Schulman et al. 2017)
- Reference: DPO (Rafailov et al. 2023)
Debugging and Interpretation (Feb 27)
Co-Lecturer Nishant Subramani
Content:
- Neural NLP model debugging methods
- Model interpretability: probing, mechanistic interpretability, and steering vectors
Slides
Recitation (OH): ZenoML
References
- Debugging References
- Recommended Reference: Interpretable Machine Learning (Molnar 2022)
- Reference: T5: Larger Models are Better (Raffel et al. 2020)
- Reference: Scaling Laws for Neural Language Models (Kaplan et al. 2020)
- Reference: Train Large, then Distill (Li et al. 2020)
- Reference: compare-mt (Neubig et al. 2019)
- Reference: ExplainaBoard (Liu et al. 2021)
- Probing References
- Reference: Edge Probing (Tenney et al. 2019a)
- Reference: BERT Rediscovers the Classical NLP Pipeline (Tenney et al. 2019b)
- Reference: Control Tasks for Probing (Hewitt et al. 2019)
- Reference: Probing Classifiers: Promises, Shortcomings, and Advances (Belinkov 2022)
- Reference: Information Theoretic Probing with MDL (Voita et al. 2020)
- Reference: Amnesic Probing (Elazar et al. 2021)
- Reference: Low-Complexity Probing (Cao et al. 2021)
- Reference: Pareto Probing (Pimentel et al. 2020)
- Mechanistic Interp References
- Reference: Zoom In: An Introduction to Circuits (Olah et al. 2020)
- Reference: A Mathmatical Framework for Transformer Circuits (Elhage et al. 2021)
- Reference: Induction Heads (Olsson et al. 2022)
- Reference: Toy Models of Superposition (Elhage et al. 2022)
- Model Interpretability References
- Reference: ROME (Meng et al. 2022)
- Reference: Steering Vectors in LSTMs (Subramani et al. 2019)
- Reference: Steering Vectors in Transformers v1 (Subramani and Suresh 2020)
- Reference: Steering Vectors in Transformers v2 (Subramani et al. 2022)
- Reference: Inference-time Interventions (Li et al. 2023)
- Reference: Activation Addition (Turner et al. 2023)
- Reference: Contrastive Activation Addition (Rimsky et al. 2023)
Ensembling and Mixture of Experts (Feb 29)
Content:
- Ensembling
- Model Merging
- Sparse Mixture of Experts
- Pipeline Models
Slides Multi-model Slides
References:
- Reference: Domain Differential Adaptation (Dou et al. 2019)
- Reference: Dexperts (Liu et al. 2021)
- Reference: Knowledge Distillation (Hinton et al. 2015)
- Reference: cuSPARSE
- Reference: NVIDIA Block Sparsity
- Reference: Sparsely Gated MOE (Shazeer et al. 2017)
- Code Example: Mistral MOE Implementation
- Reference: Weight Averaging for Neural Networks (Utans 1996)
- Reference: Model Soups (Wortsman et al. 2022)
- Software: MergeKit
- Reference: Task Vectors (Ilharco et al. 2022)
- Reference: TIES (Yadav et al. 2023)
- Reference: Stacking (Niehues et al. 2017)
- Reference: Deliberation Networks (Xia et al. 2017)
- Reference: Diffuser (Reid et al. 2022)
- Reference: Self-refine (Madaan et al. 2023)
Tour of Modern Large Language Models (Mar 12)
Content:
- Factors influencing openness in LMs
- Pythia
- OLMo
- LLaMa 2
- Mistral/Mixtral
- Qwen
- Code/Math/Science models
- Closed models
Slides Modern LM Slides
References
- Levels of Release in LMs (Liang et al. 2022)
- Pythia (Biderman et al. 2023)
- The Pile (Gao et al. 2021)
- OLMo (Groeneveld et al. 2024)
- LLaMa 2 (Touvron et al. 2023)
- Context Distillation (Askell et al. 2021)
- Mistral (Jiang et al. 2023)
- Mixtral (Jiang et al. 2023)
- Qwen (Bai et al. 2023)
- Starcoder (Li et al. 2023)
- Code LLaMA (Rozière et al. 2023)
- Llema (Azerbayev et al. 2023)
- Galactica (Taylor et al. 2022)
- GPT-4 (OpenAI 2023)
- Gemini (Gemini Team 2023)
- Claude (Anthropic 2023)
Long Sequence Models - Albert Gu (March 14)
Code Generation (March 19)
Content:
- Code Generation
Slides Code Generation Slides
References
- Github Copilot
- Github Copilot and Productivity
- HumanEval (Chen et al. 2021)
- CoNaLa (Yin et al. 2018)
- ODEX (Wang et al. 2022)
- CodeBLEU (Ren et al. 2020)
- Design2Code (Si et al. 2024)
- CodeBERTScore (Zhou et al. 2023)
- ARCADE (Yin et al. 2022)
- LiveCodeBench (Jain et al. 2024)
- SWEBench (Jiminez et al. 2023)
- InCoder (Fried et al. 2022)
- Copilot Explorer (Thakkar 2023)
- Retrieval-based Code Generation (Hayati et al. 2018)
- DocPrompting (Zhou et al. 2022)
- Code Generation w/ Execution (Shi et al. 2022)
- InterCode (Yang et al. 2023)
- Flashfill (Gulwani 2011)
- Terpret (Gaunt et al. 2016)
- CodeLLaMa (Roziere et al. 2023)
- DeepSeek Coder (Guo et al. 2024)
- StarCoder 2 (Lozhkov et al. 2024)
Knowledge Based QA (March 21)
Content:
- Knowledge Based QA
Slides Knowledge Based QA Slides
References
- Required Reading: Relation Extraction Jurafsky and Martin Chapter 17.2
- Reference: Relation Extraction Survey (Nickel et al. 2016)
- Reference: WordNet (Miller 1995)
- Reference: Cyc (Lenant 1995)
- Reference: DBPedia (Auer et al. 2007)
- Reference: YAGO (Suchanek et al. 2007)
- Reference: Babelnet (Navigli and Ponzetto 2010)
- Reference: Freebase (Bollacker et al. 2008)
- Reference: Wikidata (Vrandečić and Krötzsch 2014)
- Reference: Relation Extraction by Translating Embeddings (Bordes et al. 2013)
- Reference: Relation Extraction with Neural Tensor Networks (Socher et al. 2013)
- Reference: Relation Extraction by Translating on Hyperplanes (Wang et al. 2014)
- Reference: Relation Extraction by Representing Entities and Relations (Lin et al. 2015)
- Reference: Relation Extraction w/ Decomposed Matrices (Xie et al. 2017)
- Reference: Distant Supervision for Relation Extraction (Mintz et al. 2009)
- Reference: Relation Classification w/ Recursive NNs (Socher et al. 2012)
- Reference: Relation Classification w/ CNNs (Zeng et al. 2014)
- Reference: Open IE from the Web (Banko et al. 2007)
- Reference: ReVerb Open IE (Fader et al. 2011)
- Reference: Supervised Open IE (Stanovsky et al. 2018)
- Reference: Universal Schema (Riedel et al. 2013)
- Reference: Joint Entity and Relation Embedding (Toutanova et al. 2015)
- Reference: Distant Supervision for Neural Models (Luo et al. 2017)
- Reference: Relation Extraction w/ Tensor Decomposition (Sutskever et al. 2009)
- Reference: Relation Extraction via. KG Paths (Lao and Cohen 2010)
- Reference: Relation Extraction by Traversing Knowledge Graphs (Guu et al. 2015)
- Reference: Relation Extraction via Differentiable Logic Rules (Yang et al. 2017)
- Reference: Improving Embeddings w/ Semantic Knowledge (Yu et al. 2014)
- Reference: Improving Embeddings w/ Semantic Knowledge (Yu et al. 2014)
- Reference: Retrofitting Word Vectors to Semantic Lexicons (Faruqui et al. 2015)
- Reference: Multi-sense Embedding with Semantic Lexicons (Jauhar et al. 2015)
- Reference: Antonymy and Synonym Constraints for Word Embedding (Mrksic et al. 2016)
- Reference: Language Models as Knowledge Bases? (Petroni et al. 2019)
- Reference: How Can We Know What Language Models Know? (Jiang et al. 2019)
- Reference: AUTOPROMPT: Eliciting Knowledge from Language Models with Automatically Generated Prompts (Shin et al. 2020)
- Reference: GPT Understands, Too (Liu et al. 2021)
- Reference: How Much Knowledge Can You Pack Into the Parameters of a Language Model? (Roberts et al. 2020)
- Reference: X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models (Jiang et al. 2020)
- Reference: REALM: Retrieval-Augmented Language Model Pre-Training (Guu et al. 2020)
- Reference: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (Lewis et al. 2020)
- Reference: Multi-hop Reasoning in LMs (Jiang et al. 2022)
Bias and Fairness - Guest Lecture by Maarten Sap (March 26)
Language Agents (March 28)
Co-Lecturer Zhiruo Wang, Frank F. Xu
Content:
- Tool Use
- Language Agents
Slides Tool Use Slides
Slides Language Agents Slides
References
- Recommended Reading: What Are Tools Anyway? A Survey from the Language Model Perspective Wang et al. 2024
- Reference: Toolformer: Language Models Can Teach Themselves to Use Tools Schick et al. 2023
- Reference: ART: Automatic multi-step reasoning and tool-use for large language models Paranjape et al. 2023
- Reference: ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Qin et al. 2024
- Reference: Gorilla: Large Language Model Connected with Massive APIs Patil et al. 2023
- Reference: HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face Shen et al. 2023
- Reference: VOYAGER: An Open-Ended Embodied Agent with Large Language Models Wang et al. 2023
- Reference: TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks Wang et al. 2024
- Reference: Animal tool behavior: the use and manufacture of tools by animals Shumaker et al. 2011
- Reference: Artificial intelligence: a modern approach Russell, Stuart and Norvig, Peter, 2016
- Reference: A syntactic neural model for general-purpose code generation Yin et al. 2017
- Reference: Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments Chen et al. 2018
- Reference: ALFWorld: Aligning Text and Embodied Environments for Interactive Learning Shridhar et al. 2021
- Reference: MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge Fan et al. 2022
- Reference: Scaling Instructable Agents Across ManySimulated Worlds SIMA Team 2024
- Reference: Devin Devin Team 2024
- Reference: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Wei et al. 2022
- Reference: ReAct: Synergizing Reasoning and Acting in Language Models Yao et al. 2023
- Reference: PAL: Program-aided Language Models Gao et al. 2022
- Reference: Mind2Web: Towards a Generalist Agent for the Web Deng et al. 2023
- Reference: A Data-Driven Approach for Learning to Control Computers Humphreys et al. 2022
- Reference: WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents Yao et al. 2023
- Reference: WebArena: A Realistic Web Environment for Building Autonomous Agents Zhou et al. 2023
- Reference: Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks Gururangan et al. 2020
- Reference: Training language models to follow instructions with human feedback Ouyang et al. 2022
- Reference: Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents Song et al. 2024
- Reference: Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization Yao et al. 2023
Complex Reasoning (April 02)
Content:
- Types of Reasoning
- Pre-LLM Approaches
- Chain-of-thought and Variants
- Supervised Training for Reasoning
- Abductive Reasoning
Slides Reasoning Slides
References
- Recommended Reading: Towards Reasoning in Large Language Models: A Survey
- Reference: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Wei et al. 2022
- Reference: Let’s think Step by Step (Kojima et al. 2022)
- Reference: Self Ask (Press et al. 2022)
- Reference: Chain of Thought with Retrieval (He et al. 2023)
- Reference: Multilingual Chain of Thought (Shi et al. 2022)
- Reference: Complexity-based Prompting (Fu et al. 2022)
- Reference: Reliability of Explanations (Ye and Durrett 2022)
- Reference: Emergent Abilities are a Mirage (Schaeffer et al. 2023)
- Reference: Let’s Verify Step-by-step (Lightman et al. 2023)
- Reference: ORCA (Mukherjee et al. 2023)
- Reference: Rule Inference with LLMs (Qiu et al. 2023)
- Reference: LLMs can learn rules
- Reference: Goal-driven Discovery of Distributional Differences (Zhong et al. 2023)
Linguistics and Computational Linguistics (April 04)
Co-Lecturer: Lindia Tjuatja
Content:
- Linguistics and Computational Linguistics
Slides Linguistics and Computational Linguistics Slides
References
- Reference Automated reconstruction of ancient languages using probabilistic models of sound change (Bouchard-Côté et al. 2013)
- Reference Articulation GAN: Unsupervised modeling of articulatory learning (Beguš et al. 2023)
- Reference What do phone embeddings learn about Phonology? (Kolachina and Magyar 2019)
- Reference PWESuite: Phonetic Word Embeddings and Tasks They Facilitate (Zouhar et al. 2024)
- Reference Bootstrapping a Neural Morphological Analyzer for St. Lawrence Island Yupik from a Finite-State Transducer (Schwartz et al. 2019)
- Reference Prosodic Structure and Expletive Infixation (McCarthy 1982)
- Reference UCxn: Typologically Informed Annotation of Constructions Atop Universal Dependencies (Weissweiler et al. 2024)
- Reference Distributional Structure (Harris 1954)
- Reference COGS: A Compositional Generalization Challenge Based on Semantic Interpretation (Kim and Linzen 2020)
- Reference The Paradox of the Compositionality of Natural Language: A Neural Machine Translation Case Study (Dankers et al. 2022)
- Reference Which Linguist Invented the Lightbulb? Presupposition Verification for Question-Answering (Kim et al. 2021)
- Reference Predicting Pragmatic Reasoning in Language Games (Frank and Goodman 2012)
Multilingual NLP (April 09)
Content:
- Multilingual NLP
Slides Multilingual NLP Slides
References
- Reference: Google’s Multilingual Translation System (Johnson et al. 2016)
- Reference: Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT (Wu and Dredze 2019)
- Reference: Unsupervised Cross-lingual Representation Learning at Scale (Conneau et al. 2019)
Reference: Massively Multilingual NMT (Aharoni et al. 2019)
- Reference: Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges (Arivazhagan et al. 2019)
- Reference: Balancing Training for Multilingual Neural Machine Translation (Wang et al. 2020)
- Reference: Multi-task Learning for Multiple Language Translation (Dong et al. 2015)
- Reference: Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism (Firat et al. 2016)
- Reference: NLLB (NLLB Team 2022)
- Reference: Parameter Sharing Methods for Multilingual Self-Attentional Translation Models (Sachan and Neubig 2018)
- Reference: Contextual Parameter Generation for Universal Neural Machine Translation (Platanios et al. 2018)
- Reference: MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer (Pfeiffer et al. 2020)
- Reference: Pre-training Multilingual Experts (Pfeiffer et al. 2022)
- Reference: Cross-lingual Language Model Pretraining (Lample and Conneau 2019)
- Reference: Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks (Huang et al. 2019)
- Reference: Explicit Alignment Objectives (Hu et al. 2020)
- Reference: XTREME (Hu et al. 2020)
- Reference: XGLUE (Liang et al. 2020)
- Reference: XTREME-R (Ruder et al. 2021)
- Reference: Rapida Adaptation to New Languages (Neubig and Hu 2018)
- Reference: Meta-learning for Low-resource Translation (Gu et al. 2018)
- Reference: How multilingual is Multilingual BERT? (Pires et al. 2019)
- Reference: Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora (Yarowsky et al. 2001)
- Reference: Choosing Transfer Languages for Cross-Lingual Learning (Lin et al. 2019)
- Reference: Phonological Transfer for Entity Linking (Rijhwani et al. 2019)
- Reference: Handling Syntactic Divergence (Zhou et al. 2019)
- Reference: Support Vector Machine Active Learning with Applications to Text Classification (Tong and Koller 2001)
- Reference: Reducing labeling effort for structured prediction tasks (Culotta and McCallum 2005)
- Reference: Active Learning for Convolutional Neural Networks: A Core-Set Approach (Sener and Savarese 2017)
- Reference: A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers (Chaudhary et al. 2019)
State-of-the-art Chat Models and Evaluation - Hao Zhang (Apr 16)
Content:
- Guest lecture by Hao Zhang