Schedule
Introduction - Overview of NLP (Aug 27)
Content:
- What is natural language processing?
- What are the features of natural language?
- What do we want to do with NLP?
- What makes it hard?
- Building a rule-based classifier
- Training a bag-of-words classifier
Slides: Intro Slides
Code: Simple Text Classifiers
Reading Material
- Reference: Examining Power and Agency in Film (Sap et al. 2017)
Representing Words (Aug 29)
Content:
- Subword models
- Continuous word embeddings
- Training more complex models
- Neural network basics
- Visualizing word embeddings
Recitation (OH): PyTorch and SentencePiece
Slides: Word Representation and Text Classification Slides
Code: Subword Models, Text Classification
Reading Material
- Reference: Neural Machine Translation of Rare Words with Subword Units (Sennrich et al. 2015)
- Reference: Unigram Models for Subword Segmentation (Kudo 2018)
- Software: SentencePiece
- Reference: Exploring BERT’s Vocabulary (Ács 2019)
Language and Sequence Modeling (Sep 03)
Content:
- Language Modeling Problem Definition
- Count-based Language Models
- Measuring Language Model Performance: Accuracy, Likelihood, and Perplexity
- Log-linear Language Models
- Neural Network Basics
- Feed-forward Neural Network Language Models
Recitation (OH): N-Gram Language Model
Slides: Language Modeling Slides
Reading Material
- Highly Recommended Reading: Goldberg Book Chapter 8-9
- Reference: An Empirical Study of Smoothing Techniques for Language Modeling (Goodman 1998)
- Software: kenlm
- Reference: Maximum entropy (log-linear) language models. (Rosenfeld 1996)
- Reference: Lossless Data Compression with Arithmetic Coding. (neptune.ai 2023)
- Reference: Using the Output Embedding. (Press and Wolf 2016)
- Reference: A Neural Probabilistic Language Model. (Bengio et al. 2003)
- Reference: On the Calibration of Modern Neural Networks (Guo et al. 2017)
- Reference: How can we Know when Language Models Know (Jiang et al. 2020)
- Reference: Self-Consistency Improves Chain of Thought Reasoning in Language Models (Wang et al. 2022)
- Reference: Just Ask for Calibration (Tian et al. 2023)
- Reference: Can LLMs Express Their Uncertainty? (Xiong et al. 2023)
Attention and Transformers (Sep 05)
Content:
- Attention
- Transformer Architecture
- Multi-Head Attention
- Positional Encodings
- Layer Normalization
- Optimizers and Training
- LLaMa Architecture
Recitation (OH): Hugging Face Transformers, Annotated Transformer
Slides: Attention and Transformers Slides
Reading Material
- Recommended Reading: Neural Machine Translation and Sequence-to-Sequence Models Chapter 8
- Reference: RNNs (Elman 1990)
- Reference: LSTMs (Hochreiter and Schmidhuber 1997)
- Reference: Attentional NMT (Bahdanau et al. 2015)
- Reference: Effective Approaches to Attention (Luong et al. 2015)
- Reference: Self Attention (Cheng et al. 2016)
Reference: Attention is All You Need (Vaswani et al. 2017)
- Highly Recommended Reading: The Annotated Transformer
- Reference: Attention is All You Need (Vaswani et al. 2017)
- Reference: Attentional NMT (Bahdanau et al. 2015)
- Reference: Effective Approaches to Attention (Luong et al. 2015)
- Reference: Relative Positional Encodings (Shaw et al. 2018)
- Reference: RoPE (Su et al. 2021)
- Reference: Layer Normalization (Ba et al. 2016)
- Reference: RMSNorm (Zhang and Sennrich 2019)
- Reference: Pre- and Post-LayerNorm (Xiong et al. 2020)
- Reference: SiLU (Hendrycks and Gimpel 2016)
- Reference: AdamW (Loshchilov and Hutter 2017)
- Reference: LLaMa (Touvron et al. 2023)
- Reference: Comparison of Architectures (Gu and Dao 2023)
Pre-training and Pre-trained LLMs (Sep 10)
Content:
- Overview of pre-training
- Pre-training objectives
- Pre-training data
- Open vs. closed models
- Representative pre-trained models
Slides: Pretraining Slides
References
- Levels of Release in LMs (Liang et al. 2022)
- Pythia (Biderman et al. 2023)
- The Pile (Gao et al. 2021)
- OLMo (Groeneveld et al. 2024)
- LLaMa 2 (Touvron et al. 2023)
- Context Distillation (Askell et al. 2021)
- Mistral (Jiang et al. 2023)
- Mixtral (Jiang et al. 2023)
- Qwen (Bai et al. 2023)
- Starcoder (Li et al. 2023)
- Code LLaMA (Rozière et al. 2023)
- Llema (Azerbayev et al. 2023)
- Galactica (Taylor et al. 2022)
- GPT-4 (OpenAI 2023)
- Gemini (Gemini Team 2023)
- Claude (Anthropic 2023)
Instruction Tuning (Sep 12)
Co-Lecturer Xiang Yue
Content:
- Multi-tasking
- Fine-tuning and Instruction Tuning
- Parameter Efficient Fine-tuning
- Instruction Tuning Datasets
- Synthetic Data Generation
Slides: Instruction Tuning Slides
Reading Material
- Recommended Reading: Instructiion Tuning Survey (Zhang et al. 2023)
- Recommended Reading: FLAN Collection (Longpre et al. 2023)
- Recommended Reading: Unified View of PEFT (He et al. 2021)
- Reference: ZeRo (Rajbhandari et al. 2019)
- Reference: Adapters (Houlsby et al. 2019)
- Reference: Adapter Fusion (Pfeiffer et al. 2020)
- Reference: LoRa (Hu et al. 2021)
- Reference: QLoRA (Dettmers et al. 2023)
- Reference: BitFit (Ben Zaken et al. 2023)
- Reference: MMLU (Hendrycks et al. 2020)
- Reference: Natural Questions (Kwiatkowski et al. 2019)
- Reference: HumanEval (Chen et al. 2021)
- Reference: WikiSum (Liu et al. 2018)
- Reference: FLORES (Goyal et al. 2021)
- Reference: OntoNotes (Weischedel et al. 2013)
- Reference: BIGBench (Srivastava et al. 2022)
- Reference: Instruction Tuning (1) (Wei et al. 2021)
- Reference: Instruction Tuning (2) (Sanh et al. 2021)
- Reference: Learning to In-context Learn (Min et al. 2021)
- Reference: Self-instruct (Wang et al. 2022)
- Reference: ORCA (Mukherjee et al. 2023)
- Reference: Evol-Instruct (Xu et al. 2023)
Prompting and Complex Reasoning (Sep 17)
Content:
- Prompting Methods
- Sequence-to-sequence Pre-training
- Prompt Engineering
- Answer Engineering
- Multi-prompt Learning
- Prompt-aware Training Methods
- Types of Reasoning
- Chain-of-thought and Variants
- Supervised Training for Reasoning
Recitation(OH): OpenAI API, LiteLLM
Slides: Prompting Slides
Reading Material
- Recommended Reading: Prompting Survey
- Recommended Reading: Towards Reasoning in Large Language Models: A Survey
- Recommended Reading: Prompt Engineering Guide
- Reference: Unsupervised Prompting (Radford et al. 20)
- Reference: Few-shot Prompting (Brown et al. 2020)
- Reference: LiteLLM Prompt Templates (LiteLLM 2024)
- Reference: How to Format Inputs to ChatGPT Models (OpenAI Cookbook 2024)
- Reference: Prompt Ordering (Lu et al. 2021)
- Reference: Label Balance and Label Coverage (Zhang et al. 2022)
- Reference: What Makes In-context Learning Work (Min et al. 2022)
- Reference: Chain of Thought (Wei et al. 2022)
- Reference: Let’s think Step by Step (Kojima et al. 2022)
- Reference: Structuring Outputs as Programs (Madaan et al. 2022)
- Reference: Program Aided Language Models (Gao et al. 2022)
- Reference: Prompt Paraphrasing (Jiang et al. 2019)
- Reference: Iterative Prompt Paraphrasing (Zhou et al. 2021)
- Reference: AutoPrompt (Shin et al. 2020)
- Reference: Language Model’s Sensitivity to Prompts (Sclar et al. 2023)
- Reference: A Unified View of Parameter-efficient Transfer Learning (He et al. 2021)
- Reference: Adapters (Houlsby et al. 2019)
- Reference: Combining Prompting with Fine-tuning (Schick and Schütze 2020)
- Reference: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Wei et al. 2022
- Reference: Let’s think Step by Step (Kojima et al. 2022)
- Reference: Self Ask (Press et al. 2022)
- Reference: Chain of Thought with Retrieval (He et al. 2023)
- Reference: Multilingual Chain of Thought (Shi et al. 2022)
- Reference: Complexity-based Prompting (Fu et al. 2022)
- Reference: Reliability of Explanations (Ye and Durrett 2022)
- Reference: Emergent Abilities are a Mirage (Schaeffer et al. 2023)
- Reference: Let’s Verify Step-by-step (Lightman et al. 2023)
- Reference: ORCA (Mukherjee et al. 2023)
- Reference: Rule Inference with LLMs (Qiu et al. 2023)
- Reference: LLMs can learn rules
- Reference: Goal-driven Discovery of Distributional Differences (Zhong et al. 2023)
Reinforcement Learning (Sep 19)
Content:
- Methods to Gather Feedback
- Error and Risk
- Reinforcement Learning
- Stabilizing Reinforcement Learning
Slides: Reinforcement Learning and Human Feedback Slides
- Recommended Reading: Deep Reinforcement Learning Tutorial (Karpathy 2016)
- Recommended Reading: Human Feedback Survey (Fernandes et al. 2023)
- Reference: Course in Machine Learning Chapter 17 (Daume)
- Reference: Reinforcement Learning Textbook (Sutton and Barto 2016)
- Reference: TrueSkill (Sakaguchi et al. 2014)
- Reference: Multi-dimensional Quality Metrics
- Reference: Large-scale MQM Annotation (Freitag et al. 2021)
- Reference: BERTScore (Zhang et al. 2019)
- Reference: COMET (Rei et al. 2020)
- Reference: Prometheus 2 (Kim et al. 2023)
- Reference: AutoMQM (Fernandes et al. 2023)
- Reference: WMT Metrics Shared Task (Freitag et al. 2023)
- Reference: SummEval (Fabbri et al. 2020)
- Reference: Summarization Evaluation through QA (Eyal et al. 2019)
- Reference: Minimum Risk Training for NMT (Shen et al. 2015)
- Reference: REINFORCE (Williams 1992)
- Reference: Co-training (Blum and Mitchell 1998)
- Reference: Revisiting Self-training (He et al. 2020)
- Reference: Adding Baselines (Dayan 1990)
- Reference: Sequence-level Training for RNNs (Ranzato et al. 2016)
- Reference: PPO (Schulman et al. 2017)
- Reference: DPO (Rafailov et al. 2023)
Experimental Design and Human Annotation (Sep 24)
Content:
- Experimental Design
- Data Annotation
Slides: Experimental Design Slides
References:
- Recommended Reading: How to Avoid Machine Learning Pitfalls (Lones 2021)
- Recommended Reading: Best Practices for Data Annotation (Tseng et al. 2020)
- Recommended Viewing: How to Write a Great Research Paper (Peyton-Jones 2006)
- Reference: Sentiment Analysis (Pang et al. 2002)
- Reference: Conversational Question Answering (Reddy et al. 2019)
- Reference: Bottom-up Abstractive Summarization (Gehrmann et al. 2018)
- Reference: Unsupervised Word Segmentation (Kudo and Richardson 2018)
- Reference: Analyzing Language of Bias (Rankin et al. 2017)
- Reference: Are All Languages Equally Hard to Language-Model? (Cotterell et al. 2018)
- Reference: Modeling Podcasts (Reddy et al. 2021)
- Reference: BERT Rediscovers the Classical NLP Pipeline (Tenney et al. 2019)
- Reference: When and Why are Word Embeddings Useful in NMT? (Qi et al. 2018)
- Reference: Kappa Statistic (Carletta 1996)
- Reference: Downside of Surveys (Varian 1994)
- Reference: Penn Treebank Annotation Guidelines (Santoroni 1990)
- Reference: Data Statements for NLP (Bender and Friedman 2018)
- Reference: Power Analysis (Card et al. 2020)
- Reference: Active Learning (Settles 2009)
- Reference: Active Learning Curves (Settles and Craven 2008)
Retrieval and RAG (Sep 26)
Content:
- Retrieval Methods
- Retrieval Augmented Generation
- Long-context Transformers
Recitation(OH): LangChain or LlamaIndex
Slides: Retrieval Augmented Generation Slides
References
- Recommended Reading: ACL 2023 RAG Tutorial (Asai et al. 2023)
- Reference: Retrieval-based QA (Chen et al. 2017)
- Reference: Dense Passage Retrieval (Karpukhin et al. 2020)
- Reference: Retro (Borgeaud et al. 2021)
- Reference: Introduction to Information Retrieval (Manning et al. 2009)
- Software: Apache Lucene
- Reference: BehnamGader et al. 2024
- Reference: Echo Embeddings (Springer et al. 2024)
- Reference: DPR (Karpukhin et al. 2020)
- Reference: Contriever (Izacard et al. 2022)
- Software: FAISS
- Software: ChromaDB
- Reference: Instructor Embeddings (Su et al. 2022)
- Reference: Cross-encoder Reranking (Nogueira et al. 2019)
- Reference: Token-level Retrieval (Khattab and Zaharia 2020)
- Reference: Hypothetical Document Embeddings (Gao et al. 2022)
- Reference: Understanding NDCG (Hegde 2022)
- Reference: Mean Average Precision (Tan 2024)
- Reference: End-to-end RAG Training (Lewis et al. 2020)
- Reference: Toolformer (Schick et al. 2023)
- Reference: FLARE (Jiang et al. 2023)
- Reference: kNN-LM (Khandelwal et al. 2019)
- Reference: Unlimiformer (Bertsch et al. 2023)
- Reference: Deciding Whether to Use Passages (Asai et al. 2021)
- Reference: Learning to Filter Context (Wang et al. 2023)
Distillation, Quantization, and Pruning (Oct 01)
Co-Lecturer Vijay Viswanathan
Content:
- Distillation
- Quantization
- Pruning
Slides: Distillation Slides
References
- Recommended Reading: Theia Vogel’s blog on “How to make LLMs go fast”
- Recommended Reading: Lilian Weng’s blog on “Inference Optimization”
- Reference: Over-parametrization is provably useful in training neural nets (Du and Lee 2018)
- Reference: Model-Aware Quantization: GOBO (Zadeh et al. 2020)
- Software: Binarized Neural Networks (Courbariaux et al. 2016)
- Reference: Layer-by-Layer Quantization-Aware Distillation (Yao et al. 2020)
- Reference: QLoRA (Dettmers et al. 2023)
- Reference: Magnitude pruning (in general) (Han et al. 2015)
- Reference: An analysis of magnitude pruning for machine translation) (See et al. 2016)
- Reference: The Lottery Ticket Hypothesis) (Frankle et al. 2018)
- Reference: Wanda (Pruning by Weights and Activations) (Frankle et al. 2018)
- Reference: Are Sixteen Heads Really Better than One? (Michel and Neubig 2019)
- Reference: Pruning with Forward Passes (Dery et al 2024)
- Reference: Self-Training (Yarowski 1995)
- Reference: Hard vs Soft Target Distillation (Hinton et al 2015)
- Reference: Sequence-Level Distillation (Kim and Rush 2016)
- Reference: DistilBERT (Sanh et al 2019)
- Reference: Deep Learning is Robust to Massive Label Noise (Furlanello et al 2018)
- Reference: Born Again Neural Networks (Rolnick et al 2018)
- Reference: Self-Instruct (Wang et al 2022)
- Reference: Prompt2Model (Viswanathan et al 2023)
- Reference: SynthIE (Exploiting Asymmetry for Synthetic Training Data Generation) (Josifoski et al 2023)
- Reference: Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes (Hsieh et al 2024)
- Reference: Let’s Verify Step by Step (Lightman et al 2023)
- Reference: Retrieval-Augmented Distillation: DataTune (Gandhi et al 2024)
- Reference: Retrieval-Augmented Distillation: ReBase (Ge et al 2024)
- Reference: Retrieval-Augmented Distillation: SynthesizRR (Ge et al 2024)
- Reference: AI models collapse when trained on recursively generated data (Shumailov et al 2024)
Domain Specific Modeling: Code and Math (Oct 03)
Co-Lecturer Xiang Yue
Content:
- Code Generation Models
- Math Models
Slides: Code and Math Slides
References
Long Sequence Models (Oct 08)
Content:
- Transformer compute/memory complexity
- Extrapolation of trained models
- Alternative transformer architectures
- Non-attentional models
- Evaluation of long-context models
Slides: Long Context Slides
- Reference: Long Range Arena (Tay et al. 2020)
- Reference: SCROLLS (Shaham et al. 2022)
- Reference: Lost in the Middle (Liu et al. 2023)
- Reference: Needle in the Haystack (Kamradt 2023)
- Reference: RULER (Hsieh et al. 2024)
- Reference: Long-context In-context Learning (Bertsch et al. 2024)
- Reference: Long-term Conversational Memory (Maharana et al. 2024)
- Reference: Efficient Transformer Computation (Rabe and Staats 2021)
- Reference: Blockwise Parallel Attention (Liu and Abeel 2023)
- Reference: Ring Attention (Liu et al. 2023)
- Reference: Data Engineering for Long Context (Fu et al. 2024)
- Reference: Controlled Study of Long-context Generalization (Lu et al. 2024)
- Reference: Transformer-XL (Dai et al. 2019)
- Reference: Mistral (Jiang et al. 2023)
- Reference: Sparse Transformers (Child et al. 2019)
- Reference: Compressive Transformer (Rae et al. 2019)
- Reference: Linformer (Wang et al. 2020)
- Reference: Nystromformer (Xiong et al. 2021)
- Reference: Strucured State Space Models (Gu et al. 2021)
- Reference: Mamba (Gu and Dao 2023)
Ensembling and Mixture of Experts (Oct 10)
Content:
- Ensembling
- Model Merging
- Sparse Mixture of Experts
- Pipeline Models
Slides: Ensembling and MOE Slides
References:
- Reference: Domain Differential Adaptation (Dou et al. 2019)
- Reference: Dexperts (Liu et al. 2021)
- Reference: Knowledge Distillation (Hinton et al. 2015)
- Reference: cuSPARSE
- Reference: NVIDIA Block Sparsity
- Reference: Sparsely Gated MOE (Shazeer et al. 2017)
- Code Example: Mistral MOE Implementation
- Reference: Weight Averaging for Neural Networks (Utans 1996)
- Reference: Model Soups (Wortsman et al. 2022)
- Software: MergeKit
- Reference: Task Vectors (Ilharco et al. 2022)
- Reference: TIES (Yadav et al. 2023)
- Reference: Model Cascades (Chen et al. 2023)
- Reference: Model Routing (Schnitzer et al. 2023)
- Reference: Stacking (Niehues et al. 2017)
- Reference: Deliberation Networks (Xia et al. 2017)
- Reference: Diffuser (Reid et al. 2022)
- Reference: Self-refine (Madaan et al. 2023)
Tool Use and LLM Agent Basics (Oct 22)
Content:
- Agent Basics
- Agent Use Cases/Envrionments
- Tool Use
- Environment Representation
- Environment Understanding
- Reasoning and Planning
- Multi-agent Systems
Slides: Tool Use and LLM Agent Basics
References
- Recommended Reading: What Are Tools Anyway? A Survey from the Language Model Perspective Wang et al. 2024
- Reference: Toolformer: Language Models Can Teach Themselves to Use Tools Schick et al. 2023
- Reference: ToolkenGPT Hao et al. 2023
- Reference: CodeAct Wang et al. 2024
- Reference: OpenAI Fucntion Calling OpenAI 2024
- Reference: ART: Automatic multi-step reasoning and tool-use for large language models Paranjape et al. 2023
- Reference: ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Qin et al. 2024
- Reference: Gorilla: Large Language Model Connected with Massive APIs Patil et al. 2023
- Reference: HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face Shen et al. 2023
- Reference: VOYAGER: An Open-Ended Embodied Agent with Large Language Models Wang et al. 2023
- Reference: TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks Wang et al. 2024
- Reference: Animal tool behavior: the use and manufacture of tools by animals Shumaker et al. 2011
- Reference: Artificial intelligence: a modern approach Russell, Stuart and Norvig, Peter, 2016
- Reference: ALFWorld: Aligning Text and Embodied Environments for Interactive Learning Shridhar et al. 2021
- Reference: WebArena: A Realistic Web Environment for Building Autonomous Agents Zhou et al. 2023
- Reference: Set of Marks Yang et al. 2023
- Reference: VisualWebBench Liu et al. 2024
- Reference: MultiUI Liu et al. 2024
- Reference: Touchdown Chen et al. 2018
- Reference: SteP Sodhi et al. 2023
- Reference: Curiosity-driven Exploration by Self-supervised Prediction Pathak et al. 2017
- Reference: Agent Workflow Memory Wang et al. 2024
- Reference: BAGEL Murty et al. 2024
- Reference: Plan and solve prompting Wang et al. 2023
- Reference: Reflexion Shinn et al. 2023
- Reference: Single-agent Systems Neubig 2024
Agents for Software Development and Web Browsing (Oct 24)
Content:
- Coding Agents
- Web Browsing Agents
Slides: Software Development and Web Browsing Slides
References
- Reference: Why Software Is Eating the World (Andreessen 2011)
- Reference: Today was a Good Day: The Daily Life of Software Developers (Meyer et al. 2019)
- Reference: Levels of Autonomy in AI-Enhanced Software Engineering (Neubig 2024)
- Reference: Research Quantifying GitHub Copilots’ Impact on Developer Productivity and Happiness (Kalliamvakou 2022)
- Reference: HumanEval Chen et al. 2021
- Reference: MBPP Austin et al. 2021
- Reference: CoNaLa Yin et al. 2018
- Reference: ODEX Wang et al. 2022
- Reference: ARCADE Yin et al. 2022
- Reference: SWE-bench Jimenez et al. 2023
- Reference: LiveCodeBench Jain et al. 2024
- Reference: Design2Code Si et al. 2024
- Reference: World of Bits Shi et al. 2017
- Reference: Mind2Web Deng et al. 2023
- Reference: WebArena Zhou et al. 2023
- Reference: OSWorld Xie et al. 2023
- Reference: CodeAct Wang et al. 2024
- Reference: SWE-Agent Yang et al. 2024
- Reference: OpenHands Wang et al. 2024
- Reference: RepoMap Gauthier 2024
- Reference: Agentless Xie et al. 2024
- Reference: CodeRAGBench Wang et al. 2024
- Reference: DocPrompting Zhou et al. 2022
- Reference: CodeR Chen et al. 2024
- Reference: InterCode Yang et al. 2023
- Reference: Capture the Flag Yang et al. 2023
Project Discussion (Oct 29)
Content:
- Project Discussion
Multimodal Models (Oct 31)
Co-Lecturer Xiang Yue
Content:
- Evaluation of NLP tasks and LMs
- Multimodal
Slides: Evaluation and Multimodal Slides
References
- Reference: SQuAD Dataset (Rajpurkar et al., 2016)
- Reference: TriviaQA (Joshi et al., 2017)
- Reference: GLUE Benchmark (Wang et al., 2018)
- Reference: WMT 2023 Terminology (Semenov et al., 2023)
- Reference: SuperGLUE (Wang et al., 2019)
- Reference: Massive Multitask (Hendrycks et al., 2020)
- Reference: Chatbot Arena (Chiang et al., 2024)
- Reference: MT-Bench (Zheng et al., 2023)
- Reference: HELM (Liang et al., 2022)
- Reference: MMMU (Yue et al., 2024)
- Reference: Self-Attention in Vision (Ramachandran et al., 2019)
- Reference: CLIP (Radford et al., 2021)
- Reference: Vision Transformers (ViT) (Dosovitskiy et al., 2021)
- Reference: BEiT (Bao et al., 2021)
- Reference: BLIP-2 (Li et al., 2023)
- Reference: LLaVA (Liu et al., 2023)
- Reference: Chameleon (Chameleon Team, 2024)
- Reference: Diffusion Models (Weng, 2021)
Linguistics and Computational Linguistics (Nov 07)
Co-Lecturer: Lindia Tjuatja
Content:
- Linguistics and Computational Linguistics
Slides: Linguistics and Computational Linguistics Slides
References
- Reference Automated reconstruction of ancient languages using probabilistic models of sound change (Bouchard-Côté et al. 2013)
- Reference Articulation GAN: Unsupervised modeling of articulatory learning (Beguš et al. 2023)
- Reference What do phone embeddings learn about Phonology? (Kolachina and Magyar 2019)
- Reference PWESuite: Phonetic Word Embeddings and Tasks They Facilitate (Zouhar et al. 2024)
- Reference Bootstrapping a Neural Morphological Analyzer for St. Lawrence Island Yupik from a Finite-State Transducer (Schwartz et al. 2019)
- Reference Prosodic Structure and Expletive Infixation (McCarthy 1982)
- Reference UCxn: Typologically Informed Annotation of Constructions Atop Universal Dependencies (Weissweiler et al. 2024)
- Reference Distributional Structure (Harris 1954)
- Reference COGS: A Compositional Generalization Challenge Based on Semantic Interpretation (Kim and Linzen 2020)
- Reference The Paradox of the Compositionality of Natural Language: A Neural Machine Translation Case Study (Dankers et al. 2022)
- Reference Which Linguist Invented the Lightbulb? Presupposition Verification for Question-Answering (Kim et al. 2021)
- Reference Predicting Pragmatic Reasoning in Language Games (Frank and Goodman 2012)
Knowledge Based QA (Nov 12)
Content:
- Knowledge Based QA
Slides: Knowledge Based QA Slides
References
- Required Reading: Relation Extraction Jurafsky and Martin Chapter 17.2
- Reference: Relation Extraction Survey (Nickel et al. 2016)
- Reference: WordNet (Miller 1995)
- Reference: Cyc (Lenant 1995)
- Reference: DBPedia (Auer et al. 2007)
- Reference: YAGO (Suchanek et al. 2007)
- Reference: Babelnet (Navigli and Ponzetto 2010)
- Reference: Freebase (Bollacker et al. 2008)
- Reference: Wikidata (Vrandečić and Krötzsch 2014)
- Reference: WikiSP (Xu et al. 2023)
- Reference: Relation Extraction by Translating Embeddings (Bordes et al. 2013)
- Reference: Relation Extraction with Neural Tensor Networks (Socher et al. 2013)
- Reference: Relation Extraction by Translating on Hyperplanes (Wang et al. 2014)
- Reference: Relation Extraction by Representing Entities and Relations (Lin et al. 2015)
- Reference: Relation Extraction w/ Decomposed Matrices (Xie et al. 2017)
- Reference: Distant Supervision for Relation Extraction (Mintz et al. 2009)
- Reference: Relation Classification w/ Recursive NNs (Socher et al. 2012)
- Reference: Relation Classification w/ CNNs (Zeng et al. 2014)
- Reference: Open IE from the Web (Banko et al. 2007)
- Reference: ReVerb Open IE (Fader et al. 2011)
- Reference: Supervised Open IE (Stanovsky et al. 2018)
- Reference: Universal Schema (Riedel et al. 2013)
- Reference: Joint Entity and Relation Embedding (Toutanova et al. 2015)
- Reference: Distant Supervision for Neural Models (Luo et al. 2017)
- Reference: Relation Extraction w/ Tensor Decomposition (Sutskever et al. 2009)
- Reference: Relation Extraction via. KG Paths (Lao and Cohen 2010)
- Reference: Relation Extraction by Traversing Knowledge Graphs (Guu et al. 2015)
- Reference: Relation Extraction via Differentiable Logic Rules (Yang et al. 2017)
- Reference: Improving Embeddings w/ Semantic Knowledge (Yu et al. 2014)
- Reference: Retrofitting Word Vectors to Semantic Lexicons (Faruqui et al. 2015)
- Reference: Multi-sense Embedding with Semantic Lexicons (Jauhar et al. 2015)
- Reference: Antonymy and Synonym Constraints for Word Embedding (Mrksic et al. 2016)
- Reference: Language Models as Knowledge Bases? (Petroni et al. 2019)
- Reference: How Can We Know What Language Models Know? (Jiang et al. 2019)
- Reference: AUTOPROMPT: Eliciting Knowledge from Language Models with Automatically Generated Prompts (Shin et al. 2020)
- Reference: GPT Understands, Too (Liu et al. 2021)
- Reference: How Much Knowledge Can You Pack Into the Parameters of a Language Model? (Roberts et al. 2020)
- Reference: X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models (Jiang et al. 2020)
- Reference: REALM: Retrieval-Augmented Language Model Pre-Training (Guu et al. 2020)
- Reference: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (Lewis et al. 2020)
- Reference: Multi-hop Reasoning in LMs (Jiang et al. 2022)
Multilingual NLP (Nov 14)
Content:
- Multilingual NLP
Slides Multilingual NLP Slides
References
- Reference: Google’s Multilingual Translation System (Johnson et al. 2016)
- Reference: Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT (Wu and Dredze 2019)
- Reference: Unsupervised Cross-lingual Representation Learning at Scale (Conneau et al. 2019)
Reference: Massively Multilingual NMT (Aharoni et al. 2019)
- Reference: Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges (Arivazhagan et al. 2019)
- Reference: Balancing Training for Multilingual Neural Machine Translation (Wang et al. 2020)
- Reference: Multi-task Learning for Multiple Language Translation (Dong et al. 2015)
- Reference: Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism (Firat et al. 2016)
- Reference: BLEU Score (Papineni et al. 2002)
- Reference: chrF Score (Popović et al. 2015)
- Reference: COMET (Rei et al. 2020)
- Reference: GEMBA (Kocmi and Federmann 2023)
- Reference: NLLB (NLLB Team 2022)
- Reference: LASER3 Bitext Mining (Heffernan et al. 2022)
- Reference: ChatGPT MT (Robinson et al. 2023)
- Reference: Parameter Sharing Methods for Multilingual Self-Attentional Translation Models (Sachan and Neubig 2018)
- Reference: Contextual Parameter Generation for Universal Neural Machine Translation (Platanios et al. 2018)
- Reference: MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer (Pfeiffer et al. 2020)
- Reference: Pre-training Multilingual Experts (Pfeiffer et al. 2022)
- Reference: Cross-lingual Language Model Pretraining (Lample and Conneau 2019)
- Reference: Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks (Huang et al. 2019)
- Reference: Explicit Alignment Objectives (Hu et al. 2020)
- Reference: XTREME (Hu et al. 2020)
- Reference: MEGA (Ahuja et al. 2023)
- Reference: mT5 (Xue et al. 2020)
- Reference: Aya (Aryabumi et al. 2024)
- Reference: Tower (Alves et al. 2024)
- Reference: Rapid Adaptation to New Languages (Neubig and Hu 2018)
- Reference: Meta-learning for Low-resource Translation (Gu et al. 2018)
- Reference: How multilingual is Multilingual BERT? (Pires et al. 2019)
- Reference: Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora (Yarowsky et al. 2001)
- Reference: Choosing Transfer Languages for Cross-Lingual Learning (Lin et al. 2019)
- Reference: Phonological Transfer for Entity Linking (Rijhwani et al. 2019)
- Reference: Handling Syntactic Divergence (Zhou et al. 2019)
- Reference: Support Vector Machine Active Learning with Applications to Text Classification (Tong and Koller 2001)
- Reference: Reducing labeling effort for structured prediction tasks (Culotta and McCallum 2005)
- Reference: Active Learning for Convolutional Neural Networks: A Core-Set Approach (Sener and Savarese 2017)
- Reference: A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers (Chaudhary et al. 2019)
Safety and Security: Bias, Fairness and Privacy (Nov 19)
Content:
- Safety and Security: Bias, Fairness and Privacy
Slides Safety and Security: Bias, Fairness and Privacy Slides
Inference Algorithms - Sean Welleck (Nov 21)
Guest Lecturer Sean Welleck
Reading Material
- Highly Recommended Reading: Inference Algorithms Survey (Welleck et al. 2024)
Guest Lecture - Beidi Chen (Nov 26)
Guest Lecturer Beidi Chen