Course Schedule
Introduction
8/29 Class Introduction
Content:
- Introduction to Neural Networks
- Example Tasks and Their Difficulties
- What Neural Nets can Do To Help
Reading Material
- Highly Recommended: Goldberg Book Chapters 1-5 (this is a lot to read, but covers basic concepts in neural networks that many people in the class may have covered already. If you're already familiar with neural nets, skim it. If not, please read carefully and ask lots of questions to the TAs/instructor.)
- Reference: Deep Unordered Composition. (Iyyer et al.)
Slides: Class Intro Slides
Sample Code: Class Intro Code Examples
Lecture Video: Class Intro Lecture Video
8/31 A Simple (?) Exercise: Predicting the Next Word in a Sentence
Content:
- Computational Graphs
- Feed-forward Neural Network Language Models
- Measuring Model Performance: Likelihood and Perplexity
Reading Material
- Highly Recommended: Goldberg Book Chapters 8-9
- Reference: Goldberg Book Chapters 6-7 (because CS11-711 is a pre-requisite, I will assume you know most of this already, but it might be worth browsing for terminology, etc.)
- Reference: Maximum entropy (log-linear) language models. (Rosenfeld 1996)
- Reference: A Neural Probabilistic Language Model. (Bengio et al. 2003, JMLR)
- Reference: An Overview of Gradient Descent Algorithms. (Ruder 2016)
- Reference: The Marginal Value of Adaptive Gradient Methods. (Wilson et al. 2017)
- Reference: Stronger Baselines for Neural MT. (Denkowski Neubig 2017)
- Reference: Using the Output Embedding. (Press and Wolf 2016)
Slides: LM Slides
Sample Code: LM Code Examples
Lecture Video: LM Lecture Video
Section 1: Models of Words
9/5 Distributional Semantics and Word Vectors
Content:
- Describing a word by the company that it keeps
- Counting and predicting
- Skip-grams and CBOW
- Evaluating/Visualizing Word Vectors
- Advanced Methods for Word Vectors
Reading Material
- Required Reading (for quiz): Goldberg Book Chapters 10-11
- Reference: WordNet
- Reference: Linguistic Regularities in Continuous Representations (Mikolov et al. 2013)
- Reference: t-SNE (van der Maaten and Hinton 2008)
- Reference: Visualizing w/ PCA vs. t-SNE (Derksen 2016)
- Reference: How to use t-SNE effectively (Wattenberg et al. 2016)
- Reference: Evaluating Word Embeddings (Schnabel et al. 2015)
- Reference: Morphology-based Embeddings (Luong et al. 2013)
- Reference: Character-based Embeddings (Ling et al. 2015)
- Reference: Subword-based Embeddings (Bojankowski et al. 2017)
- Reference: Multi-prototype Embeddings (Reisinger and Mooney 2010)
- Reference: Non-parametric Multi-prototype Embeddings (Neelakantan et al. 2014)
- Reference: Cross-lingual Embeddings (Faruqui et al. 2014)
- Reference: Retrofitting to Lexicons (Faruqui et al. 2015)
- Reference: Sparse Word Embeddings (Murphy et al. 2012)
- Reference: De-biasing Word Embeddings (Bolukbasi et al. 2016)
- Softmax Approximations: Negative Sampling, Hierarchical Softmax
- Parallel Training
- Tips for Training on GPUs
- Highly Recommended Reading: Notes on Noise Contrastive Estimation and Negative Sampling (Dyer 2014)
- Reference: Importance Sampling (Bengio and Senécal, 2003)
- Reference: Noise Contrastive Estimation (Mnih and Teh, 2012)
- Reference: Negative Sampling (Goldberg and Levy, 2014)
- Reference: Mini-batching Sampling-based Softmax Approximations (Zoph et al., 2015)
- Reference: Class-based Softmax (Goodman 2001)
- Reference: Hierarchical Softmax (Morin and Bengio 2005)
- Reference: Error Correcting Codes (Dietterich and Bakiri 1995)
- Reference: Binary Code Prediction for Language (Oda et al. 2017)
- Bag of Words, Bag of n-grams, and Convolution
- Applications of Convolution: Context Windows and Sentence Modeling
- Stacked and Dilated Convolutions
- Structured Convolution
- Convolutional Models of Sentence Pairs
- Visualization for CNNs
- Required Reading (for quiz): Goldberg Book Chapter 13
- Reference: Time Delay Neural Networks (Waibel et al. 1989)
- Reference: Convolutional Neural Networks (LeCun et al. 1998)
- Reference: CNNs for Text (Collobert and Weston 2011)
- Reference: CNN for Modeling Sentences (Kalchbrenner et al. 2014)
- Reference: CNNs for Sentence Classification (Kim 2014)
- Reference: Dilated CNNs for Language Modeling (Kalchbrenner et al. 2016)
- Reference: Tree Convolution (Ma et al. 2015)
- Reference: Graph Convolution for Text (Marcheggiani and Titov 2017)
- Reference: Siamese Networks (Bromley et al. 1993)
- Reference: Convolutional Matching Model (Hu et al. 2014)
- Reference: Convolution + Sentence Pair Pooling (Yin and Schutze 2015)
- Reference: Understanding ConvNets (Karpathy 2016)
- Recurrent Networks
- Vanishing Gradient and LSTMs
- Strengths and Weaknesses of Recurrence in Sentence Modeling
- Pre-training for RNNs
- Required Reading (for quiz): Goldberg Book Chapter 14-15
- Other Reading: Goldberg Book Chapter 16 (will be covered in class)
- Reference: RNNs (Elman 1990)
- Reference: LSTM (Hochreiter and Schmidhuber 1997)
- Reference: Variants of LSTM (Greff et al. 2015)
- Reference: GRU (Cho et al. 2014)
- Reference: Pre-training RNNs (Dai and Le 2015)
- Reference: Visualizing Recurrent Nets (Karpathy et al. 2015)
- Reference: Learning Syntax from Translation (Shi et al. 2016)
- Reference: Learning Sentiment from LMs (Radford et al. 2017)
- Sentence Similarity
- Textual Entailment
- Paraphrase Identification
- Retrieval
- Highly Recommended Reading: Skip-thought Vectors (Kiros et al. 2015), good reference for several tasks
- Reference: Sentiment Treebank (Socher et al. 2013)
- Reference: Paraphrase Detection (Dolan et al. 2005)
- Reference: Paraphrase Detection w/ Matrix Factorization (Ji and Eisenstein 2013)
- Reference: Semantic Relatedness (Marelli et al. 2014)
- Reference: Manhattan LSTM (Mueller and Thyagarajan 2016)
- Reference: Recognizing Textual Entailment (Dagan et al. 2006)
- Reference: Stanford Natural Language Inference Corpus (Bowman et al. 2015)
- Reference: Multi-perspective Matching for NLI (Wang et al. 2017)
- Reference: Inference -> Generalization (Conneau et al. 2017)
- Reference: Text to Text Retrieval (Huang et al. 2013)
- Reference: Text to Image Retrieval (Socher et al. 2014)
- Reference: Resources on Locality Sensitive Hashing (Stack Overflow)
- Reference: Flickr 8k (Hodosh et al. 2013)
- Encoder-Decoder Models
- Conditional Generation and Search
- Ensembling
- Evaluation
- Types of Data to Condition On
- Required Reading (for quiz): Neural Machine Translation and Sequence-to-Sequence Models Chapter 7
- Reference: Recurrent Neural Translation Models (Kalchbrenner and Blunsom 2013)
- Reference: LSTM Encoder-Decoders (Sutskever et al. 2014)
- Reference: BLEU (Papineni et al. 2002)
- Reference: METEOR (Banerjee and Lavie 2005)
- Reference: Knowledge Distillation (Kim et al. 2016)
- Reference: Generation from Structured Data (Wen et al. 2015)
- Reference: Generation from Input+Tags (Zhou and Neubig 2017)
- Reference: Generation from Images (Karpathy and Li 2015)
- Reference: Generation from Recipe (Kiddon et al. 2016)
- Reference: Generation from TED Talks (Hoang et al. 2016)
- Attention
- What do We Attend To?
- Improvements to Attention
- Specialized Attention Varieties
- A Case Study: "Attention is All You Need"
- Required Reading (for quiz): Neural Machine Translation and Sequence-to-Sequence Models Chapter 8
- Reference: Attentional NMT (Bahdanau et al. 2015)
- Reference: Effective Approaches to Attention (Luong et al. 2015)
- Reference: Copying Mechanism (Gu et al. 2016)
- Reference: Attention-based Bias (Arthur et al. 2016)
- Reference: Attending to Images (Xu et al. 2015)
- Reference: Attending to Speech (Chan et al. 2015)
- Reference: Hierarchical Attention (Yang et al. 2016)
- Reference: Attending to Multiple Sources (Zoph and Knight 2015)
- Reference: Different Multi-source Strategies (Libovicky and Helcl 2017)
- Reference: Multi-modal Attention (Huang et al. 2016)
- Reference: Self Attention (Cheng et al. 2016)
- Reference: Attention is All You Need (Vaswani et al. 2017)
- Reference: Structural Biases in Attention (Cohn et al. 2015)
- Reference: Coverage Embedding Models (Mi et al. 2016)
- Reference: Interpretability w/ Hard Attention (Lei et al. 2016)
- Reference: Supervised Attention (Mi et al. 2016)
- Reference: Attention vs. Alignment (Koehn and Knowles 2016)
- Reference: Monotonic Attention (Yu et al. 2016)
- Reference: Convolutional Attention (Allamanis et al. 2016)
- The Structured Perceptron
- Structured Max-margin Objectives
- Simple Remedies to Exposure Bias
- Required Reading (for quiz): Goldberg Book Chapter 19-19.3
- Recommended Reading: Course in Machine Learning Chapter 17 (Daume)
- Reference: Conditional Random Fields (Lafferty et al. 2001)
- Reference: Structured Perceptron (Collins 2002)
- Reference: Structured Hinge Loss (Taskar et al. 2005)
- Reference: SEARN (Daume et al. 2006)
- Reference: DAgger (Ross et al. 2011)
- Reference: Dynamic Oracles (Goldberg and Nivre 2013)
- Reference: Training Neural Parsers w/ Dynamic Oracles (Ballesteros et al. 2016)
- Reference: Word Dropout (Gal and Ghahramani 2015)
- Reference: RAML (Norouzi et al. 2016)
- Why Local Independence Assumptions?
- Conditional Random Fields
- Required Reading (for quiz): Bidirectional LSTM-CRF Models for Sequence Tagging (Huang et al. 2015)
- Reference: Conditional Random Fields (Lafferty et al. 2001)
- Reference: An Introduction to CRFs (Sutton and McCallum 2011)
- Reference: Minimum Risk Training for Neural MT (Shen et al. 2016)
- Reference: Globally Normalized Networks (Andor et al. 2016)
- Reference: Reward Augmented Maximum Likelihood (Norouzi et al. 2016)
- Reference: Softmax Q-Distribution Estimation (Ma et al. 2016)
- Reference: End-to-end Sequence Labeling with BiLSTM-CNN-CRF (Ma et al. 2016)
- What is Transition-based Parsing?
- Shift-reduce Parsing w/ Feed-forward Nets
- Stack LSTM
- Transition-based Models for Phrase Structure
- A Simple Alternative: Linearized Trees
- Required Reading (for quiz): Dependency Parsing Jurafsky and Martin Chapter 14.1-14.4
- Reference: Shift-reduce Parsing (Yamada and Matsumoto 2003)
- Reference: Shift-reduce Parsing (Nivre 2003)
- Reference: Feature Engineering for Parsing (Zhang and Nivre 2011)
- Reference: Feed-forward Dependency Parsing (Chen and Manning 2014)
- Reference: Recursive RNNs (Socher et al. 2011)
- Reference: Tree-structured LSTM (Tai et al. 2015)
- Reference: Stack LSTM Dependency Parsing (Dyer et al. 2015)
- Reference: Shift-reduce Phrase Structure Parsing (Watanabe et al. 2015)
- Reference: Recurrent Neural Network Gramamrs (Dyer et al. 2016)
- Reference: Linearized Trees (Vinyals et al. 2015)
- What is Graph-based Parsing?
- Minimum Spanning Tree Parsing
- Structured Training and Other Improvements
- Dynamic Programming Methods for Phrase Structure Parsing
- Required Reading (for quiz): Graph-based Dependency Parsing Jurafsky and Martin Chapter 14.5-14.6
- Reference: Eisner Algorithm (Eisner 1996)
- Reference: Large-margin Training of Parsers (McDonald et al. 2005)
- Reference: Spanning-tree Algorithms (McDonald et al. 2005)
- Reference: Higher-order Dependency Parsing (Zhang and McDonald 2012)
- Reference: Graph-based Parsing w/ Neural Nets (Pei et al. 2015)
- Reference: BiLSTM Features for Graph-based Parsing (Kiperwasser and Goldberg 2016)
- Reference: Deep Bi-affine Attention (Dozat and Manning 2017)
- Reference: Probabilistic Parsing w/ Matrix Tree Theorem (Koo et al. 2007)
- Reference: Neural Probabilistic Parser (Ma and Hovy 2017)
- Reference: Neural CRF Parsing (Durrett and Klein 2015)
- Reference: Span-based Constituency Parsing (Stern et al. 2017)
- Reference: Inside-outside Recurrent Networks (Le and Zuidema 2014)
- Reference: Parsing as Language Modeling (Choe and Charniak 2016)
- Reference: Disentangling Reranking Effects (Fried et al. 2017)
- Combinatory Categorial Grammar and Lambda Calculus
- Graph-based Models of Semantics
- Shallow Semantics: Semantic Role Labeling
- Recommended Reading (no quiz this class, but material may be useful anyway): Jurafsky and Martin Chapters 17 and 18
- Reference: Geoquery (Zelle and Mooney 1996)
- Reference: Free917 (Cai and Yates 2013)
- Reference: Robocup (Wong and Mooney 2006)
- Reference: If This Then That (Quirk et al. 2015)
- Reference: Hearthstone Dataset/Latent Predictor Networks (Ling et al 2016)
- Reference: Django Dataset (Oda et al. 2015)
- Reference: Sequence-to-sequence Semantic Parsing (Jia and Liang 2016)
- Reference: Sequence-to-tree Parsing (Dong and Lapata 2016)
- Reference: Interfacing with FreeBase (Dong et al. 2015)
- Reference: Learning from Weak Supervision (Guu et al. 2017)
- Reference: Syntax for Code Generation (Yin and Neubig 2017)
- Reference: Abstract Meaning Representation (Banarescu et al. 2013)
- Reference: Minimal Recursion Semantics (Copestake et al. 2005)
- Reference: Universal Conceptual Cognitive Annotation (Abend and Rappoport 2013)
- Reference: Dependency->Semantics (Reddy et al. 2017)
- Reference: CCG->Semantics (Zettlemoyer and Collins 2005)
- Reference: Supertagging w/ LSTMs (Vaswani et al. 2016)
- Reference: Neural Parsing for AMR (Damonte et al. 2017)
- Reference: Neural Parsing for AMR (Peng et al. 2017)
- Reference: Graph Parsing w/ Linearized Trees (Buys and Blunsom 2017)
- Reference: Graph Parsing w/ "Remote" Transition (Hershcovich et al. 2017)
- Reference: Semantic Role Labeling (Gildea and Jurafsky 2002)
- Reference: Neural Semantic Role Labeling (He et al. 2017)
- Generative vs. Discriminative, Deterministic vs. Random Variables
- Variational Autoencoders
- Handling Discrete Latent Variables
- Examples of Variational Autoencoders in NLP
- Required Reading (for quiz): Tutorial on Variational Auto-encoders (Doersch 2016)
- Reference: Variational Auto-encoders (Kingma and Welling 2014)
- Reference: Generating Sentences from a Continuous Space (Bowman et al. 2016)
- Reference: Problems w/ Optimizing Latent Variables (Chen et al. 2017)
- Reference: Convoluton Decoders for VAE (Yang et al. 2017)
- Reference: Concrete Distribution (Maddison et al. 2017)
- Reference: Gumbel-Softmax (Jang et al. 2017)
- Reference: Variational Inference for Text Processing (Miao et al. 2016)
- Reference: Controllable Text Generation w/ VAE (Hu et al. 2017)
- Reference: Multi-space Variational Encoder-Decoders (Zhou and Neubig 2017)
- Reference: Recurrent Latent Variable Models (Chung et al. 2015)
- Reference: Language as a Latent Variable (Miao and Blunsom 2016)
- Reference: Emergence of Language in Multi-agent Games (Havrylov and Titov 2017)
- Reference: Natural Language Does Not Emerge Naturally (Kottur et al. 2017)
- What is Reinforcement Learning?
- Policy Gradient and REINFORCE
- Stabilizing Reinforcement Learning
- Value-based Reinforcement Learning
- Required Reading (for quiz): Deep Reinforcement Learning Tutorial (Karpathy 2016)
- Other Useful Reading: Reinforcement Learning Textbook (Sutton and Barto 2016)
- Reference: REINFORCE (Williams 1992)
- Reference: Co-training (Blum and Mitchell 1998)
- Reference: Adding Baselines (Dayan 1990)
- Reference: Sequence-level Training for RNNs (Ranzato et al. 2016)
- Reference: Experience Replay (Lin 1993)
- Reference: Neural Q Learning (Tesauro 1995)
- Reference: Intrinsic Reward (Schmidhuber 1991)
- Reference: Intrinsic Reward for Atari (Bellemare et al. 2016)
- Reference: Reinforcement Learning for Dialog (Young et al. 2013)
- Reference: End-to-end Neural Task-based Dialog (Williams and Zweig 2016)
- Reference: Neural Chat Dialog (Li et al. 2016)
- Reference: User Simulation for Learning in Dialog (Schatzmann et al. 2007)
- Reference: RL for Mapping Instructions to actions (Branavan et al. 2009)
- Reference: Deep RL for Mapping Instructions to Actions (Misra et al. 2017)
- Reference: RL for Text-based Grames (Narasimhan et al. 2015)
- Reference: Incremental Prediction in MT (Grissom et al. 2014)
- Reference: Incremental Neural MT (Gu et al. 2017)
- Reference: RL for Information Retrieval (Narasimhan et al. 2016)
- Reference: RL for Query Reformulation (Nogueira and Cho 2017)
- Reference: RL for Coarse-to-fine Question Answering (Choi et al. 2017)
- Reference: RL for Learning Neural Network Structure (Zoph and Le 2016)
- (Generative) Adversarial Networks
- Where to use the Adversary?: Features vs. Outputs
- GANs on Discrete Outputs
- Adversaries on Discrete Inputs
- Required Reading (for quiz): GAN Tutorial (Goodfellow 2017)
- Reference: Generative Adversarial Nets (Goodfellow et al. 2014)
- Reference: Example of Fuzzy Outputs (Lotter et al. 2015)
- Reference: Improved Techniques for Training GANs (Salimans et al. 2016)
- Reference: SeqGan (Yu et al. 2016)
- Reference: MT w/ GAN (Yang et al. 2017)
- Reference: MT w/ GAN (Wu et al. 2017)
- Reference: MT w/ Gumbel-Greedy Decoding (Gu et al. 2017)
- Reference: Dialog w/ GAN (Li et al. 2017)
- Reference: Perturbing Embeddings (Miyato et al. 2016)
- Reference: Adversarial Feature Learning for Domain Adaptation (Ganin et al. 2016)
- Reference: Adversarial Feature Learning for Bilingual Classification (Chen et al. 2016)
- Reference: Adversarial Feature Learning for Multilingual MT (Xie et al. 2017)
- Reference: Adversarial Feature Learning for Multi-task Classification (Liu et al. 2017)
- Reference: Adversarial Adaptation using Synthetic Data (Kim et al. 2017)
- Reference: Adversarial Feature Learning for Implicit Relation Classification (Qin et al. 2017)
- Reference: Professor Forcing (Lamb et al. 2016)
- Reference: Unsupervised Style Transfer for Text (Shen et al. 2017)
- Learning Features vs. Learning Structure
- Semi-supervised Learning Methods
- Unsupervised Learning Methods
- Design Decisions for Unsupervised Models
- Examples of Unsupervised Learning
- Interesting Reading (not required, no quiz) Linguistic Structure Prediction Chapter 4
- Reference: Unsupervised POS Induction w/ Word Embeddings (Lin et al. 2015)
- Reference: Unsupervised Neural Hidden Markov Models (Tran et al. 2016)
- Reference: Extracting Automata from RNNs (Giles et al. 1992)
- Reference: CRF Autoencoders (Ammar et al. 2014)
- Reference: Semi-supervised Prediction w/ Neural CRF Autoencoders (Zhang et al. 2017)
- Reference: Gated Convolution (Cho et al. 2014)
- Reference: Learning Grammar with RL (Yogatama et al. 2016)
- Reference: Learning to Compose Task-specific Tree Structures (Choi et al. 2017)
- Reference: Parsing w/ a Semantic Objective (Williams et al. 2017)
- Reference: What do RNN Grammars Learn About Syntax? (Kuncoro et al. 2017)
- Reference: Dependency Model with Valence (Klein and Manning 2004)
- Reference: Unsupervised Neural Dependency Parsing (Jiang et al. 2016)
- Reference: CRF Autoencoders for Unsuprevised Dependency Parsing (Cai et al. 2017)
- Reference: Learning Language-level Features (Malaviya et al. 2017)
- Reference: Embedded Segmental k-means Models (Kamper et al. 2017)
- Reference: Speech Segmentation (Elsner and Shain 2017)
- Reference: Word Discovery w/ Encoder-decoder Models (Boito et al. 2017)
- Models of Coreference
- Discourse Parsing
- Required Reading (for quiz): 15 Years in Co-reference (Ng 2010)
- Reference: End-to-end Neural Coreference Resolution (Lee et al. 2017)
- Reference: Deep Reinforcement Learning for Entity Ranking (Clark and Manning 2016)
- Reference: Entity-level Representations (Clark and Manning 2016)
- Reference: Global Features for Coreference (Wiseman et al. 2016)
- Reference: Anaphoricity and Antecedent Features (Wiseman et al. 2015)
- Reference: Coref, success and challenges (Ng 2016)
- Reference: Discourse-driven LMs (Peng and Roth 2016)
- Reference: Sentence-level LSTMs for Script Inference (Pichotta and Mooney 2016)
- Reference: Easy Victories and Uphill Battles (Durrett and Klein 2013)
- Reference: Solving Hard Coreference Problems (Peng et al. 2015)
- Reference: Entity-centric Coref (Clark and Manning 2015)
- Reference: Modular Entity-centric Model (Haghighi and Klein 2010)
- Reference: Discourse Structure for Text Categorization (Ji and Smith 2017)
- Reference: Adversarial Implicit Discourse Relation Classification (Qin et al. 2017)
- Reference: Recursive Deep Models for Discourse (Li et al. 2014)
- Reference: Attention-based Hierarchical Discourse (Li et al. 2016)
- Reference: Representation Learning for Text-level Discourse (Ji and Eisenstein 2014)
- Reference: Pay Attention to the Ending (Cai et al. 2017)
- Reference: Discourse Language Models (Chaturvedi et al. 2017)
- Chat-based Dialog
- Task-based Dialog
- Interesting Reading (no quiz): Dialog Systems and Chatbots Jurafsky and Martin Chapter 29
- Reference: Data-driven Dialog Response Generation (Ritter et al. 2011)
- Reference: Neural Dialog Response Generation (Sordoni et al. 2015)
- Reference: Neural Dialog Response Generation (Shang et al. 2015)
- Reference: Neural Dialog Response Generation (Vinyals and Le 2015)
- Reference: Context is Helpful for MT (Matsuzaki et al. 2015)
- Reference: Context is Not So Helpful for MT (Jean et al. 2017)
- Reference: Hierarchical Model for Dialog Generation (Serban et al. 2016)
- Reference: Discourse-level VAE (Zhao et al. 2017)
- Reference: Diversity Promoting Objective (Li et al. 2016)
- Reference: How Not to Evaluate your Dialog System (Liu et al. 2016)
- Reference: DeltaBLEU (Galley et al. 2015)
- Reference: Adversarial Evaluation (Li et al. 2017)
- Reference: Automatic Turing Test (Lowe et al. 2017)
- Reference: Personality Generation for Dialog (Mairesse et al. 2007)
- Reference: Persona-based Neural Dialog (Li et al. 2016)
- Reference: Dialog Response Retrieval (Lee et al. 2009)
- Reference: Neural Dialog Response Retrieval (Nio et al. 2014)
- Reference: Smart Reply (Kannan et al. 2016)
- Reference: Language Generation for Dialog (Wen et al. 2015)
- Reference: Neural Nets for Spoken Language Understanding (Mesnil et al. 2015)
- Reference: Dialog State Tracking (Williams et al. 2013)
- Reference: Neural Dialog State Tracking (Henderson et al. 2014)
- Reference: End-to-end Dialog Control (Williams et al. 2017)
- What are Knowledge Graphs/Ontologies?
- Relation Extraction from Embeddings
- Learning Embeddings from Relations
- Required Reading (for quiz): Relation Extraction Jurafsky and Martin Chapter 21.2
- Reference: Relation Extraction Survey (Nickel et al. 2016)
- Reference: WordNet (Miller 1995)
- Reference: Cyc (Lenant 1995)
- Reference: DBPedia (Auer et al. 2007)
- Reference: YAGO (Suchanek et al. 2007)
- Reference: Babelnet (Navigli and Ponzetto 2010)
- Reference: Freebase (Bollacker et al. 2008)
- Reference: Wikidata (Vrandečić and Krötzsch 2014)
- Reference: Relation Extraction by Translating Embeddings (Bordes et al. 2013)
- Reference: Relation Extraction with Neural Tensor Networks (Socher et al. 2013)
- Reference: Relation Extraction by Translating on Hyperplanes (Wang et al. 2014)
- Reference: Relation Extraction by Representing Entities and Relations (Lin et al. 2015)
- Reference: Relation Extraction w/ Decomposed Matrices (Xie et al. 2017)
- Reference: Distant Supervision for Relation Extraction (Mintz et al. 2009)
- Reference: Relation Classification w/ Recursive NNs (Socher et al. 2012)
- Reference: Relation Classification w/ CNNs (Zeng et al. 2014)
- Reference: Joint Entity and Relation Embedding (Toutanova et al. 2015)
- Reference: Distant Supervision for Neural Models (Luo et al. 2017)
- Reference: Relation Extraction w/ Tensor Decomposition (Sutskever et al. 2009)
- Reference: Relation Extraction via. KG Paths (Lao and Cohen 2010)
- Reference: Relation Extraction by Traversing Knowledge Graphs (Guu et al. 2015)
- Reference: Relation Extraction via Differentiable Logic Rules (Yang et al. 2017)
- Reference: Improving Embeddings w/ Semantic Knowledge (Yu et al. 2014)
- Reference: Improving Embeddings w/ Semantic Knowledge (Yu et al. 2014)
- Reference: Retrofitting Word Vectors to Semantic Lexicons (Faruqui et al. 2015)
- Reference: Multi-sense Embedding with Semantic Lexicons (Jauhar et al. 2015)
- Reference: Antonymy and Synonym Constraints for Word Embedding (Mrksic et al. 2016)
- TBD
- No quiz
- Reference: MCTest (Richardson et al. 2013)
- Reference: RACE (Lai et al. 2017)
- Reference: SQuAD (Rajpurkar et al. 2016)
- Reference: TriviaQA (Joshi et al. 2017)
- Reference: Teaching Machines to Read and Comprehend (Hermann et al. 2015)
- Reference: Attention Sum (Kadlec et al. 2016)
- Reference: Attention over Attention (Cui et al. 2017)
- Reference: Bidirectional Attention Flow (Seo et al. 2017)
- Reference: Dynamic Coattention Networks (Xiong et al. 2017)
- Reference: Gated Attention Readers (Dhingra et al. 2017)
- Reference: Memory Networks (Weston et al. 2015)
- Reference: End-to-end Memory Networks (Sukhbaatar et al. 2015)
- Reference: Dynamic Memory Networks (Kumar et al. 2016)
- Reference: Learning to Stop Reading (Shen et al. 2017)
- Reference: Coarse-to-fine Question Answering (Choi et al. 2017)
- Reference: bAbI Dataset (Weston et al. 2015)
- Reference: NLP in Prolog (Pereira and Shieber 2002), Example Code
- Reference: A Thorough Examination of the CNN/Daily Mail Task (Chen et al. 2016)
- Reference: Adversarial Examples in SQuAD (Jia and Liang 2017)
- Identifying problems
- Debugging training time problems
- Debugging test time problems
- Interesting Reading
- Reference: Highway Networks (Srivastava et al. 2015)
- Reference: Residual Connections (He et al. 2015)
- Reference: Rethinking Generalization (Zhang et al. 2017)
- Reference: Marginal Value of Adaptive Gradient Methods (Wilson et al. 2017)
- Reference: Adam w/ Learning Rate Decay (Denkowski and Neubig 2017)
- Reference: Dropout (Srivastava et al. 2014)
- Reference: Recurrent Dropout (Gal and Ghahramani 2015)
- Reference: Minibatch Creation Strategies (Morishita et al. 2017)
- Reference: Decoding Problems (Koehn and Knowles 2017)
- Beam Search
- A* Search
- Search w/ Future Costs
- No Quiz
- Reference: Google’s Neural Machine Translation System (Wu et al. 2016)
- Reference: Multinomial Length Normalization (Eriguchi et al. 2016)
- Reference: Average Length Normalization (Cho et al. 2014)
- Reference: Mutual Information and Diverse Decoding (Li et al., 2016)
- Reference: Generating High-Quality and Informative Conversation Responses (Shao et al., 2017)
- Reference: Effective Inference for Generative Neural Parsing (Stern et al., 2017)
- Reference: Beam-Search Optimization (Wiseman et al., 2016)
- Reference: Continuous Beam Search (Goyal et al. 2017)
- Reference: A* Parsing (Klein et al., 2003)
- Reference: LSTM CCG Parsing (Lewis et al. 2014)
- Reference: Global Neural CCG Parsing (Lee et al. 2016)
- Reference: Learning to Decode for Future Success (Li et al., 2017)
- Reference: Generative Transition-based Dependency Parsing (Buys et al., 2015)
- Reference: Recurrent Neural Network Grammars (Dyer et al. 2016)
- Reference: Monte Carlo Tree Search (Kumagai et al. 2017)
- What is Multi-task Learning?
- Methods for Multi-task Learning
- Multi-task Objectives for NLP
- Required Reading (for quiz): Multi-task Learning in Neural Networks and Multi-task Objectives for NLP (Ruder 2017)
- Reference: Natural Language Processing from Scratch (Collobert et al. 2011)
- Reference: Regularization Techniques (Barone et al. 2017)
- Reference: Word Representations (Turian et al. 2010)
- Reference: Semi-supervised Sequence Learning (Dai and Le 2015)
- Reference: Gaze Prediction + Summarization (Klerke et al. 2016)
- Reference: Selective Transfer (Zoph et al. 2016)
- Reference: Soft Parameter Tying (Duong et al. 2015)
- Reference: Translation-based Encoder Pretraining (McCann et al. 2017)
- Reference: Bidirectional Language Model Pretraining (Peters et al. 2017)
- Reference: Pre-training for MT (Luong et al. 2015)
- Reference: Domain Adaptation via Feature Augmentation (Kim et al. 2016)
- Reference: Feature Augmentation w/ Tags (Chu et al. 2017)
- Reference: Unsupervised Adaptation (Long et al. 2015)
- Reference: Multilingual MT (Johnson et al. 2017)
- Reference: Muiltilingual MT (Ha et al. 2016)
- Reference: Teacher-student Multilingual NMT (Chen et al. 2017)
- Reference: Multiple Annotation Standards for Semantic Parsing (Peng et al. 2017)
- Reference: Multiple Annotation Standards for Word Segmentation (Chen et al. 2017)
- Reference: Modeling Annotator Variance (Guan et al. 2017)
- Reference: Different Layers for Different Tasks (Hashimoto et al. 2017)
- Reference: Polyglot Language Models (Tsvetkov et al. 2016)
- Reference: Many Languages One Parser (Ammar et al. 2016)
- Reference: Multilingual Relation Extraction (Lin et al. 2017)
Slides: Word Embedding Slides
Sample Code: Word Embedding Code Examples
Lecture Video: Word Embedding Lecture Video
9/7 Why is word2vec So Fast?: Speed Tricks for Neural Nets
Content:
(Guest Lecture: Taylor Berg-Kirkpatrick)
Reading Material
Slides: Efficiency Slides
Sample Code: Efficiency Code Examples
Lecture Video: Efficiency Lecture Video
Section 2: Models of Sentences
9/12 Convolutional Networks for Text
Content:
Reading Material
Slides: CNN Slides
Sample Code: CNN Code Examples
Lecture Video: CNN Lecture Video
9/14 Recurrent Networks for Sentence or Language Modeling
Content:
Reading Material
Slides: RNN Slides
Sample Code: RNN Code Examples
Lecture Video: RNN Lecture Video
9/19 Using/Evaluating Sentence Representations
Content:
Reading Material
Slides: Sentence Representation Slides
Sample Code: Sentence Representation Code Examples
Lecture Video: Sentence Representation Video
Section 3: Sequence-to-sequence Models
9/21 Conditioned Generation
Content:
Reading Material
Slides: Conditional LM Slides
Sample Code: Conditional LM Code Examples
Lecture Video: Conditional LM Lecture Video
9/26 Attention
Content:
Reading Material
Slides: Attention Slides
Sample Code: Attention Code Examples
Lecture Video: Attention Lecture Video
Section 4: Structured Prediction Models
9/28 Search-based Structured Prediction
Content:
Slides: Structured Prediction Slides
Sample Code: Structured Prediction Code Examples
Lecture Video: Attention Lecture Video
10/3 Structured Prediction with Local Independence Assumptions
Content:
Slides: CRF Slides
Lecture Video: CRF Video
Section 4: Syntactic/Semantic Parsing Models
10/5 Transition-based Parsing Models
Content:
Slides: Transition-based Parsing Slides
Sample Code: Transition-based Parsing Code Examples
Lecture Video: Transition-based Parsing Video
10/10 Parsing with Dynamic Programs
Content:
Slides: DP Parsing Slides
Sample Code: DP Parsing Code Examples
Lecture Video: DP Parsing Lecture Video
10/12 Neural Semantic Parsing
Content:
Slides: Semantic Parsing Slides
Sample Code: Semantic Parsing Code Examples
Lecture Video: Semantic Parsing Lecture Video
Section 5: Advanced Learning Techniques
10/17 Latent Random Variable Models
Content:
Slides: Latent Variable Slides
Sample Code: Latent Variable Code Examples
Lecture Video: Latent Variable Lecture Video
10/19 Reinforcement Learning
Slides: Reinforcement Learning Slides
Sample Code: Reinforcement Learning Code Examples
Lecture Video: Reinforcement Learning Lecture Video
10/24 Adversarial Networks
Content:
Slides: Adversarial Slides
Lecture Video: Adversarial Lecture Video
10/26 Semi-supervised and Unsupervised Learning of Structure
Slides: Unsupervised/Semi-supervised Slides
Lecture Video: Unsupervised/Semi-supervised Lecture Video
Section 6: Models of Documents and Discourse
10/31 Coreference and Discourse Parsing
Slides: Document-level Model Slides
Lecture Video: Document-level Model Lecture Video
11/2 Models of Dialog
Slides: Dialog Slides
Lecture Video: Dialog Lecture Video
Section 7: Neural Networks and Knowledge
11/7 Learning from/for Knowledge Graphs
Slides: Knowledge Graph Slides
Lecture Video: Knowledge Graph Video
11/9 Machine Reading w/ Neural Nets
Slides: Machine Reading Slides
Lecture Video: Machine Reading Video
11/14 Debugging Neural Nets (for NLP)
Slides: Debugging Slides
Section 8: Search, Multi-lingual and Multi-task Learning
11/16 Advanced Search Algorithms
Slides: Search Slides
11/21 Multi-task, Multi-lingual Learning Models
Slides: Multitask Slides
11/23 Thanksgiving -- NO CLASS
11/28 Multi-modal Learning
Content: (Guest Lecture: LP Morency)
Slides: Multimodal Slides