Model Interpretation (4/28/2020)
This lecture will cover:
- Model interpretation methods
Reading Material:
- Required Reading (for quiz): Analysis in NLP Survey, to the end of Section 3 (Belinkov and Glass 2018)
- Reference: Understanding ConvNets (Karpathy 2016)
- Reference: Visualizing and understanding recurrent networks (Karpathy et al. 2015)
- Reference: Lstmvis: A tool for visual analysis of hidden state dynamics in recurrent neural networks (Strobelt et al. 2018)
- Reference: The mythos of interpretability (Lipton 2016)
- Reference: Towards a rigorous science of interpretable machine learning (Doshi-Velez et al. 2017)
- Reference: Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission (Caruana et al. 2015)
- Reference: Fine-grained analysis of sentence embeddings using auxiliary prediction tasks (Adi et al. 2016)
- Reference: What you can cram into a single vector: Probing sentence embeddings for linguistic properties (Conneau et al. 2018)
- Reference: Does string-based neural MT learn source syntax (Shi et al. 2016)
- Reference: Why neural translations are the right length (Shi et al. 2016)
- Reference: SPINE: SParse Interpretable Neural Embeddings (Subramanian et al. 2018)
- Reference: Interpretable semantic vectors from a joint model of brain-and text-based meaning (Fyshe et al. 2014)
- Reference: Sparse Overcomplete Word Vector Representations (Faruqui et al. 2014)
- Reference: Rationalizing neural predictions (Tao et al. 2016)
- Reference: Extracting automata from recurrent neural networks using queries and counterexamples (Weiss et al. 2018)
- Reference: Measuring Compositionality in Representation Learning (Andreas 2019)
- Reference: Beyond word importance: Contextual decomposition to extract interactions from LSTMs (Murdoch et al. 2018)
- Reference: Colorless green recurrent networks dream hierarchically (Gulordava et al. 2018)
- Reference: Learning to generate reviews and discovering sentiment (Radford et al. 2017)
Slides: Interpretation Slides