NLP Highlights

Allen Institute for Artificial Intelligence•soundcloud.com

**The podcast is currently on hiatus. For more active NLP content, check out the Holistic Intelligence Podcast linked below.** Welcome to the NLP highlights podcast, where we invite researchers to talk about their work in various areas in natural language processing. All views expressed belong to the hosts/guests, and do not represent their employers.

Last refreshed: January 3rd, 2026 at 4:41 AM ⓘ

Follow this podcast in the Metacast mobile app to refresh it and see new episodes.

Follow on

Apple Podcasts

Spotify

RSS

Podcasts are better in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

54 - Simulating Action Dynamics with Neural Process Networks, with Antoine Bosselut

ICLR 2018 paper, by Antoine Bosselut, Omer Levy, Ari Holtzman, Corin Ennis, Dieter Fox, and Yejin Choi. This is not your standard NLP task. This work tries to predict which entities change state over the course of a recipe (e.g., ingredients get combined into a batter, so entities merge, and then the batter gets baked, changing location, temperature, and "cookedness"). We talk to Antoine about the work, getting into details about how the data was collected, how the model works, and what some pos...

Mar 26, 2018•36 min

53 - Classical Structured Prediction Losses for Sequence to Sequence Learning, with Sergey and Myle

NAACL 2018 paper, by Sergey Edunov, Myle Ott, Michael Auli, David Grangier, and Marc'Aurelio Ranzato, from Facebook AI Research In this episode we continue our theme from last episode on structured prediction, talking with Sergey and Myle about their paper. They did a comprehensive set of experiments comparing many prior structured learning losses, applied to neural seq2seq models. We talk about the motivation for their work, what turned out to work well, and some details about some of their los...

Mar 21, 2018•27 min

52 - Sequence-to-Sequence Learning as Beam-Search Optimization, with Sam Wiseman

EMNLP 2016 paper by Sam Wiseman and Sasha Rush. In this episode we talk with Sam about a paper from a couple of years ago on bringing back some ideas from structured prediction into neural seq2seq models. We talk about the classic problems in structured prediction of exposure bias, label bias, and locally normalized models, how people used to solve these problems, and how we can apply those solutions to modern neural seq2seq architectures using a technique that Sam and Sasha call Beam Search Opt...

Mar 15, 2018•23 min

51 - A Regularized Framework for Sparse and Structured Neural Attention, with Vlad Niculae

NIPS 2017 paper by Vlad Niculae and Mathieu Blondel. Vlad comes on to tell us about his paper. Attentions are often computed in neural networks using a softmax operator, which maps scalar outputs from a model into a probability space over latent variables. There are lots of cases where this is not optimal, however, such as when you really want to encourage a sparse attention over your inputs, or when you have additional structural biases that could inform the model. Vlad and Mathieu have develop...

Mar 12, 2018•17 min

50 - Cardinal Virtues: Extracting Relation Cardinalities from Text, with Paramita Mirza

ACL 2017 paper, by Paramita Mirza, Simon Razniewski, Fariz Darari, and Gerhard Weikum. There's not a whole lot of work on numbers in NLP, and getting good information out of numbers expressed in text can be challenging. In this episode, Paramita comes on to tell us about her efforts to use distant supervision to learn models that extract relation cardinalities from text. That is, given an entity and a relation in a knowledge base, like "Barack Obama" and "has child", the goal is to extract _how ...

Feb 14, 2018•27 min

49 - A Joint Sequential and Relational Model for Frame-Semantic Parsing, with Bishan Yang

EMNLP 2017 paper by Bishan Yang and Tom Mitchell. Bishan tells us about her experiments on frame-semantic parsing / semantic role labeling, which is trying to recover the predicate-argument structure from natural language sentences, as well as categorize those structures into a pre-defined event schema (in the case of frame-semantic parsing). Bishan had two interesting ideas here: (1) use a technique similar to model distillation to combine two different model structures (her "sequential" and "r...

Feb 05, 2018•27 min

48 - Incidental Supervision: Moving Beyond Supervised Learning, with Dan Roth

AAAI 2017 paper, by Dan Roth. In this episode we have a conversation with Dan about what he means by "incidental supervision", and how it's related to ideas in reinforcement learning and representation learning. For many tasks, there are signals you can get from seemingly unrelated data that will help you in making predictions. Leveraging the international news cycle to learn transliteration models for named entities is one example of this, as is the current trend in NLP of using language models...

Jan 29, 2018•28 min

47 - Dynamic integration of background knowledge in neural NLU systems, with Dirk Weißenborn

How should you incorporate background knowledge into a neural net? A lot of people have been thinking about this problem, and Dirk Weissenborn comes on to tell us about his work in this area. Paper is with Tomáš Kočiský and Chris Dyer. https://arxiv.org/abs/1706.02596

Jan 24, 2018•36 min

46 - Parsing with Traces, with Jonathan Kummerfeld

TACL 2017 paper by Jonathan K. Kummerfeld and Dan Klein. Jonathan tells us about his work on parsing algorithms that capture traces and null elements in sentence structure. We spend the first third of the conversation talking about what these are and why they are interesting - if you want to correctly handle wh-movement, or coordinating structures, or control structures, or many other phenomena that we commonly see in language, you really want to handle traces and null elements, but most current...

Jan 08, 2018•39 min

45 - Build It, Break It workshop, with Allyson Ettinger and Sudha Rao

How robust is your NLP system? High numbers on common datasets can be misleading, as most systems are easily fooled by small modifications that would not be hard for humans to understand. Allyson Ettinger, Sudha Rao, Hal Daumé III, and Emily Bender organized a workshop trying to characterize this issue, inviting participants to either build robust systems, or try to break them with targeted examples. Allyson and Sudha come on the podcast to talk about the workshop. We cover the motivation of the...

Jan 02, 2018•38 min

44 - Truly Low Resource NLP, with Anders Søgaard

Anders talks with us about his line of work on doing NLP in languages where you have no linguistic resources other than a Bible translation or other religious works. He and his students have developed methods for annotation projection for both part of speech tagging and dependency parsing, aggregating information from many languages to predict annotations for languages where you have no training data. We talk about low-resource NLP generally, then dive into the specifics of the annotation projec...

Dec 07, 2017•48 min

43 - Reinforced Video Captioning with Entailment Rewards, with Ramakanth and Mohit

EMNLP 2017 paper by Ramakanth Pasunuru and Mohit Bansal Ram and Mohit join us to talk about their work, which uses reinforcement learning to improve performance on a video captioning task. They directly optimize CIDEr, a popular image/video captioning metric, using policy gradient methods, then use a modified version of CIDEr that penalizes the model when it fails to produce a caption that is _entailed_ by the correct caption. In our discussion, we hit on what video captioning is, what typical m...

Dec 04, 2017•48 min

42 - Generating Sentences by Editing Prototypes, with Kelvin Guu

Paper is by Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, and Percy Liang In this episode, Kelvin tells us how to build a language model that starts from a prototype sentence instead of starting from scratch, enabling much more grammatical and diverse language modeling results. In the process, Kelvin gives us a really good intuitive explanation for how variational autoencoders work, we talk about some of the details of the model they used, and some of the implications of the work - can you u...

Nov 30, 2017•39 min

41 - Cross-Sentence N-ary Relation Extraction with Graph LSTMs, with Nanyun (Violet) Peng

TACL 2017 paper, by Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. Most relation extraction work focuses on binary relations, like (Seattle, located in, Washington), because extracting n-ary relations is difficult. Nanyun (Violet) and her colleagues came up with a model to extract n-ary relations, focusing on drug-mutation-gene interactions, using graph LSTMs (a construct pretty similar to graph CNNs, which was developed around the same time). Nanyun comes on the po...

Nov 10, 2017•35 min

40 - On the State of the Art of Evaluation in Neural Language Models, with Gábor Melis

Recent arxiv paper by Gábor Melis, Chris Dyer, and Phil Blunsom. Gábor comes on the podcast to tell us about his work. He performs a thorough comparison between vanilla LSTMs and recurrent highway networks on the language modeling task, showing that when both methods are given equal amounts of hyperparameter tuning, LSTMs perform better, in contrast to prior work claiming that recurrent highway networks perform better. We talk about parameter tuning, training variance, language model evaluation,...

Nov 07, 2017•30 min

39 - Organizing the SemEval task on scientific information extraction, with Isabelle Augenstein

Isabelle Augenstein was the lead organizer of SemEval 2017 task 10, on extracting keyphrases and relations from scientific publications. In this episode we talk about her experience organizing the task, how the task was set up, and what the result of the task was. We also talk about some related work Isabelle did on multi-task learning for keyphrase boundary detection. https://www.semanticscholar.org/paper/SemEval-2017-Task-10-ScienceIE-Extracting-Keyphras-Augenstein-Das/71007219617d0f5e2419c5c1...

Nov 01, 2017•32 min

38 - A Corpus of Natural Language for Visual Reasoning, with Alane Suhr

ACL 2017 best resource paper, by Alane Suhr, Mike Lewis, James Yeh, and Yoav Artzi Alane joins us on the podcast to tell us about the dataset, which contains images paired with natural language descriptions of the images, where the task is to decide whether the description is true or false. Alane tells us about the motivation for creating the new dataset, how it was constructed, the way they elicited complex language from crowd workers, and why the dataset is an interesting target for future res...

Oct 30, 2017•23 min

37 - On Statistical Significance, Training Variance, and Why Reporting Score Distributions Matters

In this episode we talk about a couple of recent papers that get at the issue of training variance, and why we should not just take the max from a training distribution when reporting results. Sadly, our current focus on performance in leaderboards only exacerbates these issues, and (in my opinion) encourages bad science. Papers: https://www.semanticscholar.org/paper/Reporting-Score-Distributions-Makes-a-Difference-P-Reimers-Gurevych/0eae432f7edacb262f3434ecdb2af707b5b06481 https://www.semantics...

Oct 24, 2017•13 min

36 - Attention Is All You Need, with Ashish Vaswani and Jakob Uszkoreit

NIPS 2017 paper. We dig into the details of the Transformer, from the "attention is all you need" paper. Ashish and Jakob give us some motivation for replacing RNNs and CNNs with a more parallelizable self-attention mechanism, they describe how this mechanism works, and then we spend the bulk of the episode trying to get their intuitions for _why_ it works. We discuss the positional encoding mechanism, multi-headed attention, trying to use these ideas to replace encoders in other models, and wha...

Oct 23, 2017•41 min

35 - Replicability Analysis for Natural Language Processing, with Roi Reichart

TACL 2017 paper by Rotem Dror, Gili Baumer, Marina Bogomolov, and Roi Reichart. Roi comes on to talk to us about how to make better statistical comparisons between two methods when there are multiple datasets in the comparison. This paper shows that there are more powerful methods available than the occasionally-used Bonferroni correction, and using the better methods can let you make stronger, statistically-valid conclusions. We talk a bit also about how the assumptions you make about your data...

Oct 19, 2017•31 min

34 - Translating Neuralese, with Jacob Andreas

ACL 2017 paper by Jacob Andreas, Anca D. Dragan, and Dan Klein. Jacob comes on to tell us about the paper. The paper focuses on multi-agent dialogue tasks, where two learning systems need to figure out a way to communicate with each other to solve some problem. These agents might be figuring out communication protocols that are very different from what humans would come up with in the same situation, and Jacob introduces some clever ways to figure out what the learned communication protocol look...

Oct 17, 2017•32 min

33 - Entity Linking via Joint Encoding of Types, Descriptions, and Context, with Nitish Gupta

EMNLP 2017 paper by Nitish Gupta, Sameer Singh, and Dan Roth. Nitish comes on to talk to us about his paper, which presents a new entity linking model that both unifies prior sources of information into a single neural model, and trains that model in a domain-agnostic way, so it can be transferred to new domains without much performance degradation. https://www.semanticscholar.org/paper/Entity-Linking-via-Joint-Encoding-of-Types-Descrip-Gupta-Singh/a66b6a3ac0aa9af6c178c1d1a4a97fd14a882353

Oct 16, 2017•24 min

32 - The Effect of Different Writing Tasks on Linguistic Style, with Roy Schwartz

CoNLL 2017 paper, by Roy Schwartz, Maarten Sap, Ioannis Konstas, Leila Zilles, Yejin Choi, and Noah A. Smith. Roy comes on to talk to us about the paper. They analyzed the ROCStories corpus, which was created with three separate tasks on Mechanical Turk. They found that there were enough stylistic differences between the text generated from each task that they could get very good performance on the ROCStories cloze task just by looking at the style, ignoring the information you're supposed to us...

Oct 10, 2017•24 min

31 - Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling

ICLR 2017 paper by Hakan Inan, Khashayar Khosravi, Richard Socher, presented by Waleed. The paper presents some tricks for training better language models. It introduces a modified loss function for language modeling, where producing a word that is similar to the target word is not penalized as much as producing a word that is very different to the target (I've seen this in other places, e.g., image classification, but not in language modeling). They also give theoretical and empirical justifica...

Oct 06, 2017•11 min

30 - Probabilistic Typology: Deep Generative Models of Vowel Inventories

Paper by Ryan Cotterell and Jason Eisner, presented by Matt. This paper won the best paper award at ACL 2017. It's also quite outside the typical focus areas that you see at NLP conferences, trying to build generative models of vowel vocabularies in languages. That means we give quite a bit of set up, to try to help someone not familiar with this area understand what's going on. That makes this episode quite a bit longer than a typical non-interview episode. https://www.semanticscholar.org/paper...

Oct 05, 2017•31 min

29 - Neural machine translation via binary code prediction, with Graham Neubig

ACL 2017 paper, by Yusuke Oda and others (including Graham Neubig) at Nara Institute of Science and Technology (Graham is now at Carnegie Mellon University). Graham comes on to talk to us about neural machine translation generally, and about this ACL paper in particular. We spend the first half of the episode talking about major milestones in neural machine translation and why it is so much more effective than previous methods (spoiler: stronger language models help a lot). We then talk about th...

Jul 14, 2017•38 min

28 - Data Programming: Creating Large Training Sets, Quickly

NIPS 2016 paper by Alexander Ratner and coauthors in Chris Ré's group at Stanford, presented by Waleed. The paper presents a method for generating labels for an unlabeled dataset by combining a number of weak labelers. This changes the annotation effort from looking at individual examples to constructing a large number of noisy labeling heuristics, a task the authors call "data programming". Then you learn a model that intelligently aggregates information from the weak labelers to create a weigh...

Jul 11, 2017•25 min

27 - What do Neural Machine Translation Models Learn about Morphology?, with Yonatan Belinkov

ACL 2017 paper by Yonatan Belinkov and others at MIT and QCRI. Yonatan comes on to tell us about their work. They trained a neural MT system, then learned models on top of the NMT representation layers to do morphology tasks, trying to probe how much morphological information is encoded by the MT system. We talk about the specifics of their model and experiments, insights they got from doing these experiments, and how this work relates to other work on representation learning in NLP. https://www...

Jul 05, 2017•29 min

26 - Structured Attention Networks, with Yoon Kim

ICLR 2017 paper, by Yoon Kim, Carl Denton, Luong Hoang, and Sasha Rush. Yoon comes on to talk with us about his paper. The paper shows how standard attentions can be seen as an expected feature count computation, and can be generalized to other kinds of expected feature counts, as long as we have efficient, differentiable algorithms for computing those marginals, like the forward-backward and inside-outside algorithms. We talk with Yoon about how this works, the experiments they ran to test this...

Jun 30, 2017•26 min

25 - Neural Semantic Parsing over Multiple Knowledge-bases

ACL 2017 short paper, by Jonathan Herzig and Jonathan Berant. This is a nice, obvious-in-hindsight paper that applies a frustratingly-easy-domain-adaptation-like approach to semantic parsing, similar to the multi-task semantic dependency parsing approach we talked to Noah Smith about recently. Because there is limited training data available for complex logical constructs (like argmax, or comparatives), but the mapping from language onto these constructions is typically constant across domains, ...

Jun 28, 2017•11 min

← Prev Next →

For the best experience, listen in Metacast app for iOS or Android