NLP Highlights

Allen Institute for Artificial Intelligence•soundcloud.com

**The podcast is currently on hiatus. For more active NLP content, check out the Holistic Intelligence Podcast linked below.** Welcome to the NLP highlights podcast, where we invite researchers to talk about their work in various areas in natural language processing. All views expressed belong to the hosts/guests, and do not represent their employers.

Last refreshed: January 3rd, 2026 at 4:41 AM ⓘ

Follow this podcast in the Metacast mobile app to refresh it and see new episodes.

Follow on

Apple Podcasts

Spotify

RSS

Podcasts are better in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

84 - Large Teams Develop, Small Groups Disrupt, with Lingfei Wu

In a recent Nature paper, Lingfei Wu (Ling) suggests that smaller teams of scientists tend to do more disruptive work. In this episode, we invite Ling to discuss their results, how they define disruption and possible reasons why smaller teams may be better positioned to do disruptive work. We also touch on robustness of the disruption metric, differences between research disciplines, and sleeping beauties in science. Lingfei Wu’s homepage: https://www.knowledgelab.org/people/detail/lingfei_wu/ P...

Mar 26, 2019•39 min

83 - Knowledge Base Construction, with Sebastian Riedel

In this episode, we invite Sebastian Riedel to talk about knowledge base construction (KBC). Why is it an important research area? What are the tradeoffs between using an open vs. closed schema? What are popular methods currently used, and what challenges prevent the adoption of KBC methods? We also briefly discuss the AKBC workshop and its graduation into a conference in 2019. Sebastian Riedel's homepage: http://www.riedelcastro.org/ AKBC conference: http://www.akbc.ws/2019/

Mar 13, 2019•38 min

82 - Visual Reasoning, with Yoav Artzi

In this episode, Yoav Artzi joins us to talk about visual reasoning. We start by defining what visual reasoning is, then discuss the pros and cons of different tasks and datasets. We discuss some of the models used for visual reasoning and how they perform, before ending with open questions in this young, exciting research area. Yoav Artzi: https://yoavartzi.com/ NLVR: https://github.com/clic-lab/nlvr/tree/master/nlvr NLVR2: https://github.com/clic-lab/nlvr/tree/master/nlvr2 CLEVR dataset: https...

Mar 06, 2019•42 min

81 - BlackboxNLP, with Afra Alishahi and Tal Linzen

Neural models recently resulted in large performance improvements in various NLP problems, but our understanding of what and how the models learn remains fairly limited. In this episode, Tal Linzen and Afra Alishahi talk to us about BlackboxNLP, an EMNLP’18 workshop dedicated to the analysis and interpretation of neural networks for NLP. In the workshop, computer scientists and cognitive scientists joined forces to probe and analyze neural NLP models. BlackboxNLP 2018 website: https://blackboxnl...

Feb 06, 2019•31 min

80 - Leaderboards and Science, with Siva Reddy

Originally used to entice fierce competitions in arcade games, leaderboards recently made their way into NLP research circles. Leaderboards could help mitigate some of the problems in how researchers run experiments and share results (e.g., accidentally overfitting models on a test set), but they also introduce new problems (e.g., breaking author anonymity in peer reviewing). In this episode, Siva Reddy joins us to talk about the good, the bad, and the ugly of using leaderboards in science. We a...

Jan 29, 2019•30 min

79 - The glass ceiling in NLP, with Natalie Schluter

In this episode, Natalie Schluter talks to us about a data-driven analysis of career progression of male vs. female researchers in NLP through the lens of mentor-mentee networks based on ~20K papers in the ACL anthology. Directed edges in the network describe a mentorship relation from the last author on a paper to the last author, and author names were annotated for gender when possible. Interesting observations include the increase of percentage of mentors (regardless of gender), and an increa...

Jan 21, 2019•27 min

78. Where do corpora come from?, with Matt Honnibal and Ines Montani

Most NLP projects rely crucially on the quality of annotations used for training and evaluating models. In this episode, Matt and Ines of Explosion AI tell us how Prodigy can improve data annotation and model development workflows. Prodigy is an annotation tool implemented as a python library, and it comes with a web application and a command line interface. A developer can define input data streams and design simple annotation interfaces. Prodigy can help break down complex annotation decisions...

Jan 15, 2019•30 min

77. On Writing Quality Peer Reviews, with Noah A. Smith

It's not uncommon for authors to be frustrated with the quality of peer reviews they receive in (NLP) conferences. In this episode, Noah A. Smith shares his advice on how to write good peer reviews. The structure Noah recommends for writing a peer review starts with a dispassionate summary of what a paper has to offer, followed by the strongest reasons the paper may be accepted, followed by the strongest reasons it may be rejected, and concludes with a list of minor, easy-to-fix problems (e.g., ...

Jan 07, 2019•38 min

76 - Increasing In-Class Similarity by Retrofitting Embeddings with Demographics, with Dirk Hovy

EMNLP 2018 paper by Dirk Hovy and Tommaso Fornaciari. https://www.semanticscholar.org/paper/Improving-Author-Attribute-Prediction-by-Linguistic-Hovy-Fornaciari/71aad8919c864f73108aafd8e926d44e9df51615 In this episode, Dirk Hovy talks about natural language as social phenomenon which can provide insights about those who generate it. For example, this paper uses retrofitted embeddings to improve on two tasks: predicting the gender and age group of a person based on their online reviews. In this ap...

Nov 27, 2018•30 min

75 - Reinforcement / Imitation Learning in NLP, with Hal Daumé III

In this episode, we invite Hal Daumé to continue the discussion on reinforcement learning, focusing on how it has been used in NLP. We discuss how to reduce NLP problems into the reinforcement learning framework, and circumstances where it may or may not be useful. We discuss imitation learning, roll-in and roll-out, and how to approximate an expert with a reference policy. DAgger: https://www.semanticscholar.org/paper/A-Reduction-of-Imitation-Learning-and-Structured-to-Ross-Gordon/17eddf33b513a...

Nov 21, 2018•44 min

74 - Deep Reinforcement Learning Doesn't Work Yet, with Alex Irpan

Blog post by Alex Irpan titled "Deep Reinforcement Learning Doesn't Work Yet" https://www.alexirpan.com/2018/02/14/rl-hard.html In this episode, Alex Irpan talks about limitations of current deep reinforcement learning methods and why we have a long way to go before they go mainstream. We discuss sample inefficiency, instability, the difficulty to design reward functions and overfitting to the environment. Alex concludes with a list of recommendations he found useful when training models with de...

Nov 16, 2018•41 min

73 - Supersense Disambiguation of English Prepositions and Possessives, with Nathan Schneider

ACL 2018 paper by Nathan Schneider, Jena D. Hwang, Vivek Srikumar, Jakob Prange, Austin Blodgett, Sarah R. Moeller, Aviram Stern, Adi Bitan, Omri Abend. In this episode, Nathan discusses how the meaning of prepositions varies, proposes a hierarchy for classifying the semantics of function words (e.g., comparison, temporal, purpose), and describes empirical results using the provided dataset for disambiguating preposition semantics. Along the way, we talk about lexicon-based semantics, multilingu...

Nov 13, 2018•53 min

72 - The Anatomy Question Answering Task, with Jordan Boyd-Graber

Our first episode in a new format: broader surveys of areas, instead of specific discussions on individual papers. In this episode, we talk with Jordan Boyd-Graber about question answering. Matt starts the discussion by giving five different axes on which question answering tasks vary: (1)how complex is the language in the question, (2)what is the genre of the question / nature of the question semantics, (3)what is the context or knowledge source used to answer the question, (4)how much "reasoni...

Oct 16, 2018•43 min

71 - DuoRC: Complex Language Understanding with Paraphrased Reading Comprehension, with Amrita Saha

ACL 2018 paper by Amrita Saha, Rahul Aralikatte, Mitesh M. Khapra, Karthik Sankaranarayanan Amrita and colleagues at IBM Research introduced a harder dataset for "reading comprehension", where you have to answer questions about a given passage of text. Amrita joins us on the podcast to talk about why a new dataset is necessary, what makes this one unique and interesting, and how well initial baseline systems perform on it. Along the way, we talk about the problems with using BLEU or ROUGE as eva...

Oct 12, 2018•34 min

70 - Measuring the Evolution of a Scientific Field through Citation Frames, with David Jurgens

TACL 2018 paper (presented at ACL 2018) by David Jurgens, Srijan Kumar, Raine Hoover, Daniel A. McFarland, and Daniel Jurafsky David comes on the podcast to talk to us about citation frames. We discuss the dataset they created by painstakingly annotating the "citation type" for all of the citations in a large collection of papers (around 2000 citations in total), then training a classifier on that data to annotate the rest of the ACL anthology. This process itself is interesting, including how e...

Sep 18, 2018•41 min

69 - Second language acquisition modeling, with Burr Settles

A shared task held in conjunction with a NAACL 2018 workshop, organized by Burr Settles and collaborators at Duolingo. Burr tells us about the shared task. The goal of the task was to predict errors that a language learner would make when doing exercises on Duolingo. We talk about the details of the data, why this particular data is interesting to study for second language acquisition, what could be better about it, and what systems people used to approach this task. We also talk a bit about wha...

Sep 10, 2018•35 min

68 - Neural models of factuality, with Rachel Rudinger

NAACL 2018 paper, by Rachel Rudinger, Aaron Steven White, and Benjamin Van Durme Rachel comes on to the podcast, telling us about what factuality is (did an event happen?), what datasets exist for doing this task (a few; they made a new, bigger one), and how to build models to predict factuality (turns out a vanilla biLSTM does quite well). Along the way, we have interesting discussions about how you decide what an "event" is, how you label factuality (whether something happened) on inherently u...

Sep 04, 2018•37 min

67 - GLUE: A Multi-Task Benchmark and Analysis Platform, with Sam Bowman

Paper by Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. Sam comes on to tell us about GLUE. We talk about the motivation behind setting up a benchmark framework for natural language understanding, how the authors defined "NLU" and chose the tasks for this benchmark, a very nice diagnostic dataset that was constructed for GLUE, and what insight they gained from the experiments they've run so far. We also have some musings about the utility of general-purp...

Aug 27, 2018•39 min

66 - Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods, with Jieyu Zhao

NACL 2018 paper, by Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. Jieyu comes on the podcast to talk about bias in coreference resolution models. This bias makes models rely disproportionately on gender when making decisions for whether "she" refers to a noun like "secretary" or "physician". Jieyu and her co-authors show that coreference systems do not actually exhibit much bias in standard evaluation settings (OntoNotes), perhaps because there is a broad document co...

Aug 20, 2018•26 min

65 - Event Representations with Tensor-based Compositions, with Niranjan Balasubramanian

AAAI 2018 paper by Noah Weber, Niranjan Balasubramanian, and Nathanael Chambers Niranjan joins us on the podcast to tell us about his latest contribution in a line of work going back to Shank's scripts. This work tries to model sequences of events to get coherent narrative schemas, mined from large collections of text. For example, given an event like "She threw a football", you might expect future events involving catching, running, scoring, and so on. But if the event is instead "She threw a b...

Aug 13, 2018•39 min

64 - Neural Network Models for Sentence Pair Tasks, with Wuwei Lan and Wei Xu

Best reproduction paper at COLING 2018, by Wuwei Lan and Wei Xu. This paper takes a bunch of models for sentence pair classification (including paraphrase identification, semantic textual similarity, natural language inference / entailment, and answer sentence selection for QA) and compares all of them on all tasks. There's a very nice table in the paper showing the cross product of models and datasets, and how by looking at the original papers this table is almost empty; Wuwei and Wei fill in a...

Aug 08, 2018•36 min

63 - Neural Lattice Language Models, with Jacob Buckman

TACL 2018 paper by Jacob Buckman and Graham Neubig. Jacob tells us about marginalizing over latent structure in a sentence by doing a clever parameterization of a lattice with a model kind of like a tree LSTM. This lets you treat collocations as multi-word units, or allow words to have multiple senses, without having to commit to a particular segmentation or word sense disambiguation up front. We talk about how this works and what comes out. One interesting result that comes out of the sense lat...

Aug 02, 2018•30 min

62 - Sounding Board: A User-Centric and Content-Driven Social Chatbot, with Hao Fang

NAACL 2018 demo paper, by Hao Fang, Hao Cheng, Maarten Sap, Elizabeth Clark, Ari Holtzman, Yejin Choi, Noah A. Smith, and Mari Ostendorf Sounding Board was the system that won the 2017 Amazon Alexa Prize, a competition to build a social chatbot that interacts with users as an Alexa skill. Hao comes on the podcast to tell us about the project. We talk for a little bit about how Sounding Board works, but spend most of the conversation talking about what these chatbots can do - the competition setu...

Jul 30, 2018•31 min

61 - Neural Text Generation in Stories, with Elizabeth Clark and Yangfeng Ji

NAACL 2018 Outstanding Paper by Elizabeth Clark, Yangfeng Ji, and Noah A. Smith Both Elizabeth and Yangfeng come on the podcast to tell us about their work. This paper is an extension of an EMNLP 2017 paper by Yangfeng and co-authors that introduced a language model that included explicit entity representations. Elizabeth and Yangfeng take that model, improve it a bit, and use it for creative narrative generation, with some interesting applications. We talk a little bit about the model, but most...

Jul 23, 2018•31 min

60 - FEVER: a large-scale dataset for Fact Extraction and VERification, with James Thorne

NAACL 2018 paper by James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal James tells us about his paper, where they created a dataset for fact checking. We talk about how this dataset relates to other datasets, why a new one was needed, how it was built, and how well the initial baseline does on this task. There are some interesting side notes on bias in dataset construction, and on how "fact checking" relates to "fake news" ("fake news" could mean that an article is acti...

Jun 28, 2018•29 min

59 - Weakly Supervised Semantic Parsing With Abstract Examples, with Omer Goldman

ACL 2018 paper by Omer Goldman, Veronica Latcinnik, Udi Naveh, Amir Globerson, and Jonathan Berant Omer comes on to tell us about a class project (done mostly by undergraduates!) that made it into ACL. Omer and colleagues built a semantic parser that gets state-of-the-art results on the Cornell Natural Language Visual Reasoning dataset. They did this by using "abstract examples" - they replaced the entities in the questions and corresponding logical forms with their types, labeled about a hundre...

Jun 12, 2018•35 min

58 - Learning What’s Easy: Fully Differentiable Neural Easy-First Taggers, with André Martins

EMNLP 2017 paper by André F. T. Martins and Julia Kreutzer André comes on the podcast to talk to us the paper. We spend the bulk of the time talking about the two main contributions of the paper: how they applied the notion of "easy first" decoding to neural taggers, and the details of the constrained softmax that they introduced to accomplish this. We conclude that "easy first" might not be the right name for this - it's doing something that in the end is very similar to stacked self-attention,...

Jun 08, 2018•47 min

57 - A Survey Of Cross-lingual Word Embedding Models, with Sebastian Ruder

Upcoming JAIR paper by Sebastian Ruder, Ivan Vulić, and Anders Søgaard. Sebastian comes on to tell us about his survey. He creates a typology of cross-lingual word embedding methods, and we discuss why you might use cross-lingual embeddings (low-resource languages in particular), what information they capture (semantics? syntax? both?), how the methods work (lots of different ways), and how to evaluate the embeddings (best when you have an extrinsic task to evaluate on). https://www.semanticscho...

Jun 05, 2018•32 min

56 - Deep contextualized word representations, with Matthew Peters

NAACL 2018 paper, by Matt Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Chris Clark, Kenton Lee, and Luke Zettlemoyer. In this episode, AI2's own Matt Peters comes on the show to talk about his recent work on ELMo embeddings, what some have called "the next word2vec". Matt has shown very convincingly that using a pre-trained bidirectional language model to get contextualized word representations performs substantially better than using static word vectors. He comes on the show to give us some...

Apr 04, 2018•30 min

55 - Matchbox: Dispatch-driven autobatching for imperative deep learning, with James Bradbury

In this episode, we take a more systems-oriented approach to NLP, looking at issues with writing deep learning code for NLP models. As a lot of people have discovered over the last few years, efficiently batching multiple examples together for fast training on a GPU can be very challenging with complex NLP models. James Bradbury comes on to tell us about Matchbox, his recent effort to provide a framework for automatic batching with pytorch. In the discussion, we talk about why batching is hard, ...

Mar 28, 2018•32 min

← Prev Next →

For the best experience, listen in Metacast app for iOS or Android