LLMs Get Lost In Multi-Turn Conversation

Best AI papers explained

Jun 09, 2025•21 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This paper exemines the performance of Large Language Models (LLMs) in multi-turn conversations compared to single-turn interactions. The authors developed a method to create "sharded" instructions from fully-specified tasks, allowing for controlled simulation of underspecified, multi-turn exchanges. They discovered that LLMs exhibit significantly lower performance and drastically increased unreliability in multi-turn settings, attributing this "lost in conversation" phenomenon primarily to issues with context management and premature, incorrect assumptions. The study concludes by urging LLM builders to focus on improving multi-turn reliability alongside single-turn aptitude, as current techniques like lowering temperature or using agent-like frameworks offer only limited improvements.

For the best experience, listen in Metacast app for iOS or Android