When and Why LLMs Fail to Reason Globally - podcast episode cover

When and Why LLMs Fail to Reason Globally

May 31, 202518 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This research explores why Large Language Models (LLMs) struggle with tasks requiring global reasoning over long inputs. The authors propose that these limitations stem from constraints on information flow within LLMs, formalizing this with the Bounded Attention Prefix Oracle (BAPO) model. They classify problems as BAPO-easy or BAPO-hard, predicting that LLMs will fail on the latter. Empirical results with models like GPT-4o, Claude, and Gemini support this prediction, showing poor performance on BAPO-hard tasks even for relatively small inputs. Crucially, the paper demonstrates theoretically and empirically that using Chain of Thought (CoT) reasoning can transform BAPO-hard problems into BAPO-easy ones, significantly improving performance despite potentially high token usage.

For the best experience, listen in Metacast app for iOS or Android