When and Why LLMs Fail to Reason Globally

Best AI papers explained

May 31, 2025•18 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This research explores why Large Language Models (LLMs) struggle with tasks requiring global reasoning over long inputs. The authors propose that these limitations stem from constraints on information flow within LLMs, formalizing this with the Bounded Attention Prefix Oracle (BAPO) model. They classify problems as BAPO-easy or BAPO-hard, predicting that LLMs will fail on the latter. Empirical results with models like GPT-4o, Claude, and Gemini support this prediction, showing poor performance on BAPO-hard tasks even for relatively small inputs. Crucially, the paper demonstrates theoretically and empirically that using Chain of Thought (CoT) reasoning can transform BAPO-hard problems into BAPO-easy ones, significantly improving performance despite potentially high token usage.

For the best experience, listen in Metacast app for iOS or Android