Base models know how to reason, thinking models learn when - podcast episode cover

Base models know how to reason, thinking models learn when

Oct 11, 202512 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This paper argues that thinking language models (LLMs that reason step-by-step) do not acquire entirely new capabilities during post-training but rather learn when to deploy pre-existing reasoning mechanisms latent in their base counterparts. The authors use an unsupervised clustering methodology via Sparse Autoencoders (SAEs) to derive an interpretable taxonomy of distinct reasoning behaviors, such as numeric computation and planning next steps. They then implement a hybrid model that uses the base model for generation but is guided by the thinking model's activation patterns via steering vectors to activate specific reasoning behaviors. This hybrid approach successfully recovered up to 91% of the performance gap between base and thinking models on reasoning benchmarks like MATH500 while steering only a small fraction of tokens, supporting the idea that the primary benefit of complex training is teaching efficient mechanism deployment.

For the best experience, listen in Metacast app for iOS or Android