General Intelligence Requires Reward-based Pretraining

Best AI papers explained

Jun 25, 2025•17 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This position paper argues that Large Language Models (LLMs), despite their current utility as Artificial Useful Intelligence (AUI), often struggle with robust and adaptive reasoning required for Artificial General Intelligence (AGI) because their training methods overfit to specific data patterns. The authors propose a shift from the current supervised pretraining (SPT) paradigm to reward-based pretraining (RPT), similar to how AlphaZero surpassed AlphaGo by learning purely through reinforcement. To achieve this, they suggest training on synthetic tasks with reduced token spaces to foster generalizable reasoning skills and decoupling knowledge from reasoning through an external memory system. This proposed architecture would allow the reasoning module to operate with a smaller context, relying on learned retrieval mechanisms for information, thereby promoting more robust generalization across novel domains.

For the best experience, listen in Metacast app for iOS or Android