General Intelligence Requires Reward-based Pretraining - podcast episode cover

General Intelligence Requires Reward-based Pretraining

Jun 25, 202517 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This position paper argues that Large Language Models (LLMs), despite their current utility as Artificial Useful Intelligence (AUI), often struggle with robust and adaptive reasoning required for Artificial General Intelligence (AGI) because their training methods overfit to specific data patterns. The authors propose a shift from the current supervised pretraining (SPT) paradigm to reward-based pretraining (RPT), similar to how AlphaZero surpassed AlphaGo by learning purely through reinforcement. To achieve this, they suggest training on synthetic tasks with reduced token spaces to foster generalizable reasoning skills and decoupling knowledge from reasoning through an external memory system. This proposed architecture would allow the reasoning module to operate with a smaller context, relying on learned retrieval mechanisms for information, thereby promoting more robust generalization across novel domains.

For the best experience, listen in Metacast app for iOS or Android