Scaling Test Time Compute to Multi-Agent Civilizations — Noam Brown, OpenAI - podcast episode cover

Scaling Test Time Compute to Multi-Agent Civilizations — Noam Brown, OpenAI

Jun 19, 2025
--:--
--:--
Listen in podcast apps:
Metacast
Spotify
Youtube
RSS

Episode description

Solving Poker and Diplomacy, Debating RL+Reasoning with Ilya, what's *wrong* with the System 1/2 analogy, and where Test-Time Compute hits a wall Timestamps 00:00 Intro – Diplomacy, Cicero & World Championship 02:00 Reverse Centaur: How AI Improved Noam’s Human Play 05:00 Turing Test Failures in Chat: Hallucinations & Steerability 07:30 Reasoning Models & Fast vs. Slow Thinking Paradigm 11:00 System 1 vs. System 2 in Visual Tasks (GeoGuessr, Tic-Tac-Toe) 14:00 The Deep Research Existence Proof for Unverifiable Domains 17:30 Harnesses, Tool Use, and Fragility in AI Agents 21:00 The Case Against Over-Reliance on Scaffolds and Routers 24:00 Reinforcement Fine-Tuning and Long-Term Model Adaptability 28:00 Ilya’s Bet on Reasoning and the O-Series Breakthrough 34:00 Noam’s Dev Stack: Codex, Windsurf & AGI Moments 38:00 Building Better AI Developers: Memory, Reuse, and PR Reviews 41:00 Multi-Agent Intelligence and the “AI Civilization” Hypothesis 44:30 Implicit World Models and Theory of Mind Through Scaling 48:00 Why Self-Play Breaks Down Beyond Go and Chess 54:00 Designing Better Benchmarks for Fuzzy Tasks 57:30 The Real Limits of Test-Time Compute: Cost vs. Time 1:00:30 Data Efficiency Gaps Between Humans and LLMs 1:03:00 Training Pipeline: Pretraining, Midtraining, Posttraining 1:05:00 Games as Research Proving Grounds: Poker, MTG, Stratego 1:10:00 Closing Thoughts – Five-Year View and Open Research Directions Chapters 00:00:00 Intro & Guest Welcome 00:00:33 Diplomacy AI & Cicero Insights 00:03:49 AI Safety, Language Models, and Steerability 00:05:23 O Series Models: Progress and Benchmarks 00:08:53 Reasoning Paradigm: Thinking Fast and Slow in AI 00:14:02 Design Questions: Harnesses, Tools, and Test Time Compute 00:20:32 Reinforcement Fine-tuning & Model Specialization 00:21:52 The Rise of Reasoning Models at OpenAI 00:29:33 Data Efficiency in Machine Learning 00:33:21 Coding & AI: Codex, Workflows, and Developer Experience 00:41:38 Multi-Agent AI: Collaboration, Competition, and Civilization 00:45:14 Poker, Diplomacy & Exploitative vs. Optimal AI Strategy 00:52:11 World Models, Multi-Agent Learning, and Self-Play 00:58:50 Generative Media: Image & Video Models 01:00:44 Robotics: Humanoids, Iteration Speed, and Embodiment 01:04:25 Rapid Fire: Research Practices, Benchmarks, and AI Progress 01:14:19 Games, Imperfect Information, and AI Research Directions
For the best experience, listen in Metacast app for iOS or Android
Open in Metacast
Scaling Test Time Compute to Multi-Agent Civilizations — Noam Brown, OpenAI | Latent Space: The AI Engineer Podcast - Listen or read transcript on Metacast