OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker) - podcast episode cover

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Jan 23, 202655 minSeason 1Ep. 35
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

AI reasoning models don’t just give answers — they plan, deliberate, and sometimes try to cheat.


In this episode of The Neuron, we’re joined by Bowen Baker, Research Scientist at OpenAI, to explore whether we can monitor AI reasoning before things go wrong — and why that transparency may not last forever.


Bowen walks us through real examples of AI reward hacking, explains why monitoring chain-of-thought is often more effective than checking outputs, and introduces the idea of a “monitorability tax” — trading raw performance for safety and transparency.


We also cover:

  • Why smaller models thinking longer can be safer than bigger models

  • How AI systems learn to hide misbehavior

  • Why suppressing “bad thoughts” can backfire

  • The limits of chain-of-thought monitoring

  • Bowen’s personal view on open-source AI and safety risks

If you care about how AI actually works — and what could go wrong — this conversation is essential.


Resources:

Title URL

Evaluating chain-of-thought monitorability | OpenAI https://openai.com/index/evaluating-chain-of-thought-monitorability/

Understanding neural networks through sparse circuits | OpenAI https://openai.com/index/understanding-neural-networks-through-sparse-circuits/

OpenAI's alignment blog: https://alignment.openai.com/

👉 Subscribe for more interviews with the people building AI

👉 Join the newsletter at https://theneuron.ai

For the best experience, listen in Metacast app for iOS or Android