🎙️ EP 107: The AI Nobel Winner Who Says ChatGPT Is A Dead End - podcast episode cover

🎙️ EP 107: The AI Nobel Winner Who Says ChatGPT Is A Dead End

Sep 29, 2025•11 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

A Turing Award winner just said what most of Silicon Valley won’t: LLMs like ChatGPT might never reach AGI. In today’s episode, we break down Richard Sutton’s surprising take, why he thinks we need goal-driven AI (not next-word predictors), and what his OaK agent architecture means for the future. Also: DeepMind’s Veo 3 might be the GPT-3 moment for video, and OpenAI plans to use more power than all of India.

We’ll talk about:

  • Why Richard Sutton thinks GPT-6 is going nowhere
  • What “Chain-of-Frames” means for video models like Veo 3
  • The wild energy chart behind OpenAI’s 125Ă— compute plan
  • The deepfake medbed video Trump posted… and deleted

Keywords: Richard Sutton, AGI, Veo 3, DeepMind, ChatGPT, OpenAI, Chain-of-Frames, Turing Award, OaK agent, video AI, medbeds, compute scale

Links:

  1. Newsletter: Sign up for our FREE daily newsletter.
  2. Our Community: Get 3-level AI tutorials across industries.
  3. Join AI Fire Academy: 500+ advanced AI workflows ($14,500+ Value)

Our Socials:

  1. Facebook Group: Join 259K+ AI builders
  2. X (Twitter): Follow us for daily AI drops
  3. YouTube: Watch AI walkthroughs & tutorials

Transcript

How can the most powerful AI today, you know, GPT -4, these huge models, how can they also be called a philosophical dead end? Yeah, it's a real head scratcher, isn't it? You've got Richard Sutton, who basically invented modern reinforcement learning, saying the tech winning right now. Yeah. It's not the path to true AGI. Right. And that's what this deep dive is all about. We're going to look past the benchmarks, look at the

actual architecture underneath. We've pulled sources on this critique, but also on the... well, the crazy cost of scaling this stuff. And our goal really is to figure out where the smart money, the smart thinking is going for AGI. Is it just making current models bigger or is it something fundamentally different, like a new way to learn? Okay, so we've got three main areas for you today. First, why Sutton thinks LLMs have this fatal flaw. Second, the absolutely

staggering scale. And yes, the cost of this AI arms race. And third, these new video models that seem to actually kind of... Think over time. Sounds good. Let's dive in. Let's start with Sutton, the Turing Award winner. He's not saying LLMs are useless, right? No, not at all. They're amazing prediction machines, obviously. But Sutton comes from reinforcement learning, which is all about agents acting in the world and learning from feedback. He says LLMs lack that core loop.

Well, passive. Passive meaning they don't have goals. They can't be surprised. Is that the idea? Exactly. And crucially, they don't learn from consequences. They're just incredibly good mimics of human text. They don't truly understand the real world impact of the words they string together. OK, I get the difference. But isn't being passive, goalless, actually safer? I mean, the whole market seems built on making these things predictable, controllable. Why build something goal driven

if it's riskier? That's the big tradeoff. Right. Their safety comes from that passivity. But Sutton argues that very passivity limits their intelligence potential. Just scaling up imitation even to GPT -7 or 8 won't get us to AGI. So he has an alternative. Yeah. He proposes this new architecture called OK. The key thing is it learns on the fly. It doesn't need that massive, hugely expensive pre -training phase that LLMs rely on. Okay,

can we give folks an analogy? So if an LLM is like a giant static textbook that predicts the next word, what's okay? Is it like an agent that can actually, you know, burn its hand on a stove and learn don't touch? That's a pretty good way to put it, yeah. It's about learning through doing, through direct consequence. It's not just copying patterns from a giant pile of text. AGI, in this view, needs these smarter loops. Action,

feedback, memory, even motivation. So if LLMs are flawed because they just imitate, what exactly is the mechanism Sutton proposes for real consequence driven learning? It's about learning via action and direct consequence, not massive pre -trained imitation. Gotcha. OK, so if real intelligence needs this whole new architecture, then this race to just build bigger and bigger LLMs. It's a massive bet, isn't it? Strategically speaking. It really is. And yet the scaling is happening

at a rate that's hard to comprehend. Yeah. Let's talk about those numbers. The sources we saw on OpenAI's compute plans, just wild. Totally wild. They apparently 9x their compute power in 2025 alone. Okay. Huge jump. But then by 2033, the projection is 125 times bigger than that.

Two sec silence. Whoa. okay wait that number 125x it kind of breaks my brain to put that in perspective for you listening that could mean needing more electricity than like the entire country of india uses today for 1 .4 billion people it just sounds Physically almost impossible. It does. It's like stacking Lego blocks of data centers reaching into the clouds. Right. And it explains why we're seeing these massive investments like Nescale raising that record $1 .1 billion

Series B. Yeah, biggest in Europe. Yeah. And that money is specifically earmarked for building these AI factories. We're talking facilities with like 100 ,000 NVIDIA GPUs each. Backed by huge names, Nokia, Dell, NVIDIA itself. Yeah. They're building the infrastructure for that 125X future. But then there's the flip side, the operational cost. We saw that post from the developer, right? Built over 30 AI agents. And tracked the cost. And called it the brutal cost

truth. Basically, running complex agents in the real world gets really expensive, really fast. It's not just the upfront training cost. Absolutely. That massive scale translates directly to higher running costs for everyone using these models. And meanwhile. You see OpenAI launching its biggest ad push ever for ChatGPT. Streaming, billboards, influencers. They're trying to lock in that market

share now. Given this exponential compute growth, is the cost of running real -world agents actually sustainable for, say, the average developer or a small company? Initial data suggests complexity dramatically increases operational expenditures. Yeah, that's a tough reality check. Okay, let's pivot a bit. Away from the cost side. towards a big technical leap. The official paper just dropped on VO3. That's Google DeepMind's new video model. And people are calling this Google's

GPT -3 moment for vision. Which is a big claim. Why? What's the big deal? Well, it connects back to Sutton's point, actually. It looks like we're seeing a shift from just imitation to something more like reasoning, but in the visual domain. VO3 seems to be able to reason across a video scene over time, not just generate pretty frames one after another. Okay. And the key concept here is chain of frames. Sounds like chain of thought for LLMs. Exactly. It's the visual peasant.

Chain of thought helps LLMs break down text problems step by step. Chain of frames lets the video model think across time. It can anticipate what happens next, understand physics in a basic way. It's not just processing static images. That's fascinating. But is it really reasoning? Or is

it just a super sophisticated mimic? If it's seen millions of videos of Jenga blocks falling, is it... understanding physics or just generating the most likely visual sequence based on that data, how do we know it's not just a fancy deep fake? That's the million -dollar question, always. But the evidence suggests it's more than mimicry because of its zero -shot performance on diverse tasks it wasn't explicitly trained for. Like what? Well, it can solve complex mazes visually.

It can simulate physics pretty accurately, predicting how those Jenga blocks will fall. It can even restore blurry images or animate a scene from just a rough hand -drawn sketch. Things require some kind of internal model of the world. Right. They use that four -level framework to measure it. Perception, modeling, manipulation, and then reasoning at the top. Simulating physics definitely feels like it's up in the modeling or even reasoning category. Exactly. And the implication is huge.

One good prompt into a model like this might eventually replace dozens of specialized computer vision tools engineers use today. It could simplify workflows dramatically. So how drastically does this new chain of frames approach change the job of computer vision engineers? One prompt can replace dozens of specialized vision tools, simplifying workflows dramatically. OK, let's shift to AI out in the wild because this power.

It has immediate, messy consequences. We saw the political example need to be neutral here, the instance with Trump posting an AI -generated clip. Yeah, the one claiming med beds could cure anything, reportedly from Fox News, but AI -generated. He later deleted it. But it shows how fast this stuff can spread and how convincing it can look, even if it's totally fabricated. It's a huge challenge for, you know. figuring out what's

real online. The speed is incredible. Yeah. I mean, I still wrestle with prompt drift myself sometimes, just trying to get an AI to do what I want consistently. Vulnerable admission. So seeing these convincing deep fakes pop up, it is worrying. It really highlights the complexity we're dealing with. Beat. And the battles aren't just political. They're corporate, too. Oh, yeah. Elon Musk's XAI is now suing OpenAI. The claim. Yeah. Stealing trade secrets. It's getting litigious.

And Meta is reportedly talking to Google about potentially using their Gemini model. Seems like it. The big players are definitely maneuvering, making alliances, getting ready for the next phase. At the same time, platforms are just drowning in AI content. Spotify deleting 75 million AI generated tracks. 75 million. Just using their spam filters. It shows the sheer scale of automated content generation they're fighting. It's like a tidal wave. Wow. But it's not all problematic.

There are useful tools emerging too, right? Like that Kimi assistant from Moonshot AI. Right, backed by Alibaba. It apparently has an agent mode now. You give it a simple prompt and it can create complex things like a multi -page website draft or editable presentation slides. Stuff that takes real work. With the volume of AI -generated content exploding, both useful and not, can platforms realistically keep pace with the necessary filtering and moderation?

The 75 million deleted tracks suggest filtering is already a massive ongoing battle. Okay. We've definitely covered a lot today. Philosophy, physics simulation, billion dollar funding rounds, fake news. Let's just take a quick pause. When we come back, let's boil it down. What's the single biggest idea, the main takeaway about this architectural shift for you, the listener? Midroll sponsor

read. All right. So if we boil down everything we discussed, the core tension really is between two fundamentally different approaches to AI. You've got the passive imitation that powers today's big LLMs. And then you have this idea of active consequence driven learning. That's the goal behind stuff like Oak. And it seems to be what's making models like VO3 capable of visual reasoning. Passive imitation versus active learning. LLMs are incredible statistical parrots,

basically. But if something's right. Getting to true AGI means moving beyond just predicting the next word or pixel. It means building systems that can actually model consequences, understand cause and effect, maybe even have intrinsic motivation. So the key thing for you to take away today is probably this. We're not just seeing AI get bigger. We might be seeing the very architecture of AI begin to shift, that chain of frames concept

in VO3. It's a sign we're moving beyond just mimicking language towards models that can actually reason visually, simulate outcomes, understand things across time. Right. That shift from mimicry to modeling consequences, that feels like the really big story here. So maybe a final thought for you to chew on. If VO3 can simulate simple

physics today, like Jenga blocks falling. What does it mean when we start handing over critical real -world decisions to models that can simulate the future consequences of their own potential actions? Beat. That feels significant. Definitely something to think about. That distinction between passive safety, which we have now, and truly goal -driven intelligence. Yeah. That's the next frontier, potentially the next big challenge.

Well, thanks for joining us on this deep dive into AI, architecture, scale, and everything in between. We hope it gave you some things to think about. We'll catch you next time, OTRO Music.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android