🎙️ EP 70: GPT-5’s “Apple Moment”, $1.5M AI Talent War, Microsoft’s Model Thinks for Itself

00:00

Imagine having a team of brilliant experts. Not just a chatbot giving facts, but something that really thinks, plans, acts for you. Instantly, they're in your pocket. This isn't sci -fi anymore. It's actually here. It really feels that way. Welcome to the Deep Dive. Today, we're plunging into what feels like, well, a huge moment in AI. Yeah, the GPT -5 launch. Exactly. OpenAI's GPT -5. Our mission is to unpack the pretty bold claims. Which are surprising. Definitely surprising.

00:32

And what this new unified model really means for you. We'll cover its capabilities, its immediate impact, and then look at other big A. So it's not just about being faster then. It's about a different kind of intelligence working together. I think that's the idea. It's a significant leap, yeah. It allows for way more... Okay, that's different. And there's this real -time routing thing, too. The model decides for itself. Quick answer needed? Or does this need deeper thought?

02:16

And it just does it. Yeah, no user input needed for that switch. That's why OpenAI is calling it an agent over chatbot. Agent over chatbot. Their pitch is basically a team of PhD -level experts in your pocket. That's quite the image. And they're saying it's the best model in the world. Do the numbers back that up? What about those benchmarks? The numbers are pretty strong, yeah. Coding, for example, 74 .9 % accuracy. Okay, 74 .9. How's that compare? It just edges

02:44

out Claude Opus 4 .1, which was at 74 .5%. Very close. Very close. Yeah. But it really pulls ahead of Google's Gemini 2 .5 Pro, which was 59. That's a big gap. What about other areas like science? Science Q &A, it hit 89 .4%. That beats Claude Opus 4 .1 at 80 .9%. And even Grok 4 Heavy at 88 .9%. Impressive. But I saw something about healthcare accuracy. That seemed really striking. Yes, that's maybe the biggest deal. The hallucination rate, you know, when the AI

03:16

just makes stuff up. Yeah, always a worry. It dropped to just 1 .6%. 1 .6%, seriously? Seriously. Compare that to GPT -40's 12 .9 % or 03's 15 .8%. Massive reduction. Wow. What does that mean, practically? Well, it's not really for solo diagnosis, obviously. But for things like summarizing research papers for a doctor, much, much more reliable. Humanity's last exam. Super hard reasoning benchmark. OK. GPT -5 scored 42 percent, which is just slightly

03:56

under Grok for heavy is 44 .4 percent. So it's. 700 million people? That's almost 10 % of the planet. Getting access overnight? Instantly. It's a massive strategic play by OpenAI. Keep users in their ecosystem. You know, FightOff, Anthropic, XAI, Google. Makes sense. Lock them in with the best stuff. Exactly. It's about market share and keeping users engaged. You know, it's funny. I still wrestle with confidence myself

04:41

sometimes. Getting the AI to consistently understand what I mean, especially over several turns on a complex task. This is sort of the idea of a model that just gets it for those multi -step things without me having to constantly tweak and rephrase. Right. Less babysitting. Yeah. That feels like a genuine game changer for actually using it effectively day to day. Definitely. So boiling it down. Yeah, the rivalries are heating

05:33

up, aren't they? Like, hours after GPC 5 dropped, Elon Musk claimed his Rock 4 Heavy was already smarter. Right, said it was smarter two weeks before the launch, though, you know, no published benchmarks to back that up yet. Still, it shows that intense personal and tech rivalry. And speaking of competition, OpenAI affordably spent, supposedly, $1 .5 million per employee. Yeah, for about 1 ,000 staff. Over $1 .5 billion total right before the launch. Just to keep talent from jumping

06:01

ship. Seems like it. It's a war for talent out there. Huge money flying around. And huge... They just raised $100 million and a $3 .1 billion valuation. Wow, that's a steep climb. Yeah, a 6x increase in less than a year. And get this, they've been profitable since day one. Profitable in AI research, that's rare. Very. And they massively cut the cost of using their video models from like $1 ,400 an hour down to $0 .25 an hour.

06:56

Okay, wait, $1 ,400 down to $0 .25? Yeah, that's completely disruptive for anyone working with video generation. He makes it accessible. That cost reduction alone is revolutionary. And on a more relatable note, there is this funny thing on Reddit. Yeah. A senior D. And from people wanting to learn this stuff. Andrew Ng, you know, the AI pioneer. Sure. He announced a big free course on Cloud Code partnering with Anthropic.

07:42

Free course from Andrew Ng, nice. Yeah, focused on what they call agentic AI, AI that acts with more autonomy. Great opportunity to learn from one of the best. Okay, so when you put all these things together, the rivalries, the new tools like 1, 2 .2, the cost drops from Dakar, even the funny explanations and the courses. Picture the pain of the AI landscape right now. It's intense competition driving incredibly rapid innovation, leading to practical, accessible

08:10

applications emerging almost constantly. Constant mode. and they're already weaving GPT -5 into basically everything. Yeah, consumer stuff, developer tools, corporate offerings. They're moving fast. And Duolingo. Remember the backlash when they went AI first? Well, it seems like that's tied down. One release isn't hurting their momentum much. Interesting. People adapt quickly, maybe? Maybe. Okay, here's one that's a bit out there. Oh. Go on. Oh. Imagine an AI system discovering

09:00

a new kind of physics. Yeah, that expands what discovery even means. Wow. That idea of AI discovering new physics, it really is profound. Kind of humbling. Absolutely. Also, a quick one. Adapt their reasoning in real time. Adapt their reasoning, meaning? Meaning they move beyond just using their static pre -trained knowledge. They're not just reciting facts they learned. Okay, so it's more dynamic than just recalling information. Exactly. And

09:54

the results are impressive. It tripled GPT -4 .1's accuracy on tough biomedical questions. Tripled it? Yep. And it even outperformed O3, which is already a top reasoning model. How does it do that? It generates these internal feedback loops while it's working. It can manage its own memory, recognize when it's uncertain about its own reasoning, and then this is the key part, revise its answers mid -process. So it's like it's thinking out loud to itself and correcting

10:23

its own path. That's a great way to put it, a kind of self -reflection and correction. Like an internal monologue almost. Yeah. And what's really cool for users, especially scientists or devs, is the control it offers. Control how? You can set uncertainty thresholds, basically telling it how sure it needs to be. You can rerun its reasoning paths, even tweak them. So you can see how it's thinking and guide it. Exactly. You can oversee how it handles tricky, ambiguous

10:49

problems. And it gives you regular, but also flexibility. That sounds incredibly useful, not just getting the answer, but understanding the process. Right. It's like a video GPS. Yeah. Instead of one fixed route. Yeah. It's like a GPS that can dynamically rethink your path mid -drive if it hits traffic or finds a better way. That's a fantastic analogy, adapting on the fly. It really is. And it raises a big question, doesn't it? Which is? If Microsoft can make GPS... P4

11:15

.1, I think. Steer and adapt like this. What happens when they apply CLIO to their next generation models? Oh, yeah. The potential there seems huge. Immense. Right. Powerful AI reasoning, suddenly accessible to hundreds of millions. It's a huge shift from just chatbots to these dynamic assistants that can actually do complex things. And then you add things like Microsoft CLIO. And you see AI itself becoming more fluid, more adaptive. Yeah, not just static knowledge anymore. Exactly.

12:09

They're becoming systems that can think, self -correct, learn in real time, explore new possibilities. The pace is just wild. The competition's intense. But the end result seems to be... AI that's more capable, more practical. And just more integrated into everything we do. It's an incredible moment to watch unfold. It really is. So maybe something for you listening to think about. Yeah. How might these new capabilities, especially this idea of an AI agent in your pocket, actually impact

12:36

your daily life? Yeah. Your work. What specific task would you just hand over to this new level of AI? Yeah. What's the first thing you'd want it to handle for you? And maybe a bigger thought. If AI can discover new physics like we talked about or adapt its thinking like CLIO, how does that change our own understanding of intelligence itself in the years ahead? What does it mean for us? Deep questions. Lots to ponder. Definitely. We hope you found some aha moments in this deep

13:07

dive. And hopefully you'll keep exploring all these fascinating developments with us. Thanks so much for joining us. Until next time, keep exploring.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript