🎙️ EP 27: OpenAI's Brand New Model Outsmarted Claude and Gemini, Seriously

00:00

Welcome to the Deep Dive. We're really glad you're joining us today, ready to take a closer look at some significant developments happening right now in the fast -moving world of AI. Yep, definitely fast -moving. We've got this stack of sources here. It's a mix of recent news excerpts, a couple of market reports, and some quick highlights pulled from various places. That's right. And our mission for this Deep Dive is basically to unpack these pieces of news, get past just the

00:26

headlines. and figure out what's the most important stuff buried within them. What does it really mean? Exactly. What does it really mean for the evolving AI landscape that you're navigating? Right. So we're going to dig into the details that matter. We'll be talking about OpenAI's new model that's quite the brainiac, apparently. Some pretty interesting moves Google is making, not just in video, but also... kind of surprisingly,

00:51

in government efficiency. And yeah, honestly, a bunch of other fascinating tidbits that really show us, you know, where things are heading and the sheer breadth of AI's impact right now. It's pretty wild. Okay, let's unpack this stack. First up, the big news coming out of OpenAI. They have just unveiled a new model. It's called O3 Pro. Right, O3 Pro. And what's fascinating here, and the sources make this super clear, is its core

01:17

identity. This model isn't positioned as another lightning -fast chatbot built for casual conversation. It's specifically described as a reasoning -focused AI model. It's an evolution from their previous O3, but it's fundamentally engineered with thinking

01:35

and logic as its primary function. reasoning first yeah i kind of love that framing built for thinking and the source material really highlights where this focus pays off we're not talking about like writing a quick email here no definitely not the areas it shines in uh are quite demanding coding math science complex academic writing and detailed business analysis these are places where you know you need robust step -by -step logic not just speed right and it's not just

01:59

some sort of uh experimental model being tested in a lab somewhere it's actually already rolled out Oh, really? Yeah, as of Juneteenth, it replaced O1 Pro in their ChatGPT Pro and Team plans. So if you're using one of those, you now have access to this model. It's also available via their API, and the sources give specific pricing. Let's see, $20 million input tokens and $80 per million output tokens. Okay, so the reasoning first approach. How does that actually translate into how the

02:27

model works? What's the, like, technical difference the sources hint at? Well, this raises an important point about model architecture and design philosophy, right? Instead of simply predicting the next most statistically probable word or phrase in a rapid sequence, which is how many models achieve speed O3 pro, is designed to process information more deliberately. It's built to simulate a step by step problem solving process. It kind of thinks

02:53

things through methodically. That's the fundamental difference driving its capabilities in those complex domains we just mentioned. And the sources give us some pretty compelling proof points that this isn't just marketing fluff. They show benchmarks where it, well, frankly, outsmarts some of the top competitors on specific difficult tests. Yes, the benchmark results provided are quite telling and really give weight to that reasoning -focused claim. For instance, it beat Google's

03:18

Gemini 2 .5 Pro on the AIME 2024 math test. Amy, and that's like serious math, right? Yeah. Amy is the American Invitational Mathematics Examination. It's a very challenging competition math test, way beyond simple arithmetic. Beating a top competitor there is significant. Oh, wow. So not just high school math, but like competitive math. Exactly. And it also beat Claude For Opus on the GPQA diamond test. GPQA. What's that? GPQA stands for Graduate Level General Knowledge Question

03:46

Answering. The Diamond subset is specifically curated for extremely difficult questions that often require deep, nuanced understanding across various scientific fields. It's essentially testing PhD -level science knowledge and reasoning. PhD -level science. OK. Yeah. And expert reviewers who tested O3 Pro alongside O1 Pro and O3 also consistently ranked it higher across various tested areas, confirming the perceived improvement

04:09

in logical processing. Beating top models on both challenging math and complex multidisciplinary science, that really does back up the idea that it's designed for deeper thinking. It seems so. And it still has all the modern AI capabilities, right? Like browsing the web, using Python code interpreters, analyzing documents, processing visual inputs, even using memory for personalized interaction. Correct. It's fully featured in terms of tool access and multimodal capabilities.

04:38

You get that enhanced reasoning, plus the ability to interact with various data types and tools. Okay, so you get all that power. But the sources also point out a significant trade -off, right? There's got to be a catch. They do. And this is crucial. The sources emphasize that responses from O3 Pro are notably slower than those from its predecessor, O1 Pro, and certainly slower than models optimized purely for speed. And this isn't a bug. It's a direct consequence of its

05:03

design. That deeper step -by -step reasoning process simply takes more computational time than quick prediction. It just takes longer to think. Okay, so what does this all mean? Why would open AI in a market obsessed with speed release a model that is explicitly slower? What's the play here? This connects directly to the bigger picture of OpenAI's strategic positioning,

05:26

according to the sources anyway. They frame O3 Pro as their response to what they see as a growing issue with some speedy AI models in the market. Right. Which they imply can hallucinate too much or produce illogical outputs when under pressure to respond instantly. This new model is explicitly designed for reliability and trust in complex tasks over sheer speed. Ah, gotcha. So they're splitting their offerings. kind of specializing. Precisely. It looks like a deliberate split lineup

05:52

strategy. Yeah. You have GPT -4 -0, which is incredibly fast, great for real -time interactions, handles multimodal inputs, seamlessly good for creative tasks, quick summaries, conversations. Yeah, the flashy one. Kind of. And now you have O3 -FRO, which is purpose -built for deep logic, accuracy, and trust in those highly demanding, reasoning -heavy applications. Okay, that makes

06:12

a lot of sense. So for you, the listener, this distinction is really important because it means choosing the right tool for the specific job you need done. If you need rapid creative brainstorming, quick information retrieval, or just conversational flow, GPC 4 .0 might be your best bet. But if you're tackling a complex coding problem, analyzing dense financial reports, writing a detailed academic paper, or trying to solve a difficult scientific query where the accuracy and soundness of the

06:40

logic are paramount. Yeah, where you really needed to be right. Then O3 Pro is specifically designed for that kind of deep work, even if it takes a little longer to give you the answer. It's about balancing speed with trustworthiness depending on your task. Exactly. It caters to different user needs by offering specialized strengths. Makes sense. All right. Shifting gears a bit, let's look at what Google has been up to. According to these sources, they've got some pretty diverse

07:05

things happening, too. Definitely. On the creative side, building on their existing capabilities, they've unveiled VO3 Fast. This is an update to their video generation tool. VO3 fast. Okay. The key highlight here is speed vidits. The sources say it's generating videos two times faster now. They're also mentioning improved serving optimizations, which implies it's also getting better at delivering those videos efficiently. It maintains a 720p resolution as well. Two times faster is... Pretty

07:34

significant for video generation. That can be a bottleneck, right? Waiting around for renders. Oh, absolutely. Cut sound waiting time. And there's this wild user example the source has pointed to. Someone apparently used VO3 for these Stormtrooper -style vlogs. Yeah, the Stormtrooper -style vlogs. Quite a mental image. The report mentioned that one such account reportedly garnered something like 8 million views on Instagram in a single day using videos generated with VO3. Whoa! 8

08:00

million views in one day. That's a crazy viral. Just for stormtrooper blogs. It really underscores the power of combining a novel creative idea with accessible, fast -generation tools. You can produce content at a volume and speed previously impossible, and apparently turning everyone into a stormtrooper resonates with 8 million people very quickly. Who knew? Unbelievable. OK, so from viral stormtrooper vlogs to government bureaucracy, the sources also mention Google partnering with

08:29

the UK government. That seems like a jump. Yes. And in my view, this particular application is one of the most practical and immediately impactful deployments highlighted in the sources. Google's Gemini extract is being leveraged to tackle a massive real world bottleneck in the UK public sector. The incredibly slow infrastructure planning process. Right. Think about everything involved in getting approval to build houses, roads or other essential infrastructure. Just mountains

08:57

of paperwork. Government paperwork. And planning documents specifically can be a notoriously complex, messy thing. Absolutely. And the problem is often the format. These aren't always neat digital files. Extract is designed to scan and process incredibly messy, handwritten or scanned planning documents. OK. And the source is specific here. It can handle things like blurry maps and even handwritten notes scrolled in the margins. Stuff that is usually really hard for computers. Oh,

09:23

wow. So it's not just OCR on a clean typed page. It's dealing with like. The real messy analog world, coffee stains and all. Exactly. It's built to interpret and understand unstructured data that isn't in a standard digital format. Its job is to convert that physical, often chaotic information, whether it's a faded stamp, a drawing on a map, or handwritten comments into searchable, structured digital data. Ah, searchable and structured.

09:53

That's key. Data that planners and decision makers can actually work with efficiently in a database or system. OK, so it doesn't just digitize an image. It makes the information within that image usable and searchable. That's a big step. And the statistic quoted in the trials for this is

10:07

pretty jaw dropping. It really is. According to the early trials mentioned in the sources, a process that previously took a human planner two hours of manual work to extract key information is cut down to just 40 seconds using Gemini extract. 40 seconds from two hours. I mean, that's a monumental efficiency gain for that specific task. It's a concrete. quantifiable benefit. And the goal here is clear and directly addresses a major

10:33

public sector issue. By accelerating the data extraction from these planning documents, they can speed up notoriously slow decisions on infrastructure and housing projects in the UK. Makes sense. It removes that incredibly tedious, mind -numbing manual data entry, freeing up trained planners to focus on their actual expertise making informed planning decisions. It's designed to cut through the massive backlogs that have reportedly stalled

11:00

development for years. That feels like such a powerful practical use case, not some, you know, far off futuristic concept, but using AI to fix a deeply rooted systemic problem that has real consequences for things like housing shortages. It absolutely is. The source specifically highlights this, calling it one of the most practical AI deployments yet in the public sector. High praise. And this move doesn't just solve a UK problem. It also significantly strengthens Google's position

11:26

in the enterprise and government AI market. By directly unblocking these slow, data -intensive processes, they're supporting a major national target, like the UK's goal of building 1 .5 million homes. It's AI solving a fundamental real world bottleneck using existing documents. That's incredibly insightful. OK, let's zoom out a bit and hit some of the other quick takes from the sources, because there are quite a few other interesting nuggets that give us a broader picture of the

11:56

AI landscape right now. Sure. Yeah. There are a number of points that paint a wider picture. For instance, the recent chat GPT worldwide outage. Oh, yeah. I remember seeing reports on that down detector lit up with nearly 2000 reports of it just being down globally for a bit. Right. While it was a temporary inconvenience for many, it really just served as a stark reminder of something important, a rapidly growing reliance on these

12:18

AI systems. Totally. When they're integral to so many workflows, even a relatively short outage highlights how dependent we're becoming. Things just stop. Yeah, it makes you think about the infrastructure supporting all this. And then there was this little kerfuffle the sources mentioned about X users kind of dragging Apple. What was that about? Ah, yes. That was a bit of digital

12:39

commentary. Apparently, Apple published some research or analysis pointing out perceived flaws or limitations in current AI reasoning models. Okay. And some users on X were quick to retort, essentially pointing out that while Apple is critiquing reasoning models, they haven't actually launched their own foundational large language model yet. A bit of a glass houses situation, maybe. Pot calling the kettle black. Slight chuckle.

13:06

Something like that. It sparked a bit of debate online, though, about the state of LLMs and who has the right to critique whom. It really shows the competitive heat in the space right now. Yeah, definitely. But OK, completely different note. Here's one that really surprised me and shows the unexpected places AI is going on medical application. Doctors at Columbia University reportedly used AI to help a couple with 19 years of infertility finally achieve pregnancy. That story was really

13:30

quite moving, wasn't it? The source specifically states it's the first known case of pregnancy made possible through AI. Now, it wasn't like the AI performed the medical procedure itself. Right, right. But rather, it likely analyzed vast amounts of patient data, treatment histories, maybe genetic information, to identify factors or potential pathways that human doctors might have missed over those nearly two decades. Wow.

13:55

Beyond the productivity tools and the big models, AI directly impacting lives in such a profound human way. That's pretty incredible. Really shows the potential breadth. And speaking of unexpected connections, there was that intriguing business detail, OpenAI quietly signing a cloud deal. With Google, aren't they like direct competitors? Yeah, this is definitely one of those behind the scenes moves that caught attention in the

14:18

sources. OpenAI is famously funded and heavily supported by Microsoft, one of Google's fiercest competitors in the cloud and AI space. For OpenAI to quietly sign a cloud deal with Google Cloud Platform. The sources frame this as a kind of arms dealer. Well, by Google, basically selling their infrastructure capabilities even to rivals. If you need compute. We've got it. And the sources suggested it might indicate something about OpenAI's relationship with Microsoft, like things are

14:47

shifting. Potentially. It could be interpreted in a few ways. One, as the source mentions, it could suggest that Microsoft's tight grip on OpenAI might be loosening slightly, or at least that OpenAI is asserting more independence in its infrastructure choices, making its own decisions. Two, it could simply be a pragmatic move by OpenAI to diversify its infrastructure providers, which is a standard practice for ensuring resilience and redundancy, and maybe leveraging competitive

15:14

pricing. hedging their bets. Right. Don't put all your eggs in one basket. Either way, it's a fascinating dynamic in the competitive AI ecosystem. Lots of maneuvering. Complex corporate stuff going on there. Also saw some funding news that highlights where investment is heading. Right. Plug and Play, the global accelerator, secured a substantial $50 million fintech and AI fund.

15:34

$50 million. Okay. The focus is specifically on AI applications within financial services, which is a massive industry ripe for AI disruption and efficiency gains. Shows investor confidence in that particular vertical. Money's flowing there. And just a few quick hits to sort of round things out and give a sense of the pace of development. Saw mentions of Apple's expected AI announcements at WWDC 2025. Hinching perhaps at things like visual AI capabilities integrated into their

16:02

ecosystem. Yeah, always anticipation around Apple events. And OpenAI hitting that pretty staggering $10 billion annual recurring revenue mark. Wow, $10 billion ARR. That's approximately $833 million a month. It shows the commercial scale they've reached surprisingly quickly. Also, news that their first planned open model in years has been delayed until later this summer. Yeah, that delay is interesting, especially after their big closed

16:25

model announcements like O3 Pro. Also saw AI companies prominently featured on the CNBC Disruptor 50 list, which again just underscores how much AI is seen as reshaping industries right now. No surprise there. And Microsoft backed Mistral launching its own reasoning model specifically positioned to rival OpenAI. Exactly. That Mistral point is key because it shows that the competition isn't just in speed or size, but also in that specific crucial capability of logical reasoning

16:54

that OpenAI is emphasizing with O3 Pro. So everyone's jumping into the reasoning game now. The race is definitely happening on multiple fronts. It's not just one dimension. And just a couple examples of like practical tools being able, things like. Hunter, an AI tool that can provide a resume review in under five minutes. Or Bubble, which lets you build no -code applications powered

17:14

by AI. So tools for everyone. Right. Those highlight that it's not just about the foundational models, but the proliferation of applications and tools built on top of them that are directly impacting workflows and creating new possibilities for individuals and businesses. The ecosystem growing around the big models. Okay. So wrapping up this deep dive. What are the big picture takeaways that surface from this collection of sources today? What should we leave people with? I think

17:39

the key insights. Pulling it all together are quite clear. First, we're seeing the AI frontier push towards not just bigger and faster, but smarter and more reliable models like OpenAI's O3 Pro. The reasoning focus. Specifically engineered for complex logic and trustworthiness, even if it means accepting a tradeoff in speed. Second, AI is clearly enabling faster, more accessible creative workflows like Google's VO3, potentially changing how content is generated and consumed,

18:08

you know, like those. Stormtrooper vlogs. And perhaps most significantly, AI is moving beyond just high tech or creative domains and is increasingly tackling fundamental, often mundane, but absolutely critical real world problems like that government paperwork bottleneck with Gemini Extract or even enabling breakthroughs in deeply human areas like health care. Yeah, it's not just about the raw advancement of the models themselves anymore,

18:36

is it? It's this parallel trend of specialization speed versus logic, for instance, and this really accelerating integration into fundamental parts of society, business, and even our personal lives. The integration point is vital. It's showing up not just in our chatbots, but in the infrastructure of government, in healthcare, in finance. It's becoming woven into the fabric of how things

18:58

work, sometimes invisibly. Which brings us to a final provocative thought for you to consider based on everything we've just explored in these sources. Given the clear tradeoffs we're seeing in these new AI models, things like balancing speed versus reliability and deep reasoning and recognizing our increasing dependence on these systems, starkly highlighted by things like that recent widespread outage. Yeah, that dependence

19:19

is growing fast. What aspects of AI do you find yourself prioritizing as these tools become more powerful and integrated into your own life or work? How do we collectively and individually think about balancing this incredible push for innovation and speed with the absolute critical need for robustness, accuracy, and trust in systems we're starting to rely on so heavily? That's the big question, isn't it? It's something worth mulling over as AI continues its deep dive into our world.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript