🎙️ EP 77: GPT-5 Beats Doctors & Why “Cheap” AI Could Break Your Budget

00:00

Have you ever wondered if that free AI you're hearing about is maybe actually a hidden money pit for businesses? Or if a machine could genuinely outperform a doctor in training on really complex medical stuff? Today, we're taking a deep dive into some pretty surprising AI realities. Welcome to the deep dive. Yeah, we're here today to unpack a fascinating stack of sources. They really challenge some of our core assumptions about artificial

00:25

intelligence. That's right. Our mission today really is to give you a shortcut to understanding AI's rapidly changing landscape. We'll start by busting the myth of cheap open source AI. It's not always what it seems. Definitely not. Then we'll sort of race through a series of rapid -fire AI, highlights everything from, believe it or not, viral cat videos to some big corporate strategy shifts. And finally, we'll grapple with a truly groundbreaking study, an AI that, well,

00:53

it just outperformed human medical interns. It's quite the journey into AI's surprising new landscape. Okay, so let's unpack this first big idea. Many people assume open -source AI is always the most affordable option. It just feels intuitive, right? Free code. It does feel that way but a recent study tells a very different story about the actual cost. Yeah. Tell us about that. It really does. A new NASA research study just dropped, and their findings are pretty eye -opening, especially

01:21

for anyone actually running AI systems. They found that these so -called free open -weight models can actually cost you more in the long run than maybe using something like OpenAI's APIs. Oh, so? Well, the research shows these open -source models can burn through 1 .5 to 4 times more tokens than their closed counterparts. Okay, wait. When you say tokens, what exactly are we talking about here? For folks maybe not

01:46

deep in the weeds on this. Good question. So tokens are like the small pieces of words or data and AI processes. You can think of them as tiny building blocks of information the AI works with. So more tokens just means more work for the AI, more processing power needed. And ultimately more cost to you, the user. Right, exactly. So if a model uses more tokens for the very same task, it's essentially less efficient. And the study highlighted this inefficiency even

02:13

further, didn't it? It got pretty extreme in some cases. Oh, absolutely. For really simple Q &A tasks, some of these open source models used a shocking 10 times more tokens. Imagine asking, you know, what's the capital of Australia? And the AI basically writes a short novel to give you Canberra. It's overkill. Wow. That's

02:34

significant. Yeah. Meanwhile, closed models like OpenAI's 04 Mini, they demonstrated superior efficiency, particularly with complex tasks like math, where their internal reasoning seems much more compressed. That's a huge difference in efficiency. And you mentioned something like, what, 51 % of companies are already running AI in production. This kind of inefficiency quickly

02:56

translates into runaway compute costs. It really highlights how that per token pricing we see advertised can be, well, pretty deceiving if the model just eats way more tokens to get the job done. Precisely. Your seemingly cheaper open source option could quietly, you know, devastate your compute budget if you're not paying close attention to its actual token consumption. So why is this happening? What's the fundamental difference in how these models reason that leads

03:21

to such a big gap? Well, it seems to come down to their architecture, maybe their training philosophy. Closed source providers are internally compressing their reasoning pathways. They've really optimized their models to perform complex tasks with fewer internal thinking steps, basically. That shrinks the token count significantly. Open source developers, on the other hand, often extend these reasoning

03:44

chains. They might add more explicit step -by -step thinking, maybe for accuracy or robustness, perhaps. to cover more edge cases. Well, that means more tokens. But that means more tokens and inevitably more cost. It's a clear trade -off. So it's not just about getting the right answer anymore. It's becoming about how efficiently the AI gets to that answer. Exactly. Token discipline. It isn't just some technical jargon anymore.

04:07

It's actually becoming crucial for managing your budget and, frankly, making AI deployment sustainable long -term. So for businesses out there trying to save money with AI, what's the real key takeaway here, the core message? The key isn't just the per token price. It's the total token efficiency that matters. Okay, let's shift gears now. Let's talk about some of the most talked about AI happenings right now. Kind of a wild mix, really showcasing how AI is just popping up everywhere. It really

04:39

is. We recently saw an AI builder spark this online challenge, right? Yeah. Inviting all AI creators to share their coolest AI -made art, videos, tools. Yeah, that was neat. And the result was this incredible sort of crowdsourced stream of creativity. It really shows how AI is maybe... democratizing artistic creation in a way. And speaking of access, if you're looking to get into that, Harvard University is offering 12

05:02

free online AI courses in 2025. That definitely widens access to some pretty critical knowledge. Okay, now here's where it gets truly interesting. Yeah. Maybe a little weird. AI CAD videos have gone super viral. Oh, I saw some of these. And

05:16

they're surprisingly bizarre. We're talking like... buff cats on revenge missions or uh billy eilish meows dubbed into 30 seconds soap operas millions are genuinely addicted it's like a whole new genre of internet culture just spawned beat whoa the sophistication in these little narratives is kind of wild makes you realize how far ai generation has come even if You know, many of us still wrestle with prompt drift, just trying to get simpler outputs. Yeah, it's a truly unique

05:43

corner of the Internet, that's for sure. On a more practical note, Genspark AI just launched something called AI Developer, which is like a vibe coding tool. Apparently, one user built a working Mario game in just five prompts. Five prompts. Wow. Talk about rapid prototyping. That really lowers the barrier to entry for coding, doesn't it? Absolutely. We also saw some more controversial news recently, though. Ignite Tech CEO. He controversially cut 80 percent of staff

06:09

who resisted AI adoption. His quote was something like belief was harder than skills. Oof. They even reportedly instituted AI only Mondays. It kind of highlights the intense pressure some companies feel to integrate AI really fast, sometimes at the cost of human jobs. But then on the flip side, Duolingo's CEO clarified that their AI first memo was widely misunderstood. He stated no full -time staff lost jobs due to AI, and their AI Fridays are now just a weekly internal

06:40

thing for exploration. Okay, so maybe more of a PR misstep there than a policy shift. Sounds like it. An important clarification about their approach, though. And finally, on the investment front, the U .S. government, alongside NVIDIA, is investing a pretty significant amount, $152 million, into building open -source AI models specifically for science. That's good to see. Yeah, the idea is to help universities catch up, especially as the costs for private, cutting

07:06

-edge AI models continue to soar. So thinking about all these incredibly diverse headlines, what's the biggest takeaway here? What connects them? I think it's that AI isn't just a tech trend anymore. It's fundamentally reshaping culture, business, and education. OK, now let's zip through some quick, intriguing insights from the AI world. These really reveal how we're starting to interact with it, maybe without even noticing sometimes

07:29

day to day. All right. Rapid fire. PwC University apparently offers five distinct strategies to help avoid what they're calling AI paralysis. Essentially, how not to get. killed by AI, in their words. Character AI is betting big on these persona -based AIs. You know, the idea is to give you a personalized bestie for conversation, companionship, maybe more. Interesting. On a more scientific note, AI is now actually designing bizarre new physics experiments that... get this,

07:59

actually work. Really? That's wild. Yeah. Google's former AI lead offered an interesting perspective, too, saying it's basically too late now to get a PhD specifically for the AI boom timing. Right. Yeah. And finally, Gemini Canvas now lets anyone tweak app designs using just simple descriptive words like make the button blue. So putting these snapshots together, what do they reveal about how AI is kind of weaving itself into our daily lives and workflows? I'd say AI is fast becoming

08:25

a creative partner and just a seamless day. tool for many people. All right. This next deep dive is, well, it's truly remarkable. Maybe bordering on astounding, frankly. GPT -5 has demonstrably outperformed human medical interns in specific diagnostic tasks. This isn't just like a small step forward. It feels like a significant leap. Yeah, this comes from a new study by Emory University's radiation oncology team. And they really put

08:53

GPT -5 through its paces. They pitted it against earlier AI models like GPT -4 -0 and actual human medical interns. And crucially, GPT -5 wasn't fed exam answers or pre -digested information. It was a pure test of its analytical and reasoning capabilities in a real diagnostic context or simulated anyway. And how exactly did they test its reasoning? What kind of prompts? were involved. That seems key. They use something called zero

09:19

-shot chain of thought. Which is a bit technical, but basically means the AI thinks step by step to reach an answer without needing specific examples or training on those exact problems first. It's kind of like it figures out the logical path on its own, showing its internal thinking process. Okay, so it's reasoning from first principles, essentially. And they gave it some incredibly complex scenarios too, right? Demanding more than just crunching text. Oh, absolutely. They

09:42

used multimodal prompts. Now that means they combined patient history, so the text, with actual medical images. Things like... like CT scans, MRIs, or x -rays. GPT -5 had to understand both the visual data and the textual context, then connect the dots to make a diagnosis. That's a very human -like, high -level task. And the results were, well, quite definitive. Probably surprising for many in the medical field, I'd

10:07

imagine. Yeah, GPT -5 crushed it. On these multimodal reasoning tasks, it showed a nearly 30 % gain in logic and over 36 % gain in understanding compared to GPT -4. That's a huge jump between versions. And where GPT -4 actually lagged behind the interns by about 5 % to 15 % on these specific tasks, GPT -5 surged ahead by over 24%. It's a significant quantifiable leap in its diagnostic capabilities in this controlled setting. But there's always a catch, isn't there? A but. But

10:34

this was all in ideal lab settings. Real hospitals are just incredibly messy. Very different environment. Yeah. You've got incomplete records, the complexities of human emotion, huge ethical concerns, legal constraints everywhere. And AI might ace an exam like this, but building bedside trust with patients, that's a completely different challenge. Two sec silence. Honestly, I still wrestle with the concept of fully trusting AI in critical life and death scenarios myself. It's a huge leap

11:01

from the lab to the actual clinic floor. Right. And if GPT -4 was maybe like a helpful but still learning med student in these tests, GPT -5 is certainly performing at the level of an attending physician, maybe even beyond, on these specific reasoning tasks. The capabilities showcased here are truly astounding. They really push the boundaries of what we thought AI could do autonomously in diagnostics. Yeah, this isn't just about technical

11:25

accuracy. It brings up huge questions about regulations, liability, how existing medical workflows would even adapt. So the implication seems clear, at least from this study. In specific, complex tasks involving both text and image data from patients, GPT -5 is demonstrably beyond doctor level, outperforming humans in multimodal reasoning within that test environment. It's pretty staggering. So what do you think is the biggest hurdle for AI moving from these amazing lab results to actually being

11:53

used widely in real world hospitals? Oh, it's got to be trust, dealing with human complexity, and just adapting to all that real world chaos. So wrapping this all up, what does this all mean for us? We've seen that AI's promises, and maybe its pitfalls, are often far more complex than they might initially appear. From those hidden costs that can really impact a company's bottom line to these truly groundbreaking capabilities

12:18

emerging in fields like medicine. Yeah, it seems like efficiency, ethical application, and just having a nuanced understanding of AI's true strengths and weaknesses, those are becoming absolutely paramount for navigating this whole evolving landscape. This deep dive really just scratched the surface of our sources today. It really did. We definitely encourage you to keep exploring these fascinating shifts. You know, ask yourself, how might AI's efficiency gains affect your work?

12:43

And where will we see AI's beyond human abilities maybe emerge next outside of the lab? And here's something to maybe ponder. If AI can now demonstrably out -diagnose human interns in specific, complex tasks. What truly fundamental human skills will remain irreplaceable in an increasingly AI -driven world? That's the big question, isn't it? Thank you for joining us on this deep dive.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript