🎙️ EP 96: Ex-OpenAI CTO Just Fixed a Bug Google Couldn't for 5 Years - podcast episode cover

🎙️ EP 96: Ex-OpenAI CTO Just Fixed a Bug Google Couldn't for 5 Years

Sep 12, 2025•16 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

One tiny team just did what OpenAI, Google, and Meta couldn't, fix the most frustrating bug in AI. Mira Murati’s new startup dropped open-source code that could finally make AI stable. That’s just the start.

We’ll talk about:

  • The shocking AI bug that made ChatGPT give different answers to the same question
  • How a $12B startup fixed it before anyone else (and why it matters)
  • A side-by-side test of ChatGPT vs. DeepSeek in healthcare, the results were wild
  • A look into what’s really coming next from Thinky Labs (spoiler: it’s not just research)

Keywords: Mira Murati, ChatGPT, DeepSeek, Thinking Machines Lab, LLM, reproducibility, AI tools, batch invariance, GPT-4o

Links:

  1. Newsletter: Sign up for our FREE daily newsletter.
  2. Our Community: Get 3-level AI tutorials across industries.
  3. Join AI Fire Academy: 500+ advanced AI workflows ($14,500+ Value)

Our Socials:

  1. Facebook Group: Join 255K+ AI builders
  2. X (Twitter): Follow us for daily AI drops
  3. YouTube: Watch AI walkthroughs & tutorials

Transcript

Have you ever asked an AI the same question twice with all the settings exactly the same and still gotten wildly different answers? Yeah. It's like walking into your favorite coffee shop ordering the exact same drink, you know? But it tastes completely different depending on how busy the barista is. Right. That kind of inconsistency, well, it can be incredibly frustrating, especially when you actually need... reliable outputs. Absolutely. That unpredictable quality has been a silent

bug, really. A pervasive headache for AI models for years. It just fundamentally undermined trust and scientific rigor. Yeah. But what's truly exciting now is that a small, really focused team has finally delivered a major fix. And today, we're diving deep into what that breakthrough means for all of us. Welcome to the Deep Dive. Today, we're going to unpack some, well, truly incredible breakthroughs and also confront a few humbling realities shaping the world of artificial

intelligence. Our mission here is really to distill these insights, giving you a shortcut to being well -informed. Yeah, we'll explore that surprising fix for an old, persistent AI problem. Then we'll shift gears a bit, touch on some exciting new tools pushing creative boundaries. Okay. We'll also discuss... major industry shifts, the immense financial investments pouring into the sector, and then look at what AI still can't quite do, especially in critical areas like, say, healthcare.

Right. Get ready for some genuine aha moments, you know, things that might reframe how you think about AI's future. Okay, let's jump right into what might be the most significant recent development then. Solving the frustrating inconsistency of large language models. Yeah, the big one. For years. You could give an AI the same prompt, often with its temperature setting at zero, which means it should be at its most deterministic, right? Most predictable. Should be, yeah. And

you'd still receive varied outputs. It just felt like rolling the dice sometimes. It was exactly like your coffee analogy. Even when you tell the AI, look, don't be creative. Just give me the most likely consistent answer. It still drifted. Researchers, developers, businesses. They've all been scratching their heads. Because if you can't reliably reproduce an AI's output, I mean, how can you build critical systems on it? Exactly.

And the core of the problem, as this brilliant team recently discovered, wasn't necessarily inside the AI model itself. Right, not the model logic per se. But in the server environment, your output actually changed depending on how many other people were hitting the AI server at the exact same time. Huh. So it's like the system was getting distracted? Kind of, yeah. Its internal state subtly altered by all those concurrent requests, leading to these non -deterministic

results. Wow. Okay, the implications of that discovery, they're enormous then. Totally. This irreproducibility messed with everything from scientific research where consistent results are just paramount. Absolutely. business reliability for AI -powered apps, and even the efficiency of training new models. You'd optimize something, run a test, get a great result. And then you can't get it again. Exactly. You couldn't replicate it consistently. Huge problems for any serious

application. So what's the fix? What do they do for this foundational challenge? Okay, so this team at Thinking Machines Lab, impressively led by ex -OpenAI CTO Mira Marotti, along with a meta researcher, developed something truly groundbreaking. It's called Batch Invariant Kernels. Batch Invariant Kernels. Okay, what does that mean in, like, plain English? Simply put, think

of them as a new computational method. They ensure the AI's internal state for your specific query stays completely unaffected by what other users are doing simultaneously. Imagine having like a dedicated, perfectly soundproofed workspace just for your task within the AI. It guarantees the exact same outcome every single time, no matter how busy the main office gets. Got it. It's a foundational stability layer. It just eliminates a huge source of irreproducibility.

the LLMs ignore the noise from other prompts being processed in the same batch? Honestly, what strikes me as truly remarkable here is that this kind of fundamental problem, right? Five years old, major players like OpenAI, Meta, Google, they seemingly hadn't solved it. Yeah, with all their resources. And then this relatively small team cracked it. That's... Genuinely inspiring, I think, for anyone working in tech. It really is. It shows that innovation doesn't always need

an army of engineers. A focused, agile team with a fresh perspective can spot a blind spot that maybe larger organizations overlook because of existing infrastructure or just different priorities. Whoa. I mean, just imagine scaling this newfound consistency. Think about a billion queries. All reliable. A truly reliable AI could transform so many industries. This is a huge step for trust and broad adoption. This batch invariant kernels

fix sounds monumental, clearly. But zooming out just a bit, why is this specific kind of AI consistency so utterly critical for the whole future trajectory of AI development? I mean, beyond just lab reproducibility. Well, consistent AI builds trust. That's huge. It enables scientific rigor and it makes business use cases truly dependable. Right. Dependability. That's key. So that foundational fix for consistency is a game changer, setting the stage for more

reliable AI everywhere. But even as we tackle core stability, AI is already hurtling forward. Incredible new capabilities seem to pop up constantly. What's caught your eye recently as a sign of where AI is heading next? Well, Google VO3 is making massive waves right now. It's a text -to -video model, but it can turn any image into accurate, highly realistic, lip -synced talking videos. Any image. Wow. Yeah. Think viral storytelling

on a whole new level. Imagine taking a static photo, maybe a historical figure, and having them deliver a compelling speech perfectly lip -synced. Okay, that's pretty wild. For content creators. It dramatically cuts down production time for things like character animation. or explainer videos, the fidelity is just astonishing. It really blurs the lines between a static image and a dynamic narrative. That opens up so many creative avenues and maybe some thought -provoking

ones too. Okay. And on the more practical, maybe operational side, OpenAI just released full support for MCP and ChatGPT. Right, MCP. Let's unpack that. What exactly does multimodal content processing let users do beyond just searching for information? MCP, multimodal content processing. It fundamentally shifts ChatGPT from being just a conversational answer machine to more of an action engine. Instead of just telling you how to update a ticket in your project management software, for example.

Okay. MCP could actually... trigger that update itself or generate a detailed report or even initiate an email sequence all based on your natural language commands it acts it acts it turns spoken or written intent into direct actionable steps within other applications linking different workflows together it's moving from conversational tool to operational one That's a big shift. And speaking of OpenAI, big news also for their structure, right? A non -binding deal with Microsoft letting

them restructure as a for -profit company. Yeah, that's significant. So what does this all mean for their future direction and maybe the broader AI landscape? it really points towards a heavy push for commercial viability you know maybe even an ipo in the not too distant future this move signals a significant acceleration towards productization market dominance and we're seeing similar financial trends elsewhere too Oracle's Larry Ellison, for instance, briefly became the

world's richest man. Right. I saw that. Thanks largely to an AI -driven surge in demand for cloud infrastructure. Then you've got Kupang, the South Korean e -commerce giant, investing about $54 million into a fund supporting 14 AI startups just in South Korea. Why? The sheer amount of money pouring into this space right now. It's just staggering. So all this investment is clearly a huge vote of confidence. But, you know, for a deep dive, are there any potential

downsides we should think about? Maybe bubbles, given this huge capital influx and rapid expansion? Well, yeah, rapid growth always carries risks. While the opportunities are vast, we are seeing some consolidation. And the valuations, well, in some cases, they might be running a bit ahead of actual revenue generation. Okay. It's definitely something to watch closely, especially in the startup ecosystem. But overall, AI is certainly moving strongly into video, operational automation,

and becoming a major economic force. Okay, so we've covered some big picture shifts, groundbreaking fixes. Let's maybe pivot to some rapid fire updates. New tools you might want to know about just showing how quickly the landscape is evolving. Absolutely. Beyond the big platforms, there's just this constant stream of new, often niche AI tools emerging daily. Right. On the creative side, you've got tools like Quick Deep Thick, pushing boundaries

with immediate face swapping. AI Figure Generator for turning 2D pictures into poseable 3D figures for artists, designers. creation from simple text, images, or even audio makes content production accessible to way more people. And for businesses trying to navigate this explosion, there's ThirdEye. It helps brands stay discoverable by tracking all the various AIs and their capabilities. That's a fascinating new challenge, right? Brand management

in the age of AI. Keeping track of the trackers, that's quite a lot of movement in creative and business apps. What about broader industry news? Well, Grammarly, a tool I think many of us use daily, is significantly expanding its reach. It now supports 19 languages beyond English. Wow, 19. That's a huge leap for global accessibility. It really is. And while Google's highly anticipated Gemini 3 isn't out this month, they've promised it's soon coming after their current 2 .5 Pro

version. So anticipation is building there. Okay. Keeping an eye on that. Right. And the language expansion is a big one for global users. We also have Stability AI launching Stable Audio 2 .5. This lets users create pretty impressive music tracks up to three minutes long now with unprecedented quality for AI Music Gen. Three minutes. Wow. In a really strategic development, both Alibaba and Baidu in China are beginning to use their own custom chips to train AI models. Ah, interesting.

Moving away from NVIDIA, maybe. Potentially, yeah. It's a significant move, not just for cost efficiency, but definitely for national tech independence, reducing reliance on external semiconductor suppliers. Right. That makes sense. Finally, on the education front. Florida State University is joining the Google AI for Education Accelerator. Okay. So it's truly clear, isn't it? AI is integrating into every corner of our lives. Creative work, national infrastructure, even how we learn. Honestly,

sometimes. I still wrestle with prompt drift myself, just trying to keep up with all these new features and tools. It's a lot to process and actually integrate into daily workflows. It really is a relentless pace of innovation, isn't it? What's the common thread you see weaving through all these diverse rapid -fire developments we just touched on? I guess. AI is just integrating deeper into everyday tasks, creative work, and yeah, global tech infrastructure itself. Mineral

sponsor read. Okay, let's shift gears now to a really important application of AI healthcare. We're going to discuss a recent study focused specifically on AI's role in patient education. Yes. A recent study put two prominent large language models, ChatGPT -40 and DeepSeq -V3, to the test. Okay. They were given identical prompts to generate patient education guides for four different chronic

diseases. The goal was really to see how well these AIs could produce clear, understandable, and reliable health info for the average person. And the results were quite illuminating, weren't they? Both models generated guys that typically landed at about a 9th, 10th grade reading level. Right, which is generally considered pretty accessible for adults. And they also scored over 80 % on understandability, meaning the content was easy

enough to grasp. And in terms of reliable quality, they scored an average of 47 out of 80 on the discern scale, which, for those unfamiliar, is a widely recognized tool for assessing health info quality. Yeah, it's a standard measure. So a score in that range indicates the guides provided solid foundational information. Good for basic understanding. For basic health info, they'll certainly do the trick. DeepSeq v3 did show a few standout moves that are worth noting,

I think. It exhibited more original writing. Okay. Its Turnitin similarity score was 32 .5 % versus ChatGPT's 46%. Now that's quite telling. A lower similarity score suggests less boilerplate language, more unique phrasing. Which you'd want inpatient education. Exactly. Something crucial for engaging materials that don't sound like they were just copied from a textbook. DeepSeq also offered more actionable content. It gave clearer next steps for patients 65 % of the time

compared to 50 % for chat GPT. That distinction, originality, and actionability, that really matters if you're trying to avoid generic language or give patients something they can actually do with the information. Right. But what does this study teach us about AI's capabilities when it comes to critical thinking, especially in a field as nuanced as medicine? Well, yeah, this is where

both models kind of fell flat. When asked for more complex tasks like providing financial style projections for health outcomes or evaluating nuanced personalized care strategies, they really struggled. They can articulate existing knowledge, structure it well. Sure. But. They're not critical thinkers yet. They sound intelligent, but they don't reason the way a human expert does. And crucially, this study strongly reinforced that they cannot and really should not replace doctors

or human educators for personalized advice. Not even close. Their role remains supportive, definitely not primary. This study offers a very clear picture then. What fundamentally does it teach us about AI's current and maybe future role in these complex human -centric fields like medicine? Well, it shows AI supports basic information delivery quite well now. But critical thinking and that nuanced care aspect, they absolutely remain human

domains. What a deep dive indeed from fixing a fundamental AI unpredictability bug with those batch invariant kernels. I mean, that genuinely changes the game for reliability. Huge. To exploring new creative tools like Google VO3, and then finally understanding AI's very real humbling limitations in critical areas like healthcare. The pace of innovation is truly staggering. Absolutely. We've seen incredible breakthroughs showing how even small focus teams can profoundly impact

the future of AI. That's exciting. Yeah. But we also saw a clear, important reminder. While AI can process, synthesize, format information brilliantly, the capacity for genuine critical thinking, nuanced human insight, and truly personalized care, that remains uniquely ours. It's a powerful

balance to understand. So as you continue to interact with and apply AI in your own world, maybe consider where its strengths truly lie, particularly its newfound consistency and where that invaluable human touch is absolutely essential. Yeah. Think about how we might leverage AI's reliable information delivery now that it's less like that unpredictable Starbucks, right? To free up human experts for what only they can

truly do. The critical thinking. The empathy, the judgment, all the stuff AI simply hasn't mastered. Thank you for joining us for this deep dive. We hope you gained some valuable insights and maybe a clearer perspective on this rapidly evolving field. Keep exploring. Keep learning.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android