🎙️ EP 86: OpenAI’s Voice AI Just Got Real. And MIT Beat the WHO?

00:00

Imagine a voice AI, so natural, it actually detects your laughter, adjusts its tone. Or think about an AI predicting flu strains, maybe even better, faster than global health organizations. This isn't like some distant sci -fi idea, is it? Well, it's happening right now. Welcome to this deep dive, everyone. Our mission today really is to cut through all the noise. We want to pull out the crucial insights from the, well, the

00:27

absolute latest in AI. We're going to kick things off by unpacking a breakthrough in human -like voice AI. Then we'll skim across some other really exciting stuff, creative tools, coding assistance, things changing how we work. And finally, we'll land on something truly groundbreaking, a medical application that could honestly reshape public health globally. It's going to be a fast ride, but yeah, it's all pretty key to understanding where AI is today. Okay. Let's dive in then.

00:53

Our first big story. It feels like a really profound shift in how we interact with machines. This new era of voice AI, specifically open AI's GPT real time. For years, we've heard these promises, you know, genuinely human -like voice agents. Always just around the corner. Exactly. Always just around the corner. But our sources are saying that corner has finally been turned. This thing we've been waiting for, the truly natural voice interface. It seems like it's actually arrived.

01:22

It really feels like it. And what hits you immediately is just how natural it sounds. OpenAI has got these two new voices, Marin and Cedar. They're described as just incredibly smooth, really nuanced. And for you, the user, it gets even more specific. You can tell it talk faster, talk softer, sound warm, even sound cold. You can even ask for a specific accent. All 10 of their voice options

01:43

apparently sound way more natural now. leap more than just clear pronunciation right it's beyond just clarity and the clever part isn't only the voice quality it's it's the understanding behind it exactly this gpt real time it can pick up on non -verbal stuff like actual laughter or sighs or pauses yeah yeah and crucially it handles code switching like a champ Yes. Switching languages mid -sentence. It just flows with you. Doesn't get tripped up. Doesn't force you to pick one

02:12

language. That's pretty advanced. Yeah. It's not just mimicking them. Yeah. It's closer to real comprehension, real responsiveness. Which leads to the accuracy question, right? How much better is it? And the numbers are pretty striking. Okay. On Big Bench Audio Reasoning, that's a test for audio understanding accuracy, jumped to 82 .8%. Wow. What was it before? It was 65 .6%. So big jump. Then instruction following on multi -challenge, that went up to 30 .5 %

02:41

from 20 .6%. Also significant. And maybe most important for like actually using it day to day.

02:47

function calling on complex funkbench it hits 66 .5 up from 49 .7 okay hang on function calling just define that simply for sure it just means the ai actually doing something for you performing an action or task like book me a table exactly like that it goes and does the booking that's function calling got it and that improvement makes it way more useful obviously definitely and here's another detail i thought was really smart really speaks to making it feel seamless

03:13

oh yeah if there's a long api call Like if the AI needs a moment to fetch some info, GPT real time keeps chatting. It doesn't just go quiet. Nope. Doesn't leave you hanging, wondering if it crashed. It's more like how a person might pause, maybe say, hmm, let me look that up. Right. It stays engaged. Yeah. That small thing, maintaining the flow, it really changes the feel of the interaction, makes it less. Tense somehow. Yeah, I can see that. And this isn't just a small update, right?

03:42

No, not at all. Our sources are really emphasizing this. GPT Realtime is the first open AI model built as a voice -native AI agent right from the start. Meaning? Meaning it integrates audio, images, live function calling right into its core design. It's multimodal, built for conversation first. Okay. A multi -mobile beast designed for conversation, and the impact is pretty clear then. Oh, yeah. Sources say it's kind of making competitors like Eleven Labs maybe sound a bit

04:09

dated already. And it's definitely a direct serious challenger to Google Gemini Voice. It marks a real transition point from voice AI being mostly a demo gimmick, cool for a minute, but not that useful to being a genuinely deployable product, something you can actually build things with. Okay. So boiling it down. What's the biggest shift this new voice AI brings to our daily interactions? It makes natural, real -time voice conversations with AI truly practical, a tool you can actually

04:39

rely on. Right, practical and reliable, not just a novelty anymore. That leap from demo gimmick to deployable product for voice, it really is something. It makes you wonder how that's playing out elsewhere. Let's talk about AI in... creativity and maybe more practical workflows. Absolutely. So on the creative front, Google's got this new image model, Nano Banana. Nano Banana. Yeah. Sounds fun, right? But it's becoming a real fan

05:04

favorite. People are using it for all sorts of diverse things, unique visuals, experimental art. Seems like it's sparking a lot of creativity. Interesting. And sticking with visuals, I saw something about UCLA researchers. An AI image generator using light instead of electricity. Oh, yeah. I saw that, too. Wild stuff. Almost zero power consumption, apparently. Those results are good. Supposedly really impressive artistically. It's just a totally different way of thinking

05:28

about computation using light itself. Very cool. Okay, so switching gears to more practical stuff. Gemini 2 .5 Pro, that's getting buzzed for workflow automation, right? Huge potential there for streamlining tasks, yeah. Okay, here's a vulnerable admission. Beat. I still wrestle with prompt drift myself sometimes. Oh, tell me about it. Trying to get the most out of these big models, you know, where the AI starts kind of wandering away from your original instructions after a while. Yeah, losing

06:00

the plot a bit. Happens to everyone. It's definitely a learning curve. Even for us, right? It shows that, you know, human guidance is still pretty key. Absolutely. It's not just plug and play magic yet. And we're seeing other specialized tools pop up, too, like XAI's GRUC code fast one. For coding. Yeah. People are calling it good for vibe coding, like capturing the feel or intent of the code quickly without getting totally bogged down. Vibe coding. I like that.

06:23

And then there was that weird, brief, AI -enhanced Wizard of Oz thing. At the Las Vegas Sphere. Oh, right. With the faces of James Dolan and David Zaslav popping up for a second. Yeah. Super random. Super brief. But kind of shows AI bleeding into entertainment and spectacle in unexpected ways. Totally unexpected. Now, all this innovation, this deployment. Yeah. It takes serious money, right? Right, for sure. And the investment is pouring in. Like Accelera AI seeking, what, $150

06:53

million? Yep, backed by Samsung, too, for their energy -efficient AI chips. And Google put in a massive $9 billion. Plus, Cohere raised $500 million for enterprise AI. Exactly. This isn't just garage tinkering anymore. It's big players, big money, signals huge confidence across the

07:10

whole sector, a real industrial shift. So when you look at all these different things, creative tools, workflow helpers, coding aids, massive investments, how do these varied innovations reflect the current state of AI development? AI is getting both more specialized for specific jobs and more deeply woven into our basic tools and infrastructure. Right. Specialized and integrated. Building that foundation. Okay. That makes sense. Let's zoom out a bit now. Broader AI trends,

07:37

maybe where things are heading next. Feels like development has been just... It's incredibly intense lately. Intense is the word. Almost a Cambrian explosion, right? New models everywhere. Yeah. What really caught my eye was that one week, GBC 5, Clog 4 .1, new Gemini models. all landed practically at once. I know. My social feed just exploded. It was like trying to drink from a fire hose, just keeping up. Exactly. It really hammers home how fast we have to relearn

08:05

what's even possible. Definitely. And beyond those huge foundational models, there were other interesting little bits. Like MathGPT expanding. It's positioning itself as a cheat -proof tutor, apparently rolling out to over 50 organizations. Interesting angle. Cheat -proof. And even Microsoft's CEO, Satya Nadella, shared his top five favorite prompts for... GPT -5 and Copilot. Shows how leaders are actually using these tools day to day and, you know, that everyone's still figuring

08:32

out the best ways to talk to these things. That's a good point. We're all still learning. What about bigger strategic moves? Well, China's plans to triple its AI output stand out. Triple it. Wow. Yeah, it's a clear strategic push. Bolster their own companies, rely less on foreign tech. Makes sense from their perspective. And on the flip side, open AI. They seem focused on user safety. Talking about parental controls, emergency contacts for GPT. So innovation, but also trying

09:00

to manage the risks. Exactly. That constant balancing act. Push forward, but do it responsibly. And looking further out, what glimpses are we getting of the future? Well, Microsoft Copilot being embedded into Samsung's 2025 TVs is pretty telling. AI in the TV. As standard. Seems like it. Think about that. AI isn't just an app on your phone anymore. It's becoming a default feature in your living room. Your TV is an actual intelligent

09:26

companion, not just a screen. Right. That fundamentally changes how we interact with our home tech, doesn't it? Much more pervasive. It really does. Okay, so looking at all these trends, the rapid model launches, the strategic plays, the safety features, the deep integration, what's the overarching theme these suggest for AI's immediate future? AI is shifting from niche tools to being everywhere, often invisible, acting like cognitive backup for daily life. Ubiquitous, invisible, cognitive

09:56

backup. Got it. Okay. Now, for our final segment, let's turn to something, well, truly profound. AI revolutionizing healthcare. MIT researchers have introduced Vaxir, an AI system that predicts flu strains. And get this, apparently outperforms the World Health Organization. Yeah, this one really made me pause. It's fascinating when you compare it to the traditional way. Which is? Well, picking flea vaccine strains is tough.

10:21

They have to make the call six months, sometimes more, in advance, just to get production going globally. Right, it's a long lead time. So Vaxir. It takes a totally different approach. Proactive, data -driven. It uses something called protein language models. Protein language models. Explain that. Okay, think of it like AI learning the language or grammar of proteins. The grammar of life itself. Kind of. It learns how the building blocks, the amino acids, arrange themselves and

10:46

interact. This helps it predict how a virus's surface proteins, the parts the vaccine targets, might change or mutate over time. Okay, so it's understanding the virus at a fundamental level. Exactly. Then it combines that understanding with real -world lab data, plus simulations of how diseases spread. And what does it do with all that? It doesn't just predict which flu strains might be dominant next season. It goes further.

11:10

It identifies the best specific vaccine strains to actually neutralize those predicted dominant strains. So it's predictive and prescriptive, finding the best countermeasure. Precisely. It's a much more holistic, proactive way to tackle an old problem. And the evidence. Does it work? This is the really incredible part. They did a 10 -year retrospective study. Looking back, Vaxier beat the WHO's selections in nine out of those 10 seasons. Nine out of 10. Seriously.

11:37

Seriously. And it matched or outperformed the WHO in six seasons overall. Pretty consistent. And there was one specific case back in 2016. It picked the right strain a full year before the WHO did. Whoa. Hold on. A year ahead. A full year. Imagine that. AI predicting the flu, not just, you know, slightly better, but potentially years ahead. That feels like a whole new frontier for proactive health. Game changing. That is genuinely mind blowing. A year's warning. And

12:07

this wasn't just a simulation, right? It matched real world data. Completely. The predictions lined up with actual effectiveness data from the CDC in the U .S., Canada's Sentinel program, and Europe's iMove initiative. So it's validated. This isn't just a lab curiosity. Nope. It's real science. And the implication is huge. It suggests a whole new way of thinking about fighting disease, moving from just reacting to outbreaks. To predicting and preventing them on a global scale. Exactly.

12:33

Proactive, predictive, preventative. Okay. So stepping back from just the flu. Yeah. What does Vaxier's success suggest for AI in broader scientific prediction, like other diseases or even other fields? It shows AI can seriously speed up and improve scientific discovery, moving us from reaction to proactive problem solving everywhere. From reaction to proaction. A fundamental shift enabled by AI. Wow. Okay. Sponsor. So let's try and wrap this up to recap our deep dive today.

13:04

The core takeaway, it feels pretty clear, AI has really crossed the chasm, hasn't it? Yeah, definitely. From being experimental tech to being, well, deployable, genuinely impactful products. We've seen it in voice interactions becoming... Incredibly natural, almost human. And creative tools, smart coding, help making workflows better. Right. All the way to potentially life -saving medical predictions like Vaxir. The impact just feels undeniable now, and it's growing fast.

13:32

And what gets me is not just the depth of it, but the sheer speed and breadth. Yeah, the pace is something else. These breakthroughs are hitting everything. Our daily work, how we create things, even like we just discussed, the foundations of global public health. It's exciting, maybe a little dizzying sometimes, but you can't ignore it. No, you really can't. So maybe a final thought for you, our listener. As you encounter these evolving AI capabilities, think about how these

13:56

advancements are changing things. From a voice that seems to really understand you to medicine guided by AI. How is this reshaping our trust in technology? And maybe even, you know, reshaping our own human potential. Lots to think about there. Thank you for joining us on this deep dive. Until next time, keep exploring. Out to your own music.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript