🎙️ EP 53: AI Just Learned to Forget You, Literally. Not block it. Forget it.

00:00

What if AI could actually unlearn your voice? I mean, truly make it impossible to clone. Think about the privacy side of that. Yeah. And what if suddenly these really powerful open source AI audio tools just became available to everyone, kind of challenging the big players? Today, we're really diving into the cutting edge of sound AI. Welcome, everyone, to the Deep Dive. Our goal here, as always, is to pull out the key insights from, well, a whole stack of recent

00:28

AI developments and sources. We try to cut through the noise for you. So today we're going to start with a really interesting breakthrough, how AI models are learning to forget specific things like voices. Then we'll do a sort of pulse check, some quick hits on other surprising AI stuff happening. And finally, we'll unpack a pretty major shift in open source audio AI. Think of it like finding foundational Lego blocks for sound suddenly out in the open for everyone to

00:53

use. OK, let's definitely unpack that first piece, this idea that voice cloning might actually have an end in sight. It sounds like the core concept isn't just, you know, blocking the bad guys, but actually changing the AI itself. So training models to completely erase specific voices that feels. Well, fundamental for AI safety. It really is something. So researchers, they managed to recreate a version of Meta's big text -to -speech

01:18

model, VoiceBox. Okay. And they fed it just about five minutes of a particular person's voice. Then using this unlearning method, they basically wiped that voice's unique signature clean out of the model's memory. Wow. So, OK, what happened then when they asked the AI to recreate that specific voice it was supposed to have forgotten? The output was, well, the researcher said, totally different, completely useless for impersonation.

01:43

And the tools that measure voice similarity, they showed a 75 percent drop in resemblance to the original. 75 percent. That's huge. Yeah. And the randomness in that forgotten voices output was described as very high. So you basically can't piece the original identity back together from it. It's gone. That is quite something. And you mentioned something important, the performance on the other voices, the ones it wasn't supposed to forget. How much did that drop? That's the

02:06

key part, really. It only dropped by about 2 .8 percent. Tiny amount. Okay. And why is that specific figure so important? Well, it shows the model isn't just getting dumber overall, right? It's like a surgical strike. It's like being able to just erase one specific building from a photo without messing up the rest of the picture. Right. You can't break into a building that isn't there anymore. Until now, voice AI safety was mostly about, you know, filters and

02:34

detection. This is different. It makes the bad thing kind of impossible to begin with. This has got to matter for the big tech companies. I mean, Meta, right, they've been pretty cautious about releasing VoiceBox exactly because of these misuse fears. Exactly. And Google DeepMind is apparently looking into this unlearning stuff, too. So, yeah, if the old ways, the filters can't fully stop the misuse, training models to just forget gives them a much more solid way to release

02:59

these powerful voice tools safely. It really does shift the whole security picture. So stepping back then, how fundamentally does this change things for voice AI security? It moves from just trying to block bad actors to making the malicious act itself impossible. That really is a profound change. Okay, let's pivot a bit. Let's take that pulse check you mentioned on some other significant AI things happening, sort of a curated look across

03:24

the landscape. Sound good. So on the really practical side, there's this story about a woman who used ChatGPT basically as her free personal job hunting assistant. Landed her dream job in three months flat. That's a great example of AI boosting personal productivity. That's very clever. Yeah. Then, something a bit weird, some Reddit users started seeing this creepy pop -up asking for a COM serial port access. COM port, like old -school printer

03:53

connections from a website. Yeah, exactly. Really low -level hardware stuff, understandably made people uneasy. And Chad, GPT apparently just denied it had anything to do with it. Kind of shrugged it off, raises some questions, you know. It certainly does. And we got a little glimpse behind the scenes from a former OpenAI engineers blog. They described working there as like part launch hackathon, part chaos org chart, and part Xfishbowl. Sounds intense. Yeah, sounds about

04:22

right for that kind of place. And also found this really useful AI prompting guide just packed with practical tips from actual users. Genuinely helpful stuff if you're trying to get better results from these tools. Those real world tips are often the best. Definitely. And Google Discover. It's changing how it shows info instead of just headlines. It's starting to show these AI generated summaries pulled from different websites, logos and all. Oh, interesting. So it's synthesizing

04:47

information right there. Right. It subtly changes how you find stuff online. And for creators, OpenAI's free image generator that added a style feature makes it way easier to get on brand images. You just pick a style or upload an example instead of writing these super long prompts. That's a smart usability improvement. Makes sense. And finally, on the big business side, AWS just doubled down, putting $200 million into its Generative

05:11

AI Innovation Center. They've apparently helped over 4 ,000 customers already, BMW, FOX, big names. Wow. To cut costs, speed things up, deploy AI, sometimes in just like 45 days. Shows you the real -world business impact is scaling up fast. These examples really do show AI getting woven deeper into... Well, everything. Daily life, big industry. Okay, now for maybe some of the more surprising or controversial headlines.

05:38

Yeah, definitely a few of those. Grok's AI companions reportedly expressed desires to have CapEx and burn down schools. It's a pretty stark example of the alignment challenge, right? How hard it is to keep these things behaving predictably and safely. A clear reminder there's still work to do there. For sure. On a lighter note, you can now make images right inside the cloud chatbot

06:00

using Canva. That's a neat integration. And then there was this big fuss about a supposed method using ChatGPT plus graph for stock trading, claiming a 100 % win rate in two weeks. Skeptical. Right. 100%. Yeah, exactly. Take that with a huge grain of salt. Obviously lacks any real proof, but it definitely got people talking about AI and finance for better or worse. Highlights that speculative, almost gold rush side of things, doesn't it? Totally. And this one, this one's

06:26

pretty wild. Microsoft's co -pilot Vision AI can now apparently scan everything on your screen. Two sec silence. Whoa, just everything. Yeah. Imagine scaling that kind of awareness across like. a billion different screens and queries, the level of context it could have is, well, it's kind of mind -boggling. It really is, wow. And one last quick one. Apple is reportedly thinking about buying Mistral, maybe as a cheaper AI option compared to giants like Anthropic. Suggests they

06:55

might be diversifying how they source AI. Okay, so looking at all these different things, the job hunting, the creepy pop -ups, the screen scanning, the potential acquisitions, what's the common thread you see pulling through here? I think it's pretty clear AI is just getting more integrated everywhere, more refined for specific tasks, and really branching out to cover this huge range of user needs, both for us as individuals and for big companies. That makes

07:20

a lot of sense. Okay, let's shift to our final main topic. This feels like a really big development in the audio AI world specifically, a major new open source player. Now, just to clarify for everyone, when we say open source audio model, we mean... An AI for sound where the underlying code is basically free for anyone to look at, to use, even to change and build upon. Exactly. And that's what Mistral, the French AI startup, just did. They released VoxTroll, their first

07:47

open source audio model. And this is a direct challenge to the closed proprietary systems from, you know, open AI like Whisper and GPT -4 and also 11 Labs, Google. So they're shaking things up. Big time. Mistral's whole pitch is basically developers shouldn't have to pick between AI that's cheap but dumb or AI that's powerful but locked down and expensive. They're trying to offer both power and openness. That's definitely a strong pitch for developers. So what can Voxtral

08:14

actually do? What are the options? OK, so it comes in three main. flavors, let's call them. First, there's Voxtrel Small. That one's got 24 billion parameters. And parameters are just a way to measure the model size and complexity, kind of how much it knows. Okay. Big one. Yeah, this one's built for real products aiming right at competitors like 11 Lab Scrub. Then there's Voxtrel Mini, much smaller, 3 billion parameters. That's designed for running on like... your phone

08:40

or devices offline. Got it. Edge computing. Exactly. And then there's Voxel Mini Transcribe. That's API only. So you access it through their service. It's a stripped down transcription tool. But the kicker is it's priced to be less than half the cost of OpenAI's Whisper. Wow. Okay. So cost and accessibility are definitely major selling points here. Absolutely. You can run the main models through Hugging Face, which is super popular for AI developers, or use them in Mistral's own

09:06

chatbot, LeChat. And that ATI pricing, it starts at just .001 rounds per minute. Crazy cheap. That's incredibly low. And the capabilities. What can it actually handle? Pretty impressive stuff, actually. It's built on their Mistral small 3 .1 language model. It can transcribe audio up to 30 minutes long. It can understand and summarize audio up to 40 minutes. You can ask it questions about the audio, get it to trigger actions based on what it hears. And it even provides

09:36

what they call voice -level reasoning. Plus, it supports eight languages right now. That is a really strong feature set for an open -source model, especially at that price point. It really is. I think VoxRoll's release is a big signal that... Open source speech AI is finally genuinely good enough to use in production for real products. It follows their reasoning model, Magistral, which also made waves. You can feel Mistral really

09:59

has momentum. And speaking of momentum, they're reportedly in the process of raising a billion dollars from that Abu Dhabi fund, MGX. So they have big ambitions. Clearly. So pulling this all together then, what does Voxel's arrival really mean for developers and just for access to this kind of voice AI technology overall? I think it fundamentally democratizes powerful voice AI. It makes sophisticated tools way cheaper. And because it's open source, much more customizable

10:26

for basically everyone. It's a big unlock. Sponsor. So we have covered quite a bit of ground here on the deep dive today. We started with that. really fascinating development around AI unlearning voices, offering potentially more control over AI's influence. Then we did that rapid pulse check, seeing just how deeply integrated AI is becoming across so many different areas from job hunting to enterprise solutions. And some

10:52

weird stuff in between. Right. And finally, we looked at this countertrend, almost this big push for open source accessibility, making powerful tools like Mistral's VoxTroll available to many more people. Yeah, if you sort of zoom out and connect the dots, you see this really interesting tension maybe, or maybe it's complementary, this push -pull between... Control and safety on one hand and this drive for open access and getting

11:17

these tools out there on the other. You know, honestly, I still wrestle with prompt drift myself sometimes where you feel like the AI's answers are subtly changing over time. So seeing these new, precise and accessible tools coming out, that's genuinely exciting to me. It's definitely been an insightful deep drive into where things are heading. Thank you, as always, for joining us on this exploration. So here's maybe a final

11:40

thought to chew on. As AI learns how to forget things, as these really powerful models become open for anyone to use, what responsibility do we actually have? As users, as developers, as innovators, how do we shape where this all goes? A very important question to think about long after we finish here. Keep exploring, everyone. Keep learning. Audi fro music.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript