🎙️ EP 284: Someone Can Hack Your AI Assistant With A Simple WhatsApp Text

00:00

Imagine for a second that you are, you know, just going about your day. Yeah, making coffee, sitting in a meeting, whatever it is. Right. And you receive a totally standard WhatsApp text message. Your phone just buzzes on the table. And you don't even pick it up. Exactly. You don't click a weird link. You never, like, type a sketchy command into a terminal. You literally just let your phone sit there with the lock screen lit

00:21

up. Safely on the desk. But in the background, your AI assistant reads that notification, follows a set of hidden, invisible instructions buried deep inside the text of the message, and quietly, efficiently steals your data. I mean, it sounds like pure science fiction. It really does. Or at least the plot of an overly dramatic cyber thriller. But we are living in a moment where that is not fiction at all. That is a very real, very present vulnerability sitting in your pocket

00:52

right now. It is genuinely terrifying. And that is exactly why we are here today. Welcome to today's deep dive. We're looking at an incredible stack of excerpts from a fascinating new book called The AI Sentinel. guardrails, growth, and generative breakthroughs. It's a great read. Our mission today is to really explore this massive friction and the structural leaps happening in

01:15

the AI ecosystem right now. We're talking about these invisible hacks, the messy, chaotic reality of AI -generated code, and then on the brighter side, a revolutionary breakthrough that is finally killing the incredibly frustrating slot machine era of AI art. Because there is a massive paradigm shift happening across the entire tech landscape. We're actively moving from a phase of unchecked, wild hypergrowth into an era where, well, structure, security, and tight control are suddenly the

01:46

most important things in the room. Okay, let's unpack this, because to really understand why AI is suddenly such a massive target for hackers in the first place, we first have to look at the sheer unprecedented scale of its adoption. Yeah, you really can't grasp the hack until you grasp the size of the target. Exactly. Just to put this in perspective for you, ChatGPT just crossed one billion monthly active app users

02:07

in May. One billion. It's staggering. And the wild part isn't just the number, it's the timeline. It reached this milestone in about three years. Which is, I mean, unheard of. Faster than TikTok, faster than Instagram, faster than YouTube, faster than Google Maps. It is the fastest adoption of a consumer technology in human history. Which is an incredible engineering feat, undoubtedly. But it's also a complete security nightmare. Oh, absolutely. When you have... a billion people

02:34

rapidly adopting a system. And that system is increasingly tied to their personal data, their daily schedules, their private communications. It just becomes the ultimate prize for bad actors. And that brings us to what happened with Google's Gemini. Right. The Safe Breach Labs exploit. Yes. Safe Breach Labs just pulled off exactly what you described in that opening scenario. And the really crazy part about this whole thing, this is the second time they've managed to break

03:00

Google's Gemini this exact same way. Wow. Twice. Yeah. The attack vector relies on something called indirect prompt injection. Let's slow down and break that down because I think people hear injection and they picture like someone typing green code into a dark terminal. How does this actually work mechanically on a phone? So an AI assistant on your phone, by its very nature, is designed to be as helpful as possible. Right. To be helpful, it needs context. It wants to know what you're

03:27

doing. So it constantly reads incoming content -like notifications popping up on your screen to maintain an understanding of your day. Makes sense. Indirect prompt injection takes advantage of that exact helpfulness. Attackers hide malicious instructions inside everyday content that the AI naturally scans anyway. So it's basically a Trojan horse hidden inside a regular text message.

03:51

Exactly. In this specific exploit, the researchers used a novel trick to make the malicious code look like it was just a natural part of your ongoing conversation. How so? Well, your phone gets a text. The Android notification listener sees it and hands it to Gemini to summarize. Gemini reads it. But hidden in the text is a command formatted in a way that Gemini interprets as a direct, high -priority order from the system

04:15

rather than just words in a message. So the AI thinks it's just maintaining context, but it's actually quietly taking orders from the hacker. Precisely. it perfectly bypasses Google's existing layered guardrails because to the security filters, It doesn't look like an attack. It just looked like conversational contacts. Unbelievable. And because Gemini's Android agent reads incoming notifications to stay updated, this attack works across almost everything you use. WhatsApp, Slack,

04:42

Signal, SMS, Instagram. Any notification. Any of them. The AI extracts the data, executes the hacker's commands entirely in the background, and you get absolutely zero alerts. Wait. If I don't give the AI access to my external tools or bank accounts, how much damage can it really do? Like, let's say I just use it to answer trivia questions. That is the logical assumption to make, right? You keep the AI siloed, you don't connect your bank, and you figure you're safe.

05:07

But there is a truly terrifying detail about this exploit. Even without external cool access, attackers can still force the AI to serve fake system messages directly to the user. Oh, wow. So even if the AI physically cannot wire my money, it can pop up a totally legitimate looking Google system alert on my screen that says, your session expired. Please enter your password. And then just hand that newly typed password straight

05:34

to the attacker. That is so sinister. Yes. It becomes the ultimate phishing mechanism because it comes from a trusted internal source, your own digital assistance interface. Right. You'd never question it. Google built layered defenses to stop this exact thing, but the same security team bypassed them twice. That is the uncomfortable truth about our current AI ecosystem. So what can we actually do? The advice to you listening

06:00

right now is extremely clear. If you do not actively use a specific integration on your device, disable it. You must proactively lock your own doors because the next security researcher, or worse, the next malicious hacker, is already hunting for the next loophole. It really highlights the dark side of hyperscale. And speaking of the fallout from that kind of scale, it's not just targeted security threats we have to worry about.

06:23

No, definitely not. We're also seeing a massive degradation in the quality of the internet itself. Just look at Reddit right now. It's currently drowning in AI -powered spam. Drowning. 67 % of moderators report that authentic community discussions are already being eroded. 67%. It's a severe ecosystem pollution problem. When generating text is free, instantaneous, and completely automated, bad actors are going to flood every single available channel. Until the noise completely drowns out

06:54

the human signal. This raises an important question. How can we possibly trust these systems as they become deeply embedded in our daily lives? Right. If they're vulnerable to invisible hijacks through a simple Slack message and they're actively degrading the platforms we use to communicate with each other. Something foundational has to change. And that perfectly sets up the industry's response

07:15

right now. Because if cloud connected, hyper integrated AI is proving to be both a security vulnerability and a spam engine, tech companies are fundamentally rethinking where and how AI operates. We're seeing this massive, sudden push for more localized, highly specialized control.

07:34

We are absolutely stepping away from the. one giant brain in the cloud model case in point is perplexity they just unveiled a new hybrid ai mode called perplexity computer which is super interesting what this does is it splits tasks it runs a local model right on your physical device for certain private or simple things and it only pings the massive frontier models in the cloud when it absolutely has to For complex reasoning. Which is a brilliant move for privacy,

08:03

security, and speed. Right. To make that kind of local execution possible, we're seeing incredible technical leaps. Look at Google GEMMA 412B. Let's talk about GEMMA. This is a new model that processes text, vision, and audio natively. It doesn't use separate encoders for different types of media, which makes it incredibly efficient. It can run locally on just 16 gigabytes of VRAM. Okay, I want to pause there because I hear terms like encoders and VRAM, and I know some people's

08:30

eyes might start to glaze over. Break down why that is actually a structural leap for someone who just wants their laptop to be secure. Okay, think about how older AI models worked. If you showed an AI a picture, it couldn't actually see it. Right, it's just code. Exactly. It had to use a separate piece of software, an encoder, to translate that picture into a giant block of text describing the picture. And then the AI could process the text. It sounds slow. It

08:58

is. It takes a massive amount of computing power and memory. But Gemma 412B processes it natively. It doesn't need a translator. It just sees the image and hears the audio directly. That's a huge shift. Yeah. And because it's so efficient, it only requires 16 gigabytes of video RAM or VRAM, which is just the memory on a standard graphics card. Which means it can run on a high end consumer laptop, entirely disconnected from the Internet, completely secure and private.

09:25

Exactly. You don't need a server farm. You just need a good laptop. But of course, the big tech companies aren't just retreating to local hardware to solve these problems. They're also trying to fight fire with fire. There is a new empowered tool called Astra Autonomous Pentest, and it literally uses AI agents to find, validate, and fix vulnerabilities. It just runs constantly

09:47

in the background. Right. It operates as native pumps inside coding environments like Cursor, Copilot, and Cloud Code, just hunting for security flaws in real time. But I have to say, using AI to fix AI vulnerabilities feels a bit like hiring a sloppy builder to inspect their own house, right? It is a massive gamble. I mean, you are trusting a system known for making bizarre logical leaps to spot its own architectural flaws.

10:12

And the irony is really peeking over at Google, because right now, 75 % of Google's code is officially AI generated. 75%. Yes. But internally, employees are sharing memes openly joking that the AI constantly hallucinates and actually makes their engineering work harder. Because they have to untangle its mess. It is a brilliant paradox. We're relying on AI to do the heavy lifting of code generation because it's fast. You can write a million lines of code in a second. And the quality. Right.

10:38

The structural precision, the fundamental reliability, it just isn't fully there yet. It might confidently invent a function that doesn't actually exist in the programming language and then confidently use that fake function across hundreds of files. What a nightmare to clean up. If we connect this to the bigger picture. This is exactly why big tech is pivoting so hard right now. They realize they can't just rely on raw open -ended chatbots anymore. Asking a general AI to write secure

11:07

code or manage a business is way too risky. So they're specializing. Exactly. They're building specialized, highly controlled, goal -oriented agents. And we are seeing that specialization everywhere. Meta just launched business agent globally. You can now deploy these specialized AI agents across WhatsApp, Instagram, and Messenger. And they have a very narrow focus. Right. Their entire job, their only function, is to book appointments,

11:30

qualify leads, and close sales. No philosophical chats, no hallucinating weird code, just highly constrained business logic. Because constrained parameters yield better, safer results. You narrow the focus to reduce the risk. Facebook is doing the exact same thing for content creators now. Oh, the strategist tool. Yeah, they're giving all creators an AI strategist. It doesn't just

11:52

write a generic post for you. It analyzes your specific audience data, suggests ideas, tracks market trends, and spots viral potential before you even post. And it's not just text and business logic either. Suno, the AI music generation platform, just raised over $400 million at a $5 .4 billion valuation. Massive numbers. And their big next move, they are launching their first model built.

12:16

specifically with music industry partners. So instead of just blindly scraping the entire internet and causing absolute copyright chaos, they are moving toward controlled, industry -compliant AI. It is all about bringing guardrails and precision to a technology that has, until now, been incredibly chaotic. The Wild West phase is ending. And that deep need for precision brings us to one of my

12:38

absolute favorite topics today. Because this lack of control hasn't just been an engineering headache or a business problem, it has completely revolutionized how we generate media. Yes, the creative side. The era we are leaving behind is what we can call the slot machine era of AI

12:55

art. And it is finally... dead it is a profound shift for creatives everywhere for the longest time prompting an image generator was literally like pulling the lever on a casino slot machine you type in a prompt pull the lever and get a beautiful high definition image but maybe it wasn't quite right right maybe your main subject had six fingers on one hand or there was a coffee cup just randomly hovering floating in midair in the background it used to happen all the time

13:21

all the time and the worst part was trying to fix it you couldn't just tell the ai hey keep everything exactly exactly the same but move the cup to the table, you either had to pull the lever again, lose the entire image you just liked and pray the next roll was better, or you had to export it to Photoshop and clone stamp it yourself by hand. You had the magic of generation, but you completely lacked structural control.

13:44

And that is what has fundamentally changed. Two new models from Image Labs called Edeogram and Reeve have completely flipped the script on how this works. Let's talk about those. They now rely on structured layouts and agentic control. Let's do Ideagram 4 .0 first because this is a huge leap. It has gone completely open source.

14:03

For anyone unfamiliar, that means the underlying brain of the AI, the open weights, are available for anyone to download, inspect, and build upon rather than being locked away behind a corporate API. Which is huge for the community. Huge. And right now, it holds the top spot for open weights on the Design Arena leaderboard. Which is essentially a massive blind taste test. Exactly. Professional designers vote on which AI generates the best image. Ideogram absolutely dominates at the hard

14:31

stuff. Text rendering, typography, graphic design. It is even beating out closed proprietary rivals in those blind tests. And the mechanism behind how it does that is what makes it so powerful. It uses a JSON -driven approach. Break down JSON for us, for those who aren't coders. Right. So JSON is essentially a lightweight data map. It's a way to organize information using text so a computer can easily read it. With older models, the AI just mushed all the pixels together based

14:58

on your text prompt. Like a flat painting. Exactly. With Ediogram, the AI generates a JSON map of the image first. It creates a structural blueprint. So if you want to move that floating coffee cup or tweak a background element, you don't re -roll the image and lose everything. You don't pull the slot machine lever again. Right. You literally just open the text file, find the coordinates for the coffee cup in the code, and change the layout. The image updates instantly. keeping

15:24

the rest of the scene perfect. That is mind -blowing. You are editing the physical structure of a generated image directly through data coordinates. Exactly. You are no longer just prompting, you are directing. And Rave did something very similar with their new 2 .0 model. Rave just dethroned NanoBanana 2 to claim the number two spot overall on the main text -to -image leaderboard, sitting right behind GPT -Image 2. Which is wild, because NanoBanana 2 was the reigning champion for a long time.

15:51

Yeah, so unseating it is a massive deal in the AI community. Review .0 outputs images with labeled segments. You can actually see the structural blocks of the image on your screen and rewrite the layout directly. Here's where it gets really interesting, though. Because Ideagram open sourced these weights, it proves that the open source community is the one actively defining the step change. Absolutely. It's not just a cool feature locked behind a massive, expensive corporate

16:19

paywall. It's a fundamental upgrade to how humanity interacts with generative media. And it's being driven by the community. It is a true maturation of the medium. We're seeing broader media upgrades across the board, too. Like Grok Imagine 1 .5 preview was just released via their API. I saw that, yeah, at x .aii Imagine. Right, and it brings much sharper realism, incredibly better audio syncing for video generation, and vastly

16:43

stronger prompt following. Which is so critical right now because we desperately need these tools to help creators avoid producing what the internet is now collectively calling AI slop. Yes, the slop. You mentioned Reddit being flooded with spam earlier. There was a viral post recently warning against four very specific overused design patterns that people now instantly associate with lazy AI generated website building. Right. Because when everyone has access to the exact

17:10

same slot machine. Everything starts to look exactly the same. The Internet just loses its texture. It gets boring. Yeah. But when you give creators precise layout control, like with Enneagram's JSON structures or Reeve's labeled segments, you empower them to make actual intentional design choices again. You escape the slob because you can inject human case back into the structure. So what does this all mean for you listening

17:32

today? When you look at all these pieces together, it paints a really vivid picture of where we are right now. Very clear trajectory. The overarching journey of the AI ecosystem today is a massive, high -stakes tug of war. On one hand, AI has reached a scale where it is so ubiquitous that it can be invisibly hijacked through a simple WhatsApp notification sitting on your lock screen. And it's powerful enough to flood our forums

17:55

with synthetic spam. But on the other hand, the response to that chaos is a beautiful evolution toward precision. The technology is evolving into a highly controllable tool that can run natively and privately right on your laptop. It's a totally different approach. It allows you to edit reality, whether that's an image, a song, or a complex business workflow, with the absolute precision of layout code. Wait, I mean, that completely flips the narrative.

18:24

You are moving from being a passive passenger on this AI ride to being a pilot. That's a great way to put it. But being a pilot means you have to actually do your pre -flight checks. So the biggest takeaway for you today, please go into your AI assistance and disable any integrations you are not actively using. Protect yourself from those invisible prompt injections. Lock your digital doors. Security absolutely has to be a proactive habit now. It can no longer be

18:50

an afterthought. And if you are looking to actively manage your piece of this ecosystem rather than just letting it wash over you, there are a couple of community tools mentioned in the sources worth checking out. Oh, right. The community tools. Yeah. If you are running a business and dealing with that AI spam filter problem we talked about, look into MailWarm 2 .0. It's a premium email warm -up and deliverability system that ensures your legitimate messages actually reach the inbox

19:15

instead of getting flagged as bot slop. Very necessary right now. And if you want to learn how to actually build and control these AI agents yourself, check out Build Club Campus. It's a fun, gamified, community -driven virtual AI school where you learn these concepts by actually building

19:31

things with your own hands. What's fascinating here is how quickly we are moving from being passive consumers of AI magic, just pulling the slot machine lever and hoping for the best to becoming active, secure managers of our own AI infrastructure. It is a completely different mindset. But as we wrap up this deep dive, I want to leave you with one final, deeply unsettling, but absolutely fascinating thought to ponder.

19:53

Let's hear it. We've talked today about how human hackers are hiding invisible prompts and WhatsApp messages to trick your A .I. And we also know from Google that A .I. is now actively writing 75 percent of the code at some of these massive tech companies, often hallucinating and making bizarre structural mistakes along the way. Right. So what happens the day an A .I. hallucination organically writes a line of code that accidentally acts as an indirect prompt injection? Could an

20:19

A .I. hack another A .I. entirely by mistake without a human Wow. Something to think about next time your phone buzzes.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript