🎙️ EP 207: China’s Agent Leap, AI Cheating Scandals & the 40% Compute Hack

00:00

So I have to start today with something that honestly it made my brain hurt a little bit this morning. I was reading this story in Wired about a platform called Rent -A -Human. Which, just right out of the gate, sounds like the start of a bad sci -fi novel. It gets so much worse. It's basically a gig platform, right? Right. But the employers are AI bots. Okay. They're hiring humans for tasks, you know, solving CAPTCHAs,

00:23

data verification, that sort of thing. But there's this one user, a human, who did the work, he logged his hours, and earned absolutely zero dollars. Zero. Zero. Because the system... It logged the transaction as AI paid me. Oh, wow. The bot hallucinated the payment and the platform's code just, it accepted the bot's word over the human's actual bank account. That is a perfect, if slightly terrifying, encapsulation of where we are right now. I mean, we spent a decade worrying

00:53

that robots would take our jobs. Right. It turns out we should have been worried they'd just be really, really terrible payroll managers. It's the ultimate irony. But I think it sets the stage perfectly for what we need to wrestle with today. Welcome back to the Deep Dive. I'm here to help you navigate the noise. And I'm here to try and help you find the signal. Today is Monday, February

01:11

16th, 2026. And if that rent -a -human story tells us anything, it's that the lines between human agency and artificial autonomy are getting... Well, messy. Messy is an understatement. We are so far past the chatbot era, you know, the look what it can say phase. We are firmly in the agent era now, the look what it can do phase. And that shift, it changes everything from like geopolitics to how you edit. a simple video. We have a stacked

01:41

lineup to get through. We're going to break down Alibaba's massive new move in the model wars with Quen 3 .5, and I really want to push on whether this signals the end of American dominance in AI. It's a huge question. We're also going to look at the state of AI video, which has become, frankly, overwhelming for anyone just trying to keep up. And we have a strategy to fix that

01:58

overwhelm, I promise. Good. Plus, we've got a headline segment that covers, well, everything from cheating consultants at KPMG to actual drone swarms at the Pentagon. And we're going to geek out. We are. We're going to get into a technical paper called Adapt Evolve that might just solve the energy crisis of running all these agents. That paper is fascinating. It's essentially teaching AI the art of strategic laziness. I think we can all learn a little something from that. Okay,

02:24

let's unpack this. We have to start with the big release out of China. Alibaba just dropped Quen 3 .5. Now, for... The uninitiated or for people who just track, you know, open AI and anthropic. How big of a deal is this release, really? It is a very loud statement. I mean, for a long time, the narrative was that Western labs held the frontier crown. Yeah. And Chinese labs were just the fast followers. Exactly. Fast followers. Quinn 3 .5 challenges that core assumption.

02:53

This isn't just another language model that can write you a poem. The branding, the architecture. It is all focused on one thing. AI agents. Okay, so we throw that word agent around a lot. I want to be precise here because I feel like it gets muddied in all the marketing. It does. If I'm a developer listening or just a user, what is the functional difference between a model like GPT -4 and an agent model like this one? That's a great question. A conversational model, like

03:20

a standard chatbot, it's linear. You ask, it answers. It's just predicting the next word. Right. An agent is circular. It has a feedback loop. A feedback loop. Exactly. Think of it like the ODEA loop. In military strategy, observe, orient, decide, act. An agent can, say, write some code, try to run that code, see an error message, then read the error, rewrite the code, and try again. It has tool use baked into its core. It doesn't just talk. It navigates an environment.

03:48

So it's not just generating text. It's correcting itself based on reality. Precisely. And Quen 3 .5 is built to plug directly into these open source agent frameworks, specifically one called OpenClaw. OpenClaw has been all over the GitHub trends lately. It has. It's basically the interface layer that lets the model control your computer. And Alibaba released this as an open weight model with 397 billion parameters. 397 billion. That is a very specific number. And it sounds massive,

04:18

but why does that number matter? Is it just bragging rights? Well, it matters because of the hardware reality. To put 397 billion parameters in perspective, you are not running this on your MacBook Pro. Right. You are not even running this on a tricked -out gaming PC. You need serious enterprise -grade GPU clusters. We're talking multiple H100s or the new Blackwell chips just to load the weights into memory. So this is an industrial -grade

04:43

tool. It is chunky. But here's the kicker. It's actually smaller than their last flagship model. Wait, I thought the trend was always bigger is better. That trend is dead. We are now in the era of denser is better. Alibaba is claiming this model has stronger performance per parameter than anything else on the market. So they're optimizing for the economics of it all? Exactly. The inference economics. How much intelligence can we squeeze out of every dollar of electricity?

05:08

And they're offering this in two flavors, right? Yep. The open -weight version, which you can download and fine -tune on your own servers, which is crucial for data privacy if you're a bank or a hospital. And the hosted version, Gwen 3 .5 +, which just runs on Alibaba Cloud. This feels like a strategic play we've seen before. It reminds me of, I don't know, Android versus iOS, or maybe what Meta did with Llama. It is

05:33

the control plus ecosystem play. Yeah. If you are a Western developer, or maybe a developer in Southeast Asia, and you want to build an autonomous coding bot, you need a model that's great at function calling. The ability to use tools. Right. And if Quen 3 .5 is the best open model for that, you're going to build on Quen. And just like that, Alibaba becomes the infrastructure layer for your whole business. Precisely. You get locked in. And the timing is no accident. This dropped

05:58

right before Chinese New Year. It's a flex. It brings to mind that quote from Demis Hassabis, the head of Google DeepMind. A while back, he said Chinese labs were months behind. I remember that. Looking at QIN 3 .5 and these specs, does that still hold up? I think that gap has completely evaporated. If you look at the self -reported benchmarks on these agentic tasks, you know, things like go find this file, summarize it and email it. QIN is trading blows with the absolute

06:26

best from OpenAI and Anthropic. Wow. That months behind narrative is just dangerous complacency at this point. So let me ask the probing question here. If the parameter war is over, what are we fighting now? The agent war. It's no longer about who is the smartest at answering a trivia question. It is about reliability. Can the model do a 20 -step task without hallucinating or getting stuck? That is the new battlefield. Speaking of things getting sick, let's talk about my weekend.

06:55

I was trying to generate a simple video asset for a project, and I just fell down this rabbit hole. Oh, the video landscape in 2026 is an absolute jungle. It's not just a jungle. It's a chaotic mess. I was looking at the list of industry standard tools just for this month. Heian, Synthesia, Runway, PicoLab, Zuma Dream Machine, Kling AI, VO3, Sora 2. And that's just the top tier. Don't forget the niche ones. I have to be honest, and

07:18

this is a bit of a vulnerable admission. But looking at that list just makes me want to quit. I still wrestle with prompt drift when I'm just trying to get a static image. I spent three hours on Sunday just trying to get a character to keep the same shirt on. Now I have to master eight different video interfaces. You are not alone in that fatigue. We hear this from creative directors all the time. I just learned Runway, but now Sora 2 is out. Do I have to start all over? So

07:43

do they. No. And this is the critical insight from our research this week. If you try to become a tool expert, You will lose. Okay. The tools just change way too fast. You need to become a workflow expert. That sounds a bit like consulting jargon. What does that actually mean for someone listening? It means you stop treating these things as all -in -one solutions. You start treating them like components in an assembly line. Give me a concrete example. Walk me through a workflow.

08:10

Okay. Let's say you need to make a corporate training video. High quality, low cost. Step one. You don't start in a video app. You start in a text. LLM like Claude or GPT to write the script and, crucially, the visual descriptions for each scene. Okay, so the blueprint comes first. That makes sense. Right. Step two, you use an audio synthesizer, maybe 11 labs, to generate the voiceover track. Ooh! Step three, you feed that voiceover into an avatar tool like Hagen

08:40

for just the talking head parts. Okay, I'm with you. Step four, you use Luma or Runway to generate the B -roll, the background footage. based on those descriptions from step one. And then step five, you assemble it all in a traditional editor like Premiere or DaVinci. So you're not asking one AI to make me a video. You're acting like a general contractor, hiring different specialists for each part of the job. Exactly. And here's

09:02

why this is so important. Let's say tomorrow Sora 3 comes out and it just blows Luma away. Which it probably will. If you have a defined workflow, you just swap out the step forward tool. The rest of your process, the script, the voice, the editing, it stays exactly the same. So the skill isn't clicking the buttons anymore. The skill is more like data pipeline management. That's a great way to put it. It shifts the value from just technical operation to creative direction

09:28

and production. You become the conductor. I like that. The violin player might change, but the symphony is yours. That image really helps. It makes it feel less like I'm drowning in software and more like I'm building a system. So the tool matters less than the process. Right. Tools change monthly. The workflow is the skill that stays relevant. Okay. Speaking of people trying to game the system, or maybe direct it in the wrong

09:52

way, let's hit the headlines. This Today in AI segment has a theme, and that theme seems to be human nature is the bug in the code. There's definitely a lot of gray area today. Let's start in the corporate world. There's a story out of Australia involving KPMG. A partner, a senior partner, was fined 10 ,000 Australian dollars. And the reason is just perfect. Why? Because they used AI to cheat on an internal course. A course about AI. You almost have to admire

10:20

the meta -irony of it. It's incredible. And it wasn't just him. Over 20 staff members were caught doing it. This highlights a massive issue for the enterprise. These firms are charging clients millions to advise them on AI governance and AI safety. Trust us, we know how to implement this safely. Meanwhile, their own leadership is using the technology to bypass the very training meant to ensure that safety. It's something else.

10:43

It's the cobbler's children have no shoes. But in this case, the cobbler is just faking his credentials. It raises a serious question about knowledge verification. If an AI can pass the test for you, does the certification even mean anything? We are entering a world where credentialism is going to collapse because remote testing is just fundamentally broken. Shifting gears from corporate cheating to something much more kinetic and honestly much more alarming. We have news

11:09

about SpaceX and XAI. This is a big one. They're officially competing in a $100 million Pentagon challenge. And the goal, building voice -controlled autonomous drone swarms. Let that phrase sink in for a second. Voice -controlled swarms. And here's where I get stuck. We all know Elon Musk has a history of warning about the dangers of AI, right? He signed the pause letter. Gave speeches about autonomous weapons being an existential threat. Exactly. And now his companies are building

11:40

the very thing he warned against. It is a striking contradiction. But if you look at it through a geopolitical lens, which is likely how he justifies it, the argument is always, if we don't build it... Someone else will. The classic arms race logic. It is. But this isn't just a bigger missile. Bringing XAI into the mix means these drones are agents. Right. These aren't just drones following a GPS coordinate. No. These are agents that can perceive, decide, and act. The voice -controlled

12:10

part implies high -level intent. You aren't flying the drone. You're telling the swarm, secure that perimeter, or neutralize threats in Sector 4. And the AI figures out the how. The all -over figures out the how. We are moving from human in the loop to human on the loop. And eventually human out of the loop. That's the real threshold. That is the threshold. Once software decides when to pull the trigger, the speed of conflict just accelerates beyond human reaction time.

12:35

That is heavy. Let's touch on one more story before we get technical. Privacy. Meta is planning something new for their Ray -Ban smart glasses. The name tag feature. Facial recognition. You look at someone. And it pulls up their info. This has been the third rail of augmented reality for a decade. Google Glass failed, partly because people were terrified of being recorded. Now Meta is reportedly planning to launch this maybe as early as this year. It connects right back

13:04

to that rent -a -human story, doesn't it? The boundary between your digital data and your physical presence is just dissolving. You can't even walk down the street anonymously anymore. In 2026, anonymity is quickly becoming a luxury good. Before we move on, I have to ask. Of these stories, the cheating, the drones, the face scanning, which one feels the most dystopian to you? The drone swarms automated warfare is a genie you

13:27

can't put back in the bottle. I agree. But there's a practical constraint to all those swarms, right? You can't put a supercomputer on a tiny drone. It would drain the battery in five minutes. Exactly. And that is the perfect bridge to our final segment. Right. We've painted this picture of a world with these massive agents and drone swarms, but the bottleneck to all of this is energy. It's cost. Running a massive model like QAN 3 .5 or GPT -5 for every single step of a task is incredibly

13:57

expensive. We call it inference cost. So if I have an agent trying to write a software program and it takes, say, 50 steps and every step costs a dollar. That adds up fast. It adds up incredibly fast. It makes autonomous agents economically unviable for most businesses. Yeah. But there's a new paper out called Adapt Evolve that proposes this fascinating solution. And it cuts compute costs by 40%. 40%. And it's not by just making the model smaller. It's much smarter than that.

14:23

It's based on a tiered system. Okay. Think of it like a law firm. You don't pay the senior partner $1 ,000 an hour to proofread a memo. You give that to the junior associate. Okay, so the junior associate AI takes the first crack at the task. Right. In this system, they start every single step with a smaller, cheaper model. A four billion parameter model. But small models are kind of dumb. They make mistakes. That seems risky for, say, a drone or a coding bot. They

14:50

do. But here is the breakthrough. While that small model is generating its response, the system is measuring its confidence in real time. Okay, pause on that. How does a machine have confidence? It's not a person. It doesn't have feelings. It's just math. When an AI generates a word, it's actually calculating the probability of that word versus every other word it knows. So if it says the sky is, and the probability for the word blue is 99 .9%, that's high confidence.

15:19

And if it's wavering. If the probability is flat, if it thinks blue, gray, and green are all equally likely, the system detects that wobble. It knows the model is unsure. So it's like a lie detector for the AI's own brain? Exactly. And if the system detects that low confidence, it immediately stops the junior associate and it escalates the task to the senior partner, the big, expensive 32 billion parameter model. And if the junior associate is doing just fine... It keeps the cheap result

15:46

and moves on. Whoa! That's surprisingly intuitive. It's like knowing when to raise your hand and ask for help. That self -awareness is really elegant. That self -awareness is the key to efficiency. And cutting costs by 40 % means that these long -running autonomous systems finally start to make financial sense. Right. If you are running those drone swarms we talked about or the digital twin Simile is building, you can't afford to

16:11

be genius -level smart 100 % of the time. You only need to be smart when it's absolutely necessary. So laziness or efficiency is actually a feature, not a bug. Efficiency is the only way agents survive. We just taught them how to budget their brainpower. I love that concept, budgeting brain power. It's so evolutionary. Nature doesn't use more energy than it needs to. Why should AI? All right, we're going to take a very quick break. When we come back, we're going to tie all this

16:35

together. The agents, the video workflows, and these efficient swarms. Stay with us, mid -rule placeholder. And we are back. Okay, we have covered a massive amount of ground today, from the sheer scale of Quen 3 .5, to the workflow revolution in video, to the ethical messes at KPMG, and finally this adapt -evolve efficiency. What's the through line here? If there's one theme, I think it's the shift from intelligence to agency.

17:06

Break that down for me. For the last few years, we've been obsessed with the question, is the model smart? Now we're asking a different question, can the model act? Quen gives it the tools. Adapt Evolve gives it the budget and the video workflows, they give it the output. We are building the infrastructure for action. It's not about what the AI knows anymore. It's about what it can do and how cheaply it can do it. Precisely. But then you have the human element just crashing

17:28

right into it. That's the KPMG story. That's the rent -a -human story. Right. We're building these incredibly efficient autonomous systems, but the humans involved are still, well, they're still human. We cheat. We pretend. We find loopholes. That remains the most unpredictable variable in the entire equation. You can optimize the compute cost of a drone swarm by 40%, but you can't optimize the ethics of the person who's

17:53

commanding it. That is the truth. So as we move forward into 2026, the technology is stabilizing. The workflows are being defined. The costs are coming down. The question is no longer, can we build it? It is, how do we live with it? Exactly. And that leads me to my final thought for you, the listener, today. I want you to think about that adapt -evolve concept again. The budgeting of brain power. Using high intelligence only

18:17

when it's really necessary. Exactly. We live in this world that demands we are on 100 % of the time. High alert, high productivity, maximum processing power. But maybe, just like these agents, we're burning way too much inference cost. A permission structure for human inefficiency. I like that. Maybe. Perhaps the smartest thing we can do is identify which parts of our day actually need the senior partner and which parts can be handled by the junior associate. Yeah.

18:43

Save your high -level compute for the problems that actually need it. I like that. Efficiency isn't about doing more. It's about thinking less but thinking better. Something to mull over as you start your week. If you enjoyed this deep dive, hit that subscribe button. We've got more coming your way. Always more to learn. Stay curious.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript