#199 Neil: This 3-Tool System Makes 20+ AI Avatars For Almost Nothing

00:00

So if you're trying to scale video content today, you've probably run head first into this conflict. You need these professional, realistic AI avatars, maybe for marketing, maybe explainer videos, but... Every platform seems like every single one traps you pretty quickly. Yeah, it's the

00:20

extensive subscriptions, right? Yeah, and the the credit limits they get you every time you feel kind of locked in exactly and you realize Wow, if I want to make say 50 videos this month, you're gonna need a second mortgage Yeah, it's nuts. But you know the core idea from the sources you send over it's actually brilliant we can like Completely sidestep that whole system right and the promise here. It isn't just about getting

00:41

better avatars. It's about getting Well, unlimited creative freedom and really cutting down the costs. That's it. So today we're diving into this custom three tool blueprint. The idea is it turns this expensive limited process into something faster, like an asset creation machine. Yeah, that's the mission today. We're going to

00:59

unpack this workflow. It really focuses on consistency first using a tool called NanoBanana, then smart scripting using ChatGPT, and then finally bringing in the heavy hitter for realism, that's Hagen, but specifically with Google's VEO 3 .1 model integrated. All right, let's unpack this then, starting with the problem, the bottleneck, financially and creatively. Why does just using one platform, let's say, sticking only with Hagen, why does that fail if your goal is serious scaled content?

01:30

Well, the single tool problem, it's like you're paying the vendor for two really different things at the same time. Okay. You're paying them for the creation part, which is actually pretty costly getting the avatar just right, the outfit, the look. And then you're also paying them for the final video render, that creation part. It just burns through credits like crazy. And I guess if you need just a small change, like the avatar needs a blue shirt today instead of yesterday's

01:51

gray one. Exactly. You're basically paying for creation all over again. You're stuck with their options, their internal library. Any change means another expensive credit burn. The whole trick is to sort of decouple those two steps. Ah, so the solution is don't start building the avatar from scratch inside the expensive video app. You pre -make it, pre -customize it somewhere else first. Yeah, use cheaper tools or even free ones to get the look right. Then you just import

02:16

the finished asset. It's a shift in where the cost happens. It's just smart economics, really. And that brings us right to the three tools. NanoBanana is the base, the foundation. It makes that consistent face, but lets you change everything else. Clothes, backgrounds, emotions. It locks the identity. Okay, tool one, identity lock. Now tool two is ChatGPT. You call it the strategist. Or like the creative director. It helps you generate the perfect detailed instructions, the props

02:44

for NanoBanana. And crucially, it helps you write conversational scripts. Scripts that actually sound human. which is harder than it sounds. And the final piece, the heavy lifting for the video itself, that's left to Hagen. Yep, hey Jen, but only for the final video render. You feed it that pre -made image from NanoBanana, and VEO 3 .1 turns that static picture into this really lifelike video. But you skip the big creation fee inside Hagen itself. OK, this makes sense.

03:11

The whole system seems to rely on separating that cost. So why is the order of using these three tools so critical? Why does it save money and give more freedom? The order saves money because creation happens before expensive video rendering costs are applied, right? You front load the free creation work. Got it. So step one then nano banana you called it magic What exactly is this thing and how does it fix that big avatar problem the face changing all the

03:36

time? So nano banana, it's a pretty powerful Google image model right now You can access it for free inside Google AI studio, which is like their testing area Okay, and it's killer feature.

03:45

The thing that makes it special is character consistency character consistency That sounds a bit like jargon maybe but it's the fix for that image drift right where the face subtly changes exactly It means keeping the person's face, their core identity, precisely the same, even when you change the outfit, the background, the lighting, maybe even the expression slightly. This is the big problem with other tools like, say, DLA or Mid Journey. The face tends to wander

04:11

with each new prompt. You know, I still wrestle with prompt drift myself sometimes, especially when I try to keep things consistent across different scenes, adding more details about the background or whatever. That subtle face change, it's tough to nail down. Yeah, it's a real frustration. And it just shows why starting simple is usually better. You upload a base image, a good clear headshot works best. That acts as the identity lock. Then you just start making variations,

04:37

unlimited variations, really. Can we run through a couple of examples? Like, how different can the variations be using the same face? For sure. Take a simple headshot. Example one, you prompt. Change the shirt to a dark red turtleneck sweater. Put him in a cozy coffee shop with dim lights. Boom. Same face. Totally new context. Takes like 15 seconds. And what if I need that same person, that same face, but for like a professional pitch video? Easy. Same face, new prompt. Change the

05:03

outfit to a sharp gray suit and tie. The background is a modern office skyscraper lobby. Bright window light, done. You could honestly generate 50 versions for 50 different uses in maybe an hour. The source has also mentioned trying to avoid that kind of generic, too polished AI look. What are the best practices for prompts to get more realism with Nano Banana? OK, yeah, good point. Be specific with the details. Like, don't just say jacket,

05:25

say black leather jacket. And. Crucially, you have to proactively add phrases about realism. Things like natural imperfections or subtle skin texture. Oh, OK. And definitely avoid telling the AI to make it flawless or perfect. That's like a direct ticket to the uncanny valley. It looks weird. Right. So if you had to pick just one prompt phrase, what's the most important one for fighting that super smooth, uncanny AI

05:50

look? Adding subtle texture details like film grain combats the uncanny to perfect AI look. Film grain. interesting okay moving to step two Chad GPT we're using it for more than just writing the script right you said like the architect yeah exactly we use it to build the characters do it okay scaffolding let's call it like what's their persona is this avatar a serious finance analyst or I don't know it chill fitness guru. ChatGPT helps define their style, maybe common

06:17

phrases they'd use. It makes them feel more believable than just some random face. And it also helps engineer the prompts for Nano Banana. Massively speeds things up. You can tell ChatGPT, hey, I need five detailed prompt ideas for professional tennis coach avatar. Give me different settings. And boom, it structures them for you. On the court, in the locker room, doing a practice drill, maybe an interview setup, you're not staring at a blank page trying to think of variations

06:41

every single time. This is also where we tackle that robot voice problem you hear in so many AI videos. Why do they often sound so stiff? Well, usually it's because people write the scripts like they're writing a formal email. Yeah. All proper and buttoned up. Right. We have to specifically tell chat GPT, use a casual tone, use the first person, make the avatar say I. and really aim for a conversational flow, like how people actually

07:07

talk. And even if ChatGPT drafts the script, those editing tips you mentioned sound, well, non -negotiable. Oh, absolutely critical. Use simple words like use instead of utilize. Keep sentences short, lots of contractions, don't, can't. That's conversational. And the number one test, read the script out loud yourself. If you stumble over the words, or if it sounds formal coming out of your mouth, the AI voice

07:31

is going to sound even more unnatural. So it's not just generating prompts, it's building this reusable character library framework. Beyond the scripting part, how else does ChatGPT really save time in this whole avatar creation process? It dramatically speeds up the process by creating reusable, customizable prompt templates immediately. Got it. Reusable templates, makes sense. Minroll sponsor, Read Placeholder. Welcome back to the

07:58

Deep Dive. Okay, so we've got our consistent custom avatar images made with NanoBanana, and we've planned out a natural conversational script using Chad GPT's help. Now for the final step. actually bringing that avatar to life using VEO 3 .1 inside Heygen. Right, this is where it all comes together. We take that nice clean avatar image we made for free, remember using Nano Banana, and we upload that into Heygen. We're effectively skipping Heygen's own more expensive avatar creation

08:25

step entirely. And this integration of Google's VEO 3 .1 model into Heygen, it's being talked about as a real leap forward in realism. What specifically makes VEO 3 .1 technically better than the older models? It really comes down to about three key upgrades. First is just superior realism. It renders in sharp 1080p, yeah. But the big thing is it uses a much larger understanding of context to figure out lighting and textures. So you get these hyper -natural shadows, subtle

08:55

skin details. It looks much more lifelike. Okay, better textures and light. What's second? Second is Perfect Sync. The audio track and the video, the lip movements, they're generated together as one unit. This pretty much eliminates that weird robotic mouth movement or the slight delay you used to see all the time. Yeah, that lip sync issue was always the dead giveaway you were watching an AI video. It totally was. And the

09:16

third thing is branding consistency. Because we're starting with that locked nano -bena image, VDO 3 .1 does a much better job of maintaining that exact look across lots of separate videos, which is huge if you're doing, say, a marketing campaign and need the same spokesperson intended. different ads. That claim, perfect sync, sounds almost too good to be true. Are there edge cases where VEO 3 .1 might still have trouble, like really fast talking or strong accents? It's definitely

09:41

excellent VEO 3 .1 is. But yeah, extremely rapid speech or maybe some very distinct localized accents, those can still occasionally challenge the model a little bit. The safest approach is just to keep the scripts at a pretty normal conversational speaking pace. Whoa. But just imagine scaling this whole system. You could potentially produce, I don't know, thousands of localized video ads

10:03

really quickly, like for global campaigns. You could A -B test different emotional deliveries, different regional languages, all while using the exact same spokesperson's face. That consistency, that's like... Exponential value. It completely changes the ROI calculation for your creative assets, yeah. And because we're only paying Hagen for that final render step, we're skipping the

10:22

most credit -intensive part of the process. OK, on a practical note, what's the longest continuous video clip VEO 3 .1 can currently generate in Hagen before you have to cut and start a new segment? VEO 3 .1 allows for single continuous video segments up to 60 seconds long. 60 seconds. Oh, no. OK. Good to know. Now let's talk troubleshooting. Because when you set up a system like this, multi -tool systems, things inevitably go wrong sometimes. People hit snacks. Instead of just listing problems,

10:53

maybe we can group the fixes. First, what about fixing issues with the image quality coming out of NanoBanana? OK, yeah. If NanoBanana gives you a weird face, like distorted features or something odd, the first fix is usually just simplify your prompt. Don't try to describe the entire universe in one go. Right. Focus on the face first. A young woman. Neutral expression, soft lighting, get that right. Then start adding

11:14

complexity like outfits or backgrounds. And what if you get a great image from Nano Banana, you're happy with it, but then Hagen rejects it when you try to upload. That's almost always resolution. Hagen needs a minimum image size, typically a 24 by 124 pixels. So the fix is simple. Before you upload to Hagen, run your Nano Banana image through a free online upscaler tool. OK, good tip. It just blows the image up to the right size. Don't skip that prep step. It saves headaches.

11:40

All right, now for some of the more advanced stuff, the pro tips from the source material, let's talk about building an avatar team. What's the idea there? Yeah, this is cool for maybe an explainer video series where you want multiple hosts that look like they belong together. The

11:54

key is what the source called. prompt parallelism prompt parallelism yeah basically you create the different characters but you make sure the prompts you use are structurally very similar like use the same lighting style description the same overall camera angle phrasing you only change small details hair color shirt color maybe slight facial structure tweaks they end up looking like they're from the same show or universe ah maintaining stylistic consistency through the

12:21

prompts makes sense you also mentioned combining image elements like taking a hairstyle from one photo and an outfit from another. Yeah, that's leveraging Nano Banana's ability to use reference images. You can point it to one picture and say, use this hairstyle, point to another and say, use this jacket, all while keeping your main avatar's face locked from the original upload. It's pretty powerful for customization. And the last practical fix, dealing with small errors.

12:49

Like if the AI generates weird fingers or strange eyes, but the rest of the image is perfect. Yeah, that's where inpainting comes in. Inpainting basically means you tell the AI, hey, regenerate only this tiny spot right here, like the weird finger or shadow that looks off without messing up the rest of the picture. Tools like Canva or the Photoshop Beta have features for this. It saves you re -rendering the whole thing just

13:11

to fix one little glitch. OK, so when you're building that avatar team and aiming for a cohesive look, what's the single most important element to keep similar across the prompts for the different characters? Keeping the prompts similar ensures a matching look across the entire cast of avatars. Right, the stylistic elements in the prompt. So let's zoom out. What does this whole three -tool blueprint ultimately give someone, a content creator, a marketer? What's the big, so what?

13:37

Well, fundamentally, it delivers efficiency, quality, and adaptability. You're getting genuine cost savings. You're getting effectively unlimited creative variations on your avatars. And you're getting really high visual quality, thanks to that VEO 3 .1 integration. Practically, it means you can easily A -B test different avatar styles. maybe male versus female, casual versus formal. You can do that across different campaigns, different

14:00

regions, different languages, even. So it shifts the avatar from just being a visual in one video to being more like a strategic asset you can deploy in many ways. For creators, maybe they can build a consistent brand with multiple hosts without needing actors or complex licensing. Exactly. It's like building an asset library powerhouse. It really transforms your whole approach. You go from slow, expensive, and limited to being rapid, strategic, and pretty expansive in what

14:25

you can create. You know, the real takeaway for me here isn't just the cost savings, though that's huge. It's the speed, the potential speed of asset creation. The sources suggest once you get comfortable with this flow, you could generate maybe 20 or more unique ready to go avatar images in less than half an hour. Yeah, it totally shifts your focus. You stop worrying about budgeting credits minute by minute, and you start thinking about building this digital factory of potential

14:52

spokespeople. Just imagine having, say, 50 unique Consistent characters ready to deploy. Tailored for any situation you need a serious lawyer avatar. Got it. An enthusiastic team for a different campaign. Got it. Fitness guru, corporate CEO, all sitting in your library ready to be turned into high quality video for fractions of what it used to cost. You could actually start building that library today. The tools are out there. They're mostly accessible. The method seems pretty

15:20

well proven according to the source. Time to start creating. Out T -Row music.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript