#99 Neil: Learn to Engineer Viral AI Videos With This VEO 3 Method

00:00

have you ever been scrolling you know maybe late at night and you just stop you see this ai video it's kind of astonishing uh maybe it's glass breaking with these impossible sounds or like epic cinematic shot millions of views and you just think how like what's the secret there if you've felt that that sort of invisible wall between your ideas and getting that high quality viral AI stuff out there well you're not alone today that wall starts coming down welcome to

00:26

the deep dive our mission today is to give you a detailed map really a clear compass for mastering AI video generation we're going to dig into Google VEO3 it's a breakthrough tool and it's really reshaping how we think about creating video Yeah, absolutely. And we're jumping right in. We'll unpack what VEO3 actually is, how it works, fundamentally. And we'll shift to thinking like an AI director

00:48

when you prompt. Plus, we're going to lay out some proven viral video formats you can use, like right away we'll share advanced techniques, talk post -production polish, and touch on the ethics and the current limits. It's the whole journey, basically. Idea to a viral potential. It really is about moving past just that, wow,

01:04

how did they do that? Moment to actually understanding the craft behind these things exactly and to really get VEOs power you gotta understand It's way more than just text -to -video VEOs this advanced AI It has like deep contextual understanding. It doesn't just read words It tries to figure out your intent the emotion the whole aesthetic you're going for what makes VEO 3 feel like a revolution from what we're seeing is Couple of key things, first, that native synchronized audio,

01:32

that's a game changer, seriously. Other tools, you're kind of scrambling for sound afterwards. VEO can generate the whole soundscape right from the prompt. Dialogue with pretty decent lip sync, sound effects, footsteps, glass shattering, ambient noise, wind, traffic, all synced up. It's a huge leap forward. And it's not just sound, is it? It almost feels like it understands physics and film language. It does. It's got this intuitive physics simulation, plus what we're calling cinematic

01:57

literacy. VEO. gets how the world works. Water flows, fabric billows, things have weight. But what's maybe more fascinating, it speaks filmmaker. You can ask for a time lapse, an aerial shot, dolly zoom, and it actually translates those into proper camera moves. It's not just making video, it's kind of directing it. So that really demands a whole different way of thinking then. This isn't just typing a few words and, you know, crossing your fingers. You have to come at it

02:23

with this AI director mindset. Every prompt isn't just a request. It's a clear directive to a virtual crew. You're the director, screenwriter, sound designer, all rolled into one. It's like VEO is just waiting for you to command it properly. So that deep contextual understanding, what does that actually look like when we're trying to create? Does it really grasp what we want beyond just the key words? It interprets the intent, the emotion, and the aesthetic you want to convey.

02:50

That deeper understanding. Yeah, that makes the whole prompting philosophy just incredibly important. Oh, absolutely. In generative AI, the prompt is, well, it's everything. A good prompt is like the bridge from your imagination to what the AI can actually make. Think of it like a really detailed blueprint, a precise directive for that virtual crew we talked about. Got to be clear, layered, no room for guesswork. And one really effective way to build those tromps is what we're

03:13

calling the layering technique. It's kind of like an artist sketching first, then adding the detail and color. So layer one, the foundation, that's your shot type, your aesthetic. The framework, perspective, style, the mood. So maybe you start with cinematic wide shot, epic fantasy aesthetic. Then layer two, the setting, build the world. Where's it happening? More detail, more believable video. So adding on. In a sprawling ancient forest at twilight, mist clings to the moss -covered

03:43

trees. Okay, then layer three. Subject in action. Put your character or object in there. Give them something to do, a purpose. Continuing that example. A lone elven warrior with glowing silver armor kneels by a stream, dipping her hands into the water. Right. And finally, layer four. The fine details. These are the final touches. Give it soul. Lighting, textures, and importantly, the specific sounds. Finishing it off. The water

04:07

ripples around her fingers. The only sounds of the gentle gurgle of the stream and the chirping of unseen insects. Build it like that layer by layer and you've got a really comprehensive directive for VEO. And it's not just the structure, right? It's the words themselves, the power of words. Your word choice makes a huge difference. The AI picks up on nuance, adjectives, verbs surprisingly

04:31

well. Think about like, a man walks down the street, okay, vague, versus a weary old man shuffles down a rain -slick, neon -lit cobblestone alley, that second one. It tells a story, gives you a mood, a setting, a character. Use strong verbs, sensory adjectives. Yeah, definitely. And it really helps to practice thinking in frames. Before you even type, like seriously, close your eyes, visualize the shot, ask those director questions, where's the camera, moving, static,

04:54

what lens, light source. Once you have that clear image translating into detailed language for the AI, much easier, much more effective. That's the real difference, isn't it? Between just getting something random back and intentionally creating an artistic vision. So to really nail that vision, why is visualizing the shot first so vital? It helps translate your clear image into effective detailed language for the AI. OK, now this is

05:19

the core of today's deep dive. We're going to break down some video formats that, well, they've proven to have a serious viral potential online. For each one, we'll look at the what, the how, and crucially, the why. the psychology, and how VEO can nail them. All right, let's jump in. First one. Unrealistic object ASMR videos. They seem absolutely everywhere. They do. And they hit two things our brains just love. ASMR, that tingly feeling from certain sounds, and our fascination

05:43

with the absurd. See, our brains know physics. So when VEO shows like a croissant, but made of obsidian, and then shatters it, it's this weird, safe violation of reality. You can't look away. VEO's physics engine lets us play that trick for the prompt. Camera angle, object material, the main destructive action, how it interacts with the environment, and that ASMR soundscape. Think of Siddy and Croissant crumbling sharp cracks, or a ruby pomegranate spilling cladowing

06:09

gems. Pitfall, though. Keep the action simple in an eight -second shot. One decisive thing works best. Too much gets messy. Okay, next. Selfie -style vlogger videos. There's something really engaging about that format. Totally. Humans connect, right? This format taps into that parasocial thing. Makes it feel authentic, like a direct chat. And the humor comes from the contrast. Put an amazing character in a mundane spot. Imagine an astronaut in a supermarket, totally baffled

06:33

by avocados. Or a Viking king versus a vending machine, you know, yelling for his crisps. The EO's good at those subtle character reactions, prompt -wise. Declare selfie style, describe the character in detail, add self -film touches, modern setting, action dialogue showing conflict, ambient audio. Just keep dialogue short, like under 10, 15 words. Long speeches tend to confuse the AI. And then satirical street interviews. What's the magic there? Subverting expectations.

07:00

We know the news interview format. So when the content is suddenly absurd, Hilarious! It lets you do satire or just pure nonsense. Viewers like feeling they're in on the joke. VEO can create those realistic backdrops and expressions. Think the invisible pet show, people grooming thin air, or a 1920s interview about the shocking new trend of sliced bread. Hello. Your prompt needs the format, shot setup, absurd backstory, interviewee details, and a short Q &A script.

07:26

Key thing. Keep the premise simple. The humor's in the basic absurdity. So what about other formats that use similar ideas, like incongruity or just strong visuals? I'm thinking news reports with a twist or even those really cinematic videos. Well, yeah, absolutely. They tap into similar core psychology. Take the news report with a twist. That's all about comedic timing and incongruity. Our brain focuses on the person talking. Right?

07:48

So when something crazy happens in the background, unnoticed by them, it shatters expectations. Serious foreground, chaotic background, classic visual comedy. Like a construction safety expert talking confidently while someone perfectly falls into wet cement behind him. Or an environmental activist giving a speech as the wind blows all his fliers away in a clean park. VEO handles those complex scenes pretty well. Then totally different vibe. Cinematic style videos. That's

08:15

about emotion. beauty, grandeur, visual storytelling. We don't just see it, we feel it. The high production value VEO gives feels professional. Picture a sci -fi scene, Blade Runner vibe, maybe an android with one tear, slow crane reveals a future city, or the lone samurai, Kurosawa style, drawing his sword on a cliff. VEO gets those complex camera moves and lighting. We also see huge engagement

08:37

with stuff like the animal Olympics. That's pure anthropomorphism, giving animals human traits, especially in a serious context like the Olympics. Instant comedy. We laugh at a grizzly bear weightlifting or Siamese cat gymnastics, but we kind of root for them too. And for controlled chaos fans, satisfying object destruction. Our brains like patterns, but also get satisfaction from controlled destruction. Slow motion, which VEO does okay,

09:01

shows off the physics details. Watching something go from order to chaos predictably, it's almost meditative. Think ink. Blooming in milk, creating those cool fractal patterns. Or a time lapse of a melting ice car. And finally, maybe a personal favorite. Historical figures in modern situations. Classic fish out of water. Anachronism. Humor from the time clash. Works best when their reaction fits their personality or achievements. Think Nikola Tesla at the Apple Store, annoyed by cat

09:27

pics instead of wireless power. Or Marie Curie tries an energy drink and discovers some new energy. OK, so if you pull all those different formats together. What's the common thread? What's the core idea uniting their success? They all tap into universal human psychology for vital appeal and engagement. Alright, we talked about making great -looking videos, understanding the

09:47

psychology. Now let's talk about control. Controlling consistently, efficiently, moving beyond just looks good to making it predictable, repeatable. Yeah, this is super important if you're doing a series or telling a story over multiple shots. If your character suddenly looks different, it just breaks everything. The fix is what we call the character sheet, not just a description. It's a standardized block of text, your character's

10:10

DNA. Needs extreme detail. Age, build, hair, eyes, skin, signature outfit gets specific on fabric, color, style. And defining features. Scar, tattoo, mold, maybe a specific walk, an accessory, whatever makes them unique. I still wrestle with punch drift myself when I try to maintain character consistency across scenes. It's tougher than it looks. It's amazing how just changing black leather jacket to black jacket middle leather can mess things up. So this character

10:35

sheet... It sounds essential, but how do we make it really work every time? Precision. Total precision. You write the sheet once, maybe in a text editor. Then, for every single prompt with that character, you copy and paste the exact block of text. 100 % accurate. Beginning of the prompt. Even a comma changes things sometimes. Beyond characters, there's mastering the camera. For really impressive shots, speak cinema language. Think Dolly add

11:01

camera moves closer or further. Crane shot, smooth vertical moves, tracking shot, moves alongside the subject. Rack focus, shifts focus within the shot. Dutch angle, tilted camera for unease. He is describing the camera's journey and not just one word. Don't just say, Dolly in, try. Camera starts close on hands, assembling a device, tilt up smoothly to their face. Then Dolly's out to show them alone in a vast workshop. More nuanced. And sound. Sounds huge. VEO lets you

11:26

direct that, too. It's important to know the difference between diegetic and non -diegetic, right? Exactly. Diegetic is what characters hear dialogue, footsteps, rain, part of their world. Non -diegetic is just, for the audience, musical score, narrator, enhances emotion. Okay, finally, credit -saving strategies. AI credits aren't cheap. Got to work smart. Use fast mode if the EO has it for testing ideas, drafts, quick lower res versions, good for iterating before the big

11:51

render. And here's a big pro tip. Before you burn credits on an eight second video, take your core visual description, pop it into an image generator like in Gemini for your cheap, lets you quickly check aesthetic, composition, character design without spending video credits, and always, always finalize prompts in the external text editor, Google Docs, whatever. Avoid accidental generations from typos. great tips there. Which one would you say offers the biggest efficiency

12:17

gain, saves the most time and resources? Testing core visual descriptions with an image generator before costly video generation. So once you've got that VEO output, remember, it's raw material. Post -production is where you polish it, turn it into something professional, ready to go. So the workflow is kind of like, VEO generates the clip, then an upscaler, then an editor, then publish? Pretty much. VEO usually outputs 720p, maybe 1080p. For top quality, you want to upscale

12:43

to 4K. AI upscalers like Topaz Video AI are great. They don't just stretch pixels, they intelligently paint in details, denoise, sharpen, looks natural. Then, editing and color grading. DaVinci Resolve is amazing, powerful free version, great for color. CapCut's good too, beginner -friendly mobile desktop. Color grading is vital. Adjust colors for mood, cool blues for horror, warm oranges for sunset. And even with VEO's audio, adding extra layers in post makes it more immersive.

13:10

background music, extra sound effects, a whoosh for movement, stuff like that. And then reformatting. That's a big one for social media. Video is horizontal, 16 .9. TikTok reels need vertical, 9 .16. Cropping just butchers it usually. Yeah, cropping loses too much. The solution. AI video outpainting tools. Runway, Luma Labs have these. They use AI to paint the missing space above and below your horizontal video, creates a seamless vertical frame, keeps your original content intact. It's

13:38

like magic for social media. That is a powerful trick. So thinking about post -production for social, what's the biggest mistake people tend to make? They often just crop horizontal videos losing most of the actual frame. Okay, so with all this incredible creative power, there's also... responsibility, especially now in the AI era. Oh, definitely. We have to talk about misinformation, deep fakes. It's a real risk. This tech can be misused, fake news, deceptive videos, harmful

14:05

stuff. As creators, we kind of have to commit to using it for art, entertainment, positive storytelling, never to deceive or harm. And transparency is key. Always tell your audience it's AI generated. Hashtags like hashtag AI generated, hashtag made with VEO, or just a little text overlay builds trust. Helps keep the whole AI creative space healthy. And VO3, even as powerful as it is, it still has limits, right? It's not perfect yet. No tech is. VEO still struggles with hands

14:31

sometimes. Extra fingers, missing fingers, it happens. Physics simulation is good, but complex, illogical stuff can still look weird. And crucially, text in video. It just can't do legible text. It comes out garbled. Workarounds for hands, use gloves, make fists, hide them, frame them out. For physics, keep actions simpler, plausible. For text, always add it in post using your editor. Don't even ask VEO to generate it. So connecting

14:56

this back to the bigger picture. What would you say is the most significant ethical concern with this tech right now? The potential for misinformation and those deceptive deep fake videos. We've covered a lot of ground in this deep dive from really understanding VEOs power, the prompting philosophy, to viral formats, advanced techniques and putting together a whole strategy. Yeah. And remember, the text is just a tool. The real power is your storytelling ability, the detail in your directives.

15:22

A well -crafted prompt makes art, not just random outputs. And things are moving so fast. Whoa! I mean, imagine the future. Perfect character consistency across dozens of scenes, real -time video generation. The creator's role is really shifting, isn't it? More like a true AI director, a narrative designer telling the AI exactly what complex idea to build. You've got the map now. You've got the toolkit. It's really your turn to explore, push the limits, become a creator

15:48

shaping what's next. Pick a format we talked about. Make it yours. Make that first video. Don't worry about failing. Every generation teaches you something. The future of storytelling is here, and it's literally in your hands. Now go create.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript