#266 Max: The Only 7 Prompts You Need to Create Any AI Video (Stop Wasting Time)

00:00

So the frustration is real. I think a lot of people feel this. You find yourself writing these, I don't know, 500 word novels for an AI video tool. And what you get back is still blurry or inconsistent. It just looks like amateur garbage. You're spending hours on it. You're trying to describe the lighting, the mood, the exact movement. And the model just seems to ignore the most important parts. It's exhausting. Stop wasting that time. I think we've figured this out. The problem isn't

00:28

that you need more words. It's that you need a stronger foundational framework. The real figure here is mastering seven, just seven simple prompt styles. Welcome to the deep dive. And yeah, that novel you're trying to feed the AI, it's not designed for that. It's looking for structural cues, not, you know, beautiful prose. We've basically boiled down the key workflows from all the top AI filmmakers. And the main insight is this. The best results, they come from frameworks that

00:53

give you directorial control. And these aren't unique to one tool. These seven styles work across, well, everything. Veo, Kling, Runway, Sora 2, Pika. It's about moving from just describing a scene to actually directing it. So our mission today is to really distill this craft, to give you the architecture to stop guessing and start creating high -fidelity, consistent work. Let's

01:16

just jump in. framework one is all about getting that directorial control over the space in your scene we call this one prompt style number one cinematic prompts this is where you use the language of film to tell the ai what to do it's not just what's in the scene but how the virtual camera is seeing it a tiny tweak to the camera movement can completely change the whole psychological impact of the clip yeah think about it like passive versus active observation if your prompt is just

01:44

uh an artist in a studio, the model picks the perspective. It's almost always flat and boring. Right. But if you start using intentional camera language, you change the whole experience. Yeah. It's contrast to static shot where the camera's just locked down. That gives you a feeling of stillness, maybe contemplation. With a rotating zoom in. Exactly. That rotating zoom instantly tells you something's changing, right? It builds suspense or maybe a really intimate emotional

02:09

connection. The model knows what these moves mean. And for dynamic movement, you have to be really specific. A tracking shot that runs alongside someone, that creates a sense of flow, of progression. But if you swap that for a handheld drift. Oh, that little wobble. Yeah, that slight instability. It injects this human tension, this realism, that a perfect, smooth tracking shot. just doesn't have. And we've seen people get really good with

02:35

the vertical element, too. A vertical tilt up, it can suggest something huge, ambitious, maybe even overwhelming. A tilt down, on the other hand, it immediately suggests fatigue or someone searching for something. It's psychological control with just two words. And the pro move, I think, is combining these. You can actually string them together. You can prompt. The camera starts with a slow tracking shot from behind. Then it smoothly tilts up to their face and then it circles around

02:59

to reveal the city below. That's a whole whole dynamic sequence built into one cohesive prompt. So how does controlling the camera angle fundamentally change the meaning? It just changes how the audience experiences the scene completely. So we've got control over space. Now we need to talk about control over time. And that brings us to style number two, timestamp prompts. This is so important.

03:24

I've tried to do this without timestamps. If you try to script a sequence, like a character walks in, finds a key, then reacts to a sound. If you put that in one paragraph. It's a mess. The AI just dumbles the timing. It might show the reaction before the key is even found. Or it merges everything into this one chaotic simultaneous moment. So timestamp prompts, they basically

03:46

turn you into an editor. You're dictating the timeline second by second, and you segment your video into these exact blocks, and you force the model to follow a precise sequence. It's how you choreograph something complex. Let's use an example, like an eight -second video of a barista. Right. You wouldn't write one block of text. You'd break it down. Zero three seconds. Slow pan across the empty coffee shop, stopping exactly on the barista. Then you force the shift.

04:08

Three to five seconds. Camera pushes in close as the barista pours steaming milk. Very focused. And the final beat. Five to eight seconds. Camera tilts up to the barista's face as they pause, exhale, and look out a rain -streaked window. Does this method prevent that chaotic randomness we were talking about? Yes, it dictates the timeline and the sequence precisely. No more guesswork. Okay, that makes sense. But what if you need more than just one smooth shot? What if you need

04:38

dramatic visual variety? That leads us to style number three. cutscene prompts. Right. This technique lets you script actual cuts like you're in an editing bay. If timestamp prompts control when things happen, cutscene prompts dictate a hard instant change in the camera angle mid -video. And the trigger is usually super simple. Something like the phrase cut to or new shot. So you could have a wide shot of a street photographer walking,

05:00

you see their whole body. And then you dictate, cut to a close -up, a macro shot, focusing on their fingers, pressing the shutter. It's that immediate shift that builds impact. But the best way, the pro workflow, is combining these two. You'd write zero threes, wide shot, three fives, cut to a close -up of the eye and the viewfinder, five eights. Cut back to a medium shot as they walk away. That's multi -angle storytelling in one prompt block. Now here's a crucial warning.

05:28

Don't, and I mean don't, cut to radically different visual styles. Oh yeah, I've seen this happen. If you go from, say, a photorealistic street scene to like a 3D anime character. The AI just freaks out. It tries to smear the two styles together and you get this incoherent mess. You have to maintain visual consistency. So what's the biggest risk when using the cutscene technique? Cutting between radically different visual styles, it breaks the AI's brain. Okay, so we've directed

05:55

the camera and the clock. Now let's get some help writing the script itself. This is prompt style number four, GPT prompts. We should be using AI to help us write for another AI. The concept is, well, it's pretty elegant. You create your own specialized prompt helper. You take a large language model like a custom GPT, and you feed it the official documentation for your video tool. So you give it all of Runway's rules and syntax guidelines. Exactly. And the job description

06:19

you give it is simple. You are an expert prompt writer for Runway. I give you a scene. You generate an optimized prompt with all the right language. It's meant to save you hours of manual formatting. So the GBT spits out this technically perfect, beautifully structured prompt. It follows all the rules and it sounds like the perfect solution. But, and I'll admit this, I still get tripped up here. I wrestle with a prompt drift myself when I lean on these helpers too much. There's

06:47

a deadly blind spot. And that blind spot is the visual rendering capability of the actual video model. The GPT knows language, but it doesn't know what the video AI is currently, you know, bad at rendering. Like crowds. If you ask the GPT for an angry mob of a thousand people shouting. It'll write you a beautiful detailed prompt. But when you feed that to the video AI. Blobby robots every time. Because rendering hundreds of distinct moving individuals is one of the

07:13

hardest things for these models right now. The human fix is so important here. You have to simplify the action but keep the emotion. So instead of a shouting mob, you change it to silent, still townsfolk watching with growing, ominous unease. It simplifies the geometry but keeps the tension. So why is human judgment still necessary, even with a prompt helper? AI generates technically perfect prompts, but lacks awareness of what it can render visually. Okay, that makes total

07:41

sense. Now let's talk about making sure our scenes stay the way we designed them. Which brings us to prompt style number five, anchor prompts. This is your consistency insurance. AI loves to hallucinate away details. The things you care about most, like a unique scar on your hero's face. Or the color of a prop. It can just disappear. Or change from one shot to the next. Your protagonist suddenly looks like a different person. Anchors

08:06

are the fix. they are specific repetitive phrases you use to lock down those critical details physical appearance like scars or wrinkles or even more important spatial relationships there was a great example of this an orc warrior riding a dire wolf right without an anchor you get the shot back and the orc is just floating like three inches above the wolf's back as it runs it just breaks the whole illusion so the anchor is a phrase you add and repeat the orc is securely

08:32

seated on the back of the dire wolf Locked in position. You're just forcing the AI to maintain that connection across the whole movement. And they're also great for off -screen details. If your warrior has armor only on their left shoulder, you anchor that detail. So when they turn and you see them from the other side in the next shot, the AI remembers the armor is still there. So are anchor prompts essentially just forcing visual consistency? They are an insurance policy

08:57

against the AI assuming details disappear. Perfect. So if anchors handle consistency, then style number six, image prompts, is all about defining the style of that consistency. This is so key. Image to video workflows are absolutely essential for professional output. Text describes motion. The image defines the style. Trying to describe a really specific visual look with a just text -like hyper -detailed cinematic photorealism with a 1980s neon color palette is so hard to

09:27

do consistently. Yeah, you'll get 10 different versions of that. The professional workflow is this. You generate a high -quality base image first. Use something like Midjourney or Dully. That image establishes the exact composition, the lighting, the style you want. Then you just feed that image into your video tool, like Clang or Runway, and use a simple text prompt to describe the motion. Something like, the character slowly

09:52

raises their head. The model keeps the style from the image, and the text just handles the action. And this is the secret for consistent characters. You don't try to generate a 30 -second video of your character walking and talking. You generate a perfect, high -fidelity -based image of them first. Then you use an image tool to generate variations, a side profile of them

10:12

sitting, an action pose. And then you animate each of those images separately, so you end up with five different perfectly consistent shots of the same person. So which is more critical for setting the visual tone? The image or the text? The image defines the style, and the text describes the motion. Simple as that. Okay, last one. Prompt style number seven, negative plumps. This is often the easiest and fastest tool for

10:36

a quick surgical correction. Yeah, instead of adding more instructions, you just tell the AI what you don't want. There's super targeted corrections for when the AI makes some weird, unwanted assumption. Like if you're generating a futuristic city and the AI keeps adding flying cars and drones because that's what it thinks future means. Right, you just add a negative prompt, no flying vehicles, no drones, no traffic in the sky, and it just cleans the scene right up. They're also really

11:01

good for sound control. A lot of these models are starting to add their own sound design now, which is terrible if you want to do professional audio in post. Exactly. So if you're generating a meditating monk and the AI adds, I don't know, dramatic chanting. You just add total silence, no chanting, no music, no ambient sound, and you get a clean slate for your own sound design. So when should negative comps be your first instinct? When the AI's default assumptions conflict sharply

11:28

with your vision. Okay, that covers all seven styles. But these aren't just individual tricks. The real power is when you see how they all stack together. Yeah, let's do a quick recap of the function for each one. Cinematic. That's controlling the camera. I'm stamp rag. Choreographing your sequences. Cutscene. That's for visual variety. GP key prompts. Speeding up the writing, but with human oversight. Anchor prompts. Locking in all your details. Image prompts. That establishes

11:55

your style and consistency. And finally... Negative prompts. That's for surgical removal. And when you look at how a 30 -second short film is built, you see this integrated approach. You don't start from scratch. You start with control. Right. A single shot might begin with a strong image prompt to set the aesthetic. Then you layer an anchor prompt on top to lock down a crucial detail like ash on the character's armor. And you finish it with a negative prompt to block any distracting

12:23

music that AI might want to add. And then you move to the next shot where you dictate the timing with a timestamp and you integrate a cutscene halfway through to jump to a close -up for dramatic pacing. This whole framework just moves you completely beyond guessing. It's repeatable creative control. This is what separates hobbyist work from professional asset creation. And when you think about that,

12:46

whoa. I mean, imagine scaling that level of precise directorial control across hundreds of assets. For a full -length project, the efficiency gain is just massive. That's the main takeaway here. Prompting is a craft. It needs a directorial structure. And the future of creative video production, it's going to depend entirely on people who understand how to direct the AI, not just use it like a magic button. These seven styles are the foundational techniques that every pro -AI filmmaker is using

13:16

right now. Master these, and you will fundamentally change the quality and consistency of everything you make. So now that you know how to truly direct, what is the first impossible shot you were going to create, something that was visually out of your reach before this? We look forward to seeing what you build.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript