#245 Neil: This New Google AI Thinks Before It Draws To Fix Fake Looking Photos

00:00

Have you ever tried to get an AI to make a group photo? Oh, it's a nightmare. It's an absolute disaster. For years, you'd ask for something simple. You know, five friends at a picnic, and you'd get, well, you'd get a thumb where it shouldn't be. Or that one blurry face in the back that looks like it's melting. Exactly. Or just that awful flat lighting that makes everyone look like a plastic doll. AI images were. They were

00:23

fun toys. A novelty. A novelty, for sure. But what if that all just changed almost overnight? Welcome to the deep dive. Today, we are digging into a massive update to Google's image generator. Officially, it's called Gemini 3, maybe Imogen 3. But the community has its own names. It really does. Some are calling it Nano Banana Pro. Which is, I mean, it's just jargon. But it gets at the feeling here. It does. It shows this isn't just a small step. The source material we're

00:51

looking at calls it a giant leap. So our mission today is to unpack what makes this so different. We'll show you how to use it for free right now, and we're even going to share some prompts to get you those stunning results. Yeah, we're going to get past the plastic faces and straight to photorealism. So where do we start? I think we have to start with the physics. The huge jump in realism and why that plastic look. is finally

01:16

a thing of the past. OK. Then we'll get into the AI's memory, which is really where the magic is for creative work. And finally, we'll cover how you can actually use this professionally. Let's do it. So the fundamental problem with the old models, it was always about texture. They could see shapes. They could see colors. But they didn't understand reality. If you ask for five friends on a beach, you just got this flat, cartoonish image. It was never convincing.

01:42

It just felt fake. Totally fake. Yeah. The breakthrough here and what the sources are all pointing to is that Gemini 3 gets nuance. It understands that human skin has pores, imperfections. It's not smooth plastic. Exactly. It knows wood has grain. It processes how light actually works in the real world. And that's the biggest giveaway for an AI image bad lighting. And this update seems to fix that. It pretty much eliminates it. OK, so let's unpack the tech behind this.

02:09

The sources are highlighting, I think, four major improvements that get rid of that flatness. Yeah. And the first one is the big one. Which is that. It actually thinks. It actually thinks. Before, a model would just start drawing pixels immediately. This one, it pauses. It actually takes a moment to plan the composition. Like an artist doing a quick sketch first. Precisely. A wire frame, a thumbnail sketch. it plans before it renders. And that leads directly to the second point,

02:39

which is just the realism. That smooth, plasticky skin is gone. The photos look like they were taken with a high -end camera. A DSLR, for sure. It's a different quality altogether. Then there's the text handling. This is huge. Oh, this has been a problem for years. You'd get a great image, but the sign in the background would be just gibberish. Now, if you want a sign that says coffee shop, it actually spells coffee shop. Which is... critical for any kind of commercial

03:04

work. And that brings us to the last point. Consistency. The holy grail. You can actually keep the same character, the same face, across a bunch of different images. That used to be almost impossible. It was. And that planning phase you mentioned, that's the engine behind all of this. It's what allows for the realism, the good text, everything. Even the AI now thinks. How does that planning phase fundamentally alter the quality compared to the

03:31

older models? It allows for complex cinematic lighting and detailed realism, which were previously impossible. The planning step lets the AI build the physics of the scene first. Exactly. It plans the physics before it draws the picture. So how do we actually get our hands on this? Is it, you know, some special software? Not at all. It's available right now in the browser. Just

03:53

on Google Gemini? Yep. You just have to look for the Gemini advanced model, or sometimes you'll see a little label that says thinking with three. That's how you know you're using the new engine. And it's free. There are limits for free users, usually about three to five high quality images a day. And there's a catch, right? The watermark. Right. All the images have an invisible digital watermark. You can't see it, but it's embedded in the file's data. Which is for traceability,

04:18

I assume. to fight deepfakes. That's the idea. It's Google's way of creating a digital signature, so you can always check if an image was AI -generated. For serious users, that's actually a good thing. It builds trust. Okay, that makes sense. Now, let's talk about the feature that everyone is

04:35

buzzing about, this. continuous context editing this is the one I mean we've all been there I I still wrestle with prompt drift myself you get the perfect character the perfect face and then you want to change one tiny thing exactly you'd say okay now make the cat wear a hat and poof he would generate a completely different cat you'd lose the one you liked the AI had zero memory zero this new model it remembers what it just created It holds onto the context from

05:01

one prompt to the next. For storyboarding or any sequential work, it's a game changer. I love the coffee shop experiment they walk through in the source material. It shows this perfectly. Yeah, that's a great example. So the first prompt sets the scene. Something like, create a realistic cinematic photo of a young male barista in a green apron. Warm lighting. 16 .9 aspect ratio, and you get a great image. Right, your starting point. But then the magic word is just next.

05:30

You just type next. And what happens? You get a new angle of the same barista and the same cafe. The face is the same, the apron, the lighting. It's all consistent. It feels like a second shot from the same photo shoot. It's incredible. Then you can change the action. You prompt, now show him serving coffee to a customer. Yeah. And it keeps the barista identical, but just changes what he's doing. We've never had that level of

05:51

control. So beyond just making a few sequential shots, what is the core implication of retaining that same character across, say, hundreds of requests? Businesses can now create reliable brand mascots and models without needing expensive photo shoots. You're creating a persistent digital asset you can use over and over. Exactly. OK, let's move from the how to the what, specifically what to type. If you're still just prompting woman in Japan street, you're going to get flat.

06:19

generic results. You need the ProPrompt Blueprint. The sources are really clear on this. The best prompts use terms from photography and physics. Let's do a comparison. Take a travel blogger. Bad prompt is just subject and location. Right. A ProPrompt gets specific. It uses phrases like candid realistic medium shot. That controls the framing. And lighting. It would specify golden hour lighting. Exactly. And my favorite one? Background is slightly blurry. Okay, that instantly

06:51

adds depth and makes it look professional. You're basically telling the AI which camera lens to use. Okay, what about a food ad instead of just picture of a burger? You use sensory details, professional food photography. You tell the cheese is melting down the side. You can get that specific. Oh, yeah, and scheme is rising from the patty. Then you control the lighting. Dark, moody background with a spotlight. Whoa, imagine generating hundreds

07:15

of perfect product shots an hour. where you can control the exact moment the cheese melts, that's profound. It's a massive shift. And you see the same thing with a sneaker ad. The prop isn't just sneaker, it's dynamic action shot. And the camera angle. Low angle view, ground level. And this is the craziest part. Water droplets are frozen in midair. You're directing physics. So let's talk about where this is already being used. The sources list a few key business applications.

07:44

Real estate agents are a big one. Right. They can upload a photo of an empty room and just tell the AI to furnish it, add a modern gray sofa, but keep the windows and walls the same. It's instant digital staging. Saves a ton of money. And for e -commerce, you can take a simple photo of your product, say a candle on your kitchen table. You prompt, place this candle on a wooden spa shelf, soft lighting, and suddenly your basic photo looks like a high -end ad. And the last

08:10

one was for event planners. Yeah, this one's cool. They can upload a napkin sketch of a wedding layout. A literal napkin sketch. A photo of it. And prompt, turn this sketch into a realistic photo of a wedding reception with flowers and gold lights. It goes from a vague idea to a photorealistic concept in seconds. OK, so considering the quality we're talking about, especially for those food ads, is this tool already replacing entry -level

08:35

commercial photographers? For initial mock -ups and marketing visuals, yes, the speed and cost savings are enormous. So it's not killing the creative vision of the artist, but it's automating the technical execution. Exactly. It's democratizing the ability to create really persuasive visual content. Okay, so based on all the tests in the source material, there are four. Let's call them secrets, to getting truly professional results. A master tips. Master tips. And tip number one

09:05

is the face first rule. Okay, what's that? If you're combining things, like a specific person with a car and a dog, always upload the photo of the person first. The AI gives the most attention to the initial input. So if you lead with the car, the car will be perfect, but the face might be messy. Exactly. The AI needs you to tell it what the most complex and important part of the image is. That makes sense. What's number two? Let's specify the aspect ratio. The shape of

09:30

the picture. Yep. 16 .9 for wide video, 9 .16 for a tall social media story, 1 .1 for a square post. If you don't tell it, it defaults to a square and can crop your image badly. Simple but crucial. Okay, tip three. Trust the thinking process. When it says analyzing or thinking, don't cancel it. Be patient. You have to be. Pause is the planning phase we talked about. It's where the quality comes from. If you interrupt it, you get a rush job. Got it. And the last

10:00

one. Know when to use specialized tools. Google is. pretty strict on safety. If you need super high resolution, like 4K, or you're creating content that's a bit more edgy, you might need a paid tool like Higgs Field AI or something in Adobe Photoshop. That's a great list. So if a listener only takes one tip away from this whole deep dive, which one should they prioritize? What's the number one thing to do? I would say tip three. Wait for the AI to finish its analysis.

10:26

That pause is the secret ingredient for quality because you're letting it run its physics simulation. See, I thought you were going to say the face -first rule. That seems like it fixes the most common problem, you know, messy hands and faces. And it is an easy fix, for sure. But if you cancel the thinking phase, even a perfect face -first image is still going to have flat lighting and bad texture. The planning is foundational. It sets the quality for everything else. So prioritize

10:51

the process over the subject. Precisely. So we've covered a lot of ground. I think the big idea here is that this update, Gemini 3, It transforms the image generator from a fun little toy into a professional artist inside your computer. And that transformation comes down to two things. Two things. Hyperrealism, which it gets through that planning phase, and continuous memory, which gives you consistent characters. So our challenge

11:17

to you, the listener, is pretty simple. Go to Google Gemini today, think of a senior dream kitchen, a funny picture of your pet, and apply the prompt tips we shared. Yeah, focus on the lighting, the texture, the camera angle. See what you can create. The only way to really keep up with this stuff is to get your hands dirty and start playing with it. And we have to think

11:35

about what this means. I mean, what if, in a year, the line between a real photo and an AI image becomes completely invisible, even to experts? That's the path we're on. It is. We're heading into a world where visual truth is going to be a lot harder to define. Keep testing those boundaries. Keep playing with the tech. Until next time, keep digging deeper.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript