#04 Robin: Gemini 3 Pro + n8n: The "Brain" Your Automation Needs (But Keep It Away From Tools)

00:00

If you've spent any time building AI automation, I'm sure you've hit this infuriating wall. Oh, yeah. You know, wiring up the system, connecting the triggers, moving the data around. That's the easy part. And that's a solved problem, really. Exactly. But the moment you ask the brain to handle real complexity, like a 100 -page policy manual, a detailed diagram, a messy financial report, the whole workflow just becomes instantly fragile. It works on your machine and it just

00:27

shatters in production. It really does. Well, this deep dive is about the technology that have the potential to solve that exact fragility We're talking about Google's new Gemini 3 Pro model. Yeah But as always with that immense power comes a whole new set of limitations. We have to navigate Welcome. We're going to dive into a necessary

00:48

reality check on Gemini 3 Pro today. We really need to go past the press releases and focus squarely on what actually matters for building reliable AI automation inside a platform like NEN. Our mission today is pretty simple. Cut through the marketing noise and get real. We have to understand its huge strengths, context, planning, but we also have to face its current

01:09

weaknesses. Which are? Primarily cost and some very specific tool calling bugs So we'll cover why it's sheer capacity is a game changer the trade -offs and how you connect it a hidden cost saving setting that most people miss and the big conclusion For reliability right now. You have to mix your models. Okay, let's unpack that the first major shift isn't really about raw intelligence, is it? It's about sheer capacity.

01:35

That's it. Before Gemini 3 Pro, handling these massive documents was, I mean, an engineering nightmare. It was the foundational problem. You were solving data limitation problems before you could even touch the actual business logic. We were always stuck with workarounds, right? Yeah. We had to do things like chunking, splitting a document into tiny pieces or use these expensive vector databases just to feed the model relevant

01:55

snippets. Mm -hmm. It felt like stacking Lego blocks of data and just hoping the AI saw the full picture. Yeah, hoping. Precisely. Now, just look at the numbers. Gemini 3 Pro supports roughly 1 million input tokens. A token, to keep it simple, is just a piece of information the AI processes, usually a word or part of a word. A million tokens. That sounds abstract, so give us the practical scale of that. What does that actually mean? It's massive. Think of it like this. The entire

02:23

novel Moby Dick is about 200 ,000 words. You could paste multiple full copies of long legal documents, internal procedure manuals, or huge financial reports directly into the prompt. The whole thing. Without hitting a limit. Without hitting a limit. This just removes a whole layer of engineering complexity for so many internal automations. You're not spending days building retrieval pipelines anymore. That context. is

02:47

incredible. But like you said, the moment we solve that engineering headache, we slam right into the budget. Yeah. Let's talk economics here. because the price tag is significant. It's a critical dilemma. You have to get this up front. Gemini 3 Pro is significantly more expensive, we're talking orders of magnitude, than its little brother, Gemini 2 .5 Flash. Right. And if you fall into that trap of using the best model for everything just because you can, your cloud bill

03:12

is going to spiral out of control fast. So when is Gemini 2 .5 Flash good enough? It's cheap. It's fast, and it's perfectly fine for bulk tasks. Things like simple document tagging, email routing, basic summaries. You don't need a sledgehammer for that. Wait, OK. If Flash is cheaper and can handle, say, 80 % of our simple tasks, why even bother with Pro? Isn't managing two models more

03:39

of a headache than just eating the cost? Only if your volume is really low, Pro earns its keep only when that deep structural reasoning or that massive context window actually changes the quality of the output in a meaningful way. So we shouldn't just be looking at general benchmarks. The scores that matter for automation builders are more practical. Exactly. I'm thinking about tests like image understanding, things like the ScreenSpot Pro benchmark. It's not just counting objects

04:02

in a picture. No. It's evaluating how well the model understands the structure of a diagram or a complex flow chart. Right, and also long horizon tasks like the vending bench benchmark. That one focuses on multi -step complex planning. The model has to plan five steps ahead and remember the constraints from step one all the way to the end. And that's where Pro justifies its cost.

04:24

That's it. If your automation needs to generate highly structured data or execute a complex plan based on a 100 page document, that deep reasoning is where Pro pays for itself. So what is the main takeaway? regarding its expense. It's strategic role assignment. Use the best model for the specific task required. That makes a lot of sense. So if we accept that strategy, how do we actually connect this thing to our automation platform? There are basically three ways to hook it into

04:52

N8n, each with different trade -offs. Yep. The first two are the most straightforward. You've got the native Google Gemini node. It's the fastest setup. It's great for quick tests, but you have very limited control over advanced settings. And it's not great for agents. Not at all. Then you have the AI agent node. This treats Gemini Pro as the brain for multi -step reasoning. You use it when the AI needs to actually think, not just describe something. But it still has limits.

05:17

It does. While it's great for planning, it still lacks that granular control over the model's internal workings. And that's where reliability starts to suffer. And I'll share a quick vulnerable admission here. I mean, after years of building these systems, I still wrestle with prompt drift myself. A tiny change in the model or the input can just throw everything off. And that's why having that low level predictable control is so crucial for production systems. That's reassuring

05:45

in a painful kind of way. What's an example of that drift causing a real problem? We had a user whose agent was supposed to classify support tickets and just pass a tag to a database. Simple. But it started adding a verbose explanation before the tag. Oh, no. The database only wanted the tag, so the system just broke silently. It lost data for three hours before anyone even noticed the structure had changed. Small drift, big error. Wow. OK, those simple connections can hide some

06:11

really costly details. Which brings us to the more advanced option. Something like OpenRouter. OpenRouter centralizes your billing and gives you a clean way to swap and compare different models, Gemini, OpenAI, Anthropic All, with a single API key. It's great for testing that role assignment idea we talked about. It is. It's often the cleanest setup for serious high -volume users. But it still abstracts away some of those deep model -specific controls that, as we're

06:40

finding out, Gemini Pro really needs. Exactly. So which connection method gives the most granular control over the model? Calling the API directly via the HTTP request node is mandatory for full control. And that brings us directly to a hidden problem. A silent cost leak that's driven by a setting most people can't even access with the easy notes. They can't, no. This setting defaults to maximum performance, which just silently inflates your costs without adding any value

07:07

for simpler tasks. This is what we call the thinking level. Gemini Pro can operate in two states, low, which is faster, cheaper, and uses less deep reasoning, and high, which is the default. It's slower and engages much deeper, more resource -intensive logic. So high thinking is like having an internal QA team checking every logical step, while low thinking is the quick gut reaction response. That's a great way to put it. But here's

07:34

the tooling gap. Gemini 10 currently doesn't expose this thinking level toggle in its native or agent nodes. You can change temperature, token limits, but not this critical cost style. Wow. Which means users are just stuck in the more expensive, higher latency, high thinking mode all the time, even for basic data extraction. At scale, that is a guaranteed insidious cost leak. We estimate you could save 40 to 50 percent.

07:58

You're asking people to ditch the easy native node and write a manual HTTP call just to flip a toggle. I mean, are the savings really worth that extra complexity? They absolutely are if you're running that workflow hundreds or thousands of times a day. We've seen costs drop dramatically overnight for users who make this switch. So it's more setup time up front? A bit more, yeah. But it's the only reliable way to save money

08:20

and reduce latency right now. This is a classic case of the model's capability running way ahead of the integration tooling. Why is this hidden setting so critical for workflows at scale? Hidden high thinking mode increases costs and latency unnecessarily for simple reasoning tasks. Okay, before the break we talked about the control you don't have. Let's pivot now to the execution barrier, which really defines the reliability

08:45

of Gemini Pro today. Let's do it. We saw some amazing success in the image analysis experiments from our source material. When testing flow charts or, say, property damage photos, other models could describe what they saw. Yeah, they could give you a caption, but the key here is the shift

09:00

from what to why. Explain that. Other models might say a dent in the fender or a pipe labeled A. Gemini 3 Pro consistently explained the structure of the diagram, the decision pass in the flow chart, and even the likely causes of the damage based on context. And that's what you need for automation. Exactly. Downstream automations need causes and logic, not captions. That level of interpretation makes the analysis immediately

09:25

actionable. Whoa. Imagine scaling that kind of deep -image analysis across a billion insurance claims or infrastructure audits. That's a profound leap. It is, and that extends directly to the large -context reliability. The test of stuffing a full 126 -page PDF into the prompt without chunking. It was a clean success. It handled the entire document, no crashes, no lost accuracy. So it matched or even beat a complex ARG system. It did, but the real win is the reduced engineering

09:56

time. You just eliminate the whole maintenance headache of chunking and vector databases for your static documents. Okay, but this incredible analytic power just hits a brick wall when the model needs to act. Let's talk about the major limitation. Tool calling. This is the big one.

10:10

We found that in NCI Gemini Pro breaks Specifically when it tries to call an external tool like looking up a record in a database and then tries to resume reasoning Right the action itself executes, but the agent just errors out right after what's the technical reason for that? It comes down to something called thought signatures basically the agent errors out because NAN doesn't yet support Gemini's specific internal format for

10:36

its planning notes. So the model is like taking notes for itself on how to solve the problem. Exactly. And when it comes back from using a tool, say, after looking up that database entry, it needs those internal notes to continue. But the current integration drops them, so the agent forgets its place and just crashes. So if I set up a Flocalize with the PDF, look up a customer. then send an email. It fails between the customer

11:00

lookup and the email. That's right. That makes it completely unsafe for production execution flows. Precisely. It's amazing at planning, but when it goes to actually do something, the flow is just too brittle right now. This brings us to the final critical insight. The winning strategy is intentional model segmentation. Spligging the roles. You got it. Use Gemini 3 Pro as the planner and analyst. Let it understand the complex input. Let it design the logic. But then let

11:26

another model do the work. Let other more stable models like Gemini Flash or even some open AI models serve as the executor. Let them handle the simple tool calls and action steps. What is the main risk of using Gemini in any 10 today? Tool -heavy execution fails due to unsupported internal thought signatures. That duality is really the big idea here. Gemini 3 Pro has absolutely raised the ceiling on what AI can understand. Complex context, images, deep reasoning. It has.

11:55

But its integration is being held back by the tooling, by these hidden settings and that broken tool calling process. So the key to reliability is intentionality. It's strategic. Use Gemini Pro for its genuine strengths, that complex analysis and design. Then let other, more stable models handle the simple execution where cost and stability are what matters most. Test everything. Right.

12:16

And this dynamic, where the model's capability is running so far ahead of the tools we use every day, it just shows how fast this whole field is moving. It's incredible. And the provocative thought to leave you with is this. What happens when the tools finally catch up? When we get full cost -effective control over that deeper thinking level, even for small, fast tasks, that's when the real massive shift in automation truly begins.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript