#233 Neil: Google’s Gemini 3 Pro 1M Token AI Is FREE See Our Hardest Tests

00:00

What if an AI decided to stop chasing pure speed and instead started focusing on deep, slow, and just careful reasoning? That fundamental shift is here. I mean, it's officially here. It's what Google is calling their new thinking model. And crucially, it's paired with this massive infrastructural upgrade. We're talking a 1 million token context window. This is not just another small step. It really changes the core architecture of how we can handle huge amounts of data. Welcome to

00:33

the Deep Dive. Today, we're doing a really critical analysis of Google's new Gemini 3 Pro. And based on some pretty extensive real -life testing we've done across a lot of different uses from finance to visual puzzles, the hype seems, well, it seems genuinely warranted. It does. It feels like a very serious contender in the high -end AI space now, maybe an unavoidable one. So our mission for this deep dive is to really distill what

00:58

makes this model unique. We're going to unpack the philosophy behind that thinking model and explain what that super memory means in a practical sense. And then get into the test results. Exactly. We're talking about building interactive dashboards, solving some tricky visual puzzles, and even its surprising ability to analyze silent video footage. OK, so let's unpack that core concept first. Gemini 3 Pro. It's the first in this new Gemini 3 series, and Google is deliberately calling

01:25

it a thinking model. Right. That sounds great. But what does that actually mean for someone using it day to day? It's a huge shift away from latency minimization, you know, that obsession with giving you an answer instantly. OK. This model is built to pause. When you ask it something complex, it doesn't just spit back the first thing it finds. It engages in what you could call an internal reasoning chain. So less like a search engine and more like someone actually

01:53

thinking a problem through. Exactly. Think of it like someone carefully working through a multi -step math problem instead of just recalling an answer. So those extra, what, 10 to 20 seconds it takes to respond? That's deliberate. That's not wasted time. That's the key. During that delay, it's essentially running premortems on its own logic. It's checking for consistency, making sure step five still aligns with the rules you set back in step one. And that rigor is what

02:18

leads to the higher quality. Precisely. It's why the final outputs are just so much more accurate. And the performance data seems to back this up. I mean, we're seeing benchmarks where Gemini 3 Pro beat its competitors. Like Claude and the latest GPT models. By the biggest performance gap ever recorded in these kinds of head -to -head tests. Right, but reasoning is only half the picture. You can have the best reasoning in the world, but if you forget what the initial

02:42

request was, it's useless. Which brings us to the memory. The context window. The other massive technical leap here. Okay, so let's talk numbers. Remind us what a context window is in a practical sense. It's basically the AI's short -term memory during your conversation. It's all the info it can hold in its head at once. And most models today are around, what, 256 ,000 tokens? Yeah, around there, which is already pretty impressive. And a token is... you know, roughly a word or

03:10

part of a word. But Gemini 3 Pro is quadrupling that. It's quadrupling it up to a full one million tokens. So you can think of the super memory like upgrading the AIs RAM by four times. So the practical benefit for you listening is what? It means the AI can digest truly huge documents. We're talking thick legal contracts, 300 page reports, even entire books without forgetting the details from page one. So it just reduces that that management overhead. You don't have

03:39

to keep reminding it of things. That's it. It just manages the complexity better. It gets rid of that frustrating moment where an AI gets three quarters through a project and starts making things up because its memory just ran out. The best news for getting started, it's free. It's accessible right now. Yep. Just go to Gemini .Google .com and it's the default model. But you have an important tip here. A very important

03:59

tip. To make sure you're actually using this high -quality thinking model, you have to go into the settings and find the specific option to turn it on. Otherwise, it might default to a faster, less thorough mode. Exactly. And for developers listening, it's already integrated into Google AI Studio and Vertex AI. So, if the model is designed to pause... Is that extra quality really worth the annoyance of waiting? For complex tasks, the quality leap is undeniable. That waiting

04:31

period is where the magic happens. Speaking of complexity, let's look at the first real -life test, building interactive dashboards. We gave it a deliberately practical challenge. We asked it to create a full financial calculator for a multi -unit rental property. Not just math. but a structured, interactive tool. Right. So the prompt had a lot of variables. Purchase price, loan rate, down payment. Income, costs, all the

04:55

usual stuff. But the real challenge we threw in was an interactive slider for the vacancy rate. OK, so the user could drag it from, what, 0 % to 30%. Exactly. And the result was genuinely impressive. It built a full functional dashboard right there in the chat. So as you move the slider. The profit numbers instantly update. You can see your profit shrink or even watch the property

05:14

start losing money in real time. And it also very smartly included a Generate Report button that gave you a written summary based on wherever the slider was set. That is a huge time saver for just checking if an idea is feasible. It is. And a big pro tip here is you can share these tools just by sending the Gemini link so you can collaborate on the model with someone else. OK, so from complex math to pure writing, the next test was about nuance, right? Yeah, passing

05:41

subtle, human -imposed constraints. The challenge was to write a 700 -word SEO article on AI in marketing. For a non -technical audience, using web search for current info. And here's the tricky part. It was forbidden from using M dashes. A style rule that even human writers mess up all the time. And how did it do? The writing was excellent, very natural, very helpful, and most importantly, it worked. No M dashes. It showed a really high level of obedience to a subtle

06:11

style guide. But there was a small issue. There was. And here's a vulnerable admission for me. I still wrestle with prompt drift myself, and even this top tier model had a little bit of it. What happened? It missed the word count. The goal was exactly 700 words, and it produced 785. And for anyone who hasn't run into it, what is prompt drift? It's when the model deep in a conversation starts to prioritize the quality of what it's creating over your original structural

06:39

rules. So it wanted to finish the thought properly, even if it meant going over the word count. That's my guess. It's also just a reminder that these AIs count in tokens, not exact words. So there's always a bit of ambiguity there. These dashboards seem great for initial modeling, but how reliable are they for serious financial work? They're outstanding for that initial analysis. For precision, always double check the logic, but the structure

07:03

is very reliable. Okay, moving beyond text and numbers into the area where this model is really expected to shine. Multimodal power. Right, analyzing images and video. So test three was about visual reasoning. We gave it two classic spatial puzzles. The first was a photo of stacked colored cubes and we asked it to count the total, including the ones it couldn't see. A puzzle that famously trips up older AIs. They just can't infer what isn't directly visible. But Gemini 3 Pro got

07:33

it. Yep. It took about 10 seconds, using that thinking model, and then it showed its work. It explained how it counted the hidden blocks needed for support, and it gave the correct total. And the second puzzle. was identifying the top -down view of a pyramid based on subtle color patterns from different photos. It got that one right too, explaining its reasoning based on how the colors had to line up. That level of spatial analysis is a huge deal. It's not just

07:59

recognizing an object. It's not. And this is critical for fields like architecture, engineering, or even medical image analysis, where you need to understand what lies beneath the surface. The next test was maybe even more impressive. video analysis, but with no sound. This is a breakthrough capability I think people are sleeping on. I uploaded a silent five -minute screen recording of me just working. And you just asked it, what am I doing? Pretty much. What am I doing and

08:25

what info can you see? And the result was? Startlingly detailed. It spotted a tiny pop -up notification that was only on screen for a second. It read my name from the user interface. And it accurately described what I was doing, saying I was changing parameters in a financial calculator tool. It understood the purpose of my actions from visuals alone. Whoa. Yeah. Just imagine scaling that. Imagine scaling that kind of video understanding to a billion security camera queries. Oh. Or

08:55

analyzing sports footage. Exactly. We did another test with a pickleball clip, and it worked perfectly there, too. It opens up entirely new workflows. And the final test in this segment was building a simple app. a game. It did. It created a playable game called Neon Swarm, kind of like Galaga, using the system's Canvas tool. It had keyboard controls, score tracking, the works. So crucial tip for anyone listening who wants to try making something visual like that. Always, always make

09:21

sure you've activated the Canvas tool. That's the environment where it can actually execute and build these interactive things. Okay. And you had a quick note on debugging these projects. You do. If you go back and forth with the AI, more than say 10 times, the canvas can sometimes get confused and things start to break. So what's the fix? Honestly, it's usually faster to just start a new chat with a clean prompt than to

09:44

try and fix the old one. So if the model is so capable, what's the biggest limiting factor right now when you're trying to create these complex visual tools? It's that the memory of the tool's visual state can sometimes break after too many back and forth refinements. A fresh chat usually solves it. Let's talk about daily workflow. We've confirmed the power of the thinking model, but it takes 10 to 20 seconds. In a fast -paced job, isn't that delay a deal -breaker? It's a fair

10:11

question. That wait time does feel a little annoying at first. We're all conditioned for instant AI replies. But the trade -off is that the quality is so much higher that you spend less time revising and fact -checking. For high -stakes work like drafting a contract summary, that depth is absolutely worth the wait. You use fast tools for fast work. and this for deliberate work. For those ready to adopt it, you mentioned two essential settings to turn on right away. Yes. First is personal

10:37

context. Turn this on. It lets Gemini learn your style, your tone, your jargon from past chats. So it becomes more personalized over time. Exactly. It'll start writing sales emails in your formal tone automatically, for example. And the second must -have setting. Custom instructions. This is why you set permanent rules for every single chat. Things like always avoid complex jargon or frame all advice around website speed. And you're saying these stick much better in this

11:06

new version. Far more reliably than they did before. Yes. OK, what about the AI mode in Google search? How did that perform against? Traditional search. I ran a comparison looking for a hotel in a specific price range. Regular Google search was perfect. Fast, precise, clean links. The AI mode was inconsistent. It would sometimes show conflicting information. How so? The little AI generated summary might quote one price, but the actual link it provided showed a price that

11:32

was 50 % higher. It was hallucinating details to make the summary sound good. So the problem isn't speed, it's reliability for real -time facts. Exactly. For simple factual queries, regular Google search is still much more reliable right now. And we have to talk about the future here. Agent mode. This is the big one. This is the idea that signals the shift from a chat tool to a digital colleague. So what does it do? It can autonomously do things on the web for you.

12:00

It's not just giving you information, it's acting on it. Like booking a reservation or filling out forms. Exactly. It moves the AI from being a passive answer generator to an active doer in your workflow. Given those search inconsistencies, should users rely on any AI, including this one, for precise real -time data. Not yet for simple facts. Regular search is still king for that. But for complex analysis of data you provide, the thinking model is far superior. So let's

12:28

bring it all together. Where does Gemini 3 Pro really shine in a professional workflow? I'd say three clear areas. First, creating structured things like interactive tools. Second, deep reasoning for complex problems. And third, that advanced multimodal analysis. And the context window. Honestly, the 1 million token context window alone is a compelling reason to try it, especially if you work with long documents. And compared

12:51

to the competition, where does it fit? Well, chat GPT still has the edge on third -party integrations. The app ecosystem is huge. Claude is still fantastic, especially for high -level coding. But Gemini 3 Pro is a serious contender. A very serious one. And if you're already deep in the Google ecosystem, Gmail, Docs, Workspace, the integration is so seamless, it makes it a really powerful and easy choice. So what's your final advice

13:18

for our listeners today? If you're totally happy with your current tools for simple things, there's no need to rush. But you should absolutely try Gemini 3 Pro for three specific tasks. Which are? Interactive tool creation, any kind of complex analysis that requires deep reasoning, and processing any long document that's over, say, 100 pages. So the core features, the reasoning, the quality, they're top -notch. They are, despite some minor issues like the word count thing. We really encourage

13:46

you to try them all yourself. It's at Gemini .Google .com. And remember to go into those settings and turn on the thinking model option to really test it on your own complex work. And building on that idea of agent mode, just imagine an AI not just answering your questions, but automatically completing the next three steps of your workflow. Scheduling, drafting, data entry. All of it, autonomously. What kind of fundamental shift happens to our idea of work when the chat tool

14:14

becomes a true digital colleague? Thank you for joining us for the deep dive.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript