#455 Neil: Gemini 3.2 Pro Leak Just Burned The Current AI Rankings

00:00

Well, we are just five days out from Google I .O., and honestly, the metaphorical dam hasn't just broken today. Oh, it is totally washed away. I mean, nobody saw this massive wave coming. Yeah, it really came out of nowhere. Over the last 48 hours, leaks just completely exploded. We are seeing sudden Gemini 3 .2 rumors everywhere. GPT 5 .6 internal tests are suddenly popping up. Mythos version 2 just successfully completed complex network breaches, and Codex Mobile screenshots

00:29

are randomly surfacing online. It is wild. Welcome to the Deep Dive. I'm very glad you are joining us today. Today is Thursday, May 14th, 2026. Right. And Google I .O. kicks off in exactly five days. Exactly. So we are taking your sources to examine all this today. We have four massive leaks completely shaking the space. And we really need to look at what this acceleration means. Because this directly impacts your daily workflow and security setups. Okay, let's unpack this.

00:56

The AI race just got incredibly hot again. It seemed to happen entirely overnight. Google hasn't officially announced Gemini 3 .2 at all. No blog posts, no documentation. Right, zero API changelogs. But the developer community is completely losing its mind anyway. Because developers keep sharing these leaked test results. So let us start with Google's unannounced Pro models. The earliest leaked outputs focus really heavily on SVG generation. Yeah, they do. It is a very specific technical

01:27

capability to highlight. And for those wondering, an SVG is an image made of math, not pixels, staying sharp at any size. That is exactly why this specific detail matters so much. An AI generating pixel art is basically painting blindly. Right. But an AI generating an SVG is actively writing code. It has to perfectly plot mathematical coordinates and curves. The first test showed a PS5 controller design. The result looked... Well, okay, but nothing truly mind -blowing. Yeah, the buttons

01:55

did not line up properly at all. The overall shape just felt a bit off ergonomically. It understood the basic geometry, but not human hands. Exactly. But the SVG quality improved significantly in later tests. Another test showed a pelican riding a bicycle. And that specific result looked much more coherent visually. The bird had a very clear, realistic body shape. It featured far better physical proportions. It really understood the abstract relationship between bird and machine.

02:21

The bicycle itself was mostly complete this time too. No broken parts or completely disconnected wheels. But the really interesting part was the interactive interface. The model generated a full UI right alongside the graphic. Oh, wow. Yeah, users could actively customize the graphic and inspect curve. They could export files and edit specific geometric parts. This suggests a huge shift toward interactive vector workflows. It is not just about basic static image generation

02:49

anymore. The SPG engine inside Gemini looks incredibly robust now. The coordinate shakes are cleaner and the layouts feel stable. But the UI generation still looks very weak, right? Unfortunately, yeah. The interfaces feel extremely flat and quite generic. They lack the deep polish of a flagship model. One example, use a Web Audio API specifically. It also incorporated Tailwind CSS into the visual design. Yet the final UI

03:14

still looked very obviously AI generated. Having a great SVG engine but weak UI generation is like putting a world -class engine in a car with no steering wheel. Exactly, or a massive engine bolted to a skateboard. The raw computing power is definitely under the hood. But if the interactive interface remains that clunky... Everyday developers cannot steer it into their actual workflows. Well, part of this issue could just stem from

03:40

the prompts. If testers focused purely on the vector math quality, the interface results would naturally look much weaker. That makes sense. Based on the leaks, Google clearly prioritized mathematical code quality. User experience definitely looks like the weak spot right now. But the Gemini Pro model surprised people in other ways, too. Another leak showed its true creative generation power. Right. From one simple prompt, it created four distinct robots. And all four robot designs

04:08

look surprisingly good. They were not just lazy, simple copies of each other. They had completely different colors and structural shapes. Yeah. Unique, small design details were built right in. This proves it can generate multiple creative variations accurately. It does not just retune one repetitive, hallucinated output. This is incredibly useful for vector asset design workflows. Icon designers and mascot creators are going to absolutely love this. But the flash variants

04:34

are where things truly heat up. Let's talk about that. There are multiple flash variants in active internal testing. The leaks mention code names Fanta, Sprite, and Cola. These are internal names for different experimental model configurations. Each variant targets completely different performance optimization goals. One might be tuned for pure processing speed. Another might focus purely on high factual coding accuracy. Right, this is standard operating practice at major AI labs.

05:03

They train several versions and basically pick the best performer. And the Fanta variant specifically showed very competitive benchmark results. It is testing incredibly well against the Pro model. Fanta is nearly rivaling Pro in many complex aspects. Which raises a big question. Will the Fanta Flash variant cannibalize the Pro model's market if speed wins out? If Google ships a Flash variant this strong, It changes everything. Users prioritize speed for fast, practical, everyday

05:29

engineering work. It could completely rewrite current pricing and tier models. So speed and daily efficiency might actually win out over raw, complex reasoning power. Right. And that is exactly what we are seeing. Pro handles complex reasoning and multi -step architectural planning tasks. While Flash handles the fast, repetitive practical daily coding work. Exactly. But here's

05:53

the really critical thing to consider. If Google nails that daily efficiency with their models, OpenAI cannot just sit around on their hands quietly. Very true. What's fascinating here is the sheer development speed. The leaks show OpenAI is definitely feeling the heat. While everyone watches Google, OpenAI is testing very quietly. GPT 5 .6 is already in active internal testing. Yeah, the first development checkpoints were evaluated very recently to internal code. code

06:18

names showed up in the recent leaf. Amber alpha and beacon alpha, right? Those are the ones. These are likely two advanced variants being heavily tested. They want to see which gives better complex benchmark results. They run advanced candidates in parallel and pick one. We might see these internal candidates appear. very quietly soon. They often show up on LM Arena testing first. They just run public blind testing without telling anyone beforehand. The gap between major

06:43

releases is shrinking drastically. I mean, GPT 5 .5 just launched a few short weeks ago. And it was a massive structural success for the company. Reviewers said it outperformed Anthropix Opus easily. That momentum gives OpenAI strong reason to push forward. especially with Google about to show new flagship models. If OpenAI waits too long, they lose the narrative entirely. Some reliable sources actually suggest a release by next month. Wow. That would be one of the shortest

07:11

release gaps ever. One major reason for this acceleration is quite profound. AI itself now actively helps build new AI models. Right. Models constantly assist with complex code generation tasks now. They handle vast data preparation and evaluation during training. They optimize the actual architectural training process itself continuously. They completely remove the human bottleneck. from data collection. The model generates millions of complex edge case coding problems.

07:39

And then it autonomously solves those exact same coding problems. And finally, an internal reward model grades the final solution. You know, Ibeat, it is a very humbling reality. I still wrestle with prompt drift myself on a daily basis. Yet these models are already debugging their own training code. It really makes you realize how quickly we become bottlenecks. Models generate synthetic data way more efficiently than human teams. They instantly spot critical processing

08:06

bottlenecks earlier than us. Every lab knows slowing down means losing crucial ground. Absolutely. So since AI is building AI right now, are we going to see massive capability leaps or just better integration? We might not see those massive historical capability jumps anymore. Early years were defined by those massive intelligence leaps. Now, updates will focus heavily on pure processing speed. They will focus on smoother workflows

08:30

and deep personalization. Right. Smoother workflows and integration matter far more now than chasing massive capability leaps. And that concept of smoother workflows is actually very important. It connects directly to highly complex, high -stakes digital environments. Which brings us to the intense cybersecurity sector today. This specific space does not get enough mainstream attention at all. It really doesn't. And the cybersecurity AI crown is shifting incredibly

08:58

fast lately. We started with Claude Opus 4 .6, leading everyone. Then the original Mythos preview appeared on the scene. It stayed slightly behind the Opus model initially. But then GPT 5 .5 Cyber closed that performance gap. It was tuned specifically for complex security network workflows. Right, and now Mythos Preview Version 2 is officially arrived. Some elite developers are currently pronouncing it as Mythos. It just received a major upgrade that is turning industry heads.

09:27

Early benchmark results show a clear capability jump immediately. Institutions like the AI Security Institute are actively testing it. It has confidently jumped back into the absolute lead. This constant back and forth really shows the development speed. The competitive cybersecurity AI race is incredibly intense right now. The most striking test involves a complex network simulation. They use a 32 -step

09:52

corporate network attack simulation. And this is definitely not a simple theoretical toy benchmark. The model needs to identify obscure digital system vulnerabilities. It has to actually chain multiple obscure software exploits together. It must execute lateral movement smoothly through the whole network. moving from a compromised printer straight to an admin laptop. It has to complete a full, complex attack path. A human cybersecurity expert needs

10:19

roughly 20 hours total for this. But this new mythos preview completed the entire simulation rapidly. Yeah, it took only 6 to 10 digital attempts total. This is a dramatic difference in operational security capability. Whoa, Pete. Imagine simulating a 32 -step network breach in just a few tries. The magnitude of that technical achievement is honestly staggering. A 32 -step attack chain requires deep architectural understanding. It must know exactly which obscure exploits actually

10:47

work. It strings everything together in the correct logical sequence. The fact that an AI does this reliably really matters. It is a massive structural deal for the entire industry. But does giving an AI the ability to chain exploits hand a dangerous weapon straight to bad actors? That is where things become significantly more complicated ethically. The exact same model defending a network can attack it. The potential risks of widespread misuse are incredibly high. That is exactly why

11:16

software releases are carefully staged. Governments and security institutions usually get private access first. They have crucial time to prepare necessary security defenses. They actively do this before tools become widely available. Exactly. Defense and attack grow together. Early access for human defenders is absolutely crucial. But this immense autonomous computing power is expanding very rapidly elsewhere, too. It is not staying securely locked inside corporate data centers

11:41

forever. Right. It is moving directly to your personal mobile phone. recent leaked screenshots started showing OpenAI's codecs running locally. It is operating directly from a standard mobile device. This immediately caught the attention of many online developers. It does not look like a highly polished app yet. It seems much closer to a remote coding workflow. Yeah, they are using a mobile CLI right now. Let's quickly define

12:05

that. A mobile CLI is a text -based command line to control remote computers from your phone. That is a perfect, concise summary of the tool. You can manage complex coding tasks away from your desk. You can comfortably review outputs and interact directly with codecs. Earlier leaks hinted at much deeper workflow integrations coming too. Tools like NotionHQ are heavily rumored right here. You could plan, document, and code in one single place. There is also specific mention

12:33

of remote control features. You can trigger heavy codecs compiling tasks directly from your phone. Because the heavy processing still runs safely on a remote server. Rohan Burma from the Core Codex team was actually mentioned. He was specifically cited as a primary source for this leak. Which adds major industry credibility to these current screenshots. This definitely does not look like random internet speculation. If Codex arrives on mobile, it completely changes developer workflows.

12:59

Right now, most professional tools need a heavy desktop IDE. Mobile access lets you actively review code from anywhere. You can approve massive pull requests while on the go. It's not about typing syntax with your thumbs. It's like carrying a senior developer in your pocket as a remote control. That is exactly how it will psychologically function for people. You could be sitting on a busy train commute. Or you could be sitting

13:21

quietly in a long meeting. You can still keep your complex coding projects moving forward. You open your phone and check the project documentation. You trigger Codex to actively review the new logic. Codex finds a bug, fixes it. and pushes the commits. Will developers actually want to manage heavy processing tasks away from their desks? They'll definitely not do heavy typing on a mobile screen, but they will urgently want

13:46

to unblock their teammates. They will want to review AI -generated code snippets very quickly. No, it adds flexibility to keep projects moving anywhere without replacing the desktop entirely. Mid -roll sponsor break. So what does this all mean? If we connect this to the bigger picture today... A very clear, undeniable historical pattern is showing up. All four leak stories point to the exact same trend. AI development labs are releasing massive models faster than

14:13

ever. The technical time between major versions is completely collapsing now. Internal testing cycles overlap heavily with public release cycles. Significant leaks happen before official corporate announcements way more often. Two massive market forces constantly drive this rapid acceleration. Advanced AI tools actively make software development much faster. An extreme competitive corporate pressure drives the entire industry forward. No single lab can afford to look completely behind.

14:41

We are looking at a massive volatile pressure cooker. OpenAI aggressively pushes 5 .6 because of Google I .O. Google rushes Gemini development because Anthropic's Mythos is gaining ground. Anthropic accelerates Mythos testing because of GPT -Cyber's incredible performance. Every strategic move by one lab creates intense market pressure. It forces the other major competitors to sprint even faster. That is exactly why we

15:06

see this massive flood of leaks. The broader AI space moves way faster than people can track. The real practical lesson here is about managing the pacing. The technical field heavily rewards people who stay consistently informed. You must actively act on what actually matters most. Right. Do not treat every single internet leak as gospel. You should patiently wait for the official software releases. Compare the real technical outputs

15:32

when they are finally public. Pay close, careful attention to actual real -world software deployments. Track the quiet, unannounced deployments on LM Arena closely. The most meaningful software improvements are often very quiet. They just secretly make your daily work much faster. They smoothly integrate into your workflow without you even noticing. Two secs silence. Here's where it gets really interesting for the future. Google I .O. officially happens on May the 19th. You need to keep a very

15:59

close eye on it. We must actively see if the pro versus flash reality actually matches the leaks. We desperately need to see if flash truly wins out. I want to leave you with one final thought today. We know AI is successfully debugging its own training code. It is autonomously accelerating the software development loop drastically. What happens when these models begin designing the next generation of physical hardware and chips

16:23

optimized purely for themselves? That is a truly staggering philosophical concept to consider. They could autonomously design the next generation of silicon chips. They could completely remove the human bottleneck from the physical timeline. It is something to seriously ponder on your own. Stay sharp.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript