#332 Neil: Gemini 3.5 Snowbunny Is The New AI Beast Coding Emulators Instantly

00:00

Imagine for a second you're sitting in front of a blank chat window, the cursor is just blinking, and you're not about to ask for a recipe or, you know, for a summary of some confusing email. Right. You type a single prompt and you ask it to build a working machine, specifically a Nintendo Game Boy emulator from scratch. Usually, this is the part where the whole illusion just breaks. The model gives you some generic apology, or maybe a few broken snippets of code that don't

00:28

actually run. But this time, it didn't do that. It generated over 3 ,000 lines of code in one continuous flow. 3 ,000 lines. One shot. It built the CPU emulation, the memory management, the input handling. It mimicked the actual hardware. And when the user ran it, it played real games. It played Tetris. Honestly, it's hard to wrap your head around. It feels like we just skipped a few chapters. Oh, we absolutely did. Welcome

00:54

back to the Deep Dive. Today, we're unpacking something that feels like a genuine tipping point. We're looking at a stack of reports, developer logs, and some leaked benchmarks around a model that the internet has decided to call Snow Bunny. Which is a very online nickname for what is actually a very serious piece of engineering. Right. To be clear, this is Google's Gemini 3 .5. But the interesting thing is, this isn't a product you'll see in a glossy press release just yet. It's

01:23

a ghost. It is. It's a leaked version. The coding community started noticing these mysterious model codes DN9, D13, popping up in Google AI Studio. OK. And when they started poking at it, running their tests, they realized this wasn't just a slightly faster version of Gemini 3 Flash. This was something else. And that's what we're exploring today. The central idea we're looking at is this shift in our relationship with these tools. We spent the last few years in the era of the chat

01:49

bot. Right. You talk, it talks back. But Snowbunny suggests we're entering the era of the AI director. Exactly. The difference is agency and scope. A chat bot answers a question. A director builds a vision. So let's get into the evidence for that. Because looking at this material, this model is putting up some numbers that don't just inch past the competition. They kind of leapfrog them. The numbers are startling. But what's more interesting to me is how it's getting those numbers.

02:17

It's not just raw power. It seems to be a change in logic. Let's start there then. Segment one, the logic leap. There were two specific tests in these reports that really caught my eye. One is a variation on the classic trolley problem. Ah, yes, the misguided detention test. Wow. This is a fascinating look at how these models actually process information. So most of us know the drill. A train is coming, five people on the track. Do you pull the lever? It's ethics 101. Yeah.

02:44

But this test had a twist. A really important one. Right. The prompt sets up the scenario, but it adds this tiny crucial detail buried in the text. Yeah. The five people on the track are already dead. Which changes everything. It changes the entire moral calculus. There's no dilemma. But here's the thing. Most AI models, even the really advanced ones, they see the words trolley problem and they just go into autopilot. They stop reading and start predicting. Exactly.

03:12

Their safety filters kick in what we call RLHF and they start lecturing you about the sanctity of life. They miss the context completely because they're pattern matching the concept, not the reality of the prompt. They're skimming like a student who didn't do the reading for class. Precisely. But Gemini 3 .5, it noticed. It caught that detail and gave the correct logical answer. It basically said, well, it doesn't matter. They're deceased. It scored 68 .5 % on this test. Which

03:40

is way above the others. It beats out almost every top model out there. That feels significant. It implies the model isn't just retrieving data. It's actually paying attention to nuance. And that means the hallucination rate where the AI just makes stuff up could drop because it's actually grounded in the text you give it. And that was backed up by the second test, the hieroglyph test. This one sounded fascinating. It's not about reading ancient Egyptian, is it? No, no,

04:03

not literally. It's a test of what they call lateral reasoning. The AI gets these strange symbols it has never seen before, and it has to figure out the hidden rules. You can't just look this up on Wikipedia. So it's basically an IQ test for puzzles it's never seen before. Exactly. It has to think on its feet. And the performance gap here is just massive. The older Gemini 2 .5 Pro scored about 20%. the model called GPT -5 reasoning reached about 45%. Gemini 3

04:31

.5 Snowbunny hit between 80 and 88%. Wow, that is a staggering jump. We're normally excited about a 5 % or 10 % increase, doubling the score of a GPT -5 reasoning model. That's different. So if it notices the dead people detail when others don't, what does that imply about its ability to handle messy real -world data? It means we can finally trust it to read carefully, rather than just pattern match. Which brings

04:53

us to the application of that intelligence. Because being smart is one thing, but can it actually build anything? This brings us to what the community is calling vibe coding. I love this term, vibe coding. It sounds less like computer science and more like a Spotify playlist. But it's about building websites without knowing HTML or CSS. Yeah, that's the core of it. The idea is that you describe the feeling, the vibe, and the AI handles all the syntax. The case study here was

05:22

a project called Cakes from the Heart. Sounds delicious. So the test was to see if the model could generate a professional -grade website for a bakery. The prompt was specific, but descriptive. You know, fancy style, cream and light gold colors, specific menu items. And what was the result? It wrote the entire thing. The hero section, the menu, the contact map, all the styling in a single HPML file. And it did it in about eight minutes. Eight minutes. And the cost? About 38

05:47

cents. That's cheaper than a croissant. Significantly. But the impressive part wasn't just the speed. It was the workflow. The user could ask for dark mode after the fact, or ask to make images move smoothly. The model would just iterate on the code. So it's not just spinning out a template. It's refining a product based on your feedback. It's separating the intent from the execution. That's the director model again. You provide the vision, the AI provides the technical labor.

06:14

Does vibe coding make actual coding obsolete then, or does it just change the barrier to entry? It lowers the barrier, letting you focus on the idea while the AI handles the syntax. So we've got logic, we've got code, but the sources also mention this model is multi -modal in a way we haven't really seen before, specifically when it comes to sound. This was a huge leak. A user named legit spotted an A -B test in Google AI Studio. Usually, if you want AI music, you go

06:42

to a separate tool like Suno or Udio. Right. You leave your chat, you go to the music app, paste your lyrics. There's a lot of friction there. Exactly. It's disjointed. But with Gemini 3 .5, the music generation is native. It doesn't give you sheet music or a link. It plays an audio file right there in the chat. That integration seems like a small UI change, but the workflow implications feel pretty massive. Oh, it's huge.

07:06

Because of context. Think about it. You spend 20 minutes working with the AI to write a funny script for a video. Right. It has the history. It knows the characters, the timing, the jokes. Then you just say, compose a background track for this. You don't have to re -explain the vibe to a separate music bot. It already knows the script is a comedy. It knows the pacing. It shares the memory. So the magic isn't just the music quality. It's the shared memory between the text

07:31

and the audio. Exactly. It's the seamless workflow. The AI understands the whole project, not just one slice of it. And moving from ears to eyes, we've seen AI generate images for a while now with MidJourney and Deli. But the reports on Gemini 3 .5 focus on something else, vector graphics, SVGs. Yeah, this is a favorite topic for developers. Generating a JPEG is kind of easy. It's just a grid of colored pixels. If you get a pixel

07:55

wrong, it's just a blurry spot. But an SVG that's code, it's a set of mathematical instructions on how to draw a line or a curve. So if the code is wrong, the image doesn't just look blurry. It breaks. It completely breaks. The line shoots off the page. The circle doesn't close. It's a much higher bar for accuracy. The leak showed users generating these cyberpunk robot icons. And they were good. Professional quality. But

08:23

what really stood out was the consistency. They generated a whole series, and the robots all had this neon blue and purple theme. They looked like they belonged to the same brand, even though the shapes were unique. There was a specific test mentioned here that I found kind of hilarious. The Pelican on a bicycle challenge. The Pelican on a bicycle. It's become a standard benchmark for vector intelligence. Why? Is there a big market for cycling birds? No, but it tests spatial

08:47

logic. Most models just failed this. They draw a bird and they draw a bike, but the bird is floating next to it or it's merged into the wheels. It's a mess. OK. So Simon Willison, a well -known researcher, he analyzed the output from Snow Bunny and it nailed it. The bird was actually sitting on the seat. Its legs were reaching for the pedals. I have to admit, the idea of spatial logic in a model that is essentially just predicting the next word, I still find that hard to wrap

09:13

my head around. It just feels counterintuitive. It really does. It's like it's deriving physics from language. So why is a bird on a bike a better test of intelligence than writing an essay? It proves the AI understands how objects relate in physical space, not just how words relate in sentences. OK, we've covered logic, websites, music and art, but we have to circle back to the cold open, to the Game Boy emulator, because this feels like the graduation project. This

09:40

is the system architecture test. We mentioned 3 ,000 lines of code. Can you break down why this is so different from, say, asking it to write a short Python script? Well, a short script is isolated, but an emulator. That is a complex ecosystem. You have the CPU emulation, which has to talk to the memory management. The memory has to talk to the input handling for the buttons. The input has to talk to the display pipeline. And they all have to agree on the rules. Exactly.

10:07

If the AI forgets a variable name it created on line 50 when it's writing line 2500, the whole thing crashes. This is what we call global consistency. So it's holding the entire blueprint in its head at once. Yes. And developers Jared Liu and Shitas Lua verified this. They ran real game ROMs on the code this model wrote. It required a few tiny manual fixes, but the architecture, the

10:30

heavy lifting, was done in one shot. If it can hold 3 ,000 lines of logic in its head at once, are we looking at the end of spaghetti code? We're looking at the ability to build full products, not just parts, by describing the system architecture. It's incredible. We're going to take a quick breather here, but when we come back, we're going to talk about what this means for you. If the AI is the director, what is your role? Stay with us. And we are back. We've looked at the capabilities

10:57

of Gemini 3 .5 Snow Bunny. It's smarter, it's multi -modal, and it can architect complex systems. It's a beast. So let's zoom out. What's the big idea here? If I'm a listener and I'm not a professional coder, why should I care that a computer can draw a pelican on a bike? Because the cost of creation is collapsing. We talked about that bakery website costing 38 cents. Right. The pricing structure for these models. About 50 cents per million. Tokens input. $3 for output. It means

11:24

experimentation is virtually free. You can afford to fail. You can afford to try 10 different versions of a website. And this shifts the user's role. We kept using that word director. That is the key philosophy. You are no longer the one placing every pixel or writing every single line of CSS. You are the one with the vision. Your job is to describe the vibe. Which is a skill in itself. It is. The sources highlight three rules for this new era. One. Be specific. Don't just say,

11:52

make a website. Say who it's for, why it exists. Two, describe the vibe. Use emotional words. Warm, modern, aggressive. The AI gets that now. And three, the 80 -20 rule. Let the AI do the heavy lifting, the 80%. You come in for that final 20%. to polish, to tweak, and to refine. So it's collaborative. You basically have a team of musicians, designers, and engineers inside your laptop. Just waiting for your orders. So

12:19

here's the call to action. You don't have to wait for some big Super Bowl commercial to try this. No. Go to Google AI Studio. It's open. Look for Gemini 3 Flash. If you happen to see the code DN9, you've got the snow bunny. But even if you don't, the tools are there. The gap between having an idea and making it real has never been smaller. Start playing. break things. Thanks for diving in with us today. We'll catch you on the next one.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript