So imagine this. You're sitting in the back of a robotaxi. It's a Tuesday, maybe 2 p .m. You're just, you know, answering emails, watching the suburbs roll by. It's totally mundane. Right. And then suddenly the sky turns this bruised purple. A tornado touches down like three blocks ahead. Oh, wow. Or maybe you turn a corner in Phoenix and there's a Texas Longhorn steer just standing dead center in the lane. Or the neighborhood is quite literally on fire. That is a terrifying
Tuesday. It is. But here's the question that I think keeps engineers up at night. How do you train a computer for that? Yeah. You can't exactly drive a million miles just waiting for a steer to wander onto the highway. You don't wait for the disaster. Hallucinate it. Welcome to the era where AI dreams up nightmares just to keep us safe. It's poetic, sure. But, you know. Practically, it's just data efficiency. You can't wait 100 years for it steered across the road. You have
to force the error. Force the error. I like it. Welcome to the Deep Dive. It is Monday, February 9th, 2026. We're sitting right at this intersection of simulation and physical reality today. I've got a stack of notes here that paint a pretty wild picture of where we are. And looking at the sources we have today, it feels like we've hit a tipping point. We're seeing what people are calling the super cycle of 2026. Yeah. Really
taking shape. And numbers are staggering. We'll get to that 800 million users, which is just it's mind blowing. But I want to stick with that image of the Robotaxi and the Texas Longhorn for a second, because this comes directly from Waymo. This is the new Waymo world model. And the engine underneath it is what really matters. They're using Google's Genie 3. Genie 3. Okay, let's unpack this. Because usually when we talk about simulations, my mind goes to, like, video
games. Grand Theft Auto, but for robots. Is that what this is? No, and that's a crucial distinction. In a video game engine, like Unreal or Unity, a human programmer has defined all the physics. Right. They wrote the code that says, if car hits wall, then crumple metal. It's rigid. It's just a render. And Genie 3. Genie 3 is generative. It's not rendering the world based on a set of rules. It's dreaming it up based on memory. It has watched millions of hours of driving footage.
So when it creates a scenario, it's predicting the next pixel, the same way ChatGPT predicts the next word. So it's hallucinating the physics. In a way, yes. But here's the kicker. It's not just generating video. It's generating the sensor data. What do you mean? It simulates the LiDAR returns, the radar waves, the camera inputs, all of it. Wait, so... Does the car's computer even know it's in a simulation? It has no idea.
To the perception stack, the brain of the car, a simulated photon and a real photon are mathematically identical. That is slightly unsettling. It solves the biggest problem in autonomous driving, the long tail. You can drive a billion miles in sunny California and never see a snowstorm or a flood. Yeah, you just won't encounter it. But with Genie 3, engineers can just type in a command. They use natural language to say add a flood or make it nighttime or insert a literal elephant. A
literal elephant. That is actually in the notes. It is. And it matters because of this safe danger concept. You can test what if decisions. What if the car swerves left? What if it breaks hard? You can run that scenario 10 ,000 times in the cloud without ever risking a passenger or, you know, a pedestrian. It's interesting because they call it language control. So an engineer is basically being a god of this little virtual world, just speaking disasters into existence.
Let there be a tornado. Exactly. And controlling the scene layout, the traffic flow, everything. Waymo's bet here is really specific. If you can simulate everything. you can make real -world failure impossible, or at least, you know, statistically negligible. It creates this real moment of wonder for me. Just thinking about the computational power required to simulate how a tornado affects a LIDAR sensor, that's not just pixels, that's light physics being predicted by a neural net.
It's massive. And they're even building a Gemini -based voice assistant for inside the car, so the interaction inside is evolving too. So here's where it gets really interesting for me. If the simulation is indistinguishable from reality for the car, does the real world even matter for training anymore? Not really. If the data is perfect, the training is valid. The car doesn't care if the photon came from the sun or from a server. Okay, so that's simulating the world.
We're building the matrix for cars. But we also have this breakthrough from Harvard and Stanford about moving through the world. And this one seems to bridge the gap between chatbots and robots. This is the OAT system. OAT. The name isn't as important as the mechanism. This is about predicting robot actions like their text. Right, because we know transformers, the tech behind GPT and clot, are really good at predicting the next word in the sentence. The cat sat on
the... and the AI knows math. Exactly. But robots don't move in words. They move in continuous kind of messy physical arcs. A robot arm reaching for a cup isn't a discrete word. It's a flow of analog data. Voltage, torque, velocity. All of that. So how do you turn a backflip into a sentence? I have no idea. You have to digitize it. Think of it like music. Sound is a continuous wave, right? But to put it on a CD or an MP3, you have to chop it up into digital bits. You
take samples. Okay, I follow that. OAT does that for movement. The researchers built an encoder that takes that continuous motion, the robot arm swinging, and splits it into chunks. Then it uses a process called finite scalar quantization. That is a mouthful. It is. But just think of it as creating a vocabulary. It forces the infinite complexity of a robot arm's arc into a fixed
menu of specific movement words or tokens. So instead of a smooth wave, it becomes a series of steps, like token A, then token B, then token C. Exactly. And because these tokens flow left to right, just like a sentence in English, standard large language models can process them. So you could feed a movement sequence into GPT -5 or CLAWD and it can autocomplete the movement. Yes. The breakthrough is that it turns physical dexterity
into a language problem. The first few tokens might describe the general direction, move arm up, and the later tokens fill in the fine motor details, rotate wrist 10 degrees. That feels huge. Yeah. It means we aren't reinventing the wheel for robotics. We're just piggybacking on the massive intelligence we already built for chatbots. And it is beating previous methods across 20 different tasks. It's vastly more efficient.
The implication is wild. If a robot's brain is just a large language model, then the robot effectively knows everything the Internet knows. So does this mean we can eventually just talk a robot into learning a backflip? Essentially, yes. By treating the backflip as a sentence of movement tokens, you're just prompting it to complete the thought, but physically. Complete the thought physically. I like that. It connects perfectly to the sheer scale of what is happening right
now. We mentioned the date February 2026, and the industry is calling this the super cycle. The numbers coming out of OpenAI this week are frankly absurd. 800 million weekly users. 800 million. That is nearly the population of the entire generic Western Hemisphere interacting with these models every single week. I remember when getting to 100 million was the fastest in history. Now they're reporting growth is back to over 10 % monthly. And it's not just chat.
Codex usage. The coding AI surged. 50 % just after the GPT 5 .3 launch. This tells me that this isn't a novelty anymore. This is infrastructure. Speaking of infrastructure, A16Z, that's Andreas and Horowitz, just dropped $1 .7 billion into AI infrastructure. Wow. And that's out of a $15 billion fund. They are literally rebuilding the platforms from the ground up. They're seeing 2026 as the year. The experimental phase ends
and the utility phase begins. This is about chips, data centers, you know, the pipes that run the Internet. But with utility comes commercialization. And I have to admit, I had a bit of a sigh moment reading the news this morning. The ads. The ads. Ads are officially live in ChatGPT. Adobe is one of the launch partners. They're testing ads for Photoshop and Acrobat right there in the chat interface. It was inevitable, right? I know, I know. But I have to admit, I just hated seeing
it. The pristine white box. It felt like a sanctuary, you know? Just pure intelligence. No noise. Now, if I ask for a summary or a photo editing tip, I might get a nudge to buy Firefly. It feels like the magic just got a corporate logo slapped on it. I get the nostalgia for the research preview era. Yeah. But look at the scale. Right. You can't serve 800 million users on compute -heavy models for free. Forever. It's the tradeoff. We're moving from the wild west of discovery
to the utility grade of electricity. The electric company sends you a bill. That's a fair point. It is now utility grade infrastructure. It's like electricity or the telephone network. It's just there. It's just there. Right. And because it's just there, the way we use it is changing. So we have the 800 million users and we have the massive silicon build out. The question is, what are they actually doing with it? Because this week the tools shifted. We move from look
at this funny video to this is. It's how you run a Fortune 500 company. The empowered AI tool sector is really maturing. But the video generation space is what catches your eye first. ByteDance released Seedens 2 .0. The demos for that are wild. I saw one with 2 .6 million views. It's a massive leap. And then you have Kling 3 .0. Kling 3 .0. This is the one claiming scene level control. What does that actually mean? It means we're moving past the slot machine era of AI
video. type of prompt pull the lever and just hope the video isn't a nightmare right scene level means multi -shot control longer takes yeah it understands continuity you are directing not just prompting but the biggest shift for me and this ties back to that utility idea is open ai frontier ah this is the enterprise play and this is where the money is explain the difference here Because we have team accounts, we have enterprise
accounts, what is Frontier? So Frontier isn't just about giving your employees access to a chatbot. It's a platform for managing AI agents. Right, not chatbots that answer questions, but agents that perform workflows. Frontier provides shared context, onboarding for these digital workers, and oversight. It treats the AI as a workforce, not a software tool. See, that's the shift. We're moving from generating content, write me a poem, make me a video, to orchestrating
labor. Exactly. The human role is shifting. You aren't playing the violin anymore. You're waving the baton. You are the conductor. The conductor. That is both exciting and terrifying. It feels like the friction is disappearing, whether it's creating a video scene or automating a business process. The barrier to entry is just melting away. And that brings its own chaos. But it's the reality of 2026. I feel like we need to take a breath and look at the big picture here. We've
covered a lot of ground. We have the God mode simulation at Waymo. We have the robot language at Harvard. And we have the super cycle of adoption. That's a lot. If we zoom out, what's the thread connecting these for you? I think it's the synthetic tipping point. Look at the three pillars. First, simulation. We're replacing real -world testing with synthetic experiences. That's Waymo. Okay. Second. Translation, we're converting physical movement into synthetic language tokens, which
is OAT. And third, scale. We have 800 million people living and working inside this synthetic infrastructure. It really does feel like we're waking up in a different world every Monday. The lines between the physical and the digital are just gone. They're blurring, certainly. I want to leave everyone with a thought to mull over. We talked about Waymo simulating tornadoes. We talked about OAT turning backflips into words.
Right. If an AI can simulate a tornado perfectly to the point where the sensors can't tell the difference and another AI can predict robot movements like words. How long until the simulation is the training ground for all physical labor? Like, do we ever need to practice anything in the real world again? That's the billion -dollar question. If the simulation is perfect, the real world is just the final exam. Something to think about while you're looking out the window of your robo
-taxi. Thanks for diving in with us. Keep exploring. We'll catch you in the next deep dive.
