72 % beat. That's the number that kept me up last night. If you have a digital vault, a crypto smart contract, or really any secure digital asset, as of this week, there's a new AI agent that can break into it and drain it of all funds 72 % of the time. And to be clear, that isn't a theoretical figure from a sci -fi novel. That is the actual benchmark result from the latest
coding agent released just days ago. We have crossed a threshold where the digital offense is statistically superior to the digital defense. Welcome back to The Deep Dive. It is Friday, February 20th, 2026. I'm your host, and I'm here with our resident expert to make sense of a week that... honestly feels like a year. It really does. It's great to be here. You know, usually we talk about AI explicitly building things, generating art, writing code, solving protein
folding. But this week, the narrative shifted. It's about AI taking things. And it's about the massive economic machinery being built to support that power. Right. So let's map out where we're going today for you listening. We have a heavy stack of reports to get through. First, Google is hitting back hard in the model wars. Gemini 3 .1 Pro is here. And the headline isn't speed. It's reasoning. We need to unpack what that actually means. Exactly. Then we're going to follow the
money. And I mean serious money. OpenAI is looking at an $850 billion valuation. We'll look at why that number is so high and the controversial shift toward advertising that's coming with it. And finally, we are going to do a deep dive into that cold open statistic. We need to talk about why offense is currently beating defense and cybersecurity. And what happens when an AI can execute a flash loan attack all on its own? That is the part that's going to scare some people,
but it's crucial to understand. Okay, let's start with the tools. It feels like we just finished analyzing Gemini 3 .0. That was what, November? Yeah, three months ago. Three months. And now we have Gemini 3 .1 Pro. Usually in software, a .1 update is a snooze. It's bug fixes. Maybe a new font. But looking at these specs, this seems entirely different. It's completely different. Google has rolled this out with a very specific
focus, reasoning. And I think we need to pause on that word because it gets horribly abused in marketing. Please do. Because to me, reasoning implies consciousness, or at least human -like thought. Is Google claiming their chatbot is actually thinking? Not thinking in the biological sense, but think of it this way. Previous models, the ones we used in 2024 or 2025, were largely system one thinkers. If you know Daniel Kahneman's
work, it's fast, instinctive, automatic. You ask for a poem, it predicts the next word immediately. It is reflexive. Okay, like a reflex. Exactly. Gemini 3 .1 is moving towards System 2. It's deliberate. It's slow. It's logical. When you ask it a complex question, it doesn't just spit out the statistically probable next word. It holds the problem in its mind, breaks it down into steps, critiques its own plan, and then
executes. It is spending compute on inference time, literally taking time to think before it speaks. So it's the difference between a student blurting out an answer in class versus a student sitting down to work through a... a math -proof step -by -step. Perfect analogy. And the results are stark. According to the preview, Gemini 3 .1 Pro has doubled its reasoning performance compared to 3 .0. Doubled in three months. Doubled. I am always skeptical of internal benchmarks.
Companies love to grade their own homework. Did they test this against anything we'd recognize? They did. They used a test called Humanities Last Exam. Okay, that is a terrifying name for a test. Is that real? It is. It was designed to be the absolute ceiling of difficulty. Questions that require multidisciplinary knowledge to solve. It's the stuff that completely stumped the models from 2025. Gemini 3 .1 outperformed its predecessor
there significantly. But the one that really matters to me is the APEX agent's leaderboard. I'm not familiar with APEX. APEX measures real professional past performance. It's not right a haiku. It's messy. It's take this scattered data set of CSV files, analyze the trends, cross -reference it with this PDF report on climate change, and give me a strategic insight. So actual knowledge work. Actual work. Gemini 3 .1 hit the top spot. It is designed for agentic workflows.
It isn't just a chatbot anymore. It's a research partner. I see. This explains the integration they announced with Notebook LM. I've been using Notebook LM for research, basically dumping PDFs in and chatting with them. But you're saying 3 .1 Pro changes the nature of that interaction. Completely. Notebook LM was a retrieval tool. Find me the quote about interest rates. With
3 .1 Pro, it becomes an analyst. You can say, look the interest rates in document A, compare them to the housing trends in document B, and predict the impact on this third variable. Whoa, imagine connecting scattered data into full picture insights like that at scale. It's stacking Lego blocks of data into a skyscraper. That is a significant shift. But I have to ask, if the cycle is down to three months, are we even learning the old tools before the new ones replace them? Barely.
We're really just adapting to the speed of the upgrade, not the tool itself. We are becoming experts in adaptation rather than experts in software. Barely adapting to the upgrade speed, not the tool itself. Got it. That's a heavy thought. It might be the only skill that matters in 2026. All right, let's pivot. Because building these System 2 thinkers, these reasoning engines, is expensive. The compute costs are astronomical. Massive. The energy bills alone are enough to
bankrupt small nations. Which brings us to the business side. We're seeing reports that OpenAI is finalizing a 100 % billion dollar raise. Yes. Which puts their valuation at over 850 billion dollars. It is staggering. To put that in perspective, that's approaching the territory of companies like Meta or Tesla. But OpenAI is a fraction of the age. It's nearly a trillion dollar valuation for a company that didn't exist a decade ago. And the backers are the usual titans. Amazon,
SoftBank, Nvidia, Microsoft. It feels like the entire tech economy is consolidating around this one entity. It's becoming the too big to fail of the AI era. It is, but here's the tension. To justify an $850 billion valuation, you need massive revenue. You can't just have $20 subscriptions. You need something way more pervasive. And that brings us to the A word. Ads. Ads. OpenAI is beginning to test them. They are. And this is where the market is starting to fracture philosophically.
You have open AI needing to monetize that massive valuation, testing ads and chat GPT. But then look at the competitors. Perplexity, which has been very aggressive, is actually stepping back from in -chat ads. Really? I thought they were leaning into that model. They were, but they issued a warning this week saying that ads can erode user trust. They are pivoting to preserve the integrity of the answer. And then you have Anthropic, who has publicly committed to staying
completely ad -free. It's interesting. It's almost like the business model is becoming part of the product identity, like organic food versus processed food. It absolutely is. Think about it. If you're using an AI for serious research, medical advice, legal strategy, complex coding, do you want an answer that is interesting? influenced by the highest bidder? Or do you want the raw truth? I want the truth. Oh. But I also know the history of the internet. Free with ads usually wins against
paid and private. That's the trillion dollar gamble. OpenAI is betting that their model is so smart you'll tolerate the commercial. Anthropic is betting that you'll pay a premium for silence and neutrality. Speaking of commercial, OpenAI isn't just looking at ads. They're looking at Hollywood. They poached Charles Porch from Instagram. Yes, the Hollywood connector. And after that $1 billion Disney deal, he's going on a listening tour to win over creators. Which is smart because
creators are skeptical. Actually, skeptical is polite. They are furious. They see AI as the thing that scrapes their work to replace them. OpenAI knows they need to bridge that gap if they want to be the platform for media as well as text. They need to convince Hollywood that they come in peace. It's a battle for hearts and minds. So let me put you on the spot. With perplexity pulling back on ads and OpenAI leaning in, who actually wins the user's trust in the
long run? The one that gives the best answer without selling me a toaster? In the short term, OpenAI wins on scale. But for the power user economy, I think anthropic or perplexity has the edge on trust. So the winner is the one giving the best answer without selling a toaster. I don't need my AI selling me appliances while I'm trying to debug code. Not yet, anyway. Okay, let's move to the story that launched the deep dive. The heist. The heist. We have discussed
reasoning. We have discussed money. Now we see what happens when reasoning agents decide to take the money. This comes from a new study involving OpenAI's new coding agent, GPT -5 .3 Codex. 5 .3. They are iterating fast. Very fast. The study used a benchmark called EVM Bench built with the crypto firm Paradigm. They set up vulnerable smart contracts, digital vaults, essentially, and unleashed the AI to see if it could break in. And this is where that 72 % come from. Correct.
GPT -5 .3 codex exploited the vulnerabilities 72 % of the time. Now for context, compare that to just six months ago. GPT -5 scored 31 .9 % on the exact same tasks. That is more than doubling in capability in half a year. That curve is vertical. It is exponential. And it's not just finding a bug. A GPT -5 .2 agent was able to execute a full flash loan attack on its own. Can we pause there? Because flash loan attack is one of those crypto terms that sounds exciting but makes no
sense to most people. Break that down for us. Okay. Imagine a bank vault. You walk in and say, I want to borrow $100 million, but I have no collateral. In the real world, the bank laughs at you and calls security. Right. In crypto, the bank says, sure, take the hundred million, but you have to pay it back within the exact same transaction. It's like freezing time. You borrow the money, you run around the market, you buy and sell assets to manipulate the price.
You make a profit and you return the original loan all before the clock starts ticking again. So it all happens in a single split second block of code. Exactly. If you fail to pay it back, the whole thing reverses like it never happened. But if you succeed, you keep the profit. It requires incredibly. complex coding and timing to pull off. And the AI did this autonomously. Autonomously. No human steering the wheel. That is unnerving. So the offense is elite. What about the defense?
Surely if the AI can break it, the AI can fix it. That's the problem. The defense is lagging. The same study showed that bug detection, the AI trying to fix the code, tops out around 46%. So it's almost twice as good at breaking in as it is at fixing the lock. Why? It comes down to the nature of the task. Breaking things is often about finding one weak point. Fixing things requires understanding the entire system so you don't break something else. Patch success sits
at only 39%. 39%. Beat. That's barely better than guessing. Unless, and this is the key, unless you give the model a small hint. A hint. If you give the AI a nudge, literally telling it roughly where the problem is, patch success jumps to 94%. That is a huge gap. It tells us that the bottleneck isn't intelligence. The AI is smart enough to fix it. The bottleneck is search. It doesn't know where to look for the solution without
guidance. It's like searching for a needle in a haystack versus being told the needle is in the top left corner. That feels incredibly human. I could fix the leak if you just tell me. me which pipe is bursting. Exactly. But in cybersecurity, you don't get hints from the attackers. You know, I have to admit, I still wrestle with prompt drift myself. I am just trying to get the model to write a coherent email sometimes, and it goes
completely off the rails. Yeah. The idea of trusting it to patch a smart contract holding millions of dollars, that feels like a massive leap of faith. It is. But OpenAI sees this gap. They don't want their tools to be super weapons for hackers. They're launching Aardvark, which is their AI security research agent, and they've put up a $10 million fund for cybersecurity researchers. They know that if AI can break systems this fast, defenders need AI just as powerful. Because in
crypto, there is no undo button. Once it's gone, it's gone. So if offense is twice as good as defense right now, is any digital vault actually safe? Not really, unless you have an AI guard dog that's smarter than the AI burglar. And right now, the burglars are winning. So no vaults are safe unless your AI guard dog outsmarts the burglar.
We are going to take a quick break. When we come back, we're going to step away from the destruction and look at the builders, including a legend in the field who is building 3D worlds from scratch. Stay with us. Monroe's sponsor, Reed Placeholder. Welcome back to the Deep Dive. We've talked about the big models and the big money, but I want to zoom in on the builders for a minute. The people making the utilities we use every day.
The tool belt. Exactly. And there's one name that popped up this week that everyone should know. Fae Fae Lit. The godmother of AI. She is responsible for ImageNet, which basically kickstarted the deep learning revolution over a decade ago. She has a new company, World Labs, and they just raised $1 billion. Which is huge for a startup, but it's the product. Morble, that's interesting. It turns text into editable 3D worlds. 3D worlds. So we aren't just generating images anymore.
No. This is about spatial intelligence. We are entering design pipelines. Imagine typing a description of a medieval city, the cobblestones, the lighting, the physics of the fog, and having a fully editable 3D model generated that you can drop straight into a video game or a movie. That feels like the next frontier. We conquered text. We conquered static images. Now we are conquering space. Spatial computing needs content. Think about the metaverse
or Apple's Vision Pro. It's empty right now because building 3D assets takes humans... Hundreds of hours. If AI can do it in seconds, that changes the entertainment industry forever. It lowers the barrier to entry to near zero. Speaking of which, there were a few other rapid -fire tools that caught my eye this week. Did you see origami .chat? I did. It is a data enrichment tool. Connects over 100 sources. You can use it to find customers, enrich a CSV file. It's highly practical. It's
unsexy, but it saves hours of grunt work. And chloe .ai. Open claw in the cloud. Zero setup. It's basically an autonomous assistant you can spin up in five minutes to browse the web and do tedious tasks for you. It seems like we are moving from chat with a bot to assign a bot a job. That is the defining theme of 2026. Don't talk to me, do it for me, even for content creation. There's Reloop, avatars, cloned voices, video editing all in one spot. You just chat with it
to make a full video. It's moving so fast. I saw that a Wharton professor released the eighth edition of his AI guide this week. The eighth edition. It just highlights that textbooks are obsolete the absolute moment they're printed. Absolutely. If you're reading a printed book on how to use AI, you're studying history. You are looking at fossils. So, faithfully focusing on 3D worlds, is that the next frontier after
text and video? Absolutely. Spatial computing needs content, and only AI can build it fast enough. Spatial computing needs content, and only AI can build it fast. I love that. Let's try to pull this all together. We've covered a lot of ground today. What is the big idea here for the listener? I think if we weave these threads together, we see a clear picture of 2026. First is capability. Gemini 3 .1 proves that models are learning to reason, not just predict. They
are becoming methodical thinkers. Second is risk. That reasoning power allows for autonomous offense. The tools are becoming weaponized much faster than they are being secured. And third? The economy. The market values this power at nearly a trillion dollars, but the business model is still totally fractured. We're seeing a split between the ads model where you are the product and the subscription model where you pay for privacy and neutrality.
It's a pivotal moment. You know, I keep thinking about that gap, the offense versus defense gap in cogeneration. It is the most critical metric right now. It isn't just a crypto problem, is it? No. It's everything. Banking, healthcare, power grids. It's a preview of all cybersecurity. Two sec silence. If the AI can pick the locks 72 % of the time, it doesn't matter how strong the door is. We need better locks and we need them fast. We do. And we likely need AI to build
them. That is the reality of February 2026. It is a world of incredible speed, high stakes, and brilliant machines that are learning to think. Thank you for diving in with us today. Thanks for having me. And to you listening. Stay curious, stay vigilant, and maybe double -check your digital locks today. We will catch you in the next deep dive. OGAU Music.
