GPT 5.2 didn't just process data this week, it actually conjectured a brand new formula for single minus gluon tree amplitudes. Which, if you aren't a theoretical physicist, is a problem that human mathematicians generally describe as incredibly long and painful. Yeah, we are talking about what is typically 1/4 page of really dense messy algebra. And the AI identified a hidden pattern that simplified all of that into a single, elegant product structure.
And that is massively significant because this goes way beyond a calculator doing arithmetic faster than a human right. We are looking at a system analyzing a mess of complexity and spotting A symmetry that human experts had missed for decades, right? It is the difference between simply computing a result and actually understanding the architecture of the physics problem itself. It proves we are building
genuine collaborators now. But while the science side is having this massive Eureka moment, the way these AI companies operate is shifting just as drastically. We are moving entirely away from the era of chat bots where you type a question and get an answer. We are entering the era of agents and Co workers that actually execute tasks. That distinction really matters. A chat bot talks, an agent acts, and right now the industry has split into two very distinct battles to make that happen.
There is one battle for raw speed and hardware infrastructure, and another for deep autonomous execution. Exactly. Which brings us to the core question we are going to be exploring today. In a week where AI is proving theorems and learning to drive, our computers, are the safety guardrails crumbling under the pressure to win massive government and enterprise contracts? It is the defining tension of
everything happening right now. And the autonomous side of that battle just got a huge injection of talent. As of this morning, February 25th, 2026, Anthropic has officially acquired Vercept. This is a major signal flare. Vercept was a Seattle based startup and they were founded by some real heavy hitters from the Allen Institute for AI. They had built this desktop agent called V. I've actually seen demos of V It is a creepy and impressive and
equal measure. It doesn't just read the underlying text code, it physically sees the screen elements. Right, the visual component is the main hurdle everyone has been trying to clear. Yeah, for a long time. If you wanted an AI system to use a computer, you had to hook it up to an API. That is essentially a special backdoor that lets the software talk directly to other software. But the real world is incredibly messy. Most software out there doesn't
have clean API's. Exactly. So VAI interacts directly with the graphical user interface. The actual pixels on the monitor. It looks at the screen just like you do. It identifies that a specific Gray rectangle is a submit button and a specific white box is a text field. Which sounds so simple to us. It does, but for a machine it is incredibly difficult. It knows the grounding problem. You have to perfectly map pixel coordinates to semantic actions.
And this acquisition perfectly explains the sudden jump in Anthropics performance numbers. We saw the new benchmarks for Claude Sonnet 4.6 this week, specifically looking at AUS World. Yes. For context, the Alice World benchmark is the standard test for measuring how well an AI can navigate an operating system. The test asks things like can it open a spreadsheet, copy a specific cell, open a web browser, paste that date into a form and hit enter. Real computer use.
Yes, and in late 2024, the best models in the world were scoring under 15% on that benchmark. Which is functionally useless for a user. You would spend more time correcting its mistakes than the time it theoretically saves you. A 15% success rate is basically a toy, but Sonnet 4.6, which is clearly integrating this new Recept tech, is now hitting 72.5%. That crosses a massive threshold at over 72%. You can actually walk away from
your desk and let the agent run. You can trust it with a multi step workflow. That is entirely why Antropic bought them. They aren't interested in keeping the Vibe brand alive. They are integrating the team of about 20 engineers to make Claude fully capable of direct computer use. They want an AI that writes the e-mail for you, opens your client, attaches the PDF and click send. So Anthropic is heavily betting on the smart autonomous agent that navigates a messy desktop
environment. Open AI, however, seems to be betting on something else entirely. This week they released GPT 5.3 codecs Spark. Spark is the operative word there. This new model is entirely obsessed with latency. And they are achieving this incredible speed through specialized hardware, right? This goes beyond standard software optimization. It is absolutely a hardware
play. Spark is running on Cerebra's Wafer Scale Engine 3. I really need you to visualize this for a second because I saw a photograph of this chip recently. Standard computer chips are small. They are roughly the size of a postage stamp. This Cerebra's chip is the size of a dinner plate. Yeah, it's the size of a giant pancake. It is an entire uncut silicon wafer, usually in standard chip manufacturing where you take a silicon wafer and cut it into hundreds of tiny individual
chips. So reverse just keeps the whole thing intact as one massive processor. Why does keeping it intact matter so much for speed? Because in a traditional setup, you have your memory sitting on one physical stick and the processor sitting on another. Data has to physically travel back and forth through wires to compute anything. Even if we get the speed of light, that travel takes time and consumes a lot of energy.
Right on the wafer scale engine, the memory and the compute cores are right next to each other on the exact same piece of silicon. The data transfer delay is effectively eliminated. It is nearly instant. And the practical result of that architecture is over 1000 tokens per second. Well over 1000. Wait, hold on, let's back up a second. To put that in perspective, a human being reads roughly 5 words a second, generating 1000
tokens. A second means a massive wall of text appears instantly on your screen. You can't even begin to read it as it generates. You absolutely cannot. But for writing code, which is exactly what Spark is designed for, it fundamentally changes the texture of the work. It is essentially 15 times faster at coding than their standard model. They are marketing this as conversational coding. Think about how you use a standard chat bot right now. You type a prompt. You wait maybe 10 seconds.
The code appears block by block. You read it. It feels very much like sending a letter and waiting for a reply in the mail. A turn based interaction. Right, with Spark the code generates so fast you can interrupt at mid thought. You can see it going down the wrong logical path in line three of a function and just stop it instantly. You correct it on the fly. So it feels more like jamming with a musician in a studio. That is the perfect analogy. It creates a tight real time
feedback loop. But there is a serious trade off here. You do not get that kind of speed for free. The model is less rigorous. Exactly. Spark is completely latency first. If you look at the SWE Bench Pro scores, which is the premier software engineering benchmark, Spark scores significantly lower than the full GPT 5.3 codecs model. And perhaps more worryingly, Open AI explicitly states it is not rated for high capability cybersecurity work. So we have a purposeful division
now. If you want the AI to discover the gluon tree amplitude formula, you wait patiently for the slow deep thinking model. If you want to hack together a website infrastructure in 10 minutes, you use Spark for fast execution. It is a deliberate split between deep reasoning and velocity. Speaking of deep reasoning and that physics discovery we mentioned at the start, we sort of glossed over where that actually took place. That discovery did not happen in a standard chat window on a
browser. It happened inside Prism. Open AI Prism. Yeah, this is their brand new workspace designed specifically for scientists, and it is fascinating because it attacks the actual workflow of doing science. It is a fully Latex native writing environment. Latex is the complex typesetting system that pretty much every physicist and mathematician uses to write their formal papers. Prism integrates the AI directly
into that source mode. Plenty of standard text editors have AI plugins these days, but Prism is completely different because of the context window and the deep integration. Prism reads the entire project structure simultaneously. It sees your equations, your citations, your raw empirical data files, and all of your figures.
So if I go in and tweak a variable in my core equation on page 2, Prism automatically knows to update the resulting graph in figure 3 on page 10. Yes, it validates if your empirical results actually match your theoretical model. It can check all your citations against the actual text of the reference papers to ensure you are quoting them correctly. It is doing the tedious grunt work of consistency that usually
drives researchers crazy. And I asked about the business model earlier because I noticed they are offering this entirely for free for personal accounts. That is the classic Silicon Valley play. They want to become the default infrastructure for scientific discovery. Right now, a scientist might use Overly for collaborative writing, Zotero to manage their citations, and Python for data analysis. Prism is designed to replace all
of those fragmented tools. They want the next Nobel Prize winning discovery to happen natively inside an open AI interface. Discovery is an incredibly valuable commodity. If you own the tool where the science happens, you get to see where human knowledge is going
before anyone else does. Which brings us directly to the money, because whether it is via navigating a messy desktop or Prism writing a theoretical physics paper, this technology has to be monetized and simply selling an API key to developers isn't cutting it for these massive valuations anymore. We are seeing the rise of the true AI Co worker and the massive consulting army is required to install them. Open AI refers to these as frontier alliances. They have realized a harsh truth.
You cannot just hand a Fortune 500 company access to a super intelligent model and expect them to magically become more productive. The companies literally do not know how to use it. They have no idea how to wire it into their legacy systems, so Open AI has officially partnered with McKenzie, BCG, Accenture, and Cap Gemini. These are the massive consulting firms you traditionally hire when you want to fire half your staff and completely restructure
the remaining half. Or, phrase more charitably, they're the people you hire when you need to redesign your organization's central nerve, the system. These consulting firms are wiring the frontier platform directly into massive corporate data warehouses and customer relationship management systems. They are actively redesigning organizational workflows to accommodate autonomous agents. It is essentially organizational surgery. And it is very expensive surgery.
Entropic is playing this enterprise game just as hard right now. They are currently hitting a $14 billion revenue run rate, and to fuel that massive infrastructure and enterprise expansion, they just closed their Series G funding round. The number on that round was staggering, $30 billion. $30 billion in cash. That completely values Anthropic at $380 billion. That is an immense, almost incomprehensible amount of
capital. But here is where the tension we talked about earlier really surfaces. We have $30 billion funding rounds, we have heavy military interest, and we have these massive consulting armies deploying agents. What happens to the original safety mission? Enthropic was founded specifically by people leaving Open AI to be the definitive safety company. That core mission is severely colliding with reality right now.
Just yesterday, on February 24th, Anthropic released version 3 Point O of their Responsible Scaling Policy, or RSPI. Read through that document last night. There is a very specific, very controversial change in the language. In the previous versions of the RSP, Anthropic had a hard written commitment. They stated they would unilaterally pause all development if they couldn't meet certain strict safety
measures. If a model was deemed too dangerous or autonomous to contain, they would stop training. Period. And in version 3 point O that is gone. That unilateral commitment is entirely gone. They've replaced it with a clear bifurcation. They now differentiate between industry recommendations and
company plans. So they're publicly recommending that the entire AI industry should pause if things get dangerous, but they are no longer promising that they will pause if their competitors keep going. They are framing it as a classic collective action problem. Their argument, which is highly rational from a pure business perspective, is that if Anthropic pauses to build perfect safety guards, but open AI or a massive state backed lab in China keeps scaling and throffic just loses market
share. They lose their influence over the industry. Exactly. They effectively cede the future to actors who might be far less concerned with safety than they are. It is the ultimate prisoner's dilemma. If I play nice and you decide to play rough, I lose everything, so I am forced to keep playing rough. But there's another massive pressure point here that we absolutely have to discuss the Pentagon.
The US Department of Defense has been getting incredibly loud about AI integration over the last year. They have. The Pentagon explicitly communicated that a strict refusal by an AI company to work on national security tasks was viewed as a supply chain risk. Supply chain risk, that is highly specific bureaucrat speak for we are not going to buy anything from you. Exactly. It is all about reliability.
If the United States government is going to heavily integrate your AI agents into their defense system, they need absolute certainty that you aren't going to suddenly turn the servers off because of an internal moral qualm about how the tech is being used. So this policy shifted. Anthropic isn't just about preserving enterprise market share, it is about making themselves a viable long term
government contractor. This new policy is a calculated pivot to survive in a world where the US government is suddenly the biggest, most important customer in the room. The entire concept of safety is being redefined in real time. It is no longer about slowing down to ensure alignment, it is about staying in the absolute lead so you have the power to
set the rules. And while they are fiercely fighting for those lucrative government contracts, they're also simultaneously writing massive checks to make their foundational legal problems disappear. We really have to talk about the data that powers all of this. The cost of content, we finally have a concrete price tag on it. The Barts V Anthropic Settlement. $1.5 billion. That is a historic number for a copyright lawsuit. It completely sets the precedent for the entire industry.
For a very long time, the standard defense from these AI labs was fair use. The argument was always that an AI learns exactly like a human learns. It reads publicly available information and internalizes the concepts. But the settlement strongly suggests that the actual cost of doing business moving forward involves paying out billion dollar class action settlements to the authors and creators. And the details of this specific
case are crucial. This wasn't just a broad philosophical debate about whether an AI reading a purchased book is fair use. It was specifically focused on the book's three data set. Right, the shadow libraries. This was a massive data set that was largely composed of directly pirated books, so the legal battle shifted away from a high level debate about copyright theory to a very specific,
undeniable accusation. You downloaded this material from a known pirate site and your engineers knew exactly what they were doing. A $1.5 billion penalty absolutely stings, but for a company that just raised $30 billion in cash a few days ago, it is highly affordable. It is literally just a line item on a spreadsheet for them. And that is the dark irony of this entire settlement. A $1.5 billion penalty instantly destroys any new startup.
It completely bankrupts the university research lab trying to build an open source model. But for Anthropic or Open AI or Google, it is just the toll they have to pay to get on the highway. It effectively entrenches the giants. They're the only entities on Earth who can actually afford to retroactively pay for the data they already scraped from the Internet. It completely clears the competitive field.
So tying all of this together, we have AI models right now that are capable of genuine world changing brilliance. They are discovering hidden physics formulas, you're coding at the speed of thought, and we have autonomous agents that can finally navigate the messy pixel based reality of our everyday computers. But to actually get those agents out of the lab and into the real world, these companies are making a very specific set of compromises.
They're wiring themselves deep into the corporate structure through massive consulting firms. They're softening their founding safety pledges to seamlessly aligned with military interests. And they are paying billion dollar fines to retroactively legalize the aggressive data gathering that made the model smart in the first place. We spent years intensely debating whether artificial intelligence would be safe or whether it would be perfectly
aligned with human values. But looking at the reality of 2026, safety isn't a philosophical stance anymore. It is simply a clause in a massive enterprise Oregon government contract. And it is being defined entirely by whoever is signing the biggest check, whether that happens to be the Pentagon or the venture capitalists. If you're not subscribed yet, take a second and hit follow on whatever app you're using. It helps us keep making this. We appreciate you being here.
