Hello and welcome to a Weekend News episode of the Leveraging AI Podcast, the podcast that shares practical, ethical ways to leverage AI to improve efficiency, grow your business, and advance. Your career. This is Isar Metis, your host, and we had another explosive week of AI news. It seems to be getting more extreme every week.
The three main topics we're going to talk about today, and they're all tied very closely together, are the main amazing releases from both Anthropic and CgatGPT, and the outcome that was given the name SA Apocalypse with a SaaS apocalypse kind of outcome, where major software companies lost hundreds of billions of dollars from their market cap because of these releases.
And then we have lots of other smaller items to talk about, and they are dozens of other articles that are not going to make it into today's show. As always, you can get access to all the articles, including the ones on the podcast and the ones that do not make it to the podcast on our newsletter. And you can sign up for the newsletter with a simple link that is in the show notes of this show. But since there's a lot to talk about, let's get started.
So I'll start with the outcome that, as I mentioned, got the name SaaS Apocalypse, and the reality is that following the announcement from both Anthropic and ChatGPT this week, the large SaaS companies has lost over $300 billion in market value in just 48 hours. If you want the bigger, broader picture in this past month, Adobe, Microsoft, Salesforce, SAP, ServiceNow, and Oracle have shed more than $730 billion in combined market value.
And Microsoft alone has lost more than 450 billion out of those 730 billion. If you are looking from a sector wise percentage, that the iShares expanded tech software sector, ETF has fallen 28% from its recent high and roughly 20% from the beginning of 2026. The s and p North American Software Index posted a 15% decline in January, which is the worst decline it has since October of 2008, where we had the financial crisis. Now the analysts are on the fence on how this is gonna end up.
and you have both bear and bull cases, either for or against how this is gonna play out. as an example from the bear side, NYU data science professor v dar told a A BC news that rudimentary legal services represent low hanging fruits for AI disruption and specifically said you go to a lawyer and they charge you thousands of dollars for boilerplate stuff. you also added that reviewing standard contracts isn't a big deal for ai, which I a hundred percent agree.
This is connected to the fact that Anthropic has released a specific plugin capability skill for Claude to be able to perform legal work, at least at a decent level. They're actually saying it is in support of lawyers, but the reality is it will replace lawyers for many, many companies, definitely smaller businesses, and definitely in cases which have lower risk.
But there's also opinions on the other side such as dan Ives, who is a managing director at Wedbush said it's strong model and it's extremely impressive, but I do not see enterprises moving away from traditional vendors because of this, and I agree with him as well. And I will explain what I mean when we finish this segment. So let's look at specific one example, which is the plugin that Anthropic has released.
Again, it's a legal specific plugin and it knows how to do NDA analysis and compliance workflows and legal briefings and templated responses. Now again, anthropic is saying that this is supposed to assist legal workflows and not replace real legal advice, but the reality is this is exactly what it's going to do. In many, many cases, I've used both anthropic and chart GPT for multiple legal reviews and drafting legal documents where I think the risk is relatively low.
Now, is that the smartest thing in the world or not? I'm not sure, but I'm sure I'm not the only person in the world who's doing this right now. and even if you send the final version that you got to, to your lawyer to review to get the final thing, it means the lawyer is going to work on it for one or two hours instead of six to 16 of creating, drafting, or reviewing the document. And hence, that means that the law firm is gonna get significantly less work.
Now, to be fair, this is obviously not a new concept. I've been talking about AI replacing software in definitely smaller applications, more or less since I launched this podcast almost three years ago. but even Microsoft, CEO, Satya Nadela warned a year ago that AI agent post a serious risk to SaaS companies.
And he said, and I'm quoting, I think the notion that business applications exist, that's probably where they will all collapse in the agent era because if you think about it, they are essentially crude databases with a bunch of business logic.
So if you think about the capability of an agent to go and connect to data, analyze the data, and provide outputs, it provides a better solution in many cases than exactly what Nadella is describing, which is connecting to a database and providing you a standard response.
Because the agent doesn't have to provide you a standard business logic, it can combine that standard business logic with reasoning and analysis that the regular software just cannot do, and hence it provides more value and in many cases, cheaper and faster than using the old system. Now to some additional opinions, M-S-P-C-E-O, Jason Slagel had a very serious pushback on this, and he said someone vibe codes, some AI swap to do a business function. How do they maintain it?
I see things integrating into HubSpot or Salesforce but not replacing it, and he's seeing the stock drop of this week as a correction from a huge overvaluation that we've seen for a number of years.
So the main point that he's making is that maintaining software is also an effort that maybe is not taken into account when you are developing the software itself, because things keep on changing whether the underlying technology, the operating systems, the APIs, the things that needs to connect to and so on. But we're gonna address that later on in this episode on why I think he's wrong. But here is the bigger picture of what I think about it.
I think there is a huge difference between blue cheap SaaS like SAP and Oracle and Microsoft infrastructure and so on, versus tens of thousands or maybe even millions of smaller business applications that are being used today. I do not see large scale companies change their infrastructure level tech stack in the near future, just because the risk is so high. However.
Every single day I see examples of small businesses, many of them with no technical skills, vibe, coding, applications that are tailored to their needs and that eliminate the need for them to purchase software solutions for that particular thing. Just yesterday in my Friday, AI Hangouts, which is a community that meets every single Friday at 1:00 PM Eastern. You are all welcome to join if you're interested. If you wanna do it, there's a link in the show notes for you to do it.
But one of the participants demoed Ham. He built an entire check-in mechanism for his wife's yoga studio that fully integrates into their existing platform, and he built it in a few hours with zero experience in how to code applications. Now there are multiple companies out there who provide such software solutions for checking control and tracking who shows up in different places.
So this is what I think about the small applications, but for the big applications, the big infrastructure stack, like SAP and Microsoft and so on, I think the risk is not somebody vibe coding a new SAP, but rather the fact that I can use one agent to do the work of 50 employees, which means they need 50 x less licenses to do the same kind of work, and later on it's gonna be a hundred x or a thousand x. And that means that unless these companies find a completely new.
Business model on how to monetize their offering, they will have a very, very serious revenue issue. And I think that even if they do figure out how to monetize the agents that are using it, which I'm not exactly sure how they're going to do, but let's say they will find a way. I have a feeling that the revenue numbers are gonna be significantly lower than what they are right now when you have thousands or tens of thousands or hundreds of thousands or millions of users using their platforms.
So do I think it's the end of SaaS? No. Do I think it is the end of smaller applications that do very specific things? Probably, yes. I think that companies, and to be fair agents will speed up applications that they need on the fly. I'm already seeing it happening in my work with Cloud Cowork in the last two weeks. So I do think that smaller software companies and SaaS solutions will either disappear or be eroded dramatically.
And I do think that large scale SaaS will take a serious hit in the revenue that they're generating. Not to mention the fact that new companies that are smaller right now will vibe code the solutions they need instead of committing to a larger SaaS provider. And then that means over time, because it's gonna be churned from the existing clients, they're going to lose clients.
And in the next 10 to 15 years, they'll have less and less clients versus more clients, which means instead of growth, we'll see a decline of these companies. So what was driving all of this in addition to just the release of a plugin that can do legal work.
Well, philanthropic officially announced Claude Opus 4.6, which is its most advanced AI model to date, which is featuring state-of-the-art on agentic coding knowledge work across the board, so not just on coding and really complex reasoning capabilities. In addition, it has a context window of 1 million tokens, which is about five x what Opus 4.5 had, which is a huge jump, which means it can work on much larger bodies of work.
And it introduced the capability to do agent teams in parallel, to do multi-agent work fully orchestrated by the AI itself. They also introduced cloud PowerPoint integration, adaptive thinking, and context compaction, which they actually already had in the previous version, and just improve the mechanism.
And this is just the beginning of the explosive week that Anthropic had that is fully aligned with the explosive two months that anthropic is having, literally becoming the darling of everybody who's like me, who is deep into the AI universe and is experimenting and building things. So what is this driving? First of all, Opus 4.6 achieves the highest score on terminal bench two, which is for engine decoding.
It leads all frontier models on human's Last exam, which is supposed to be the toughest questions from the most advanced disciplines in the world, and it outperforms GPT 5.2 by 144 points on the GDP valve, which measures real world knowledge. It's actually a solution that was developed by OpenAI themselves. And now Anthropic is leading in that in both finance, legal, and other domains that are being tested.
So it's an extremely, extremely powerful model that is built for agentic work beyond just coding. It also scores extremely well on the MRCR version two Needling. The Haystack Benchmark, which as the name suggests, is testing models on finding very specific accurate information out of huge, large volumes of content. So Opus 4.6 scores, 76% on the platform versus 18.5% of Sonnet 4.5.
So in addition to the fact that it can work with really large bodies of context, it can also retrieve specific pieces of information from it very accurately and significantly better than previous models. And they also introduced, as I mentioned, the agent teams function where, and I'm quoting instead of one agent working through tasks sequentially, you can split the work across multiple agents, each owning it's peace and coordinating directly with the others. So how does it work?
One session acts as the team lead. It's coordinating the work. It is assigning task and it is creating the other agents on the fly, basically, teammates that can work independently, each and every one of them with its own context.
Window of 1 million tokens, which basically means you can do unlimited amount of work because each new agent has its own context window that it can work in independently and they can communicate to each other, not just through the lead, but also directly to one another. The humans can interact with each and every one of the teammates separately because they each run in a separate instance of Claude. Now, what does that mean for actual work?
Philanthropic researcher nicholas Carini did a stress test for this system. He had 16 agents, so doesn't sound like a huge number, but listen to what it did. 16 agents built a new C compiler from scratch. It worked over almost 2000 sessions, spent $20,000 in API costs, but it was able to build a fully functioning C compiler with over a hundred thousand lines of code. And the compiler it created can run across multiple platforms, including Linux and X 86 and ARM and other solutions.
And so it built a really robust capability on its own while working simultaneously across 16 different instances of Claude Code. Now the amazing thing is to make it happen is actually really simple. So all you need to do is to enable a function inside the settings of your cloud environment. And the setting is called cloud code Experimental agent teams. And all you have to do is turn it on. And from that moment on the user, you can describe the team structure and the task in natural language.
Claude creates the team, spawns the teammates, and coordinates the work automatically. You don't have to do a thing for this to work. It will decide how many agents, what each agent needs to do, how they are going to communicate with one another, and so on.
And it can also suggest to you creating the team if it sees that a specific task you're doing will benefit from parallel work now, the one thing that Anthropic mentioned is that this teammate coordination adds significant overhead from a token usage perspective because they have to communicate with one another and all the messages back and forth are consuming tokens.
And they also said that there are cases in which the parallel work is actually not beneficial and a sequential work actually does better. And I'm sure the system over time will get better and better at identifying this on its own. For now, it does require some human guidance of when to use this and when not to use this, but this connects very well to what we shared.
Just last week, if you remember last week, Kimmy 2.5 came out including Agent Swarms mode, which basically does the same thing and created a huge level of excitement in the industry around this feature. Well, it didn't take very long, and Anthropic came out with the same thing. And as we'll talk about in a few minutes, you will see that OpenAI has now the same exact feature as well. Now, whether it's fully functional right now, or is it just still a testing beta? Doesn't really matter.
It doesn't really matter because it just, like adaptive thinking was just introduced in the summer of 2025 and it was, ah, it was good but not great. It is now fantastic in the way it is working. And the same thing is going to be with the ability to coordinate these swarms of agents on the fly for multiple tasks. Now just think about what this actually means.
You communicate with Claude, you tell it exactly what you want to do, and now the single agent you're talking to is creating and spinning up multiple agents. Each and every one of them knows exactly what it needs to do. It has its own context window, and they can communicate between each other on the progress they're having on different tasks.
They're monitoring the progress through a unified environment, and it allows them to run independently while still being synchronized on the bigger, broader tasks. And then it all comes together through the orchestrator main agent who monitors the progress and decides what needs to happen next. This is exactly how software companies, and to be fair, more or less any process, but in software companies just very well structured.
You have a scrum master who is the person who decides what are the requirements and which requirements need to be worked on. Then there is a standup meeting in which the tasks gets assigned to specific people based on their skills and based on what needs to be done by the same person, to know the all the stuff that's happening, and then versus what can be done in parallel by other people.
The work gets divided then it gets done by multiple people that then come together to discuss the deployment and verify that everything is working correctly. This is exactly how it works. This means that Claude will be able to do on its own what an entire development team does, and not just individual people in the team. And yes, it will cost more tokens to do this because they need to coordinate with one another, but that cost is going to be significantly lower than hiring a team of developers.
Now, as somebody who has been using Claude Cowork extensively since it came out about two weeks ago. It completely changed the way I work. I currently spending about 90% of my AI time in cloud cowork, and the other 10% divided between chat models that I used every single day before that because the benefits are not even close.
Now, if I can get to the point that I can go to Claude and it can do the tasks five to 50 times faster because it can spin up multiple agents, instead of me having to define them and tell them exactly what to do, it puts me in warp speed. I can now do five to 50 X more things in the same amount of time with the same amount of effort. And yes, paying the tokens, but it doesn't make any difference because I will generate all that work instead of hiring so many people to do this kind of work.
Now from a personal creativity and productivity that is really, really exciting. I am terrified with implications of that on the global workforce. Now, if Claude 4.6 Opus, which is their largest model, is not enough testing catalog, is reporting that a philanthropic is apparently on the verge of releasing sonnet five, which means the next big model with their middle level, which is the way they released all the previous versions as well.
So Anthropic always has three versions of its Model Haiku, which is the smallest one, then sonnet and Opus, and they've always released sonnet first, kind of like the mid-tier level. Apparently they're planning to release that potentially during the Super Bowl.
Now, according to testing catalog, early hands-on testing of the non-thinking Sonnet five variant, which is not the most powerful variant of it, is competitive with today's frontier models and potentially outperforms or is aligned with Claude Opus 4.5. So the version just before the one that came out of Opus.
But with significantly less tokens and a lower cost structure, which is the way it's always been working, always the smaller next version of models is beating the higher, more capable previous version of models. Based on the leaked information, the cost could be 50% less than Opus 4.5 for comparable or superior performance. But that's not all of the announcement that Anthropic has made this week.
They also announced that the Claude side panel in the Chrome browser is now available to all paid users. So previously with just the max users, now it's available to Pro Max Team and Enterprise. And what it allows it to do is it opens a side panel inside of Chrome that allows Claude to basically do everything that you can do in the browser.
It can read information, understand what it is, fill out form, extract data, manage multiple tabs simultaneously, and run multi-step workflows across these different tabs. Now the other thing is it's now fully integrated into cloud code. So if you are developing stuff in cloud code, you can test things in the browser. You can have access to the browser, see what it's doing and so on. all without having to use any third party tools.
The other new feature is that you can schedule recurring tasks daily, weekly, or monthly. So there is a planning mode that allows you to create a plan and approve executions, and then it will independently run multi tab workflows on whatever timeframes that you decide.
So you can teach it how to do a specific work that people used to do before and tell it to do it daily or weekly or every hour or whatever frequency you want, and it will go ahead and execute it running multiple tabs in your browser at the same time. Now the lowest tier, the pro subscription, is only limited to Haiku 4.5, their lowest model, but the max and team and enterprise users can choose from any model basically they want in order to run this operation.
Now Anthropic emphasizes as they did every time before that using browser-based AI carries inherent risks, including prompt injections and other risks that it generates. And because of that, the extension asks you before acting on anything that it is doing. Uh, but you can tell it that you can also allow it to do a specific function from that moment on without asking you, which I find myself doing more and more and I'm learning to trust it, not necessarily from all the right reasons.
I'm just excited with what it's doing, and I don't wanna approve every single step. Now while this functionality of having a side panel of a gentech capabil that can control your browser sounds exactly like Comet or Atlas, which has existed for a while now. These are age agentic browsers. comet is from Perplexity and Atlas is from Open ai. Even sounds very much like the latest release from Gemini that has released Gemini for Chrome, where it can control your browser.
This is actually a very different solution from one big reason, which is the fact that it connects to Claude Code and Claude Cowork. The Cloud browser connection gives Cloud Cowork an extremely powerful capability to fetch additional information review outcomes of things it is doing, test everything. It is setting up, whether it's N eight N workflows or code or whatever, and it is an incredible amplifier of what Cloud cowork could do without this web browsing access.
I have been using this capability a lot in the last few weeks, and it provides an immense value to basically any new business process that I'm developing. And I'm getting all this value without opening the Claude Sidebar in Chrome even once. So I've used the actual extension inside of Chrome exactly zero times, and yet this capability to control the browser is giving me incredible value, and that's the biggest unlock. It is not about me using Claude in the browser.
It is about the agents that I'm developing that can use Claude in the browser in order to do everything I need to do. As an example, Claude Cowork is now creating 100% of the new NA 10 workflows that I'm creating, and it can test them, evaluate them, and see their outputs inside of the browser without me having to be involved.
Now, while it's still not perfect and I still have to give it guidance and fixes every now and then, it definitely feels like magic and like a completely new universe of possibilities that are working at a 90% success rate, which is definitely better than I could do on my own. And it is happening in the background while I'm working on other things, which makes this even more powerful. But not all the news from philanthropic from this week is good.
Anthropic experienced a major outage on February 3rd that not Claude Code out completely, and with it Claude Cowork, and basically took all the API connected models offline and had issues with the web capabilities as well. Again, it was only down for 20 minutes, but those 20 minutes stopped millions of developers from working at that time. It showed the growing dependency that the software development world has on these models.
I can tell you that I felt like I'm wasting a huge amount of time when that happened. And there were actually smaller, not complete, but partial outages over this past week, and it just drove me crazy because I'm now, again, spending most of my time in Claude Cowork. And when it stops working and it's telling you it's not responsible, like, what do you mean I, I need you to work. I need to do these things. How can I not do this right now? And this is just a two week process for me.
And again, I've been using Cloud Code way before that, but I'm now building so many things with cloud cowork, and every moment it's not running. I feel like I am wasting time and the main point I'm trying to deliver here is you must have redundancy in your at least main processes to these tools. If you have a significant part of your value, depend on one specific model. When that model is down, you are down and your ability to provide value is down.
And hence, always have a backup plan, whether with open AI or Gemini or cloud or whatever, have several of them connected. And maybe even if it's not fully optimized to your process, at least have it as an option that is a working functioning option that allows you to keep on working while your main platform is doub. And I mentioned that there's a potential release of Sonnet five during the Super Bowl, but that is not the only Super Bowl related thing that happened with Anthropic.
This week, anthropic released four Super Bowl spots, ads featuring actors that are showing as if they are a ChatGPT agent that people in different scenarios are communicating with. It is hilarious. It is really, really funny, and it is basically joking about the fact that ads are coming to ChatGPT. So the premise of all of them is there is a conversation between a person in a specific scenario seeking help from an agent. Both the person and the agent are played by humans.
And I will put links to all four of these in the show notes, but you can just Google it and find all four of them very easily. And it is in the middle of the sentence. While it is providing value, it is coming up with an ad that doesn't make full sense in the context of the conversation, and it's really confusing the humans as part of that conversation. And then the tagline says, ads are coming to AI but not to clawed. And the timing couldn't have been more perfect from Anthropics perspective.
They're gonna release it in the Super Bowl, and about two weeks after OpenAI announced that they're going to have ads on their platform and while they're still testing it, and they're gonna get a lot of attention because the ads are really, really funny, combined with potentially releasing a new model. And I, and you understand how this is a brilliant PR move from Anthropic, but that didn't go down the throat of Sam Altman very easily. And he was really, really pissed.
Now, every time Sam writes long tweets, you know something went wrong. That has been a consistent pattern multiple times in the past few years. Every time something bad happens, either external or internal, inside of OpenAI, Sam goes to X and write really long, detailed post. So let me read you some segments of the post that Sam wrote immediately after the launch, but then I will relate to some of the things that he was saying. Here we go. Here's what Sam wrote first.
The good part of the philanthropic ads. They are funny and I laughed, but I wonder why philanthropic would go for something so clearly dishonest. Our most important principle for ads says that we don't do exactly this. We would obviously never run ads in the way philanthropic depicts them. We are not stupid, and we know our users would reject that.
So to be fair, OpenAI said multiple times that the ads are gonna be separate from the regular chat answers, and that's not going to impact the answers, which is exactly what the ads are suggesting. And I'm continuing to what Sam wrote. I guess it's on brand to anthropic, to double speak, to critique theoretical, deceptive ads that aren't real. But a Super Bowl ad is not where I would expect it.
More importantly, we believe everyone deserves to use AI and are committed to free access because we believe access creates agency more Texans use ChatGPT for free than total people use Claude in the us. So we have a differently shaped problem than they do. So basically what he's saying is saying they have a significantly larger free user base that they want to maintain as free, and hence they need a mechanism to be able to pay for the tokens and the compute to allow this free usage.
I a hundred percent agree with that. Obviously, Sam did not have to step on Anthropic in order to say that, but since they're giving a jab through the ads, he is just fighting Back. Then he said Anthropic serves an expensive product to reach people. Now, to be fair, that is not accurate as well because just like ChatGPT has multiple tiers, philanthropic has the same thing. So ChatGPT, you can get it for free, $8, $20 and $200 and you can get Anthropic for free, $17, a hundred dollars and $200.
So a very similar approach exists with Claude as well. So I think that's, again, not a fair statement by Sam. But in war, like in war, you can do anything you want and then Sam goes on in order to state what he's thinking about Anthropic, and I'm quoting again. Maybe even more importantly, anthropic wants to control what people do with ai. They block companies they don't like from using their coding product, including us.
They want to write the rules themselves for what people can and can't use AI for. And now they also want to tell other companies what their business model can be. So again, to be fair, I think the fact that Anthropic is blocking their competitors from using their tools in order to write code makes perfect sense to me, and I don't really understand how can Sam can use that against them? I think this makes perfect sense.
Why would you use your tools to allow your competitors to close the gap against you? Uh, I'm fully aligned on the logic behind what philanthropic has done by blocking x, and OpenAI from using cloud code. But then finally in the end, Sam went to the positive side and I'm skipping a part of the tweet and he said, we are enjoying watching so many people switch to Codex.
There have now been 500,000 app downloads since the launch on Monday, and we think builders are really going to love what's coming in the next few weeks. I believe Codex is going to win. So what is this whole thing tells us gloves are off as much as there's been a fierce competition between these two companies before.
This is a very serious escalation of this, both in going to ads that just joke about the other company, as well as the level of responses from Sam on X. This is going to be very dramatic year for the competition between these two companies. But this is a very good segue to talk about the release of the new coding platform from OpenAI. Let's do a little timeline analysis. Both companies, apparently were planning to launch their new products, so Anthropic Opus 4.6 and GPT 5.3 Codex.
The plan was to release them at 10:00 AM Pacific time. However, anthropic jumped the gun and moved first and made the announcement at 9 47, so 13 minutes before the queue and OpenAI followed at 9 52. So five minutes after Anthropic made their big reveal, OpenAI did the same thing with their product. That is the direct competition for the Anthropic product. So GPT 5.3 Codex is reportedly 25% faster than Codex 5.2.
The model topped both SW Bench Pro and terminal bench two benchmarks, and OpenAI describes it as the advancements that transforms Codex from a tool that can, and I'm quoting, write and view code into a capable platform that can perform. And I'm quoting again, almost anything developers and professionals do on a computer. So again, you can see the shift from developing a tool just for computer developers to a tool for any professional.
Very similar to what Claude did with Cowork a couple of weeks ago. Now, an interesting point here is the five minute gap between the two announcements, which tells you very clearly that both these companies know exactly what happening in the other companies.
So the level of competitive intelligence that is happening here is absolutely crazy, and I don't know if it's people just, you know, drinking beer in the same places or real business espionage, but one way or another, releasing two competitive products within five minutes apart tells you that they're knowing exactly what's happening behind the scenes each in the other company.
Now Codex is available to all ChatGPT paid plans, and you can use it on the standalone Codex app and in the command line interface and in your favorite IDE and on the web interface and the API access is coming very soon. Now, very similar to the announcements from Philanthropic Codex also has the ability to schedule recurring tasks. So you can tell Codex what you want it to do. You can develop skills for it, very similar to what you can do with co-work.
And then you can set up and define a specific schedule. When a task gets completed, the results gets placed in a review queue for the developer and or the professional user to review the output and the system runs on work trees, basically defining the work for the different repositories in order to prevent conflicts when the code and or the other tasks gets executed and generated.
Now while right now it runs on the developer's computer, OpenAI is planning a server side aspect of this that will allow these parallel tasks that are pre-scheduled or that are created in real time by the other agents to run on the web while the developer's computers are off. So a continuous never ending development cycle, very similar to what we heard from philanthropic and from other companies as well.
So the biggest shift from a mindset perspective is that these kind of backend, long lasting self-fulfilling automations are a complete shift from how these models are used right now.
So rather than asking the AI to complete a specific task at this moment, developers can define recurring workflows, entire workflows and delegated such as maintenance work of applications or databases and so on, can happen on their own in the background, running by multiple agents, just by providing humans feedback on what is the status, but executing autonomously. But that's not the final thing that open AI introduced, or if you want from the late night infomercials, but wait, there's more.
So OpenAI also launched Frontier, which is an enterprise platform for building, deploying, and managing AI agents, not just their own AI agents, but the idea is to create an infrastructure that can allow you to run and manage multiple agents from multiple sources, including from their competitors such as Google, Microsoft, philanthropic, or any homegrown other third party agents that were developed by any enterprise can all be managed on this one environment.
One of the key features of the Frontier platform is what OpenAI calls semantic layer for the enterprise, which basically allows to take silo data from whatever warehouses such as CRM system, ticketing tools, or any internal applications, and create a unified context environment that all the agents can access and reason over that.
Data breaks basically breaking one of the biggest problems that companies have right now with data, which is siloed firewalled data that exists in different places that the AI agents cannot connect to all of it together. Well, it solves that problem. Knight also introduces the concept of co-workers. The name, again, very interesting. After Claude launched cowork.
And the idea is that agents can be managed similar to how you manage human employees, complete with onboarding process and feedback loops and improvement cycles over time. So the agents build memories and learn from the performance of other agents, and they keep on improving the quality of the output by monitoring themselves, creating dashboards for humans to monitor them and give them feedback and so on.
Now, early adopters of this platform include companies like Intuit and State Farm, Uber, h, hp, Oracle, BBVA, Cisco and T-Mobile, and some companies are reported getting 90% more time back for their client facing team by using this new architecture. Now to make all of this more accessible to users, OpenAI also launched Codex desktop app for Mac os. Again, sounds familiar. The same exact thing that Claude did with Cowork just a couple of weeks ago.
So OpenAI released, the Codex app on February 2nd, can the desktop app supports multi-agent parallelization, where the coding agents can run multiple threads at the same time. Again, the same exact thing we've seen from Anthropic and from Kimi two and from Cursor. Individual agents can run for more than 30 minutes independently before returning and completing the code review with other agents.
There's a centralized plan mode that provides review for the state of all the developers, so humans and other agents can inspect what is actually happening, and the full process can be tracked and managed in a more effective way. As I mentioned, it also allows you to schedule tasks that can run in the background, for more or less anything that you want.
I haven't tested the new Codex yet, but it sounds very much like similar capabilities to Cloud cowork that, as I mentioned multiple times in the past three weeks, I'm completely obsessed with. And it also allows connecting skills and connecting apps and connecting cps, and building really powerful age solutions through that.
So I haven't tested it yet, but I'm definitely planning to test it out and compare it to Cloud Cowork and I will report my findings once I have a clear understanding of the pros and cons of each and every one of these platforms. Another interesting thing that they did is that they have cross tool continuity, meaning you can continue in the desktop app sessions that you started, either at your development platform or IDE or even in the terminal.
So all of these allow you to continue the work that you started in one and continue in another. Now to show you how fast this is evolving, over 1 million developers use Codex in this past month. Usage has nearly doubled since GPT 5.2. Codex came out in mid-December, so less than a month and a half ago, and it has grown 20 x since the launch in August of 2025.
So in just a few weeks, we went from several tools who are great in coding, but that require a lot of hands-on from the developer and a lot of direct guidance and can more or less write only code to autonomous multi-agent orchestrated tools that can run 24 7 while running schedule tasks in parallel and going way beyond coding into more or less any knowledge work. We're not just accelerating, the acceleration itself is accelerating.
Now, from a personal perspective, I can tell you that I can feel, literally feel what they mean by reaching the singularity, meaning I feel that the rate of change is getting very close or closer to a vertical line going straight up. Again, a few weeks ago, we didn't have many of these capabilities that are now all maturing and connecting to extremely powerful capabilities.
And as I mentioned multiple times in this podcast, we are not ready for this from any perspective, from an economical perspective, social perspective, psychological perspective. In other words, the human race, all of us are not ready for such a rate of change, such a rate of improvement, which means because we're not ready, we will need to deal with a lot of unknowns and really serious bumps on the road in real time versus planning for them in advance and.
This will become the norm because the speed of change is so fast and we cannot keep up with it, which from a global society scale, I believe is a really bad thing. In addition to all these things, OpenAI also announced a broader support for MCP apps inside everything OpenAI. So they had support for MCP since March of 2025, just a few months after Anthropic announced it in late 2024.
But now they have full read and write actions built into the MCP capability inside of OpenAI, which is similar again to what Anthropic already has, which provides developers in the OpenAI universe a lot more flexibility with what they can do with MCP connectors.
And together with that announcement, they also announced a lot of new MCP connector partners in the OpenAI environment, including Amplitude, fireflies, sal monday.com, Stripe Hex, ignite Alpaca, Biore SEMrush, and many others, including Atlassian, which brings in tools like Jira and Compass and Confluence, which you can now communicate with through a regular chat GPT conversation, and will allow you to understand what's happening in each and every one of these platforms, but also make changes to
these platforms. So you can update your Jira just by having a chat with chat, GPT and the same thing for monday.com. You're not a developer and you're just working on tasks in a regular knowledge work environment.
Now if you think I'm the only one that feels that this is moving a little too fast and requires some more guardrails, I mentioned to you a few weeks ago that OpenAI had an open position, for a head of preparedness, somebody that will basically manage and oversee the deployment of new systems and the level of risk that they represent. OpenAI found that person. They just hired Dylan Canona, which used to work at Anthropic at a very similar role.
Altman framed the hire in the context of these accelerating capabilities, stating that things and I'm quoting are about to move quite fast. And he stated that OpenAI will be, and I'm quoting, working with extremely powerful models. Soon, in a recent interview with Forbes, Alman hinted that OpenAI has, and I'm quoting, basically built a GI or something very close to it in another interview this week. At the Cisco Summit where he was interviewed by Cisco's Chief Product Officer, GTU Patel.
Sam mentioned that 2026 is going to see a significant jump in the capabilities of models, and when he was asked whether the users are going to feel a five x, a 10 x or a 50 x improvement, he said that he believes that it will be about a 10 x improvement by the end of this year.
Now, from my personal perspective, I can tell you that as somebody who's been using these models every single day, and they're all extremely powerful right now, I cannot understand what a tenix improvement can look like because the models are already extremely capable right now and is talking about all of this happening in 2026, which means in the lab they already have these models running. When these guys make prediction for the next few months, they're not making predictions.
They're basically telling us what they have working the lab resort now that they haven't released yet. And Sam, by the way, confirmed what I'm feeling in the past few weeks, that the convergence of all these things together is what makes it unique. And he noted that I'm quoting code is really powerful, but code plus generalized computer use is even much more powerful, which is exactly what I'm experiencing. I don't really care that it's writing code in the background.
I care about what it enables me to do by connecting the dots across more or less everything a business needs to do from a knowledge work perspective. And this is what these platforms enable right now. And if they're gonna be Pan X better by the end of the year, this is insanely powerful. And as I mentioned, nobody is ready for it, especially not companies.
And Sammy himself in this interview he's saying that the most important thing for companies to do right now is to dramatically accelerate their understanding of how to use AI systems. And he's saying that companies who will fail to do that, who will be unprepared for integrating AI into everything that they're doing, will face a significant competitive disadvantage in the very near term.
And I can tell you that working with multiple businesses myself as a consultant and as an educator with these companies and showing them what's possible, I can tell you that the companies who are adapting and making these changes are gaining huge, significant competitive advantages over their competitors.
Uh, and again, I'm saying this from the positive side, but it obviously means that the companies who are not doing this are going to suffer in a very big way, in a relatively short amount of time. By the way, if you are in a company, in a leadership position and you're looking for assistant in that, please reach out to me. either through the link in the show notes, there's a way to book time with me, or just reach out to me on LinkedIn.
I'm there every single day and I will gladly come and provide you advice on how to proceed and what's the best way forward. Now to establish the fact that things are moving faster.
Meter, which is a company that we talked about many times in the past in this podcast, is a company that is has, that has developed a benchmark to measure how fast AI is accelerating and the way they're measuring it to, just as a quick reminder is by seeing how long AI can work in a single session in order to complete a specific task, 50% successfully. And the 50% success rate doesn't really matter because they just keep on comparing a 50% success rate over time.
So the fact that it's a 50% success rate is obviously not acceptable from a business perspective. that's not what they're trying to measure. They're trying to measure from one model to the next, how much longer can you work and still complete tasks in 50% of the time? Well, they got to the limit of their previous benchmark because the AI was able to complete the tasks successfully almost every single time, relatively quickly.
So they developed a new benchmark that they're calling th 1.1 versus TH one that happened before. But even with this new benchmark, what they found is now that now the models are doubling the time, they can work on a specific task every 131 days versus 165 days, which was the assumption previously. So if previously models could double the amount of time they work on a task every 165 days.
Now they're doing it every 131 days, which tells you again that the improvement is accelerating between the different versions of the models. So what I am feeling, and if you are in this universe on a regular basis, you are feeling it as well, is not a subjective feeling. It is actually what's happening right now and now to a lot of quick and interesting rapid fire items. Apple, just released X code 26.3, which is integrating coding tools from both philanthropic and open ai.
So the two main things we just talked about into the Apple development environment, which means you can now use philanthropic agents and open AI agents in order to do development for anything in the Apple ecosystem. So you can develop iOS, ME os, watch os, TV OS, and Vision OS applications using these very powerful tools. Cursor, who is probably the most known name out of the AI development platforms out there, and definitely the one that is used the most by developers, not just by vibe.
Coders just released an interesting blog post detailing their findings on how to run multiple AI coding agents simultaneously on a single code base and introducing multiple interesting concepts. And you can see this aligns perfectly with the latest releases from Kimi and OpenAI and Anthropic on running multiple agents on a single task that are running in parallel. It's a very technical and yet very interesting paper.
And we'll put the link in the show notes, but the bottom line is the world of parallel. Agents is already here, and it is going to change every single thing that we know as far as how work can be done in a very dramatic way, in a relatively near future. And basically what they're saying that they've resolved all the issues on how to resolve conflicts between all these agents. And they've actually resolved it in a very interesting way where they actually allow the agents to make mistakes.
So in their initial attempts, they were trying to get the agents to do everything perfectly on two different levels, both the individual piece of code that IT agent is generating, but also preventing overlap in issues with the code.
And they basically learned that a small error factor in both the code and conflict issues actually generates better results because these get resolved by other agents afterwards providing an overall faster and more efficient and yet accurate system at the end product. And what they're saying right now is at scale, the limitation right now is not the agents or their coordination, but the ability to read and write from the hardware you're reading and writing to.
So the disc io that is being used in order to write the data or retrieve the data that is required by these agents, the actual hardware limitation becomes the limitation of how big or fast the system can work. Because hundreds of agents that are running in parallel can generate gigabytes of new code every single second.
This is obviously a very profound situation that we're in, that the limiting factor for the amount of code you can generate is how fast the disc can write the new information that gets generated and not how many agents can run in parallel.
Now, something from two weeks ago that I did not report on, but now that we're reporting on a lot of interesting releases that I feel that I have to report about is Google DeepMind released Project Jenny, which is the next version of the prototype that previously was released only for preview of specific companies and organizations. And it's a tool that is using several different capabilities that Google has developed in order to create 3D worlds in real time.
So we talked about world models on this podcast several times in the past. Genie three is one of the most advanced of these. And the way it works is the user defines the world they're trying to create and the character they're trying to create. And then Jenny creates it in real time, and it's currently allowing a 62nd free navigation through this new world with whatever character you invented. And it is doing this at a relatively decent quality of seven 20 p with 24 frames per second.
So think about coming up with an idea for a new world and allowing you to navigate that world with whatever kind of agent representing you in that universe. Whether a bird, a person, a submarine, literally whatever you can come up with, can be the entity that is navigating, and it can be a third party view or a first person view from that thing, allowing you to navigate that universe.
This is just another step in the intensifying race, in the world models environment where FEI Lee has launched World Labs, which is developing something very similar with a product called Marble Runway, has launched their own world model And we also talked on the podcast on Alna Kun previously, the chief scientist of meta leaving and starting his own company that is going to focus on developing world models, which has incredible interesting implications.
One of them might be just like there was just the SaaS apocalypse. We can see the gaming apocalypse happening because if anybody can create any game on the fly just with a prompt and then share it on an app store or a gaming platform, then why do you need large companies to create new games if they can just be created on the fly? Again, I don't think it will replace all games.
I think it will replace all the less sophisticated games, and you will do this very quickly, and I think it will have a profound implication on the gaming world. But to me, and I talked about this in the past when we talked about these models, I see a very scary future addiction because if you can experience.
Anything in any world, whether a realistic world or imaginary world, and you combine that with virtual reality headsets that will get more and more advanced and with hectic devices that will make you feel and touch anything in that world. we are going to end up in a Black Mirror episode where people will prefer spending time in these virtual universes versus realistic universes.
And I sadly see stuff like that happening not in the too far future, sticking to the visual world and sticking to interesting releases. This week, Kling AI just released Kling video three. Which is now one of the most advanced video and image generation platforms. What's interesting about it is they created one unified multimodal AI generation engine that integrates text and image and video and audio into one single training framework, which provides it extremely powerful capabilities.
The video generation time jumped from 10 seconds to 15 seconds, and you can control the exact duration. When you set up the preset options of the run, they added a really interesting feature that generates a multi-shot generation that allows you to generate up to six distinct camera cuts with a single video, which enables you a lot more control from a storytelling perspective. They also now have. Native audio generation.
So now it allows you to create both the speech of the characters sound effects and music in the background, all while generating the first run of the video, similar to what we know from VO three. Now, from a quality perspective, the model outputs native 4K resolution at 60 frames per second. It can speak Chinese. It has multi-language support for Chinese, English, Japanese, Korean, Spanish, and other different specific dialects.
And it has very strong consistency of subject and environment across multiple generations, which enables you to generate really long videos. What. Really long videos and combine them together seamlessly because the environment and the person stay consistently between all these generations staying on video generation. Meta just announced that they are launching Vibes, which is a standalone AI video creation app that lets users generate videos from scratch or remix existing content.
You can then add visuals, add music, adjust different styles, and then post directly to the vibes feed on both Instagram and Facebook. So this is a very similar concept to the Sora app from OpenAI, which tells you that we're going to see a lot more AI generated content in our feeds.
The more interesting announcement from Meta is something that we discussed shortly last week, which is Mark Zuckerberg basically saying that this year Meta is planning to release a complete AI controlled, autonomously generated advertisements on their platform. So you as the user will submit a product image or a URL and a specific budget, and the AI will autonomously complete everything, images, video, text, determine the optimal targeting.
We'll select the best platform and we'll basically run the ads for you. This is the collapse if you want, of an entire industry of marketing and ad agencies that has been doing this since the launch of e-commerce about 25 years ago. And I'm gonna end with a funny, interesting, and yet scary thing that happened, and it actually happened several months ago, but I just caught it right now because apparently the phenomena has been growing.
So an AI generated travel article on the Tasmania Tours website fabricated basically hallucinated a non-existing tourist attraction called the Well de Boro Hot Springs. And it included a detailed description of how beautiful the springs are, and also vivid AI generated images of steaming pools in the lush rainforest. Now, actual real tourists are driving hours through Tasmania's countryside to reach the destination to find out it doesn't actually exist.
Now a local pub owner said, and I'm quoting, it was only a couple of calls to start with, but then people began turning up in droves. I was receiving probably five phone calls a day, and at least two to three people arriving at the hotel looking for the springs. Now, one of the reasons it was so successful in confusing people is that this article that again, was AI generated, also included well-known attractions like the Hessing caves and other known attractions in Tasmania.
So it made the made up locations a lot more believable because a lot of people knew a lot of the other attractions. And by the way, that's what I see with most hallucinations on the day-to-day. They don't just show up as a standalone clear thing. They are well blended into a great output that most of it is real and some of it is made up. And that's what makes it very, very tricky to find hallucinations. But now I wanna combine some of the dots from this episode together.
I want you to combine this with how we started this article. So think about this particular article is a one article that was created by a person using AI and then posted without checking all the information, but we're talking about a very near future where completely autonomous work cycles are completed with multiple agents running in parallel 24 7, generating whatever output they're generating.
Could be code, but could be new business protocols, could be new products and services, could be new content or articles on behalf of your company. All while potentially hallucinating in the process. Now, while I'm telling everyone, including myself to check every output the AI gives you because it might be hallucinating, the reality is we are getting very close to the point that it is impossible to verify the amount of work that AI is going to generate.
Now, whether it is possible or not, it is very easy to stop doing this. And I told you in the beginning that when you use these agents, they ask you for approval to take different steps. You, it's the same in cloud code. It is the same in cloud cowork. And I very quickly went from looking at every single step and approving it to just giving it a blank check to do whatever it wants, every time it asks for access to a specific platform.
Just because it is so tempting, I do not want to allow it to do everything it needs to do. I just want the output and I want it fast, and I want it to be efficient, and I don't want me to be the showstopper or the blocker for the effort, so I'm just allowing it to do the work. Now combine that with systems that are 10 x faster that are deploying these kind of agents at a global scale, and you understand why this can go terribly wrong now.
I'm not saying this to scare you, I'm just telling you what's coming. And sadly, I don't have any good solutions or even suggestions on how we handle this reality where independent agents are working at scales, we cannot monitor, and they're generating content which may or may not be real, that is gonna be used across literally everything that we do.
And this is without even talking about the option, that there's actually bad actors that will do this on purpose, that will create such things at scale, with swarms of agents to convince millions or potentially billions of people of whatever they want to convince us with. So on this positive note, I will end today's episode.
And I will mention that all I can do at this point is keep you updated and keep you as much aware of what is happening and what I think is going to happen and how it is gonna impact our world. And I really hope that the fact that more of us have that knowledge somehow allows us to be better prepared and reduces the overall risk. If you're finding this podcast helpful, even if scary at times, I would appreciate it if you rank it on your favorite podcasting platform.
If you write us a review and if you share it with other people who can benefit from it, it will take you seconds to do. Literally click the share button and. Then share it with a few people who can benefit from it as well. Keep on testing AI yourself. Keep sharing with the world what you find. This is our way to be better prepared for what is around the corner. That's it for this week. And until the next episode, have an amazing rest of your weekend.
