Claude Mythos: Anthropic's Leak That's Too Dangerous to Release

⁠¶ AI Breakthroughs and Leaks

00:00

Ejaaz: Three weeks ago, rumors broke that a major AI lab had built a model more powerful, Ejaaz: more dangerous, and more expensive than any AI model that we had seen before. Ejaaz: We didn't know which model lab it would be. We didn't know what the model was called. Ejaaz: And then just a few days ago, Anthropic leaked a model called Claude Mythos, Ejaaz: which is supposedly more powerful than any model that they've ever built before, Ejaaz: a tier above Opus 4.6, which is what we see today.

00:26

Ejaaz: This model is actually so good that it is considered a cyber security threat Ejaaz: and can't be rolled out to the public just yet. Ejaaz: But it's not just Anthropic that's building a model that is close to AGI like this. Ejaaz: OpenAI has a model codenamed Spud, Google has a model codenamed Agent Smith, Ejaaz: and there's many more to come this year.

00:43

Josh: But the Anthropic leak wasn't intentional. This was discovered by accident last Josh: Thursday, March 26th, by a Fortune reporter who discovered that Anthropic's Josh: content management system had a configuration error. Josh: And for those who aren't familiar, the content management system, Josh: it's how the web server serves files. Josh: And within that, there is a config error that leaked nearly 3,000 unpublished Josh: assets sitting in this publicly searchable database.

01:07

Josh: Anyone could find them. So two independent security researchers, Josh: they went through, they confirmed. Josh: And among these files were two blog posts of two models named Claude Mythos Josh: and a new tier named Capybara.

⁠¶ Claude Mythos Unveiled

01:21

Josh: Anthropic immediately removed access to all of Josh: this as soon as it came out but then later on an anthropic spokesperson Josh: confirmed that it represents a step change in Josh: ai performance and is the most capable model we've ever built so they confirmed Josh: what we're seeing here is real now the problem is like this image suggests on Josh: screen we're missing a lot of information this is a leak that something like

01:42

Josh: this exists but we don't we're not sure exactly what what we do know is that Josh: there is the new model tier, Ejaz, like you mentioned, named Capybara. Josh: It is the new tier that sits above Opus. So now the lineup will kind of look Josh: like Haiku, Sonnet, Opus, and then Capybara at the top. Josh: It doesn't really sound quite right. Maybe that's an experimental name. Josh: They might find something better. Josh: And then Mythos is the specific model name within that tier.

02:04

Josh: So you can think of Capybara as the weight class and Mythos as like the fighter. Josh: It's the specific model. Josh: Now, according to the leaked documents, this dramatically outperforms Claude Josh: Opus 4.6 on basically everything, but particularly coding, academic reasoning, Josh: and the cybersecurity benchmarks. Josh: And I think the cybersecurity one is one of the more interesting points here, Josh: because it's so powerful as cybersecurity that one of the main reasons why they

02:26

Josh: can't release it is to actually prevent people from using it maliciously. Is that right? Ejaaz: Yeah. So actually, if we rewind to about a month and a half ago, Ejaaz: Anthropik's head of AI security, who's actually a legend in the industry, Ejaaz: gave a talk about Claude Opus 4.6, when it had just released. Ejaaz: And his talk described how the model was pointed at five to 10 very popular Ejaaz: open source code bases with no instructions given.

02:53

Ejaaz: And what the model did was very, very interesting. Ejaaz: It scanned all those code bases and discovered 500 major security flaws. Ejaaz: Expert human AI security researchers couldn't discover in decades that they'd Ejaaz: been staring and using these exact code bases. Ejaaz: So Claude did in a couple of hours what many security researchers couldn't do. Ejaaz: We're talking about like millions of compute hours and time spent staring at Ejaaz: these code bases, testing it.

03:20

Ejaaz: Claude, Opus 4.6 managed to figure this out. Now, this created a lot of excitement, Ejaaz: but also a lot of concern. Ejaaz: Now, because these AI security researchers had a good heart, Ejaaz: they weren't using this maliciously. Ejaaz: But if you could imagine that if this model had been placed to, Ejaaz: say, a malicious actor, they could have exploited these for many different reasons.

03:40

Ejaaz: And so these exploits were surfaced and they were fixed. But the question now Ejaaz: becomes, what if a more powerful model was made more readily available to anyone Ejaaz: or an attacker, for example, a foreign adversary that could discover and exploit any future bugs?

⁠¶ Cybersecurity Concerns

03:55

Ejaaz: That's the concern that's around that i have personally around Ejaaz: clode mythos or capybara this model is supposedly Ejaaz: meant to be a tier above anything that we've ever seen before apparently it Ejaaz: is amazing at discovering and exploiting exploits Ejaaz: so if it is let's say two orders of magnitude let's be conservative two orders Ejaaz: of magnitude better than opus 4.6 we could have a real problem on our hands

04:16

Ejaaz: and so what anthropic has done now is they've started to slow release this secret Ejaaz: model mythos and capybara to cybersecurity experts first. Ejaaz: Why? Because they want them to figure out how they can harden their own defense Ejaaz: systems before they publicly release this model. Ejaaz: And someone, maybe a nefarious attacker might use it for unachievable gain.

04:37

Josh: I think it's ironic that the company building what it describes as an AI with Josh: unprecedented cybersecurity capabilities leaked it because someone misconfigured their blog.

04:45

Josh: Like the irony there is too strong. And you have to wonder, you have to really Josh: ask yourself the question, well, what if this model so smart that it's leaking Josh: itself if it's like poking holes to like let people secretly find it i don't know Josh: The one thing for sure is that, one, this model is going to be incredibly expensive Josh: to run currently, at least.

05:03

Josh: That's part of the reason why we're not seeing it now. But the second is it's Josh: going to be unbelievably powerful. Josh: And the progress that we've had in the last year is going to probably look like Josh: nothing compared to what we're going to get for the next three quarters. Josh: The market also very much felt the effects of this because, oh my God, Josh: these stock charts look absolutely horrendous.

05:19

Ejaaz: Yeah, CrowdStrike, which is like the major cybersecurity firm, Ejaaz: was down a couple billion on the news. Ejaaz: And Palo Alto Networks, which is another similar company that competes in this Ejaaz: firm, also suffered from this. Ejaaz: Now, these two charts that I'm looking at right now for these specific companies, Ejaaz: Josh, gives me a little PTSD or deja vu.

05:38

Ejaaz: Because we were talking about this, I think, four weeks ago when Anthropic released Ejaaz: their security review clawed feature. Ejaaz: Which, you know, wasn't anything to do about Mythos, but basically helped review Ejaaz: the Vibe code that you produced using Claude. And so cybersecurity stocks dumped again. Ejaaz: This is happening seemingly on a monthly basis at this point.

05:58

Josh: Even though these charts are down quite a bit, I'm not sure how concerned the Josh: market needs to be immediately because it appears as if this new model that's coming, Josh: this new cybersecurity specialist is really compute intensive, Josh: so much so that it's almost going to be impossible for them to run across all Josh: the accounts currently without some serious compression and iteration and figuring

⁠¶ Market Reactions

06:20

Josh: out how to run this more optimally. Josh: And it seems like we're starting to see those growing pains, right? It's like, Josh: As they're training models like this, as they're running them on their own servers, Josh: it's starting to affect the average user. Josh: I know sometimes I'll wake up and I'll feel like my opus is running a little Josh: bit dumber than it was the day before. And we actually have data that backs this up.

06:36

Ejaaz: Yeah, so basically over the weekend, Clawed servers basically went down or were majorly impaired. Ejaaz: There were a bunch of different outages. People were reporting very, Ejaaz: very reduced quality in their interactions with Clawed. Ejaaz: And this has been kind of like a repeating trend over the last couple of weeks. Ejaaz: And now we might have the answer why.

06:55

Ejaaz: Typically, major AI labs, the last public bit of information that we had was Ejaaz: from OpenAI's 2025 run of a major model. Ejaaz: They dedicated 30% of their available compute to a training run. Ejaaz: Now, the rumors state that for Claude Mythos, they've dedicated even more and Ejaaz: that's like the major architectural breakthrough that they've made.

07:15

Ejaaz: If they've done that, that might be the reason why we aren't being able to use Ejaaz: the best version of Claude as consumers because they're too busy using the compute Ejaaz: to train the next step or tier in model. Ejaaz: I don't know if this is a good or bad thing, but one thing it definitely like Ejaaz: screams at me is like, we need a ton more compute.

07:34

Josh: Big time. And it's amazing to think about how far we've come just in the last Josh: three months leading up to this moment here. Josh: I mean, when you think about over the winter break is when people really start Josh: to take vibe coding seriously. Josh: And since then, companies have gone from a very small percentage of code to almost 100% of code. Josh: I mean, this is saying 80% plus of all code deployed is written by CloudCo just for Anthropic.

⁠¶ Predictions for AI Models

07:54

Josh: It's unbelievable we started with opus 4.5 which Josh: was released in november and then opus 4.6 came Josh: in february which took us from a 200 000 token contacts Josh: went into a million and now whatever this new thing is is going to really drive Josh: up the coding capabilities in a really big way and i think it's probably worth Josh: checking in on which model is going to be the strongest model which company

08:18

Josh: has the best model through the end of june and thanks to polymarket we have Josh: some interesting stats on this. Josh: So the people are betting that Anthropic has a 66% chance of having the best Josh: AI model in June, which is huge. Josh: And that number has increased very significantly recently. If you look just Josh: back in February, it was Google who was the heavy favorite with a almost 80% Josh: chance or 70% chance of having the best model.

08:41

Josh: That has changed recently in a big way, perhaps because of this leak.

08:45

Josh: But I'm not sure if this is fully up to date and Josh: it may be missing some information because we have some news on open ai and Josh: google who are planning to release something really important too and thank Josh: you for probably for sponsoring that part of the show but let's talk about open Josh: ai there's a new code name spud model that's coming and this is probably going Josh: to be the mythos competitor so what is this looking like yeah

09:05

Ejaaz: Um that's the issue we don't really know all of these models we don't have the Ejaaz: the specs we need the specs to talk about them. Ejaaz: There's a few trends or patterns that are happening amongst the hottest, Ejaaz: or should I say, top two or three AI labs. Ejaaz: We've got Anthropic Releasing Mythos, which is their AGI or pre-AGI model, Ejaaz: a massive, massive leap ahead. Ejaaz: OpenAI is working on the same thing. They've been secretively working on a larger model.

09:30

Ejaaz: This has gone through a few different names. If you remember, Ejaaz: Josh, by the end of the year, I think it was referred to as codename Sprout. Ejaaz: And now it's referred to as Spud. So I don't know if that implies that it's Ejaaz: grown massively since then. Ejaaz: It's growing. But these models are supposedly meant to be anywhere between 10 Ejaaz: to 20 trillion parameter models.

09:47

Ejaaz: Now, for context, the largest models that we currently look at right now is Ejaaz: between one to two trillion. Ejaaz: So this is a major order of magnitude larger model. Ejaaz: They're going to be compute intensive. They're going to be very expensive to serve. Ejaaz: So we need to figure out how to scale AI infrastructure and a bunch of other things.

⁠¶ OpenAI's New Direction

10:04

Ejaaz: But OpenAI's model is codenamed Spud, and it's meant to be the competitor to Ejaaz: Mythos. People are anticipating that it might be something like GPT 5.5 or rather GPT 6. Ejaaz: So again, a tier above what we see today. It's going to be advanced in coding, Ejaaz: reasoning, and a lot of the things Anthropics is as well. Ejaaz: When I look at this, Josh, personally to me, this seems to be, Ejaaz: one, a massive bid to try and leapfrog each other.

10:31

Ejaaz: And number two, maybe try and juice their numbers ahead of a potential IPA. Ejaaz: I don't know whether your reaction to this is the same, but that's like my gut Ejaaz: reaction when I read news like this. Josh: Yeah, it's probably both. They want to juice up things before the IPO, Josh: but they also just want to win. Josh: And I have some pretty strong speculations just based on vibes of what this is going to look like.

10:48

Josh: I think we've been seeing this recent convergence around OpenAI, Josh: particularly on focus and on really dialing in what they're focused on. Josh: And we saw a big move last week when they removed Sora. They totally destroyed Sora.

11:01

Josh: They moved a lot of the teams together. They made their chief of product, Josh: um the chief of like agi release and it appears as if they're building a mega Josh: app based on the rumors so Josh: Part of the reason why I have a difficult time using OpenAI's products is there's Josh: kind of spread out everywhere. Josh: There's like the Sora app was one, there's Codex, then there's their browser, Josh: then there's ChatGPT, and there's a lot of different software.

11:26

Josh: And the same is true with their models, or it was at least, where there was Josh: GPT 5.3 Codex, and there was 5.3 High, Mid, Low. Josh: There's all these different models that really complicate things and confuse things. Josh: With 5.4, they made a singular model. Now 5.4 does your coding and it does the reasoning all in one.

11:42

Josh: And what I suspect with this new model, Codename Spud, is going to be the kind Josh: of pinnacle of this focus, where I'm hoping they release this with their new Josh: application, with a singular model. Josh: So there's one model that is all-knowing. There's one application, Josh: similar to what Anthropic does with the Cloud Desktop app, that has all of the Josh: functionality under one roof. Josh: And I think they're going to probably use this as a point to really...

12:04

Josh: Lean into that focus instead of distributing this across a lot of different areas. Josh: And I'm hopeful that that will meaningfully change OpenAI more so than it'll Josh: change Anthropik because it actually changes the way that users interface with Josh: the product and it becomes a much better product. Ejaaz: Yeah, I think for the majority of last year, I was pretty upset with the way Ejaaz: that SAM and OpenAI were focusing on so many different things.

12:25

Ejaaz: I was just like, just focus on creating a really good model. Ejaaz: You're being left behind in coding, Anthropik's eating your lunch, like figure this out. Ejaaz: And then since their code read of like, what was it, November last year, Ejaaz: they've been like reallocating compute, money, data, and all their resources Ejaaz: to focus on building the best general model and the best coding model. Ejaaz: So we're starting to see the fruits of that labor.

12:45

Ejaaz: I have a lot of faith now in OpenAI that they're going to produce a really good Ejaaz: product that will compete with the likes of Anthropic, which have been eating their lunch. Ejaaz: When I look at like the last week, it seems like it's pretty negative for OpenAI. Ejaaz: You mentioned that they killed Sora. Ejaaz: They also killed the $1 billion deal that they had signed with Disney.

13:00

Ejaaz: And they also shut down ChatGPT adult mode and a bunch of like consumer shopping Ejaaz: apps and their like app marketplace as well. Ejaaz: They're just focused on these few things right now. Ejaaz: But then the other thing is Sam is also kind of defaulting on a few of the major Ejaaz: GPU and data center deals, right? Ejaaz: So we had the OpenAI and Oracle Abilene deal fall through where they couldn't Ejaaz: finance it for a variety of different reasons.

13:24

Ejaaz: Then the other thing is they're defaulting on purchasing up to 40% of the world's Ejaaz: memory supply because they haven't figured out their finances right now.

⁠¶ Google’s Agent Smith

13:32

Ejaaz: So I think that OpenAI is going through kind of like a puberty period where Ejaaz: they're figuring their stuff out and where to reallocate resources. Ejaaz: But I think they're going to pull through. Josh: And it also seems like this is indeed a serious breakthrough. Josh: I mean, Sam, in an internal memo that got leaked out to employees, Josh: he said things are moving faster than many of us expected. Josh: And he called it a very strong model that can really accelerate the economy.

13:53

Josh: That seems like pretty large claims to make internally with employees who are Josh: also kind of in the know and aware of what's going on. Josh: And I just think that a lot of us who are sitting outside these labs are not Josh: entirely wrapping our head around how much progress is actually about to hit Josh: us over the next couple of months with these new model releases. Josh: It seems like they're step function improvements.

14:15

Josh: And one of the employees from OpenAI actually hinted that Spud contains a capability Josh: that is very different from what we've seen before. So while there aren't specifics, Josh: there are clearly a lot of these huge novel breakthroughs incoming, Josh: which is worth looking out for. Josh: There's one final model release, model leak that we have from Google, Josh: who has been doing well, kind of chugging along slowly in the background. Josh: And this is called Agent Smith.

14:38

Josh: It's a secret AI tool. Do you have any information on this one, EJS? Ejaaz: Yeah, so there was like a leaked report from an insider at Google. Ejaaz: Apparently, Google employees are using a new internal tool called AgentSmith Ejaaz: that can automate tasks such as coding, according to three people that were familiar with it.

14:53

Ejaaz: The way that this product is supposed to work is within their Vibe coding platform Ejaaz: called Antigravity, which exists today but hasn't really had a major upgrade Ejaaz: for, let's say, a couple months now, which is like an eternity in the AI world. Ejaaz: So they're releasing a new AI model called Agent Smith that is supposed to take Ejaaz: a multi-agent approach and use an upgraded version of Gemini 3.1. Ejaaz: So it's probably not going to be 3.1. It might be 3.5 or maybe even 4.

15:20

Ejaaz: Again, another order of magnitude leap up. So what we're seeing here is Google Ejaaz: working on an AI coding model competitor to try and catch up to Anthropic and Ejaaz: the likes of OpenAI's codecs. Ejaaz: You've got OpenAI trying to reallocate resources and focus on building the best Ejaaz: general model and catch up with Anthropic, which they have at coding.

15:36

Ejaaz: Then you have Anthropic trying to keep these two at bay and make the next order Ejaaz: of magnitude up spending all their compute but coming at the expense of serving Ejaaz: their existing users which they're adding like a million a day reporting you know Ejaaz: Claude servers being down and reduced quality of usage. So this is a very, Ejaaz: I can like feel the tension in the air between these three companies right now. Ejaaz: I don't know what Mets is doing.

15:58

Ejaaz: I don't know where Grok is. I'm rooting for them. I hope they catch up. Ejaaz: But it seems to be these three major competitors right now that are in the running for winning this race.

⁠¶ The AI Landscape

16:05

Josh: They're firing. I mean, in the last 90 days, since we started this year to now, Josh: we went from 200,000 context windows to a million.

16:11

Josh: We went from these coding assistants to compiler writing Josh: agents who are completely capable of writing a very small amount Josh: now over a quarter of google's production software and 80 plus Josh: of anthropic software everything we learned this week the frontier is Josh: going to keep moving faster and faster so we're in Josh: for a crazy q2 q3 q4 just a Josh: crazy 2026 and as all these things happen as these Josh: ipos start to happen and they get even more fundraising to deploy

16:35

Josh: these ai data centers at scale things are really going Josh: to get weird in a hurry but we will be here to cover it as always um if you Josh: enjoyed this episode please don't forget share it with your friends uh like Josh: it on youtube don't forget to subscribe if you listen on a podcast player like Josh: spotify or rss you could rate us five stars there it's always really appreciated Josh: you just any final notes before we sign off for the day we've

16:56

Ejaaz: Been absolutely killing it on our side uh loads of new subscribers loads of Ejaaz: new listeners thank you guys so much for for joining us um and yeah i have a Ejaaz: request because we always like to give out homework at the end of the episode Ejaaz: um if you're listening to this and you are a insider at anthropic open air or Ejaaz: google and you are willing to give an anonymous tip to our accounts, Ejaaz: please spin up an Anon account on x slash Twitter and DM us.

17:20

Ejaaz: I would love to hear from you. Josh: That'd be great. Well, yeah, thank you guys for watching. We'll see you in the next one.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript