THIS WEEK IN AI: Chat GPT-5.5 Beats Claude Mythos, SpaceX Cursor Rumors, Google's New TPU's

⁠¶ Launch of ChatGPT 5.5

00:00

Ejaaz: The most powerful model in the world is here right now. In fact, Ejaaz: it's so good that it beats Claude mythos. Ejaaz: OpenAI just released ChatGPT 5.5 and it crushes Claude on every single benchmark. Ejaaz: It's the new number one coding model. It can do 20 hour tasks that expert software Ejaaz: engineers sometimes can't do. Ejaaz: It's already discovered groundbreaking solutions in maths and frontier sciences such as genetics.

00:24

Ejaaz: And it's cheaper than GPT 5.4. This is the result of two years worth of frontier Ejaaz: research released in this one single model. Ejaaz: In fact, it's so good that an NVIDIA engineer said, and I quote, Ejaaz: losing access to GPT 5.5 feels like I've had a limb amputated.

00:40

Josh: I think a lot of people are going to compare this to Opus 4.7, Josh: and that's fair, but I really think the true comparison is to Mythos because Josh: Sam Elman recently, he just posted something as the model was coming out that Josh: felt very much like a jab at Mythos. Josh: And we're going to get into the benchmarks comparing them, many of which will

00:56

Josh: actually beat the Claude model. But what I find most interesting about this Josh: post is the second paragraph where he says, we believe in democratization. Josh: And he mentioned specifically, we have been tracking cybersecurity as a preparedness Josh: category for a long time and have built mitigations we believe in that enable Josh: us to make capable models broadly available.

01:14

Josh: So this is very much a dig at Mythos, which is, as we all know, Josh: privately available, only gated to the companies that are given allowance to it. Josh: ChatGPT and OpenAI are like, hey, we're going to give you the powerful cybersecurity. Josh: We're just going to bake in the precautions into the model so that everyone could have it. Josh: And it ends by saying it's this really sweet thing. It's like we love you and

01:32

Josh: we want you to win. We believe in everyone having access to this intelligence. Josh: And I really respect that. And I think it's an awesome way to set the precedence Josh: for what the next generation of these models is going to look like. Josh: But before we go any further, let's talk about the model itself. It's out right now. Josh: If you have a chat GPT membership, you can go and use it, go and play with it.

01:48

Josh: EJS, what's the TLDR? What are the high-level things that everyone should know? Josh: What's most new and noteworthy about GPT 5.5? Ejaaz: Okay, so inspired by your mythos comparison, the first question that pops into Ejaaz: my head is I use Claude Opus 4.7 every single day. So I'm like, Ejaaz: is it better than this? Like, should I be switching back to ChatGPT right now?

02:06

Ejaaz: The answer might be yes. So if we look at the benchmark score right here, Ejaaz: GPT 5.5 on the left over here absolutely crushes all the standard benchmarks Ejaaz: that these frontier models are weighted against. Ejaaz: And if you look on the right over here, Claude Opus 4.7, it either doesn't even Ejaaz: measure in a particular category, or it's completely beaten by GPT 5.5.

02:26

Ejaaz: In fact, the only stat that GPT 5.5 doesn't beat Opus 4.7 in is something called Ejaaz: Software Engineering Benchmark Verified Pro or something like that. Ejaaz: It's like the pro software coding situation. Ejaaz: But there's a footnote at the bottom of this blog where OpenAI states, Ejaaz: Anthropic has publicly said that they might have gamed that particular benchmark Ejaaz: and they need to be re-evaluated. Ejaaz: So we might have a complete clean sweep for 5.5 as we see today.

02:52

Ejaaz: So it's an incredibly powerful model. Ejaaz: But a question that popped to my head is, does it actually beat Mythos? Ejaaz: And we have a direct comparison right here. Josh: Yeah, so it shows that it does across some benchmarks. Now, again, Josh: these benchmarks are pretty fuzzy. Josh: We don't know which ones are gamed to do what. But there is a world in which Josh: GPT 5.5 will outperform Mythos on some things, which ones we're not entirely sure.

03:13

Josh: I think as we kind of figure out ways to describe GPT 5.5, it seems as if it's Josh: their first attempt at making a model built for autonomy instead of answers. Josh: I think a lot of the benchmarks that they're working on is in agent decoding, Josh: things like it handles tasks that are 20 hours long. We'll get into that. Josh: It's doing 85% of OpenAI's internal work already.

03:33

Josh: And it also helped rewrite the infrastructure that Josh: built it there was this amazing quote in the blog post it said open ai Josh: says 5.5 itself helped optimize the stack Josh: that serves it codex analyzed weeks of production traffic and Josh: wrote custom heuristics for load balancing that boosted token generation speed Josh: by over 20 so they're using the model to actually build the model and make it

03:53

Josh: maximally efficient based on the data that it's collected from users like us Josh: who are interacting with the model on a daily basis so it's very smart it's Josh: very clever it's not just there to give you answers. Josh: It's there to think deeply and actually solve problems for you in a way that Josh: I think Mythos and a lot of these other frontier models are kind of pivoting towards now.

04:11

Ejaaz: The great thing about this model release is it reveals a few things that OpenAI Ejaaz: has as an advantage against, say, a frontier lab like Anthropic. Ejaaz: It's clear looking at these benchmarks compared to Mythos, which, Ejaaz: by the way, the entire world is spiraling because of this model, Ejaaz: because it's going to have the cybersecurity ability to take over any kind of government system.

04:30

Ejaaz: This model is pretty close, and Sam is going to be releasing this publicly, Ejaaz: or OpenAI is going to be releasing it publicly for everyone to use. Ejaaz: So a question that pops to my head is, does this mean that it's a matter of Ejaaz: compute, and OpenAI just simply has more of them?

04:44

Ejaaz: Certainly, if you compare Sam Altman's ability to acquire compute and spend Ejaaz: all these trillions of dollars to acquire it versus Anthropic, Ejaaz: Anthropic has been extremely conservative, and now they're struggling.

⁠¶ Comparing with Mythos

04:54

Ejaaz: They recently signed a $5 billion deal with Amazon, which we'll get to later Ejaaz: on. But the point is, this is a tale of two stories. Ejaaz: Either OpenAI has enough compute and they're about to leapfrog Claude because of that. Ejaaz: And they're proving that through this model that is a very good answer to Mythos. Ejaaz: Or, and this is the alternative side, Anthropic's Mythos model is just plainly better than 5.5.

05:15

Ejaaz: And these benchmarks are actually verified, which is technically kind of true Ejaaz: because I don't know how official these things are. Ejaaz: These are just through tests that a small set of users have done. Ejaaz: So it's a game of both. I'm sure Anthropic is watching this and thinking, Ejaaz: hmm, maybe we should roll out Mythos, but they don't have to compute.

⁠¶ Inference and Pricing

05:30

Josh: Yeah, they don't have the inference. In fact, speaking of the inference, Josh: Sam actually made a post saying that he's really...

05:36

Josh: Excellent work by the inference team to serve this model so efficiently he Josh: wanted to really highlight the fact that to a significant degree they've Josh: become an ai inference company now and i think that's a Josh: really big difference than what was previously stated like anthropic has really Josh: tough time serving compute and we see that and even if they had mythos available Josh: in a way that was safe they can't serve it open ai can and we see it reflected

05:55

Josh: in pricing because i mean we have some pricing for this model right and it seems Josh: as if it's roughly at par with 4.7 if not slightly better? Ejaaz: Slightly. It's slightly more expensive, but not by much. Ejaaz: So for every million tokens input, it's both the same for Anthropic Opus 4.7 and GPT 5.5. Ejaaz: It's $5 in, but the output is $30 for 5.5 per million tokens and $25 per million tokens for 4.7. Ejaaz: So it's a little more expensive, but here's where you actually have more of

06:27

Ejaaz: a bargain using the more expensive model 5.5. It is cheaper than GPT 5.4, Ejaaz: and it uses tokens way more efficiently to think. Ejaaz: So what does that mean if you are an enterprise that wants to plug in this AI Ejaaz: model and not worry about it and just have it power your entire profit engine?

06:45

Ejaaz: Well, you end up using less tokens, so you hit your rate limits in a much slower Ejaaz: rate, which means that you end up getting more bang for your buck as long as Ejaaz: you use the model like 24-7 or you use it effectively. Ejaaz: Way if you are just kind of out there using 5.5 to like ask questions that you Ejaaz: should maybe be asking google this is probably not the model for you but otherwise super powerful one

07:07

Josh: Yeah. And if these prices don't mean anything to you, that's fine. Josh: As long as you have a $20 a month subscription. Josh: In fact, this is going to be available to freezers fairly soon, I believe. Josh: But anyone who is a subscriber has access to this. You don't need to use the Josh: API. There's nothing fancy. Josh: You open up your app on your phone, you go to the web browser, Josh: it's there, it's available, ready to go.

07:23

Josh: Now, there's a few interesting things that you can do with this model that haven't Josh: previously been possible.

⁠¶ New Applications

07:28

Josh: And although we don't quite have access to it just yet, we're recording this Josh: right as the model got launched.

07:32

Josh: We do have a blog post from OpenAI themselves who are showcasing a few demos Josh: so again take these with a grain of salt these are straight from open ai but Josh: they are seemingly pretty impressive and pretty noteworthy as to what they're Josh: capable of doing starting with this space mission application which is um pretty Josh: cool and very reminiscent of the moon mission that we just had yeah.

07:51

Ejaaz: Um so if you guys don't know um josh has a secret he has many secrets on this Ejaaz: show one is he's a massive space fan and when he's not hanging out with me he's Ejaaz: doing uh space simulations uh on whatever he can do right well okay maybe maybe Ejaaz: be part of that is a bit of a lie. Ejaaz: But with this new app that we're seeing in front of us right now, Ejaaz: this was completely vibe coded using 5.5. Ejaaz: And it's used to simulate a specific space mission.

08:15

Ejaaz: Now, if this looks very similar, it's because we just had a space mission for Ejaaz: some we visited or went back to the moon in 53 years, pretty big deal. Ejaaz: And we can see a pretty accurate simulation going on right here. Ejaaz: So as you can see, there's various different toggles, the physics of the entire Ejaaz: thing is very important.

08:30

Ejaaz: And that's another point I want to make about this model. it is being Ejaaz: used for frontier research not just in ai but in Ejaaz: mathematics in genetics like it made frontier progression on both of these fronts Ejaaz: and so what we're showing here is this is a model that goes way beyond just Ejaaz: text and telling you what could be it actually implements this into a lot of Ejaaz: different things and understands the world around it which is extremely powerful

08:52

Ejaaz: but we have another one here we have a we have an earthquake tracker Josh: For anyone who wants to make websites, it's so good at making websites. Josh: And this appears to be one of the strong suits. In this case, Josh: there's a few things to highlight on this Earthquake tracker. Josh: One of them being that it's one, just like a pretty elegantly designed website. Josh: But two, all of the graphics are interactive. You'll notice that they update

09:11

Josh: dynamically as you hover over them, as you click. It looks very clean. Josh: I assume that it is pulling up-to-date information from an API somewhere that it set up.

09:19

Josh: It is just truly competent and capable of doing these kind of longer tail tasks Josh: that are a bit more complicated than a static landing page, Josh: but have dynamic data have the richness that Josh: you would expect from a high-end high-quality polished website Josh: except just built with an ai model from someone who doesn't Josh: need to know anything about coding at all and then Josh: for the gamers also there's another great example of a dungeon game

09:41

Josh: which is they're describing as a playable 3d dungeon arena prototype Josh: built with codex and gpt models now Josh: i think this is something novel to this setup where codex handles Josh: the game architecture the combat systems the enemy encounters and then the character Josh: models the character textures and animations those were created with third-party Josh: asset generation tools using something like image gen 2.0 so this is also one

10:03

Josh: of the earlier signs where you can actually merge a lot of these tools together Josh: to build something dynamic in a way that you previously couldn't have done before.

10:09

Ejaaz: Yet actually the quality of this game looks like something out of uh league Ejaaz: of legends or something like that at least that's what it reminds me of like Ejaaz: the these games are getting way more high def than i expected i know it's just Ejaaz: like it's pretty basic for anyone that's watching this they can kind of like Ejaaz: pick with a finer eye but it's cool but for those of you who prefer like the Ejaaz: more traditional side of games

10:27

Ejaaz: this might be something that you can kind of vibe code in a Ejaaz: couple of minutes now it may look basic but theoretically Ejaaz: this is like a 3D spatially aware game and that's not something you could achieve Ejaaz: at least very easily with previous models what I love about this as well is Ejaaz: it's also they've also created or included the prompt for all of these things Ejaaz: so this is something that you can try right now like look at this And the prompt

10:49

Ejaaz: is no more than like, what's it? Ejaaz: One, two, three, four, like 12 lives. 12 lives, dude. Ejaaz: And you can have like a fully functioning game. You can probably then add an Ejaaz: extra step or extra prompt saying, hey, can you deploy this to Vercel? And- Ejaaz: Send that to your friends. Now you can use, you have a game. Ejaaz: You're a game creator. You're a game developer.

⁠¶ Impressive Capabilities

11:05

Ejaaz: So the applications for this model cannot be understated. Ejaaz: I'm going to be very honest. I thought this model was going to be just an iterative Ejaaz: upgrade. I didn't think it would get anywhere near Claude Mythos. Ejaaz: Two stories have now revealed themselves, which is, one, it's the answer to Claude Mythos. Ejaaz: And two, it's really damn good. I am now convinced that compute is everything,

11:26

Ejaaz: but not in the way that I thought it would be useful. I thought it would be Ejaaz: largely for pre-training. Ejaaz: But to Sam's tweet earlier on, and also in Greg Brockman, the president of OpenAI's Ejaaz: recent interview, they're going all in on inference, test time compute, Ejaaz: which just means that if you have more compute and if you have a good enough Ejaaz: model, it can do the thing.

11:44

Ejaaz: This thing, like I said, built itself. It's a self-improving model. Very, very impressive. Josh: It's good for solving hard problems. It's good for thinking for a long time. Josh: In fact, they marketed it as a model that can now think for 20 hours coherently.

11:56

Josh: Great which is almost a full day it can work Josh: on a problem yeah and what you're noticing from this prompt that's on screen is Josh: it doesn't take that much to get it going you don't need to Josh: kind of spoon feed it all the way through anymore it can make Josh: decisions on its own it can infer conclusions on what Josh: you want just based on the the knowledge architecture that it Josh: currently has it's amazingly impressive in fact one of the people who got access to

12:15

Josh: it early just posted on x that he's posting Josh: live as his um prompt is seven hours Josh: into his task it has been running for over seven hours Josh: he said this has literally never happened before the models would maybe run Josh: for 30 minutes or so wow or or if you Josh: really shouted them after two to three hours but he's on seven plus hours i Josh: think this is going to be fun for people with complicated things if you really

12:37

Josh: want to make a triple a feeling video game or a simulator or a really complex Josh: website this is the model to try out and to use it with codex and see how all Josh: these things kind of piece together it's really i mean Josh: I wasn't, I didn't have my hopes very high based on the Opus 4.7 to 4.6 incremental improvement. Josh: This seems like a very solid improvement over 5.4. Ejaaz: Absolutely. And listen, if you are listening to this and you're like, listen, I'm not a gamer.

⁠¶ Beyond Coding

13:01

Ejaaz: I can't waste my time with that. I focus on more serious things. Ejaaz: Well, for you serious people, if you're a manager at a top company or whatever Ejaaz: that might be, this isn't just a toy or a model used for coders. Ejaaz: A lot of the examples that we just gave are around coding.

13:15

Ejaaz: You can use this for just admin stuff or managerial work, like the capability Ejaaz: of this model to think more strategically and long-term and understand the context Ejaaz: of the tasks that you're working towards. Ejaaz: Like we said earlier, for coding specifically, it can work on 20-hour-long expert tasks. Ejaaz: That also applies for administrative stuff or things that are more generalized, Ejaaz: white-collar worker work.

13:38

Ejaaz: And so in this example, Noam Brown says, I'm a manager at OpenAI, Ejaaz: but I'm using this model to basically manage my entire team and make sure we're Ejaaz: focused on the right things. Ejaaz: And guess what? but the output of this team and this product has been pretty amazing. Ejaaz: So all around really excellent work by the entire team and the inference team Ejaaz: specifically, as Sam Altman says here.

13:56

Ejaaz: And yeah, I'm looking forward to using this thing. I don't have access to it right now. Ejaaz: I've refreshed my account probably like five times at this point and it hasn't Ejaaz: appeared. So maybe it's like a slow rollout. Ejaaz: But if you're listening to this and you've tried it out, let us know what you're Ejaaz: using it for. Let us know what amazes you. I really want to hear more.

14:11

Josh: Yeah, OpenAX had a pretty incredible week. And this comes on the back of their Josh: new ImageGen model that they just released, which was also unbelievable. Josh: If you haven't seen that episode, we just recorded it yesterday. Josh: So I would go advise you to see because, oh my God, it is amazing.

14:23

Josh: We also recorded an episode on Apple's new CEO this week and what that means Josh: for the company, as well as the hardware race and how this, I mean, Josh: this model, Opus, no, not Opus, this is GPT. Josh: GPT 5.5 is very much part of the AGI class of models that is built on Blackwell Josh: chips. and we've recorded an entire episode all about that. Josh: Very interesting, very fascinating. Also interesting and fascinating because

14:46

Josh: as always, this is the weekly roundup. We have a few other topics to talk about. Josh: We have some news out of SpaceX, which is a pseudo acquisition. Josh: Now they haven't quite acquired Cursor being the company in question, Josh: but they have at least partnered with them with the option to buy Cursor for Josh: either $60 billion or pay 10 billion for the right to actually work together. Josh: This seems like a big deal.

15:07

Josh: This seems like, I mean, XAI, we could call it SpaceX, but SpaceX AI is taking Josh: AI very seriously. They're currently behind. Josh: They clearly don't want to be behind. This is a huge step and a huge kind of Josh: trust of support in Cursor with this minimum of $10 billion into accelerating Josh: their progress and trying to get themselves into this game. Ejaaz: This is actually a genius deal, and there are a few stories why it makes that so. So let me explain.

⁠¶ SpaceX Partnership

15:30

Ejaaz: If you're SpaceX AI, which by the way is a ridiculous name now, Ejaaz: like we'll just call them XAI, you are currently harboring... Ejaaz: One to 1.5 million of the frontier GPUs, mainly NVIDIA, in a warehouse. There's one issue. Ejaaz: You're not really utilizing all of it because XAI has had a bit of a slow start Ejaaz: to training their models. What's a genius idea? Ejaaz: If I rent those out to another company to train their own model,

15:59

Ejaaz: then we can make money from that. Okay, so that's win number one for SpaceX.

16:02

Ejaaz: But then they've thought of another thing which is huh grok Ejaaz: isn't really good at coding and we are Ejaaz: losing the race every single day we don't update our model Ejaaz: at coding because anthropic and chat gpt Ejaaz: 5.5 is completely running away with it so Ejaaz: how did they leapfrog and get ahead they should acquire the company that is Ejaaz: using their own gpus to train a frontier coding model so then the question becomes

16:27

Ejaaz: well who the hell is cursor what what's the mode that they have like why do Ejaaz: they have a good shot of training a better coding model than Anthropic and GPT-505? Ejaaz: Aren't those two companies way ahead? Well, the answer is not quite so. Ejaaz: Cursor, for the longest time, was the number one platform and tool for people Ejaaz: to use to do their Vibe coding. Why?

16:46

Ejaaz: Not only did they have access to Frontier coding models from Claude and ChatGPT, Ejaaz: they also had something called an agent harness. Ejaaz: Now, you'll notice in GPT-505, it's really good at coding because of something Ejaaz: called agentic That is something that Cursor pretty much pioneered.

17:03

Ejaaz: It's basically the harness, the prompts, the environment that they mold the model, Ejaaz: or rather that they mold around the model that makes it so good and intuitive Ejaaz: and remembers the context across every single project, like menial things, Ejaaz: like understanding your GitHub branches and working on separate flows at the same time. Ejaaz: A lot of the top software engineers in the world right now use tools like Curse Ejaaz: and Argentic Coding to be able to pull this off.

17:27

Ejaaz: So Elon Musk thought, hmm, if I give you the GPUs to train a better coding model, Ejaaz: which gives you a better product, I should have the option to acquire you. Ejaaz: In acquiring you, I can integrate you with Grok and Grok somehow becomes the Ejaaz: number one coding model over the next year or so, depending on if this deal goes. Ejaaz: And if the deal falls through and they create a really bad model, Ejaaz: well, you pay me $10 billion for the service.

17:50

Ejaaz: Well i pay you not a bad deal not a bad deal Josh: Yeah it seems like they're they're going to be continuing to Josh: work with other companies to accelerate in places that Josh: they're weak at currently because i mean they they're so strong at Josh: building out the hardware and creating these huge data centers they need Josh: someone who could take advantage of all those gpus hopefully this will Josh: help serve that cause and that's not the only spacex news

18:09

Josh: this week the other is that they have officially filed an s1 which Josh: for those who are not familiar it means they're going public it's officially official Josh: 100 they will be going public this year if there Josh: were any doubts please let them be relinquished here we Josh: have it spacex will be going published the most interesting thing from Josh: this was i think the share structure of Josh: how they're going to be organizing this for daddy elon who's going to be getting

18:32

Josh: quite a big payday if he does well so we have on screen here just a series of Josh: some of the financials i mean we know starlink as a business has been doing Josh: unbelievable they have about 25 billion dollars in cash 92 billion assets 50 Josh: billion liabilities that's.

18:45

Ejaaz: Quite a lot of liabilities on this my god Josh: They got a lot of debt man i don't know we'll see we'll see once they finally Josh: publish everything i'm very excited for the first earnings report where you Josh: really get a true peek behind the scenes of what's going on there but it looks Josh: like it's going to be going public at a 1.75 trillion dollar valuation now in terms of pay structure.

19:02

Josh: Elon is posed to get 60 million shares Josh: which is 11 tranches vesting in Josh: 500 billion dollar market cap increments from 1.1 trillion to 6.6 trillion dollar Josh: share price oh um so for those unfamiliar with the current ceiling i think it's Josh: nvidia nvidia is what five trillion under five trillion close to five trillion Josh: it's like 4.3 yeah okay so not even close they're like 20 away from five trillion.

19:29

Josh: SpaceX needs to be, what is that? Like 20 something percent more valuable than Josh: the most valuable company in the world. Josh: But if they do, Elon gets 60 million shares. Josh: Now I haven't done the math on exactly how much that is. Josh: But if we make some assumptions here, the total value at Vest looks like it Josh: could be about a quarter of a trillion dollars.

19:49

Josh: So pretty good payday for Elon. I think the most important thing is that he's Josh: getting a lot of control over this.

19:54

Josh: It seems as if he's going to have 40 something percent control of Josh: the company which is really ultimately what was most important to Josh: him as they went public so really exciting news i am hopeful that it happens Josh: this june which we can expect and it's without a shadow of a doubt going to Josh: be the largest ipo in history i think everyone's going to be talking about it Josh: there is a new vehicle in which some people are investing in we're actually

20:14

Josh: going to have the founder on the show soon so keep an eye out for that one. Josh: And yeah, the SpaceX news is very exciting. Ejaaz: Now, in the world of AI hardware, many people think that NVIDIA has run away with the win. Ejaaz: And you could argue that with a $4.300 market cap, not many people are competing, Ejaaz: except that there is one company, Google. Ejaaz: Now, you might be thinking, Google does all my search engines and stuff.

20:36

Ejaaz: Well, Google is the only vertically integrated Mag 7 company that is involved Ejaaz: or has a frontier capability at every single layer of the AI stack. Ejaaz: Now, right at the bottom are these things called Google TPUs, Ejaaz: Tensor Processing Units. Ejaaz: And they're their version of the GPU. In fact, fun fact, Ejaaz: Google's Gemini models has never trained on an NVIDIA GPU.

21:00

Ejaaz: It's all been their own internal warehoused infrastructure. And they've been Ejaaz: working on this thing for 10 years. Ejaaz: Now, just today, or rather this week, they released their latest generation Ejaaz: of TPUs, the TPU-8T and the TPU-8i. Ejaaz: Now, the TPU-8T, T stands for training or pre-training. Ejaaz: It is highly optimized for the pre-training part of an AI model. Ejaaz: So this is like the bulk, arguably the more expensive part of training a model.

21:27

Ejaaz: It's like teaching it like, hey, these are words. Ejaaz: These are the general fundamental set of facts that you need to know before Ejaaz: we can kind of like put you out into the world and present you to our users. Ejaaz: TPU AI is specialized or hyper-specialized in inference specifically. Ejaaz: Now, the important part about inference is it's being used for so many different things.

21:46

Ejaaz: Number one, it's to answer all your different prompts. Whenever you write a Ejaaz: prompt and you submit it to an AI model, it is known as inference. Ejaaz: It's getting inference. It needs to query the model and make sure it does the Ejaaz: right types of thinking and gives you the right answer.

21:57

Ejaaz: But the other part of inference is post-training, where a lot of people train Ejaaz: the model, and then they do more training after the fact by using it to help Ejaaz: the model reason and think of other alternative facts before it presents you the actual answer. Ejaaz: And that's what that second TPU is. Now, Google's TPUs have been used extensively. Ejaaz: In fact, their largest customer is a little-known AI lab known as Anthropic,

22:19

Ejaaz: which currently runs 1.5 million TPUs. So the argument can be made that TPUs Ejaaz: are largely responsible for Claude's and Opus' success. Ejaaz: So very impressive all around, but there's some other facts about this, right? Josh: Yeah, well, I love the dual architecture training setup that they have here being hyper specific. Josh: I mean, the AT chip in particular, it's built to reduce frontier model development Josh: cycles, they said, from months to weeks.

22:43

Josh: And then we have the AI, which is the reasoning engine, which is specifically Josh: served for agentic use to deliver tokens really quick, as fast as possible. Josh: And as we know, Anthropic is working closely with them. And also, Josh: I mean, Google is making these for themselves.

22:55

Josh: So I think whoever is working with Google, whoever's kind of focused on these Josh: accelerators, is probably in for a nice little windfall as it relates to increased Josh: velocity of the training and also increased ability to distribute these models. Josh: As we know, Anthropik is having a very difficult time with this. Josh: Now, NVIDIA and Jensen are probably feeling a little shook.

23:14

Josh: They got to be feeling a little bit of pressure here. and it seems as if Josh: that's why they're pushing to be open source because if you Josh: are a in a closed source world where everyone is Josh: making closed source models on their own architecture then the Josh: nvidia edge very quickly disappears and i mean i'm looking at these chips in Josh: hand they look beautiful they're ready to be they're taped out ready to be manufactured

23:32

Josh: and i i think you could start getting kind of excited about this new world of Josh: accelerated hardware and we're seeing this happen again and again because amazon Josh: just made another big investment in who else other than Anthropic. Josh: And the deal, I think, is like this has to be close to a record deal. Josh: They're owning a tremendous amount of this company now. Ejaaz: Yep. So the news here is Amazon announced they're investing $5 billion into Anthropic.

23:57

Ejaaz: They've just raised $5 billion. Congrats. Ejaaz: And so the reason why this is important is, well, there's a few reasons. Ejaaz: Number one, Anthropic knows that they don't have enough compute. Ejaaz: The argument could be made that's why Claude Mythos hasn't been rolled out. Ejaaz: Well, hey, hey, presto, now you have $5 billion worth more of compute. Ejaaz: Now, for those of you who didn't know, Amazon is a primary investor already in Anthropic.

24:19

Ejaaz: Before this announcement, they owned around 17% of Anthropic. Ejaaz: After this announcement, it's closer to 20%. So we're talking about one company Ejaaz: that's publicly tradable right now that owns a fifth. Ejaaz: Is my math right? Yeah, a fifth of the world's leading Ejaaz: AI lab, which is pretty crazy. Now, if we look into the stats of this, Ejaaz: this is a five gigawatt deal, which is more than any single data center that's currently live.

24:46

Ejaaz: It's actually a multiple of five. I think SpaceX AI's Colossus 2 is the largest Ejaaz: right now with their 1 million TB. Ejaaz: So it's going to be 5X larger than the average data center that we're seeing Ejaaz: right now for AI specifically. Ejaaz: And they're aiming to get one gigawatt online by the end of the year. Ejaaz: Now, Now, the reason why this is so good for both teams is Anthropic already Ejaaz: has a close relationship with AWS and Amazon's cloud computing department.

25:09

Ejaaz: So spinning up more compute clusters is gonna be so easy for them. Ejaaz: They have a working relationship. Ejaaz: They're used to training cloud models on this, so it shouldn't be too hard to Ejaaz: ramp this up. If you're Amazon, Ejaaz: Welcome back. That $5 billion is going to come right back to you.

⁠¶ Anthropic's Funding

25:24

Ejaaz: So I don't know what kind of like circle economy this is, but it's back and Ejaaz: it's very impressive for them. Josh: Is it ironic that today Amazon hit an all-time high? No, maybe, maybe not. Ejaaz: I'm holding stock. I got the stock. Josh: Clearly, clearly they're doing something right. Amazon is a phenomenal company. Josh: They're the largest shareholder in Anthropic. Josh: It's hard not to be bullish on them. It's hard not to be bullish on the accelerated

25:42

Josh: computing stack. And I think that's probably what Jensen is getting nervous Josh: about. That's why NVIDIA is pushing open source. Josh: And the good news is, is he has some help. He has some assistance. Josh: From the folks overseas in China who have been pumping out unbelievable models Josh: all week long as it relates to Kimi and Quen, our Chinese favorites. Josh: We have Kimi K2.6 and Quen 3.6. There's a lot of digits and numbers.

26:04

Josh: All you need to know is that the best open source models in the world didn't exist last week. Josh: They now exist this week and they are better at pretty much everything, Josh: but exceptional at coding. Josh: In fact, word on the street is that some of these models are as good as GPT Josh: 5.4 was and only a few points off of Claude.

26:21

Josh: I mean, these are pretty amazing open source models that, again, Josh: are free to run locally on your machine if you have the machine capability of doing so. Josh: That's a big, that's a big game changer. Ejaaz: Okay, so typically the story we tell with these open source models is, Ejaaz: wow, aren't they so amazing?

⁠¶ Rise of Open Source Models

26:36

Ejaaz: Yeah, they're the good younger brother. They're not as good as the Frontier AI Labs. Ejaaz: That completely changed this week. So Kimi K 2.6 is the latest model from a Ejaaz: Chinese lab called Moonshot Labs. I believe it's Moonshot or Moonshot AI. Ejaaz: And they released their model, which ends up being as good as coding or at coding.

26:54

Ejaaz: As opus 4.7 and it's 100 open source Ejaaz: like you mentioned josh which means that maybe you could run Ejaaz: this on a local device now the answer that you would typically get back from Ejaaz: this is hey like listen it's uh it's too Ejaaz: large to run on my laptop and that is true but with the latest quen model which Ejaaz: is a 3.6 version you can run it as an 18 gigabyte sized model slightly quantized

27:16

Ejaaz: on your laptop today so the point that i want to make about these models isn't Ejaaz: exactly the specifics but across all benchmarks, Ejaaz: it's not as good as the Frontier AI Labs, but it's a few points. Ejaaz: That difference and gap has closed massively over the last couple of months, Ejaaz: which tells me two things.

27:33

Ejaaz: Number one, China has figured out some kind of groundbreaking way to train their Ejaaz: models that they haven't told the West about, and they're going to keep it closed Ejaaz: guard and eventually close source their model releases going forwards. Ejaaz: And number two, they've figured out a new way to use inference to their benefit. Ejaaz: Like one thing I'm going to highlight here is this new Kimi K 2.6 model can Ejaaz: code continuously for 12 hours straight using 300 agents.

27:56

Ejaaz: So the unlock here isn't one model itself. It's spitting up 300 versions of Ejaaz: itself and getting it to attack the problem. Ejaaz: That's something Sam realized and what he's implementing in 5.5. Ejaaz: That's something Opus 4.7 realized and is doing probably similarly with Mythos. Ejaaz: So I have this question here, which is like, how do they have to try to do this?

⁠¶ Anthropic vs Figma

28:12

Ejaaz: Well, I think every three months that there's a new open model that gets released, Ejaaz: they're making these jumps because they're using these models to train themselves Ejaaz: we proved that with kimmy k 2.5 there's too many two point whatevers um and Ejaaz: the same thing is happening with quen it's just all around pretty amazing stuff um Josh: Yeah chen is crushing okay so before we go we have two quick things to hit the

28:31

Josh: first being one that we missed last week which we need to touch on quickly anthropic Josh: has a design tool now if you are a designer if you are interested in building Josh: web pages videos graphics slideshows pitch decks any type of visual asset you're Josh: claude now has an entire design suite built just Josh: for this purpose it's called claude design it exists separately Josh: you can access it through the desktop app or on your browser and

28:53

Josh: it basically allows you to build visual assets in a way that Josh: you couldn't previously previously with claude you had artifacts an Josh: artifact you could generate something dynamic it could kind of Josh: build you a web page this takes it to a whole new level you could generate wireframes Josh: if you want to try it to use less tokens you could fill it out and create properly Josh: created prototypes that are actually clickable it's amazing the video we're

29:13

Josh: seeing on screen highlights a few of them unfortunately there was a big loser Josh: in this because uh this sounds like a lot of what that little design company named figma does. Ejaaz: Yeah the little company Josh: Stock market did not love the reaction to that did it nope.

29:26

Ejaaz: Nope it is down almost 20 on the week um i actually tracked the stock price Ejaaz: after the announcement was made so like it wasn't even readily available it Ejaaz: was literally just the tweet 20 minutes after it was tweeted the stock was down Ejaaz: six percent so So the point being, Ejaaz: whether this is market speculation or not, like, listen, Ejaaz: Claude Design isn't as good as Figma. Ejaaz: They're working with a few of these different partners, such as Canva.

29:49

Ejaaz: Two weeks ago, one of Anthropic's former most execs left the board of Figma. Ejaaz: And the rumors was because they were building a competitor. So it's pretty clear. Ejaaz: Anthropic is going off to every single sector, whether you're a designer, Ejaaz: a software engineer, a mathematician, a research scientist, doesn't matter. Ejaaz: They're going off to everything because the model is applicable to everything.

30:08

Ejaaz: And I don't know what this means for certain modes that companies like Figma Ejaaz: holds, but it's certainly going to affect the stock price. Josh: Can you do me a favor and click the max button real quick for me just to show the chart? Ejaaz: Oh! Josh: Yeah, minus 86% since IPO for those who are not watching on screen. Josh: It's been a pretty bad, rough run for Figma.

30:27

Ejaaz: We have to start naming Anthropic the stock killer, Josh. This is like every Ejaaz: single tweet is tanking a stock. Josh: No, it's tough. It's brutal. We had one last thing that you wanted to mention. Josh: I know. We got to end on this strong. What do we have? Ejaaz: How good is your accent or impersonation of your president, of our president, Josh?

⁠¶ President's Comments on AI

30:45

Josh: Pretty horrible. Ejaaz: Not good. Okay, well, then we're not going to attempt it. Josh: I'd love to hear your British take on it, though, if you're feeling ambitious. Ejaaz: Okay, so my British take on this is, this is, albeit hilarious and somewhat Ejaaz: terrifying, that the President of the United States is saying this. Ejaaz: He commented, okay, on the government's relationship with Anthropic.

31:04

Ejaaz: Now, if you're wondering why on earth he's commenting on it, Ejaaz: they're going to be releasing this cold mythos model. Ejaaz: It might be a security risk. It's probably good for the government to have access Ejaaz: to this thing and prepare necessarily. Ejaaz: The government has been having very important conversations with bankers and Ejaaz: governments all around the world, just try and figure out, you know, Ejaaz: how best to prepare for this.

31:20

Ejaaz: And after having an in-depth discussion with Dario Amore, which by the way, Ejaaz: he blacklisted that CEO and Anthropic entirely from the government using it. Ejaaz: He's now rekindling it and saying, maybe there's a deal on the line. Ejaaz: He goes, and I quote, I'm not going to do the accent. Ejaaz: We'll get along with Anthropic just fine. Trump said on CNBC. Josh: We'll get along with Anthropic just fine. I think they can be of great use to us.

31:43

Josh: They're high IQ people. Very good. Very good. They tend to be on the left, Josh: radical left, but We get along with them. Josh: I don't know. That's all I got. But that is what he said. Ejaaz: Were you practicing that? That was actually pretty good. I was practicing my head. Josh: I was rehearsing. Ejaaz: Damn. I closed my eyes whilst you were doing that, whilst I was laughing. Did it feel right? Josh: It sounded like him? Good.

32:00

Ejaaz: It channeled his spirit. It was there. It was a good effort. Ejaaz: But I believe that's it. That is the end of the roundup. Ejaaz: Josh and I are recording this. FYI, it's 4 p.m. over here. Typically, we're morning birds. Ejaaz: We deliver this in the morning, but we waited for the announcement of Spud GPT Ejaaz: 5.5 just for you guys. and we're going to be bringing you the cutting edge news every single week.

32:22

Ejaaz: As Josh mentioned, we had three other amazing episodes that we filmed earlier Ejaaz: this week. Definitely go check them out. They're all each 20 minute song. Ejaaz: It's your commute to work. Ejaaz: It's your gym session if you're not that active. Ejaaz: Definitely go check it out and let us know what you think. But yeah, Ejaaz: Josh, any final thoughts? Josh: Call me crazy, but I like the afternoon recordings. I got good energy.

32:39

Josh: I'm like woken up. I'm 100% right now. I'm rocking and rolling. Josh: I'm feeling good. So I don't know. Maybe we'll have to lean into this a little Josh: bit more, but that's everything. Josh: If you've made it this far, if you're still listening to this and you've heard Josh: our other episodes, you're caught up. You're done for the week. Josh: You can go touch grass. Enjoy your weekend.

32:52

Josh: There will be a lot more to talk about next weekend. But for now, Josh: you have fully synchronized with all of the chaos happening on the frontier of AI and technology. Josh: Thank you so much for watching. As always, we very much appreciate it. Josh: If you enjoyed this episode or any of our previous episodes from this week, Josh: don't forget to share them with a friend who you also might enjoy it, possibly. Josh: We have a newsletter on Substack that goes live twice a week.

33:12

Josh: Just went live yesterday, going live again tomorrow. Josh: The Friday issue is a recap of everything that happens this week, Josh: which is always fun and exciting. Josh: In fact, I'm gonna go write that as soon as we finish this episode. Josh: So thank you all for watching. Josh: As always, don't forget to subscribe, like, comment, all the good things, Josh: and we will see you guys next week.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript