The Coding Model Wars: Claude Opus 4.6 vs GPT-5.3 Codex

⁠¶ Intro / Opening

00:00

Ejaaz: 48 hours ago, Anthropic dropped Claude Opus 4.6, the world's most powerful AI model.

⁠¶ AI Showdown: Claude vs. Codex

00:06

Ejaaz: And literally 20 minutes later, OpenAI dropped Codex 5.3, which is not only Ejaaz: better, but also built itself. Ejaaz: Now, to say both of these models are powerful would literally be the understatement of the century. Ejaaz: By the time I'd eaten breakfast yesterday, one of the models had discovered Ejaaz: 500 security flaws, which no one else had discovered before.

00:24

Ejaaz: And by lunchtime, a bunch of software stocks were down hundreds of billions Ejaaz: of dollars out of fear that these models would replace entire teams. Ejaaz: And it's actually already happened. These models can replace a team of 50 software Ejaaz: engineers, rebuild Pokemon from scratch, and so much more. Ejaaz: And in this episode, we're going to be doing a live demo side by side to show Ejaaz: you which model is the best.

⁠¶ Live Demo of Coding Models

00:44

Josh: Yeah, this is pretty cool. I wanted to spend a lot of time this episode kind Josh: of introducing people to these models, what they could do, how they work through Josh: demos that we're going to perform ourselves.

00:52

Josh: These are definitely two frontier models but i think more importantly they're Josh: frontier coding models and when people hear that i Josh: think a lot of them get turned away because it seems like this complicated Josh: thing like you need to be a developer in order to use them and we Josh: are here to tell you that is not the case as from Josh: one non-technical person to another i fed this Josh: model a prompt i fed it some assets and Josh: then i pressed play and what i got is a

01:16

Josh: side-scrolling game which was exactly what i asked for so on the screen now Josh: you're seeing the one shot prompt that i fed this model to ask to create a side Josh: scroller that was like mario that we can actually play so it has coins and i Josh: don't think the gravity quite works what you're saying is that it understands Josh: physics it is able to generate graphics and it plays like a pretty solid side Josh: scroller and i created this in five minutes,

01:40

Josh: with one prompt and it actually works what.

01:43

Ejaaz: Was the prompt that you used josh Josh: Yeah so i'll pause playing this game to Josh: actually show you the the prompt it was very simple it was this Josh: one paragraph i want you to make a game you can Josh: use python or c++ whatever you find the most convenient a 2d Josh: platformer that closely resembles super mario use the Josh: attached background image and sprites found in the Josh: asset folder take into account that the sprites don't come with transparent background

02:04

Josh: but pink ones so you need to filter the background and for those who are Josh: watching you can actually see the sprites on my screen they were Josh: just a series of assets that there was no context given as Josh: to what each one of them was but the model reasoned through it it removed Josh: the background and it actually generated a pretty good representation of Josh: that now this was built one shot on codex which Josh: is the new open ai mac application that just released this

02:25

Josh: week and i wanted to compare it to claude Josh: so i have another instance here on the screen with claude this is using opus Josh: 4.6 the newest frontier model that they just released this week and i want to Josh: do an exact one-to-one comparison so i'm gonna launch the same exact prompt Josh: we're gonna have that cook on codex or we're gonna have that cook in claude Josh: code and in the meantime you just maybe we can kind of talk about more of what

02:46

Josh: these models do and how they work well. Ejaaz: Before we do that actually um as you set this game up i ran it on claude opus Ejaaz: 4.6 as well but with a slight twist okay Josh: Let's see your output what do we have okay.

02:58

Ejaaz: Uh i don't know if you can see my screen Ejaaz: but it is the exact game that you just created but i don't know if those characters Ejaaz: look uh kind of familiar to you we have the uh hero protagonist character which Ejaaz: is uh my beautiful face and my beautiful person ejaz um and we have uh who's Ejaaz: this enemy over here that looks a lot like the bear guy Ejaaz: and listen we can double jump here josh and i think yep i can crush you but every time i mean this

03:27

Ejaaz: Kind of jokes aside, this is insane. This took me like around three minutes to build end-to-end. Ejaaz: I used the exact same prompt that you gave me. Ejaaz: And we didn't have sprites ready-made of ourselves, right? Ejaaz: We didn't have like cartoon images of ourselves. So I uploaded an image that Ejaaz: we had taken, I don't know, like six months ago and said, hey, Ejaaz: can you make game avatars out of this?

03:48

Ejaaz: It did it in 20 seconds. And then I said, could you add these to the game and Ejaaz: replace the enemy with Josh and the protagonist with Ejaz? And it did it in a minute. Josh: So here we go. That's pretty amazing. And these are really, these are just using Josh: standard desktop applications. So what you're using right here, Josh: this was done in Cloud Code, right? Josh: You just went onto Cloud, the MacBook, the Mac app. You downloaded it. You put in the prompt.

04:10

Josh: You shared some assets. And now it built this amazing game in one single prompt.

⁠¶ Comparing Model Outputs

04:13

Josh: And we're actually going to experiment further in this episode where we're going Josh: to create a trading room that does actual real-time stock analysis. Josh: So as I'm curating the prompts and as we're getting ready for that second demo, Josh: maybe we could walk through what makes these models so exceptional. Ejaaz: Yeah, well, you might actually notice the first difference on screen right now. Ejaaz: If you notice, if you look closely, my avatar is kind of glitching out, right?

04:36

Ejaaz: And if you compare it to your Codex game that you just coded up, Ejaaz: there's no glitches. It runs super smoothly. Ejaaz: And the main takeaway here is Codex 5.3 is a superior coding model to Anthropic.

⁠¶ Codex vs. Claude Performance

04:48

Ejaaz: And that's a sentence I never thought I would say, at least for the next couple Ejaaz: of years, because Anthropic has held that prestige and title for so long. Ejaaz: But since Code Red was initiated in open air around three months ago, Ejaaz: Sam has devoted pretty much all his resources towards building the best coding model. Ejaaz: And the benchmarks don't lie. It is a full 12 points on the software engineering Ejaaz: benchmark ahead of Claude Opus 4.6.

05:11

Josh: That's a pretty significant difference. Ejaaz: So I've actually pulled up a more general comparison between the two models here. Ejaaz: And it summarizes it really well. So if we look at Claude's model, Ejaaz: Opus 4.6, what's good about it? Ejaaz: Well, they've 5x the context window.

05:26

Ejaaz: So it's gone up to a million tokens or rather characters that you can put in Ejaaz: a single prompt, which if you want to understand how powerful this is, Ejaaz: you can just put way more information into your initial prompt. Ejaaz: It has much better context and memory. So you can end up cooking up much better Ejaaz: products overall, which is very, very impressive and important to have. Ejaaz: Number two, I would think about this as an orchestration model.

05:49

Ejaaz: So if you look at specific benchmarks, it is beaten OpenAI at GDP eval. Ejaaz: GDP eval is a benchmark where they go out and they test a model's performance Ejaaz: at a really complex task versus a professional human that would normally do that task. Ejaaz: And the decision is, would you use the AI model or would you use the human? Ejaaz: And in this case, you would choose Claude 4.6 over humans way more than you

06:12

Ejaaz: would choose OpenAI's latest model. So that's a really important thing.

⁠¶ Exploring the Models' Features

06:16

Ejaaz: And the point around Claude's latest model is that it doesn't code as well as Ejaaz: codecs, but it can orchestrate a bunch of agents and overall activity better than OpenAI. Ejaaz: Now, if you look at Codex and OpenAI's new models specifically,

06:31

Ejaaz: It wins on the software engineering. It is simply a better software engineer Ejaaz: than Claude is, which is a massive flip around and shows that it's a testament Ejaaz: to how much resources and fine-tuning that OpenAI has been able to achieve. Josh: And to the note on the quality of the models here, my prompt is done in Claude Josh: code that I used, the same one that we used in Codex. And I'm going to run it Josh: here for the first time now.

06:53

Josh: You can see on screen and we'll see what it looks like. Josh: So underneath, we have our Codex version, which looks beautiful.

06:59

Josh: On top we have our brand new version that was just made by opus now i haven't Josh: tried this yet so we're going to see what happens when i press space to start, Josh: so it looks like opus has failed to create a Josh: floor so i am just falling through the floor until the game ends um okay so Josh: just based on this one demo alone this is a fairly significant difference where Josh: gpt's codex has created a beautiful side scroller it doesn't have gravity but

07:25

Josh: i could just ask it to or it has gravity it's a little too much i could ask Josh: it to lower it opus doesn't even work at all, Josh: And again, the test was just a one-shot prompt. So I'm going to get back to Josh: work prompting it again to build this new application, the trading application. Josh: We'll follow up with that. But I think that's a funny kind of demo just to showcase Josh: that one actually is kind of superior in the other in this one use case, at least.

07:46

Ejaaz: Yeah, I mean, you said it pretty clearly, which is Codex is the best coding AI model. Ejaaz: And I have to like, I can't emphasize that enough because OpenAI for a long Ejaaz: time was behind Anthropic and by a massive margin. and in some way, Ejaaz: shape, or form, they've been able to catch up. Ejaaz: Now, what's interesting here is both companies have focused on each other's goals.

08:10

Ejaaz: So when Anthropic was typically meant to be the leading frontier model in coding, Ejaaz: it now has decided to focus on what OpenAI was really good at, Ejaaz: which is overall orchestration and being a better generalized model, right? Josh: They're taking each other's lunch. Yeah, exactly. Ejaaz: OpenAI has decided to eat Anthropic's Ejaaz: lunch and say, okay, we've got the generalized stuff sorted out. Ejaaz: Let's try and figure out the coding specific niche, highly defined,

08:34

Ejaaz: professionalized functions. And it's produced the best coding model. Ejaaz: So it's kind of a weird win-win for both labs. Ejaaz: And what's awesome about this is they both now have really well-rounded, Ejaaz: but also very specialized models. Ejaaz: And the reason why this is important is, and this is like kind of maybe my hot take, Ejaaz: I don't think the coding models matter, Josh. I actually don't think the generalized models matter either.

⁠¶ The Future of Work with AI

08:59

Ejaaz: I think they're both going off to something much bigger, which is creating the Ejaaz: operating system for the future of work. Ejaaz: They know that AI models and AI agents are gonna automate a ton of different Ejaaz: industries and the industries are only gonna pick you if you can do both generalized Ejaaz: work and hyper-specific work really well. Ejaaz: That is coding and orchestration and managing your data.

09:20

Ejaaz: And now we have two amazing models dropped within 20 minutes of each other. Ejaaz: That does exactly that to the highest performance metric that we've ever seen before. Josh: They're pretty exceptional. So now for this next demo, I have it queued up here.

⁠¶ Building a Stock Analysis Tool

09:32

Josh: What we're going to do is, what I did is ask the model itself to build me a Josh: prompt for this. So I wanted it to create me an AI stock portfolio war room. Josh: And I asked, hey, I want to create this, create me a fully fleshed out prompt Josh: that kind of should solve this problem with one shot. Josh: So what I do is I loaded it up here in our Cloud Code app.

09:52

Josh: And then I also loaded it up into the codex app i created its own Josh: project folder and now i'm going to hit send so both of Josh: these things are thinking in real time we will check back Josh: in once their outputs are done and we'll compare again the second version Josh: which is more of a robust one i mean you'll see uh on Josh: the cloud screen it has this whole list of to-dos that it wants to do it has

10:10

Josh: an entire plan there's nine different panels that it's going to build it's going Josh: to do risk analysis matrix and portfolio action bars and all this stuff so we'll Josh: let that cook and let's get back to what separates these what people have been Josh: freaking out about on the internet more as these things get going could i. Ejaaz: Take three minutes show you some wild demos yeah Josh: Let's see what the internet's been demoing while we wait for hours to cook okay.

10:30

Ejaaz: Cool like listen our 2d mario inspired game was cool but imagine if i told you Ejaaz: you could recreate the entire pokemon game including levels cities characters Ejaaz: and creatures that you fight from scratch in about an hour and 30 minutes Ejaaz: That's pretty impressive. That's what we're looking at right now. Josh: Wow, it even has the fighting. Ejaaz: Yeah, yeah, yeah. And buttons and the multimodal gameplay.

10:53

Ejaaz: And obviously this looks like it's been made by a child image wise, Ejaaz: but it's probably going to take you, what, another couple of hours to make a Ejaaz: really high fidelity game that you could probably run on your Nintendo Switch or whatever. Ejaaz: It is just so impressive that we can do these things.

11:07

Ejaaz: Anyone can do these things with no previous background. Just upload a few images Ejaaz: or generate a few images and you can create childhood nostalgic games that are Ejaaz: worth billions of dollars, which is just super cool to see. Josh: Yeah, one of the cool things that I think it's really important to note is how approachable this is.

11:21

Josh: Like for the recent example that we're having run right now on my screen, Josh: all I did was tell it what I wanted and ask it to develop the prompt with me. Josh: So even if it feels overwhelming, like you don't really know how to code, Josh: you don't know how to prompt things, you can actually just ask the model to Josh: help you generate the prompt, help explain to you how it works. Josh: And it's a really easy way to build basically anything you can imagine.

11:41

Josh: It's not just games. It's productivity tools. It's CRM tracking.

⁠¶ Technical Demos Unveiled

11:45

Josh: It's whatever you want it to be so i think that's really interesting but it Josh: also goes much more technical right i saw another crazy example with the compiler.

11:52

Ejaaz: Okay so for for the tech nerds Ejaaz: out there that's been a lot of time coding you are going to Ejaaz: be wowed by this um for one of their uh flagship demos for uh opus 4.6 the anthropic Ejaaz: team decided to task the model with building a c compiler which is an incredibly Ejaaz: complicated execution tool that is required to code up some of the most craziest types of apps.

12:17

Ejaaz: And they just walked away. And they just kind of like looked at it, Ejaaz: monitored it, made sure that it wasn't going awry. Ejaaz: And in two weeks, let me emphasize that, Ejaaz: Two whole weeks, 14 days, it coded nonstop and built this compiler. Ejaaz: Now, you might think two weeks is quite a long time. I want my thing done in an hour and a half.

12:36

Ejaaz: Well, let me hearken back to history where previously, if you wanted to create Ejaaz: something like this, in today's world, it would take a team of around 50 or Ejaaz: so humans, and it would take them a few months to build from scratch. That's today. Ejaaz: But back in the day, it would technically have taken them around a decade to Ejaaz: build and like thousands of people.

12:56

Ejaaz: So we have just kind of condensed the timeline to create really complicated Ejaaz: tools in a matter of hours or weeks in this case. Ejaaz: Now, the second thing I want to point out is the fact that these models can Ejaaz: go untouched for two weeks is just insane. Ejaaz: There was another stat that was released today by OpenAI with, Ejaaz: sorry, yesterday with OpenAI is 5.2, I think, 5.2 high, I believe, Ejaaz: where it can go pretty much 50% hit rate for 6.6 hours. a time horizon.

13:28

Ejaaz: So that means if you gave it any kind of complicated coding task, Ejaaz: 50% of the time in 6.6 hours, it would get that done, completely done. Ejaaz: And it would nail it 50% of the time, which is just such an impressive track Ejaaz: record when you look back a year. Ejaaz: And that time was, what was it like 30 minutes, maybe an hour. Ejaaz: So every iteration, we see this thing double. It's just so insane.

13:49

Josh: Yeah, it's really, it's unbelievable and almost like intimidating how Josh: capable and competent it is even for someone who Josh: is a novel at writing code it's not about writing Josh: code it's about being able to generate whatever you want it to so like if you Josh: think of it you kind of in a way it abstracts the code away and allows you to Josh: just speak the english language and get what you want from speaking english

14:11

Josh: and in a way that you understand and it will help walk you through the way one Josh: of the things that i love about cloud code in particular is the plan mode. Josh: If you leave a lot of things out of your prompt, it'll actually just continue Josh: to prompt you with additional questions to understand where you want. Josh: And one of the most fascinating things that I read about GPT's 5.3 codex in Josh: particular is like you mentioned in the intro, it helps build itself.

14:33

Josh: And I don't think that can be overstated because this is the first model in Josh: the history of OpenAI that has helped with the building and construction of itself.

⁠¶ Self-Improving AI Models

14:42

Josh: And what happens as that starts to ramp up, right? If you think of each model Josh: iteration as a flywheel, what is the constraint? Josh: The two constraints are the speed at which a developer can actually build it Josh: and then create the test for it and make sure that it's safe to ready to deploy. Josh: And then it's the hardware that's required to actually train the model.

15:00

Josh: What we're seeing with Codex and Opus, which I really believe was kind of Sonnet, Josh: is the incremental improvements. Josh: Now, for the incremental improvements that don't require an entirely new training Josh: run, the real constraint is the actual software and what you could squeeze out of it.

15:13

Josh: And when you have a model that's helping you build this Josh: software that can think for 6 12 24 hours Josh: at a time even longer and that is it kind Josh: of creates this like self-fulfilling loop right where the models use the Josh: new models to make the new models the future models Josh: stronger and more powerful and better and i thought that was a really interesting Josh: thing to note is that this is the first self propagating model where it ran

15:35

Josh: a lot of the test for itself it introduced new code that made itself better Josh: and as we continue to see that you can start to imagine that vertical that like Josh: exponential progress line going pretty close to vertical and things getting Josh: really good like really really quick. Ejaaz: I think what most people listening to this might think is that, Ejaaz: well, what was different before? Ejaaz: Well, previously, models would just kind of work in a very analog mode.

16:00

Ejaaz: You would just point it at a problem Ejaaz: and it would just understand what the problem was and then solve it. Ejaaz: But it lacked that awareness and wider context as to like what the wider vision Ejaaz: and goal was to achieve and then figuring out stuff for itself. Ejaaz: You always had to kind of handhold it. But now with its ability to kind of like Ejaaz: understand what it's trying to do and look internally and say, Ejaaz: huh, I made that mistake because of this error in my code.

16:24

Ejaaz: I'm going to now like rewrite my code and then I'll be better at it. Ejaaz: It kind of functions similarly to a human. Now, I actually saw a great analogy. Ejaaz: I forgot who wrote it, but it's Ejaaz: fantastic. where if you imagine yourself standing on a sidewalk, right? Ejaaz: And a Bugatti Veyron drives super fast by you at let's say 200 miles an hour, Ejaaz: you'll be like, wow, that's kind of fast.

16:47

Ejaaz: And then two minutes later, another Bugatti drives by you at 300 miles an hour. Ejaaz: You'll be like, wow, that's kind of fast. But you wouldn't really notice the Ejaaz: difference between that 100 mile an hour difference, right? Ejaaz: But if you were in the car strapped in, you would notice it is significantly Ejaaz: improved. And that's how software engineers feel right now.

17:07

Ejaaz: Now, if you're someone that doesn't code all the time, you're not necessarily Ejaaz: going to understand these impacts, but it's really important for those of you Ejaaz: listening to this to figure out that this is massively impactful and will change Ejaaz: the way that a lot of things are happening today.

⁠¶ Automating Complex Tasks

17:19

Ejaaz: I mean, just take a look at this, right? This is a direct quote from someone Ejaaz: who is building at a major tech company, Rakuten. Ejaaz: And the quote here says, Claude Opus 4.6 autonomously closed 13 issues and assigned Ejaaz: 12 issues to the right team members in a single day, managing a 50-person organization Ejaaz: across six repositories. Ejaaz: Josh, do you know who else is responsible for doing that?

17:44

Ejaaz: An entire team of product managers that each get paid a quarter of a million Ejaaz: dollars in compensation automatically. Josh: Minimum per year at least yeah their.

17:52

Ejaaz: Jobs are automated now Josh: Well one of the earlier moments in Josh: which i realized this was pretty profound is is when claude co-work they Josh: said they built it with what just a hint like four people over the course of Josh: 10 days and it was 100 built by the current model of claude which is opus 4.5 Josh: at the time like the the amount of leverage from these tools is so high but Josh: it cuts both ways it's like if you can design and develop a product in 10 days,

18:19

Josh: then that means another company can probably do that in five. Josh: And it starts to lower the competitive threshold for these companies to catch up. Josh: And it starts to raise the bar of what is possible. Josh: Like if you could build something that profound in 10 days, what can you build Josh: over the course of six months? Josh: Like, can you really build something fantastic that has a moat that like actually Josh: delivers on the total power that you have by leveraging this AI?

⁠¶ The Competitive Landscape

18:46

Josh: It's going to be interesting to see because i mean what we're finding even with Josh: the the codex and opus dual launch is that these companies are right next to Josh: each other and if one publishes something, Josh: profound or something that attracts a lot of users they're just a few days and Josh: a few prompts away from copying it and that's like a pretty difficult thing Josh: to compete against on on the software front well.

19:06

Ejaaz: That's why if we look at the stock market over the last couple of days like Ejaaz: it's down trillions of dollars and i'm not exaggerating if you look at microsoft Ejaaz: over the last two weeks, the stock is down 20%. It's trading like a meme stock, which is just insane.

19:20

Ejaaz: And the reason why that is, is a lot of investors are anticipating that these models, Ejaaz: Specifically Opus 4.6 and Codex 5.3, will just create the tools that these billions Ejaaz: of dollars worth of SaaS companies have spent or valued their entire lives on Ejaaz: in a couple of seconds, just as you described.

19:40

Ejaaz: Now, the counter argument to this, Josh, is, and Jets of Wine actually kind Ejaaz: of went live at a conference and spoke about this and made this point, Ejaaz: If you're an AI agent or AI model that is capable of building these tools, right? Ejaaz: Why would you rebuild the tool every single time you do a function? Ejaaz: Surely you would just access the best tool and use it.

20:03

Ejaaz: So there's a bit more nuance where AI models aren't just gonna recreate your Ejaaz: entire software stack if you are at a Fortune 500 company. Ejaaz: That kind of doesn't make any sense. There are a bunch of tools that are hyper-optimized to do that. Ejaaz: But what it will do is it will connect all of these tools and silos in a much more effective way. Ejaaz: And maybe that requires rebuilding parts of it.

20:23

Ejaaz: Maybe it requires kind of connecting different ways, but not rebuilding the entire tools. Ejaaz: And whatever operating system that ends up becoming will be the most sticky Ejaaz: and valuable company ever.

⁠¶ Investor Perspectives on AI

20:33

Ejaaz: Now, that could be Salesforce, or it could be someone completely different, Ejaaz: a startup that we haven't even heard of. And I think that's really important Ejaaz: to understand, but people are experimenting. Ejaaz: And if you look at this graph right here, which is may not look insane to some, Ejaaz: but is insane to me at least, 4% of daily GitHub commits are now clawed code. Ejaaz: That was, I think, 5% of what it is today two months ago.

20:57

Ejaaz: So the ascent has just been insane. These companies are adopting it and they are using it. Josh: Yeah, the number is just going to keep going up and there's no reason why it Josh: wouldn't. It's such a testament. One, the speed. Josh: It feels like we're strapped in that car and now we're flying.

21:11

Josh: Two, an outsider might not look like it. It certainly feels like that Josh: on the inside and i think a lot of people are starting to notice this and get Josh: a little nervous about it too like look at this example on the screen right Josh: now this is a prompt from gpt 5.3 codex which basically created an entire minecraft Josh: clone in a single prompt and it looks awesome and it works really fast and it Josh: was super lightweight and

21:34

Josh: And it says, I also tried on Opus 4.6, but for some reason it got stuck. Josh: But you can build anything that you want very, very quickly, Josh: like very cheaply as well. Josh: What Opus 5.3, or Opus 5.3, I'm getting them all mixed up. Josh: What GPT 5.3 Codex offered is double the rates, the double the token rates for Josh: the next couple of months. Josh: So you actually have the freedom for their $20 a month plan to go and build whatever you want.

22:00

Ejaaz: Can I maybe deliver a hot take, Josh? Josh: Yeah, what do you got? Ejaaz: I think the most exciting part about these model releases aren't the models themselves. Ejaaz: Largely, I think the models are kind of similar in capabilities. Ejaaz: They are around the same coding benchmarks, and they can roughly do the same Ejaaz: things. They can spin up a bunch of agents and orchestrate themselves.

22:19

Ejaaz: The bigger picture, which I think a lot of people missed, was both companies, Ejaaz: Anthropic and OpenAI, are at war with each other. Ejaaz: And they're trying to basically build and own the operating system for work, Ejaaz: which isn't just a model. it's a software suite. Ejaaz: So this week alone, OpenAI didn't just release this new model.

⁠¶ Major Updates from OpenAI

22:37

Ejaaz: They released the Codex app, which is a desktop Mac app, which is kind of like Ejaaz: a command line interface, which makes the coding experience way better. Ejaaz: And they also launched an enterprise platform called Frontier, Ejaaz: which allows Fortune 500 companies to basically take this magical model and

22:54

Ejaaz: give it to non-coders and let them do magical things. Now, Ejaaz: All of these products together creates a very sticky experience where it starts Ejaaz: to make sense for software engineers and non-software engineers to use these products. Ejaaz: And it becomes incredibly sticky, which results in billion-dollar contracts, right? Ejaaz: Anthropic has done the same thing over the last two weeks.

23:14

Ejaaz: They released Claude Cowork, they released agent teams this week, Ejaaz: and then they released this new model. Ejaaz: They're going after the same thing, which it kind of makes sense why they're Ejaaz: releasing Super Bowl ads that are kind of shitting on each other now. Ejaaz: It makes a lot of sense. And so the point is, if they can own this operating Ejaaz: system, this future of work, they will basically be the most valuable company.

23:35

Ejaaz: And I think it's going to be when it takes most. Josh: I have to interrupt you here. We have some developments on our prompts that Josh: we've been working on, our AI stock war room. Let's go. That I'm going to have Josh: to share on the screen right now.

⁠¶ Real-Time Quality Assurance Testing

23:44

Josh: So currently what it's doing is it's asking to do some quality assurance testing. Josh: So you'll see it actually used a it's taking over control of my browser and Josh: it's asking to make prompts on the screen. So you can see all of this that you're Josh: seeing right here is generated live, and it's doing an actual real-time debug Josh: of the product that it made.

24:02

Josh: It's clicking around, it's resizing things, it's going through the links, Josh: and it's running real quality assurance testing on the actual product. Josh: It's really amazing to see.

24:12

Josh: This was all just built all these visual charts and they're all accurate so Josh: right now we're looking at nvidia we have a chart and i'm not going to mess Josh: with it because it's doing the real-time manipulation to do quality assurance Josh: checks but it's actually clicking through it's making sure the Josh: stats are accurate it's making sure all the widgets work and look it has this Josh: amazing graphs already it has sentiment analysis 85 percent of people are bullish

24:32

Josh: on nvidia it has recent signals from the news it has the assessment a risk assessment Josh: matrix where it shows the like export controls and chip controls. Josh: It has revenue and earnings every single quarter, charted, competitive moats. Josh: It has sector comparisons. It's like, this is unbelievable. Josh: And it just generated this in a single prompt. And I just find it really funny Josh: that we can actually watch this do it in real time.

24:55

Josh: So you'll see in this prompt, it's clicking through, it's taking screenshots of what it's seeing. Josh: And then it's digesting, analyzing, and understanding what it made, Josh: what it messed up and what it actually still has left to finish. Josh: And it generated everything, all of this in real time as we're recording this episode. Josh: So fascinating.

25:14

Ejaaz: Wow, it reminds me of some of the research platforms at the former companies Ejaaz: that I used to work at and they would pay, I'm not joking, millions of dollars Ejaaz: a year to get access to these types of platforms that would give them analysis Ejaaz: like what you're showing on the screen right now. Josh: And you just built it from scratch. From scratch, and look, it's doing this.

25:32

Josh: I'm not even touching my keyboard. I just searched for Apple and now I'm sure Josh: if I go over to the prompt, Josh: it's taking screenshots of apple it says apple dashboard Josh: looking great let me scroll to see the new three column button row layout and Josh: it's checking the button rows and it's really unbelievable like we have the Josh: investment thesis the bull case for it the bear case for it catalyst and timelines

25:51

Josh: it has wwdc built in it has the iphone 18 launch props um set up for september, Josh: It's like so cool. It's absolutely unbelievable. And now this is a real tool Josh: that I'll be able to use to type Josh: in whatever stock I want to look at and actually get some analysis on it. Josh: Now, I'll go over to Codex over here and it looks like Codex is taking its sweet time.

26:12

Josh: It's still zero out of six tasks completed. So it might take a little while Josh: for us to get a visual on that, but it's just amazing to watch this happen in Josh: real time as at least Cloud Code and Opus 4.6, Josh: does some quality assurance testing live by taking over my browser and running Josh: it for itself. I just think this is like, this is amazing.

26:31

Ejaaz: It's magic. Something I just noticed in your Opus chatbot screen when it's going Ejaaz: through its thinking, it seems to have like spun up a few different agents or Ejaaz: instances of its own self to pull this off.

26:44

Ejaaz: Like I think if you scroll up, like I saw a few kind of like prompts that like Ejaaz: suggested that that's what it was doing, Ejaaz: which I think is, underscore is a very important point that both of these models Ejaaz: can do, which is they can spin up multiple versions of the same model and task Ejaaz: it with different things to run in parallel.

27:02

Ejaaz: What this means is you can get a really complicated product like what you're Ejaaz: seeing on the screen right now in a matter of minutes because it's running in parallel. Ejaaz: So imagine having a bunch of computer science geniuses that you can just duplicate Ejaaz: immediately and run at a fraction of the cost of electricity, the cost of inference. Ejaaz: And now you start to see why all these NVIDIA chips and stuff are worth so much.

27:24

Ejaaz: Because you want to do cool stuff like this. This is insane. Josh: It's actually incredible. Okay, so now I want to test it on Tesla. Josh: So I'm going to choose Tesla and see if it actually can do it in. Ejaaz: A non-controlled environment. This UI is so cool. Josh: It's very pretty. What the hell? This looks great. Okay, so here we have Tesla. Josh: It has the charts. We're going to click through the charts. It has the one-week

27:39

Josh: chart, the one-month chart, the three-month chart. That looks fairly accurate. Josh: It has the price-to-earnings ratio, the 52-week high, 52-week low. Josh: So it looks like at one point it was trading at $4.88, now it's trading at $3.89. Josh: The bull case for Tesla, RoboTaxi and FSD driving licenses could unlock $500 Josh: billion in revenue by 2030. Josh: It has the RoboTaxi service launch in Austin that it's preparing for.

28:03

Josh: And let's see the sector comparison. So it's comparing it to Rivian, Baidu, Toyota, Ford. Josh: It has the competitive moat where it says it's most strong in brand power, Josh: IP patents, and cost advantages. Josh: You can see the revenue, the estimate per share earnings. Josh: Sentiment is much worse on Tesla than it was on Apple. It's at 52% right now. Josh: And it looks like, as it relates to the risk assessment, devaluation and competition Josh: and execution are all very high risk.

28:33

Josh: And that's probably an accurate assessment, although I'm not sure the competition Josh: is really a problem. The execution is certainly going to be an issue. Josh: But it's just amazing to see how well it does. And it even gives it a verdict. Josh: So the AI verdict on Tesla is, Josh: It's a hold. Tesla's optionality is enormous, but current valuations already Josh: prices in multiple moonshots. Josh: Execution on RoboTaxi will be the key catalyst. That sounds about right.

28:57

Josh: And it's amazing that we just built this with a single prompt without any oversight from me. Josh: And it works. It actually works. It's really just unbelievable how capable these things are.

⁠¶ Creating a Stock Dashboard

29:08

Josh: And now I have a dashboard that anytime I want to make a decision, Josh: I can type in the ticker and get all this um optionality it even has menus that Josh: work look at this profit margins pe ratios market cap wow pretty unbelievable it's.

29:21

Ejaaz: It's a reactive in real time bloomberg terminal oh wait for the modern age Josh: There's um there's another feature here that looks like you could compare stocks Josh: let's see if this actually works here so if i type in let's say apple's ticker Josh: and i hit go will that compare the two now it looks like that doesn't work very Josh: well oh my god but it has moving average lines and everything. This is pretty robust. Ejaaz: I know it's like the traded and investors dream. Just crazy.

29:48

Ejaaz: Kind of like a side note on this, but like, Ejaaz: The fact that Tesla's down and everyone's kind of like bearish on this company, Ejaaz: even though they're like rumored to be merging and stuff like this. Ejaaz: Like the point being is there's an asymmetry between what the market is seeing Ejaaz: and what these inventors and builders are seeing. Ejaaz: These AI labs have created what they define as pretty much a low form of AGI.

30:12

Ejaaz: You literally have an AI model that is building the next version of itself. Ejaaz: That by description is like a super genius and it's only limited by the function Ejaaz: of energy and compute, right? Ejaaz: And then investors are looking at this and saying, huh, Amazon and Google are Ejaaz: about to spend a combined $500 billion worth of CapEx this year.

30:32

Ejaaz: Kind of bearish, that's a lot of money. So there is a real investment opportunity Ejaaz: here to really understand the difference of what these things can actually do. Ejaaz: And that might lead to a lot of like opportunities to invest.

30:43

Ejaaz: I don't know, but I know that I'm buying Tesla today and a bunch of google stock Josh: Yeah i mean look at this google valuation one this chart looks absolutely gorgeous Josh: but two um the ai verdict is a buy even the ai thinks google is a buy because Josh: they just have um alphabet offers the best value in mega cap tech dominant ai Josh: capabilities diversified growth and a cheap valuation if search mode holds and.

31:03

Ejaaz: Yeah give me the week give me the week Josh: Let's see the weekly chart here do you want some moving average lines as well Josh: because we could drop those in please let's. Ejaaz: See let's see i'm actually super yeah look see it's had a slight dip Markets are so reactive. Crazy. Josh: Yeah, and I think to the point of the CapEx, markets are viewing that as a scary, high-risk statement.

31:22

Josh: But while that's true, I also think it's a testament to the fact that scaling Josh: laws are going to work, and the largest companies in the world are betting on Josh: the continuation of them working. Josh: And the shared consensus between all of these large-cap companies deciding to Josh: spend record CapEx this year, Josh: is a testament to the fact that things are only going to go faster. Josh: And they believe that the more money they put in, the more outputs they will get.

31:47

Josh: And they're going to continue to put their foot on the gas. So I think any question Josh: that anyone had, if these scaling laws could continue to hold up and we could Josh: continue to be on the path to whatever AGI looks like and beyond, Josh: I think that was answered this week through these earnings reports. Josh: And the overwhelming answer is yes, it's true. Josh: It is likely that this is going to happen and everyone is betting their entire company on it?

32:06

Ejaaz: I think we have done a great job, if I pat ourselves on the back virtually, Ejaaz: Josh, of showing what these models are capable of. Ejaaz: And remember, it's been less than 48 hours that these models have been alive. Ejaaz: In fact, I think it's been like 36 hours. So if any of you are interested in Ejaaz: trying these out, I cannot urge you enough to go out and try these things.

32:28

Ejaaz: Try to solve a problem that you're finding at work or try to solve a problem Ejaaz: that you're finding just in your casual leisure time to code up a hobby or a Ejaaz: project in a matter of seconds. It's so, so easy. Ejaaz: And it'll put you at an advantage to understand how these tools work and why Ejaaz: they're really changing the world as we see it around us, why stocks are dumping, Ejaaz: why some stocks are pumping.

32:48

Ejaaz: But yes, go demo it. Let us know what you actually end up building. Ejaaz: Josh and I are trying to give you more live demos in a lot of the episodes that we put out. Ejaaz: And with every other model release and feature that drops, we are going to be Ejaaz: trying and testing these things so we can bring to you exactly what these things Ejaaz: can do and show you kind of like the benefits and disadvantages, Ejaaz: what's real and what's really not.

33:08

Josh: Yeah. And I can't stress this enough. The best way to stay on top of things, Josh: the best way to feel like you're not being left behind is just to use the tools Josh: as they come out and to understand them and what makes them different. Josh: And for a single subscription to ChatGPT or to Claude, you can access tools Josh: just like this and build stuff just like this. Josh: I'm not, this wasn't like an incredibly difficult technical challenge.

33:29

Josh: You just ask it what you want and you ask it to help you. Josh: And it will actually walk through and help you through the process and build whatever you want. Josh: So the most important thing for anyone listening is just to train that muscle and to get familiar with, Josh: these tools and these skills that you're able to leverage them to your advantage, Josh: however it may best fit in your life. Josh: And that's what kind of we wanted to share with us.

33:50

Josh: Like, it's simple. You download the app, you log into your account, Josh: and you're on your way. It's really Josh: not as difficult as I think a lot of people make it seem like it is. Josh: And I mean, this beautiful dashboard is a testament to that.

34:02

Josh: Okay, so Ejaz, it also looks like our codex output Josh: has finished itself so we have here on the Josh: screen we have opus which we saw which is Josh: really a lovely dashboard but it seems like codex Josh: now has its own version that we could quickly compare so maybe we'll try we'll Josh: go to our favorite google we'll type google in and we'll click analyze and kind Josh: of see how this compares i find it funny how they've they've merged on the same

34:24

Josh: type of design style but yeah oh okay this whoa this is interesting this is Josh: different so it has the moving averages select oh is that, Josh: Okay, yeah, so it has the charts. Ejaaz: Is that accurate? Josh: It has the PE ratio. Yeah, that's what I was looking at. Let's go to that one-week chart and see. Josh: I have some questions about these. It looks pretty right. Ejaaz: Okay. That looks very wrong. Josh: Yeah, the one you're a little confused about. Let's compare it to Claude here.

34:53

Josh: Let's go to Google and we'll analyze that. Well, it thinks we can look at the Josh: rest. So it looks like it emulated pretty well. Josh: It has the verdict. It has the same stats. Josh: The risk assessment matrix is... good but you could see like some of the text Josh: you can't really read because it's black on black um but nonetheless pretty Josh: interesting they both succeeded.

35:12

Ejaaz: Yeah i mean as we said before like these models are very equally capable and Ejaaz: you know maybe it's just the way that you prompt something or uh the way that Ejaaz: some of these things work but largely they kind of achieve the same goal and Ejaaz: same quality um and like listen like we're talking about like minor discrepancies here Ejaaz: I can't wait to see what we will build with this. Like, this is insane. Josh: It's amazing. Both of these one-shot prompts didn't touch anything.

35:38

Josh: And here we are. I do think that Google, when your chart is wrong, Josh: I think Claude got that one right. Josh: But we overall both succeeded in the mission. Both look great. Josh: And both are just excellent models.

⁠¶ Conclusion and Future Insights

35:46

Ejaaz: Amazing. Okay, well, that's it. Wherever you're listening to this, Ejaaz: if it is on YouTube and you're watching our lovely faces, or if you're listening Ejaaz: to us on Spotify, Apple Music, or wherever you listen to us, Ejaaz: please subscribe, give us a rating, leave us some comments.

35:59

Ejaaz: We love your feedback and we respond to pretty much every single comment because Ejaaz: we're trying to figure out how to make this show better and bring you the content Ejaaz: that you guys deserve and want. Ejaaz: Turn on notifications because we are releasing more and more videos every week Ejaaz: on the hottest topics as they come out. Ejaaz: We also have the sickest newsletter ever where one of us will either write a Ejaaz: essay or give you the five top highlights of the week.

36:22

Ejaaz: So if you don't want to watch any of these videos, you can just read and digest Ejaaz: that and you'll know everything that you need to know in AI and frontier tech. Ejaaz: Thank you for listening, and we will see you on the next one. Josh: See you in the next one. Peace.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript