The Coding Model Wars: Claude Opus 4.6 vs GPT-5.3 Codex - podcast episode cover

The Coding Model Wars: Claude Opus 4.6 vs GPT-5.3 Codex

Feb 07, 202637 minEp. 122
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Anthropic's Claude Opus 4.6 and OpenAI's Codex 5.3 have come out back to back, so we dive in and compare their shocking capabilities and implications for AI development. 

We compare Claude's orchestration skills against Codex's superior coding efficiency through live demos, revealing the potential impact on job automation in tech. Try them out, see which one you prefer, and let us know!

------
🌌 LIMITLESS HQ ⬇️

NEWSLETTER:    https://limitlessft.substack.com/
FOLLOW ON X:   https://x.com/LimitlessFT
SPOTIFY:             https://open.spotify.com/show/5oV29YUL8AzzwXkxEXlRMQ
APPLE:                 https://podcasts.apple.com/us/podcast/limitless-podcast/id1813210890
RSS FEED:           https://limitlessft.substack.com/

------
TIMESTAMPS

0:05 AI Showdown: Claude vs. Codex
0:43 Live Demo of Coding Models
4:13 Comparing Model Outputs
4:47 Codex vs. Claude Performance
6:15 Exploring the Models' Features
8:58 The Future of Work with AI
9:32 Building a Stock Analysis Tool
11:44 Technical Demos Unveiled
14:41 Self-Improving AI Models
17:19 Automating Complex Tasks
18:46 The Competitive Landscape
20:32 Investor Perspectives on AI
22:36 Major Updates from OpenAI
23:43 Real-Time Quality Assurance Testing
29:07 Creating a Stock Dashboard
35:45 Conclusion and Future Insights

------
RESOURCES

Josh: https://x.com/JoshKale

Ejaaz: https://x.com/cryptopunk7213

------
Not financial or tax advice. See our investment disclosures here:
https://www.bankless.com/disclosures⁠

Transcript

Intro / Opening

Ejaaz: 48 hours ago, Anthropic dropped Claude Opus 4.6, the world's most powerful AI model.

AI Showdown: Claude vs. Codex

Ejaaz: And literally 20 minutes later, OpenAI dropped Codex 5.3, which is not only Ejaaz: better, but also built itself. Ejaaz: Now, to say both of these models are powerful would literally be the understatement of the century. Ejaaz: By the time I'd eaten breakfast yesterday, one of the models had discovered Ejaaz: 500 security flaws, which no one else had discovered before.

Ejaaz: And by lunchtime, a bunch of software stocks were down hundreds of billions Ejaaz: of dollars out of fear that these models would replace entire teams. Ejaaz: And it's actually already happened. These models can replace a team of 50 software Ejaaz: engineers, rebuild Pokemon from scratch, and so much more. Ejaaz: And in this episode, we're going to be doing a live demo side by side to show Ejaaz: you which model is the best.

Live Demo of Coding Models

Josh: Yeah, this is pretty cool. I wanted to spend a lot of time this episode kind Josh: of introducing people to these models, what they could do, how they work through Josh: demos that we're going to perform ourselves.

Josh: These are definitely two frontier models but i think more importantly they're Josh: frontier coding models and when people hear that i Josh: think a lot of them get turned away because it seems like this complicated Josh: thing like you need to be a developer in order to use them and we Josh: are here to tell you that is not the case as from Josh: one non-technical person to another i fed this Josh: model a prompt i fed it some assets and Josh: then i pressed play and what i got is a

Josh: side-scrolling game which was exactly what i asked for so on the screen now Josh: you're seeing the one shot prompt that i fed this model to ask to create a side Josh: scroller that was like mario that we can actually play so it has coins and i Josh: don't think the gravity quite works what you're saying is that it understands Josh: physics it is able to generate graphics and it plays like a pretty solid side Josh: scroller and i created this in five minutes,

Josh: with one prompt and it actually works what.

Ejaaz: Was the prompt that you used josh Josh: Yeah so i'll pause playing this game to Josh: actually show you the the prompt it was very simple it was this Josh: one paragraph i want you to make a game you can Josh: use python or c++ whatever you find the most convenient a 2d Josh: platformer that closely resembles super mario use the Josh: attached background image and sprites found in the Josh: asset folder take into account that the sprites don't come with transparent background

Josh: but pink ones so you need to filter the background and for those who are Josh: watching you can actually see the sprites on my screen they were Josh: just a series of assets that there was no context given as Josh: to what each one of them was but the model reasoned through it it removed Josh: the background and it actually generated a pretty good representation of Josh: that now this was built one shot on codex which Josh: is the new open ai mac application that just released this

Josh: week and i wanted to compare it to claude Josh: so i have another instance here on the screen with claude this is using opus Josh: 4.6 the newest frontier model that they just released this week and i want to Josh: do an exact one-to-one comparison so i'm gonna launch the same exact prompt Josh: we're gonna have that cook on codex or we're gonna have that cook in claude Josh: code and in the meantime you just maybe we can kind of talk about more of what

Josh: these models do and how they work well. Ejaaz: Before we do that actually um as you set this game up i ran it on claude opus Ejaaz: 4.6 as well but with a slight twist okay Josh: Let's see your output what do we have okay.

Ejaaz: Uh i don't know if you can see my screen Ejaaz: but it is the exact game that you just created but i don't know if those characters Ejaaz: look uh kind of familiar to you we have the uh hero protagonist character which Ejaaz: is uh my beautiful face and my beautiful person ejaz um and we have uh who's Ejaaz: this enemy over here that looks a lot like the bear guy Ejaaz: and listen we can double jump here josh and i think yep i can crush you but every time i mean this

Ejaaz: Kind of jokes aside, this is insane. This took me like around three minutes to build end-to-end. Ejaaz: I used the exact same prompt that you gave me. Ejaaz: And we didn't have sprites ready-made of ourselves, right? Ejaaz: We didn't have like cartoon images of ourselves. So I uploaded an image that Ejaaz: we had taken, I don't know, like six months ago and said, hey, Ejaaz: can you make game avatars out of this?

Ejaaz: It did it in 20 seconds. And then I said, could you add these to the game and Ejaaz: replace the enemy with Josh and the protagonist with Ejaz? And it did it in a minute. Josh: So here we go. That's pretty amazing. And these are really, these are just using Josh: standard desktop applications. So what you're using right here, Josh: this was done in Cloud Code, right? Josh: You just went onto Cloud, the MacBook, the Mac app. You downloaded it. You put in the prompt.

Josh: You shared some assets. And now it built this amazing game in one single prompt.

Comparing Model Outputs

Josh: And we're actually going to experiment further in this episode where we're going Josh: to create a trading room that does actual real-time stock analysis. Josh: So as I'm curating the prompts and as we're getting ready for that second demo, Josh: maybe we could walk through what makes these models so exceptional. Ejaaz: Yeah, well, you might actually notice the first difference on screen right now. Ejaaz: If you notice, if you look closely, my avatar is kind of glitching out, right?

Ejaaz: And if you compare it to your Codex game that you just coded up, Ejaaz: there's no glitches. It runs super smoothly. Ejaaz: And the main takeaway here is Codex 5.3 is a superior coding model to Anthropic.

Codex vs. Claude Performance

Ejaaz: And that's a sentence I never thought I would say, at least for the next couple Ejaaz: of years, because Anthropic has held that prestige and title for so long. Ejaaz: But since Code Red was initiated in open air around three months ago, Ejaaz: Sam has devoted pretty much all his resources towards building the best coding model. Ejaaz: And the benchmarks don't lie. It is a full 12 points on the software engineering Ejaaz: benchmark ahead of Claude Opus 4.6.

Josh: That's a pretty significant difference. Ejaaz: So I've actually pulled up a more general comparison between the two models here. Ejaaz: And it summarizes it really well. So if we look at Claude's model, Ejaaz: Opus 4.6, what's good about it? Ejaaz: Well, they've 5x the context window.

Ejaaz: So it's gone up to a million tokens or rather characters that you can put in Ejaaz: a single prompt, which if you want to understand how powerful this is, Ejaaz: you can just put way more information into your initial prompt. Ejaaz: It has much better context and memory. So you can end up cooking up much better Ejaaz: products overall, which is very, very impressive and important to have. Ejaaz: Number two, I would think about this as an orchestration model.

Ejaaz: So if you look at specific benchmarks, it is beaten OpenAI at GDP eval. Ejaaz: GDP eval is a benchmark where they go out and they test a model's performance Ejaaz: at a really complex task versus a professional human that would normally do that task. Ejaaz: And the decision is, would you use the AI model or would you use the human? Ejaaz: And in this case, you would choose Claude 4.6 over humans way more than you

Ejaaz: would choose OpenAI's latest model. So that's a really important thing.

Exploring the Models' Features

Ejaaz: And the point around Claude's latest model is that it doesn't code as well as Ejaaz: codecs, but it can orchestrate a bunch of agents and overall activity better than OpenAI. Ejaaz: Now, if you look at Codex and OpenAI's new models specifically,

Ejaaz: It wins on the software engineering. It is simply a better software engineer Ejaaz: than Claude is, which is a massive flip around and shows that it's a testament Ejaaz: to how much resources and fine-tuning that OpenAI has been able to achieve. Josh: And to the note on the quality of the models here, my prompt is done in Claude Josh: code that I used, the same one that we used in Codex. And I'm going to run it Josh: here for the first time now.

Josh: You can see on screen and we'll see what it looks like. Josh: So underneath, we have our Codex version, which looks beautiful.

Josh: On top we have our brand new version that was just made by opus now i haven't Josh: tried this yet so we're going to see what happens when i press space to start, Josh: so it looks like opus has failed to create a Josh: floor so i am just falling through the floor until the game ends um okay so Josh: just based on this one demo alone this is a fairly significant difference where Josh: gpt's codex has created a beautiful side scroller it doesn't have gravity but

Josh: i could just ask it to or it has gravity it's a little too much i could ask Josh: it to lower it opus doesn't even work at all, Josh: And again, the test was just a one-shot prompt. So I'm going to get back to Josh: work prompting it again to build this new application, the trading application. Josh: We'll follow up with that. But I think that's a funny kind of demo just to showcase Josh: that one actually is kind of superior in the other in this one use case, at least.

Ejaaz: Yeah, I mean, you said it pretty clearly, which is Codex is the best coding AI model. Ejaaz: And I have to like, I can't emphasize that enough because OpenAI for a long Ejaaz: time was behind Anthropic and by a massive margin. and in some way, Ejaaz: shape, or form, they've been able to catch up. Ejaaz: Now, what's interesting here is both companies have focused on each other's goals.

Ejaaz: So when Anthropic was typically meant to be the leading frontier model in coding, Ejaaz: it now has decided to focus on what OpenAI was really good at, Ejaaz: which is overall orchestration and being a better generalized model, right? Josh: They're taking each other's lunch. Yeah, exactly. Ejaaz: OpenAI has decided to eat Anthropic's Ejaaz: lunch and say, okay, we've got the generalized stuff sorted out. Ejaaz: Let's try and figure out the coding specific niche, highly defined,

Ejaaz: professionalized functions. And it's produced the best coding model. Ejaaz: So it's kind of a weird win-win for both labs. Ejaaz: And what's awesome about this is they both now have really well-rounded, Ejaaz: but also very specialized models. Ejaaz: And the reason why this is important is, and this is like kind of maybe my hot take, Ejaaz: I don't think the coding models matter, Josh. I actually don't think the generalized models matter either.

The Future of Work with AI

Ejaaz: I think they're both going off to something much bigger, which is creating the Ejaaz: operating system for the future of work. Ejaaz: They know that AI models and AI agents are gonna automate a ton of different Ejaaz: industries and the industries are only gonna pick you if you can do both generalized Ejaaz: work and hyper-specific work really well. Ejaaz: That is coding and orchestration and managing your data.

Ejaaz: And now we have two amazing models dropped within 20 minutes of each other. Ejaaz: That does exactly that to the highest performance metric that we've ever seen before. Josh: They're pretty exceptional. So now for this next demo, I have it queued up here.

Building a Stock Analysis Tool

Josh: What we're going to do is, what I did is ask the model itself to build me a Josh: prompt for this. So I wanted it to create me an AI stock portfolio war room. Josh: And I asked, hey, I want to create this, create me a fully fleshed out prompt Josh: that kind of should solve this problem with one shot. Josh: So what I do is I loaded it up here in our Cloud Code app.

Josh: And then I also loaded it up into the codex app i created its own Josh: project folder and now i'm going to hit send so both of Josh: these things are thinking in real time we will check back Josh: in once their outputs are done and we'll compare again the second version Josh: which is more of a robust one i mean you'll see uh on Josh: the cloud screen it has this whole list of to-dos that it wants to do it has

Josh: an entire plan there's nine different panels that it's going to build it's going Josh: to do risk analysis matrix and portfolio action bars and all this stuff so we'll Josh: let that cook and let's get back to what separates these what people have been Josh: freaking out about on the internet more as these things get going could i. Ejaaz: Take three minutes show you some wild demos yeah Josh: Let's see what the internet's been demoing while we wait for hours to cook okay.

Ejaaz: Cool like listen our 2d mario inspired game was cool but imagine if i told you Ejaaz: you could recreate the entire pokemon game including levels cities characters Ejaaz: and creatures that you fight from scratch in about an hour and 30 minutes Ejaaz: That's pretty impressive. That's what we're looking at right now. Josh: Wow, it even has the fighting. Ejaaz: Yeah, yeah, yeah. And buttons and the multimodal gameplay.

Ejaaz: And obviously this looks like it's been made by a child image wise, Ejaaz: but it's probably going to take you, what, another couple of hours to make a Ejaaz: really high fidelity game that you could probably run on your Nintendo Switch or whatever. Ejaaz: It is just so impressive that we can do these things.

Ejaaz: Anyone can do these things with no previous background. Just upload a few images Ejaaz: or generate a few images and you can create childhood nostalgic games that are Ejaaz: worth billions of dollars, which is just super cool to see. Josh: Yeah, one of the cool things that I think it's really important to note is how approachable this is.

Josh: Like for the recent example that we're having run right now on my screen, Josh: all I did was tell it what I wanted and ask it to develop the prompt with me. Josh: So even if it feels overwhelming, like you don't really know how to code, Josh: you don't know how to prompt things, you can actually just ask the model to Josh: help you generate the prompt, help explain to you how it works. Josh: And it's a really easy way to build basically anything you can imagine.

Josh: It's not just games. It's productivity tools. It's CRM tracking.

Technical Demos Unveiled

Josh: It's whatever you want it to be so i think that's really interesting but it Josh: also goes much more technical right i saw another crazy example with the compiler.

Ejaaz: Okay so for for the tech nerds Ejaaz: out there that's been a lot of time coding you are going to Ejaaz: be wowed by this um for one of their uh flagship demos for uh opus 4.6 the anthropic Ejaaz: team decided to task the model with building a c compiler which is an incredibly Ejaaz: complicated execution tool that is required to code up some of the most craziest types of apps.

Ejaaz: And they just walked away. And they just kind of like looked at it, Ejaaz: monitored it, made sure that it wasn't going awry. Ejaaz: And in two weeks, let me emphasize that, Ejaaz: Two whole weeks, 14 days, it coded nonstop and built this compiler. Ejaaz: Now, you might think two weeks is quite a long time. I want my thing done in an hour and a half.

Ejaaz: Well, let me hearken back to history where previously, if you wanted to create Ejaaz: something like this, in today's world, it would take a team of around 50 or Ejaaz: so humans, and it would take them a few months to build from scratch. That's today. Ejaaz: But back in the day, it would technically have taken them around a decade to Ejaaz: build and like thousands of people.

Ejaaz: So we have just kind of condensed the timeline to create really complicated Ejaaz: tools in a matter of hours or weeks in this case. Ejaaz: Now, the second thing I want to point out is the fact that these models can Ejaaz: go untouched for two weeks is just insane. Ejaaz: There was another stat that was released today by OpenAI with, Ejaaz: sorry, yesterday with OpenAI is 5.2, I think, 5.2 high, I believe, Ejaaz: where it can go pretty much 50% hit rate for 6.6 hours. a time horizon.

Ejaaz: So that means if you gave it any kind of complicated coding task, Ejaaz: 50% of the time in 6.6 hours, it would get that done, completely done. Ejaaz: And it would nail it 50% of the time, which is just such an impressive track Ejaaz: record when you look back a year. Ejaaz: And that time was, what was it like 30 minutes, maybe an hour. Ejaaz: So every iteration, we see this thing double. It's just so insane.

Josh: Yeah, it's really, it's unbelievable and almost like intimidating how Josh: capable and competent it is even for someone who Josh: is a novel at writing code it's not about writing Josh: code it's about being able to generate whatever you want it to so like if you Josh: think of it you kind of in a way it abstracts the code away and allows you to Josh: just speak the english language and get what you want from speaking english

Josh: and in a way that you understand and it will help walk you through the way one Josh: of the things that i love about cloud code in particular is the plan mode. Josh: If you leave a lot of things out of your prompt, it'll actually just continue Josh: to prompt you with additional questions to understand where you want. Josh: And one of the most fascinating things that I read about GPT's 5.3 codex in Josh: particular is like you mentioned in the intro, it helps build itself.

Josh: And I don't think that can be overstated because this is the first model in Josh: the history of OpenAI that has helped with the building and construction of itself.

Self-Improving AI Models

Josh: And what happens as that starts to ramp up, right? If you think of each model Josh: iteration as a flywheel, what is the constraint? Josh: The two constraints are the speed at which a developer can actually build it Josh: and then create the test for it and make sure that it's safe to ready to deploy. Josh: And then it's the hardware that's required to actually train the model.

Josh: What we're seeing with Codex and Opus, which I really believe was kind of Sonnet, Josh: is the incremental improvements. Josh: Now, for the incremental improvements that don't require an entirely new training Josh: run, the real constraint is the actual software and what you could squeeze out of it.

Josh: And when you have a model that's helping you build this Josh: software that can think for 6 12 24 hours Josh: at a time even longer and that is it kind Josh: of creates this like self-fulfilling loop right where the models use the Josh: new models to make the new models the future models Josh: stronger and more powerful and better and i thought that was a really interesting Josh: thing to note is that this is the first self propagating model where it ran

Josh: a lot of the test for itself it introduced new code that made itself better Josh: and as we continue to see that you can start to imagine that vertical that like Josh: exponential progress line going pretty close to vertical and things getting Josh: really good like really really quick. Ejaaz: I think what most people listening to this might think is that, Ejaaz: well, what was different before? Ejaaz: Well, previously, models would just kind of work in a very analog mode.

Ejaaz: You would just point it at a problem Ejaaz: and it would just understand what the problem was and then solve it. Ejaaz: But it lacked that awareness and wider context as to like what the wider vision Ejaaz: and goal was to achieve and then figuring out stuff for itself. Ejaaz: You always had to kind of handhold it. But now with its ability to kind of like Ejaaz: understand what it's trying to do and look internally and say, Ejaaz: huh, I made that mistake because of this error in my code.

Ejaaz: I'm going to now like rewrite my code and then I'll be better at it. Ejaaz: It kind of functions similarly to a human. Now, I actually saw a great analogy. Ejaaz: I forgot who wrote it, but it's Ejaaz: fantastic. where if you imagine yourself standing on a sidewalk, right? Ejaaz: And a Bugatti Veyron drives super fast by you at let's say 200 miles an hour, Ejaaz: you'll be like, wow, that's kind of fast.

Ejaaz: And then two minutes later, another Bugatti drives by you at 300 miles an hour. Ejaaz: You'll be like, wow, that's kind of fast. But you wouldn't really notice the Ejaaz: difference between that 100 mile an hour difference, right? Ejaaz: But if you were in the car strapped in, you would notice it is significantly Ejaaz: improved. And that's how software engineers feel right now.

Ejaaz: Now, if you're someone that doesn't code all the time, you're not necessarily Ejaaz: going to understand these impacts, but it's really important for those of you Ejaaz: listening to this to figure out that this is massively impactful and will change Ejaaz: the way that a lot of things are happening today.

Automating Complex Tasks

Ejaaz: I mean, just take a look at this, right? This is a direct quote from someone Ejaaz: who is building at a major tech company, Rakuten. Ejaaz: And the quote here says, Claude Opus 4.6 autonomously closed 13 issues and assigned Ejaaz: 12 issues to the right team members in a single day, managing a 50-person organization Ejaaz: across six repositories. Ejaaz: Josh, do you know who else is responsible for doing that?

Ejaaz: An entire team of product managers that each get paid a quarter of a million Ejaaz: dollars in compensation automatically. Josh: Minimum per year at least yeah their.

Ejaaz: Jobs are automated now Josh: Well one of the earlier moments in Josh: which i realized this was pretty profound is is when claude co-work they Josh: said they built it with what just a hint like four people over the course of Josh: 10 days and it was 100 built by the current model of claude which is opus 4.5 Josh: at the time like the the amount of leverage from these tools is so high but Josh: it cuts both ways it's like if you can design and develop a product in 10 days,

Josh: then that means another company can probably do that in five. Josh: And it starts to lower the competitive threshold for these companies to catch up. Josh: And it starts to raise the bar of what is possible. Josh: Like if you could build something that profound in 10 days, what can you build Josh: over the course of six months? Josh: Like, can you really build something fantastic that has a moat that like actually Josh: delivers on the total power that you have by leveraging this AI?

The Competitive Landscape

Josh: It's going to be interesting to see because i mean what we're finding even with Josh: the the codex and opus dual launch is that these companies are right next to Josh: each other and if one publishes something, Josh: profound or something that attracts a lot of users they're just a few days and Josh: a few prompts away from copying it and that's like a pretty difficult thing Josh: to compete against on on the software front well.

Ejaaz: That's why if we look at the stock market over the last couple of days like Ejaaz: it's down trillions of dollars and i'm not exaggerating if you look at microsoft Ejaaz: over the last two weeks, the stock is down 20%. It's trading like a meme stock, which is just insane.

Ejaaz: And the reason why that is, is a lot of investors are anticipating that these models, Ejaaz: Specifically Opus 4.6 and Codex 5.3, will just create the tools that these billions Ejaaz: of dollars worth of SaaS companies have spent or valued their entire lives on Ejaaz: in a couple of seconds, just as you described.

Ejaaz: Now, the counter argument to this, Josh, is, and Jets of Wine actually kind Ejaaz: of went live at a conference and spoke about this and made this point, Ejaaz: If you're an AI agent or AI model that is capable of building these tools, right? Ejaaz: Why would you rebuild the tool every single time you do a function? Ejaaz: Surely you would just access the best tool and use it.

Ejaaz: So there's a bit more nuance where AI models aren't just gonna recreate your Ejaaz: entire software stack if you are at a Fortune 500 company. Ejaaz: That kind of doesn't make any sense. There are a bunch of tools that are hyper-optimized to do that. Ejaaz: But what it will do is it will connect all of these tools and silos in a much more effective way. Ejaaz: And maybe that requires rebuilding parts of it.

Ejaaz: Maybe it requires kind of connecting different ways, but not rebuilding the entire tools. Ejaaz: And whatever operating system that ends up becoming will be the most sticky Ejaaz: and valuable company ever.

Investor Perspectives on AI

Ejaaz: Now, that could be Salesforce, or it could be someone completely different, Ejaaz: a startup that we haven't even heard of. And I think that's really important Ejaaz: to understand, but people are experimenting. Ejaaz: And if you look at this graph right here, which is may not look insane to some, Ejaaz: but is insane to me at least, 4% of daily GitHub commits are now clawed code. Ejaaz: That was, I think, 5% of what it is today two months ago.

Ejaaz: So the ascent has just been insane. These companies are adopting it and they are using it. Josh: Yeah, the number is just going to keep going up and there's no reason why it Josh: wouldn't. It's such a testament. One, the speed. Josh: It feels like we're strapped in that car and now we're flying.

Josh: Two, an outsider might not look like it. It certainly feels like that Josh: on the inside and i think a lot of people are starting to notice this and get Josh: a little nervous about it too like look at this example on the screen right Josh: now this is a prompt from gpt 5.3 codex which basically created an entire minecraft Josh: clone in a single prompt and it looks awesome and it works really fast and it Josh: was super lightweight and

Josh: And it says, I also tried on Opus 4.6, but for some reason it got stuck. Josh: But you can build anything that you want very, very quickly, Josh: like very cheaply as well. Josh: What Opus 5.3, or Opus 5.3, I'm getting them all mixed up. Josh: What GPT 5.3 Codex offered is double the rates, the double the token rates for Josh: the next couple of months. Josh: So you actually have the freedom for their $20 a month plan to go and build whatever you want.

Ejaaz: Can I maybe deliver a hot take, Josh? Josh: Yeah, what do you got? Ejaaz: I think the most exciting part about these model releases aren't the models themselves. Ejaaz: Largely, I think the models are kind of similar in capabilities. Ejaaz: They are around the same coding benchmarks, and they can roughly do the same Ejaaz: things. They can spin up a bunch of agents and orchestrate themselves.

Ejaaz: The bigger picture, which I think a lot of people missed, was both companies, Ejaaz: Anthropic and OpenAI, are at war with each other. Ejaaz: And they're trying to basically build and own the operating system for work, Ejaaz: which isn't just a model. it's a software suite. Ejaaz: So this week alone, OpenAI didn't just release this new model.

Major Updates from OpenAI

Ejaaz: They released the Codex app, which is a desktop Mac app, which is kind of like Ejaaz: a command line interface, which makes the coding experience way better. Ejaaz: And they also launched an enterprise platform called Frontier, Ejaaz: which allows Fortune 500 companies to basically take this magical model and

Ejaaz: give it to non-coders and let them do magical things. Now, Ejaaz: All of these products together creates a very sticky experience where it starts Ejaaz: to make sense for software engineers and non-software engineers to use these products. Ejaaz: And it becomes incredibly sticky, which results in billion-dollar contracts, right? Ejaaz: Anthropic has done the same thing over the last two weeks.

Ejaaz: They released Claude Cowork, they released agent teams this week, Ejaaz: and then they released this new model. Ejaaz: They're going after the same thing, which it kind of makes sense why they're Ejaaz: releasing Super Bowl ads that are kind of shitting on each other now. Ejaaz: It makes a lot of sense. And so the point is, if they can own this operating Ejaaz: system, this future of work, they will basically be the most valuable company.

Ejaaz: And I think it's going to be when it takes most. Josh: I have to interrupt you here. We have some developments on our prompts that Josh: we've been working on, our AI stock war room. Let's go. That I'm going to have Josh: to share on the screen right now.

Real-Time Quality Assurance Testing

Josh: So currently what it's doing is it's asking to do some quality assurance testing. Josh: So you'll see it actually used a it's taking over control of my browser and Josh: it's asking to make prompts on the screen. So you can see all of this that you're Josh: seeing right here is generated live, and it's doing an actual real-time debug Josh: of the product that it made.

Josh: It's clicking around, it's resizing things, it's going through the links, Josh: and it's running real quality assurance testing on the actual product. Josh: It's really amazing to see.

Josh: This was all just built all these visual charts and they're all accurate so Josh: right now we're looking at nvidia we have a chart and i'm not going to mess Josh: with it because it's doing the real-time manipulation to do quality assurance Josh: checks but it's actually clicking through it's making sure the Josh: stats are accurate it's making sure all the widgets work and look it has this Josh: amazing graphs already it has sentiment analysis 85 percent of people are bullish

Josh: on nvidia it has recent signals from the news it has the assessment a risk assessment Josh: matrix where it shows the like export controls and chip controls. Josh: It has revenue and earnings every single quarter, charted, competitive moats. Josh: It has sector comparisons. It's like, this is unbelievable. Josh: And it just generated this in a single prompt. And I just find it really funny Josh: that we can actually watch this do it in real time.

Josh: So you'll see in this prompt, it's clicking through, it's taking screenshots of what it's seeing. Josh: And then it's digesting, analyzing, and understanding what it made, Josh: what it messed up and what it actually still has left to finish. Josh: And it generated everything, all of this in real time as we're recording this episode. Josh: So fascinating.

Ejaaz: Wow, it reminds me of some of the research platforms at the former companies Ejaaz: that I used to work at and they would pay, I'm not joking, millions of dollars Ejaaz: a year to get access to these types of platforms that would give them analysis Ejaaz: like what you're showing on the screen right now. Josh: And you just built it from scratch. From scratch, and look, it's doing this.

Josh: I'm not even touching my keyboard. I just searched for Apple and now I'm sure Josh: if I go over to the prompt, Josh: it's taking screenshots of apple it says apple dashboard Josh: looking great let me scroll to see the new three column button row layout and Josh: it's checking the button rows and it's really unbelievable like we have the Josh: investment thesis the bull case for it the bear case for it catalyst and timelines

Josh: it has wwdc built in it has the iphone 18 launch props um set up for september, Josh: It's like so cool. It's absolutely unbelievable. And now this is a real tool Josh: that I'll be able to use to type Josh: in whatever stock I want to look at and actually get some analysis on it. Josh: Now, I'll go over to Codex over here and it looks like Codex is taking its sweet time.

Josh: It's still zero out of six tasks completed. So it might take a little while Josh: for us to get a visual on that, but it's just amazing to watch this happen in Josh: real time as at least Cloud Code and Opus 4.6, Josh: does some quality assurance testing live by taking over my browser and running Josh: it for itself. I just think this is like, this is amazing.

Ejaaz: It's magic. Something I just noticed in your Opus chatbot screen when it's going Ejaaz: through its thinking, it seems to have like spun up a few different agents or Ejaaz: instances of its own self to pull this off.

Ejaaz: Like I think if you scroll up, like I saw a few kind of like prompts that like Ejaaz: suggested that that's what it was doing, Ejaaz: which I think is, underscore is a very important point that both of these models Ejaaz: can do, which is they can spin up multiple versions of the same model and task Ejaaz: it with different things to run in parallel.

Ejaaz: What this means is you can get a really complicated product like what you're Ejaaz: seeing on the screen right now in a matter of minutes because it's running in parallel. Ejaaz: So imagine having a bunch of computer science geniuses that you can just duplicate Ejaaz: immediately and run at a fraction of the cost of electricity, the cost of inference. Ejaaz: And now you start to see why all these NVIDIA chips and stuff are worth so much.

Ejaaz: Because you want to do cool stuff like this. This is insane. Josh: It's actually incredible. Okay, so now I want to test it on Tesla. Josh: So I'm going to choose Tesla and see if it actually can do it in. Ejaaz: A non-controlled environment. This UI is so cool. Josh: It's very pretty. What the hell? This looks great. Okay, so here we have Tesla. Josh: It has the charts. We're going to click through the charts. It has the one-week

Josh: chart, the one-month chart, the three-month chart. That looks fairly accurate. Josh: It has the price-to-earnings ratio, the 52-week high, 52-week low. Josh: So it looks like at one point it was trading at $4.88, now it's trading at $3.89. Josh: The bull case for Tesla, RoboTaxi and FSD driving licenses could unlock $500 Josh: billion in revenue by 2030. Josh: It has the RoboTaxi service launch in Austin that it's preparing for.

Josh: And let's see the sector comparison. So it's comparing it to Rivian, Baidu, Toyota, Ford. Josh: It has the competitive moat where it says it's most strong in brand power, Josh: IP patents, and cost advantages. Josh: You can see the revenue, the estimate per share earnings. Josh: Sentiment is much worse on Tesla than it was on Apple. It's at 52% right now. Josh: And it looks like, as it relates to the risk assessment, devaluation and competition Josh: and execution are all very high risk.

Josh: And that's probably an accurate assessment, although I'm not sure the competition Josh: is really a problem. The execution is certainly going to be an issue. Josh: But it's just amazing to see how well it does. And it even gives it a verdict. Josh: So the AI verdict on Tesla is, Josh: It's a hold. Tesla's optionality is enormous, but current valuations already Josh: prices in multiple moonshots. Josh: Execution on RoboTaxi will be the key catalyst. That sounds about right.

Josh: And it's amazing that we just built this with a single prompt without any oversight from me. Josh: And it works. It actually works. It's really just unbelievable how capable these things are.

Creating a Stock Dashboard

Josh: And now I have a dashboard that anytime I want to make a decision, Josh: I can type in the ticker and get all this um optionality it even has menus that Josh: work look at this profit margins pe ratios market cap wow pretty unbelievable it's.

Ejaaz: It's a reactive in real time bloomberg terminal oh wait for the modern age Josh: There's um there's another feature here that looks like you could compare stocks Josh: let's see if this actually works here so if i type in let's say apple's ticker Josh: and i hit go will that compare the two now it looks like that doesn't work very Josh: well oh my god but it has moving average lines and everything. This is pretty robust. Ejaaz: I know it's like the traded and investors dream. Just crazy.

Ejaaz: Kind of like a side note on this, but like, Ejaaz: The fact that Tesla's down and everyone's kind of like bearish on this company, Ejaaz: even though they're like rumored to be merging and stuff like this. Ejaaz: Like the point being is there's an asymmetry between what the market is seeing Ejaaz: and what these inventors and builders are seeing. Ejaaz: These AI labs have created what they define as pretty much a low form of AGI.

Ejaaz: You literally have an AI model that is building the next version of itself. Ejaaz: That by description is like a super genius and it's only limited by the function Ejaaz: of energy and compute, right? Ejaaz: And then investors are looking at this and saying, huh, Amazon and Google are Ejaaz: about to spend a combined $500 billion worth of CapEx this year.

Ejaaz: Kind of bearish, that's a lot of money. So there is a real investment opportunity Ejaaz: here to really understand the difference of what these things can actually do. Ejaaz: And that might lead to a lot of like opportunities to invest.

Ejaaz: I don't know, but I know that I'm buying Tesla today and a bunch of google stock Josh: Yeah i mean look at this google valuation one this chart looks absolutely gorgeous Josh: but two um the ai verdict is a buy even the ai thinks google is a buy because Josh: they just have um alphabet offers the best value in mega cap tech dominant ai Josh: capabilities diversified growth and a cheap valuation if search mode holds and.

Ejaaz: Yeah give me the week give me the week Josh: Let's see the weekly chart here do you want some moving average lines as well Josh: because we could drop those in please let's. Ejaaz: See let's see i'm actually super yeah look see it's had a slight dip Markets are so reactive. Crazy. Josh: Yeah, and I think to the point of the CapEx, markets are viewing that as a scary, high-risk statement.

Josh: But while that's true, I also think it's a testament to the fact that scaling Josh: laws are going to work, and the largest companies in the world are betting on Josh: the continuation of them working. Josh: And the shared consensus between all of these large-cap companies deciding to Josh: spend record CapEx this year, Josh: is a testament to the fact that things are only going to go faster. Josh: And they believe that the more money they put in, the more outputs they will get.

Josh: And they're going to continue to put their foot on the gas. So I think any question Josh: that anyone had, if these scaling laws could continue to hold up and we could Josh: continue to be on the path to whatever AGI looks like and beyond, Josh: I think that was answered this week through these earnings reports. Josh: And the overwhelming answer is yes, it's true. Josh: It is likely that this is going to happen and everyone is betting their entire company on it?

Ejaaz: I think we have done a great job, if I pat ourselves on the back virtually, Ejaaz: Josh, of showing what these models are capable of. Ejaaz: And remember, it's been less than 48 hours that these models have been alive. Ejaaz: In fact, I think it's been like 36 hours. So if any of you are interested in Ejaaz: trying these out, I cannot urge you enough to go out and try these things.

Ejaaz: Try to solve a problem that you're finding at work or try to solve a problem Ejaaz: that you're finding just in your casual leisure time to code up a hobby or a Ejaaz: project in a matter of seconds. It's so, so easy. Ejaaz: And it'll put you at an advantage to understand how these tools work and why Ejaaz: they're really changing the world as we see it around us, why stocks are dumping, Ejaaz: why some stocks are pumping.

Ejaaz: But yes, go demo it. Let us know what you actually end up building. Ejaaz: Josh and I are trying to give you more live demos in a lot of the episodes that we put out. Ejaaz: And with every other model release and feature that drops, we are going to be Ejaaz: trying and testing these things so we can bring to you exactly what these things Ejaaz: can do and show you kind of like the benefits and disadvantages, Ejaaz: what's real and what's really not.

Josh: Yeah. And I can't stress this enough. The best way to stay on top of things, Josh: the best way to feel like you're not being left behind is just to use the tools Josh: as they come out and to understand them and what makes them different. Josh: And for a single subscription to ChatGPT or to Claude, you can access tools Josh: just like this and build stuff just like this. Josh: I'm not, this wasn't like an incredibly difficult technical challenge.

Josh: You just ask it what you want and you ask it to help you. Josh: And it will actually walk through and help you through the process and build whatever you want. Josh: So the most important thing for anyone listening is just to train that muscle and to get familiar with, Josh: these tools and these skills that you're able to leverage them to your advantage, Josh: however it may best fit in your life. Josh: And that's what kind of we wanted to share with us.

Josh: Like, it's simple. You download the app, you log into your account, Josh: and you're on your way. It's really Josh: not as difficult as I think a lot of people make it seem like it is. Josh: And I mean, this beautiful dashboard is a testament to that.

Josh: Okay, so Ejaz, it also looks like our codex output Josh: has finished itself so we have here on the Josh: screen we have opus which we saw which is Josh: really a lovely dashboard but it seems like codex Josh: now has its own version that we could quickly compare so maybe we'll try we'll Josh: go to our favorite google we'll type google in and we'll click analyze and kind Josh: of see how this compares i find it funny how they've they've merged on the same

Josh: type of design style but yeah oh okay this whoa this is interesting this is Josh: different so it has the moving averages select oh is that, Josh: Okay, yeah, so it has the charts. Ejaaz: Is that accurate? Josh: It has the PE ratio. Yeah, that's what I was looking at. Let's go to that one-week chart and see. Josh: I have some questions about these. It looks pretty right. Ejaaz: Okay. That looks very wrong. Josh: Yeah, the one you're a little confused about. Let's compare it to Claude here.

Josh: Let's go to Google and we'll analyze that. Well, it thinks we can look at the Josh: rest. So it looks like it emulated pretty well. Josh: It has the verdict. It has the same stats. Josh: The risk assessment matrix is... good but you could see like some of the text Josh: you can't really read because it's black on black um but nonetheless pretty Josh: interesting they both succeeded.

Ejaaz: Yeah i mean as we said before like these models are very equally capable and Ejaaz: you know maybe it's just the way that you prompt something or uh the way that Ejaaz: some of these things work but largely they kind of achieve the same goal and Ejaaz: same quality um and like listen like we're talking about like minor discrepancies here Ejaaz: I can't wait to see what we will build with this. Like, this is insane. Josh: It's amazing. Both of these one-shot prompts didn't touch anything.

Josh: And here we are. I do think that Google, when your chart is wrong, Josh: I think Claude got that one right. Josh: But we overall both succeeded in the mission. Both look great. Josh: And both are just excellent models.

Conclusion and Future Insights

Ejaaz: Amazing. Okay, well, that's it. Wherever you're listening to this, Ejaaz: if it is on YouTube and you're watching our lovely faces, or if you're listening Ejaaz: to us on Spotify, Apple Music, or wherever you listen to us, Ejaaz: please subscribe, give us a rating, leave us some comments.

Ejaaz: We love your feedback and we respond to pretty much every single comment because Ejaaz: we're trying to figure out how to make this show better and bring you the content Ejaaz: that you guys deserve and want. Ejaaz: Turn on notifications because we are releasing more and more videos every week Ejaaz: on the hottest topics as they come out. Ejaaz: We also have the sickest newsletter ever where one of us will either write a Ejaaz: essay or give you the five top highlights of the week.

Ejaaz: So if you don't want to watch any of these videos, you can just read and digest Ejaaz: that and you'll know everything that you need to know in AI and frontier tech. Ejaaz: Thank you for listening, and we will see you on the next one. Josh: See you in the next one. Peace.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android