Everyone Needs to Use OpenAI Codex... Until Claude Mythos Comes Out - podcast episode cover

Everyone Needs to Use OpenAI Codex... Until Claude Mythos Comes Out

May 05, 202631 minEp. 168
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Let's examine the fierce competition between AI coding tools Anthropic's Claude and OpenAI's Codex. As Codex emerges with robust updates, we discuss user experiences and showcase demos comparing game development and dashboard creation. 

Highlights include Codex's superior interface and innovative features like auto-review and Chronicle. We also explore the broader implications for AI integration in coding tasks.

------
🌌 LIMITLESS HQ ⬇️

NEWSLETTER:    https://limitlessft.substack.com/
FOLLOW ON X:   https://x.com/LimitlessFT
SPOTIFY:             https://open.spotify.com/show/5oV29YUL8AzzwXkxEXlRMQ
APPLE:                 https://podcasts.apple.com/us/podcast/limitless-podcast/id1813210890
RSS FEED:           https://limitlessft.substack.com/

------
TIMESTAMPS

0:00 Claude vs Codex
3:25 Image Generation Capabilities
4:52 Long Horizon Autonomy
8:55 Chronicles
10:19 Demo
16:49 Dashboard Creation Challenge
20:30 The AI Model Harness Explained
24:27 The Future of AI Tools
26:20 Claude Mythos
27:26 Verdicts

------
RESOURCES

Josh: https://x.com/JoshKale

Ejaaz: https://x.com/cryptopunk7213

------
Not financial or tax advice. See our investment disclosures here:
https://www.bankless.com/disclosures⁠

Transcript

Claude vs Codex

Josh: A few months ago, we told you to use Claude. Now we're telling you to switch Josh: back because for those of you who aren't familiar, well, over Christmas break, Josh: there was a major vibe shift where AI coding went from this like fun tool to Josh: things that developers actually use when they're shipping code. Josh: And even if you're not a developer, the amount of use cases and applications Josh: that were created around that time were really strong.

Josh: And since then, Anthropic has gone on this generational run of shipping these Josh: incredible products seemingly every single day that has turned Claude code into Josh: this supercharged super app that is the place that EJS, Josh: I know you've gone to, I've gone there too, in order to get all of our AI progress done. Josh: Any work that we have, we've gone to Cloud Code.

Josh: Now, OpenAI has woken up. And over the last few weeks, Codex has shipped more Josh: features than most companies ship in a year. Josh: And I bet, I guarantee that you haven't heard of some of these features that Josh: we're going to talk about in this episode. Josh: The pendulum has fully swung back, or at least I believe, because I'm totally Codex-pilled.

Josh: And in this episode, we're going to kind of walk through the differences between Josh: these two and why the model that you're using today probably won't be the model you're using tomorrow. Josh: And I don't think we're going to convince you, but maybe we could show you why Josh: you might want to consider using something else here. Ejaaz: I just want to talk through some of the crazy stats here because the script

Ejaaz: has genuinely flipped. A few months ago, Claude Code was anything everyone could talk about. Ejaaz: And every software engineer was using Claude Code. Every enterprise was installing it. It was crazy. Ejaaz: But just over the last couple of weeks, specifically by the end of April, Ejaaz: Chat GPT 5.5 was released, and that was plugged into the coding AI model. Ejaaz: It's all one and the same.

Ejaaz: And OpenAI went on this code red run where they focus on nothing but building Ejaaz: the best coding AI model and the best LLM. Ejaaz: And the numbers show that it's worked. Ejaaz: Over the last week, Codex has been downloaded over or installed over 46 million times.

Ejaaz: Cold code, under 500,000 times. Now, that is crazy to say, because if you look Ejaaz: at the historical data, cold code downloads and installs has absolutely dwarfed Ejaaz: Codex, but something changed over the last couple of weeks. Ejaaz: That something was OpenAI putting out just a better model. You mentioned that Ejaaz: you were Codex pill, Josh. Ejaaz: I think so am I. I've spent the last couple of days playing around with Codex.

Ejaaz: This morning, we prepped a bunch of really cool demos, and it is just completely flipped script. Ejaaz: But it's one thing saying it. It's another thing actually showing the direct Ejaaz: comparison. So we created this visual artifact to kind of Ejaaz: give you the scoreboard. And you can see it at the top here. Ejaaz: It's OpenAI Codex at 11 and Anthropic Claude at two. But let me explain why. Ejaaz: Okay. So number one, computer use. Ejaaz: Codex and Claude code can use your computer.

Ejaaz: It can take over your desktop and it can like move your cursor around. Ejaaz: Now, Claude pioneered it. There were the first ones there, but it was super slow. Ejaaz: It kind of runs into a bunch of obstacles and you have to kind of like handhold Ejaaz: it and improve it to do a bunch of different things. Ejaaz: Codex is not only quicker than me, it's quicker than the average person. Ejaaz: In fact, I can actually see the cursor move around so quickly.

Ejaaz: And it's like using a computer, but it's a superhuman and it can run pretty much 24-7 at this point. Ejaaz: Long horizon autonomy. Codex can work for longer in a much more intelligent Ejaaz: manner versus Claude code, which is, again, crazy to say because literally a Ejaaz: month ago, it was the inverse of this. Ejaaz: Claude right now can run for a decent number of times or amount of time, Ejaaz: but not as long as Codex can.

Ejaaz: And then the last two that I want to talk about here is browser use. Ejaaz: So Codex can take over your browser. It can do a lot more intentional things. Ejaaz: It understands what it's looking at, very importantly. Previously, it could not do that.

Image Generation Capabilities

Ejaaz: Claude can do the same, but not as intelligently. And then finally, Ejaaz: ChatGPT Images 2.0 got released, what was it, like two weeks ago now? Oh, it's so good. Ejaaz: Yeah, it's the image generation model from OpenAI, and it is absolutely astounding. Ejaaz: In fact, it beat all the other predecessors, including Google's, Ejaaz: what is it, Nano Banana 2.0 Pro, which previously held the lead. Ejaaz: It beat it across every single benchmark.

Ejaaz: Anthropic, on the other hand, doesn't even have an image gen model. Ejaaz: So, so far, it's crushing. Yeah. Josh: Yeah, I think a lot of the best is now bundled into codex. The image gen for Josh: anyone who uses any sort of visual work is unbelievable. And being able to use Josh: that directly in your software is awesome. Josh: One thing that you mentioned is the long horizon autonomy. I think that needs Josh: a double clicking on because it's really impressive how well it works.

Josh: Traditionally, there's been this thing called a Ralph loop that we use. Josh: It's actually named after the character from The Simpsons who is very persistent. Josh: And it's basically a planning mode where you give the AI a goal and it will Josh: continue to iterate towards that goal until it accomplishes it. Josh: So like, let's say you want to build a Lego car or something and you give it the exact parameters.

Josh: It will go and go and go until it solves that problem and gives you exactly Josh: what you want in a way that other AI models haven't. Josh: Codex did that. And this is the only native implementation that you can get Josh: of this long horizon thinking where it actually will go for days on end. Josh: I've seen screenshots of some thinking for as long as 36 hours to accomplish the goal. Josh: So if you have really difficult tasks, Codex is going to be really good at solving those.

Long Horizon Autonomy

Josh: Now, continuing to scroll down, there was another feature that was just released Josh: this week called auto review. Josh: And a huge pain in the ass for people who are creating code for working on complex Josh: projects, whatever it may be, is you're constantly having to sit there and approve Josh: things because the permission system is a little finicky, right? Josh: You don't want to give it full access to your computer.

Josh: You also don't want to sit there and approving every time it wants to use Chrome Josh: or every time I want to access your file. Josh: So Codex created auto review and they rolled it out last week where the agent is kind of smart. Josh: It knows which things are going to possibly be systemic existential threats Josh: and which approvals aren't. Josh: And it will just automatically approve all the things that aren't going to get you in a lot of trouble.

Josh: It creates a much easier user interface where you can just kind of walk away Josh: from the computer for a little while and come back and things get done.

Josh: Memory and context is pretty strong i'd say Josh: the one thing and we haven't mentioned many claude winners the Josh: place where claude wins currently is on their open claw capability funny Josh: enough because open ai bought open claw but dispatch is the mobile app feature Josh: for claude in which you can actually engage with claude code remotely that doesn't Josh: currently exist on codex and while the team has promised to ship that you don't

Josh: actually have that currently today claude has that also in terms of the personality Josh: and ui claude is just so much better i think we're going to get into our.

Josh: Personal takes but whenever you're using an llm versus an actual Josh: tool set or a harness claude is pretty great and the Josh: ui is very warm so there's there's some kind of Josh: instances in which claude is better but for the most part codex is Josh: really just kind of crushing it and i've really enjoyed using it one of the Josh: fun things is pets i mean just recently they released pets and claude also released

Josh: pets but these pets are a little bit different this is an example of angry dario Josh: we're seeing on the screen and it's fun because you have this persistent character Josh: that exists throughout your computer use. Josh: And as you're engaging with Codex, it'll just kind of chat with you in the background Josh: so you can see your progress, see where you're at. Josh: It's fun, it's playful, and it just shows that they kind of care about the user experience.

Josh: Now, one feature I would guarantee most people don't know is Chronicle, EJS. Josh: And you were just telling me about Chronicle and how cool it is, Josh: how it kind of monitors your screen as you go. This seems like novel technology Josh: that we haven't seen yet. Ejaaz: Yeah, so one of the earliest episodes that we did here on Limitless was an interview Ejaaz: with the folks at OpenAI that created... Ejaaz: Something called, what was it called, Josh? Do you remember?

Ejaaz: It was like agent mode or personal mode, something like that. Josh: Yes. It thought overnight for you, right? Yes. Ejaaz: It basically took all the conversations that you'd had with ChatGPT the night Ejaaz: before or the day before or the week before, and it created important context Ejaaz: around you in the form of something called memories. Ejaaz: This is where AI memory was birthed from OpenAI themselves, from the OpenAI

Ejaaz: team. And what it would do is it would feed you a report in the morning that Ejaaz: would update you on information that it thought you would be interested to read about. Ejaaz: So say, for example, you were interested in the stock market, Ejaaz: it'll give you an update on a bunch of advancements that had happened overnight Ejaaz: or over the last week or whatever it might be. Ejaaz: Right now, fast forward today, memory is embedded across every single AI model

Ejaaz: and tool. The reason why is context is so important. Ejaaz: It's one thing a user asking for something explicitly and directly. Ejaaz: It's a complete other thing for an AI to actually understand what you mean, Ejaaz: the nuance in the sentence that you've created, and even better, Ejaaz: to predict what you want. Ejaaz: But there was still an obstacle, which was you needed to feed it the context Ejaaz: and say, hey, Claude, hey, ChatGPT, can you remember this?

Ejaaz: OpenAI recently released a feature called Chronicle, where it observes what you scroll through, Ejaaz: What you click on, what you type, and it builds its own context and memories Ejaaz: around you without you needing to feed it, which actually led to a really cool Ejaaz: prompt that you pointed out, Josh, or that you found, which was, Ejaaz: what have I been doing very inefficiently on my computer, according to Chronicle,

Ejaaz: which is this new memory feature, make some recommendations, Ejaaz: be direct, tell me what I need to hear. That's, that's pretty awesome. Josh: Yeah. So this is alpha because I don't think a lot of people recognize that Josh: this is a possibility because Codex and OpenAI didn't do a good job of explaining this. Josh: When they released Chronicle, they said it's a way of the system to review your Josh: code as you've gone because it's been taking sequential screenshots.

Chronicles

Josh: But it's the reality is, is that it's much bigger than this. Josh: And I suspect they didn't market it this way because it could be a bit of a privacy issue, Josh: but it's essentially constantly monitoring your screen and taking screenshots Josh: of what's happening on your screen and interpreting it so it understands your Josh: habits, the way that you work, the thing that you do. Josh: And then you can ask it, what have I been doing very inefficiently on my computer?

Josh: According to Chronicle, make some recommendations, be direct, Josh: tell me what I need to hear. Josh: And it'll actually evaluate how you've been using your computer, Josh: how long you've been scrolling on Twitter, perhaps, how long you haven't been Josh: doing the things you're supposed to be working on, or just generally how to Josh: improve your workflow and give you real feedback based on your actual actions that it's seen.

Josh: And I think this is a super powerful thing currently only available to pro members. Josh: So if you pay for the $100, $200 a month subscription, you get access to this. Josh: But I suspect this is the early signs of a very important feature they're going Josh: to roll out, which is that entire computer monitoring system to improve your Josh: system and also probably train the models to get better at engaging with your system.

Josh: But I found Chronicle to be one of those kind of secret features that not a Josh: lot of people know about, but has a lot of upside if you use it to your advantage Josh: and let it monitor what you're doing and improve your workflow on a day-to-day basis. Ejaaz: So the point is, from both of these companies, Anthropic and OpenAI, Ejaaz: we are getting feature releases every single week. In fact, every single day. Ejaaz: And it's becoming, I'm being bombarded by this.

Ejaaz: And it's hard to keep track with all of this. So what is the number one litmus Ejaaz: test for both of these models and products and companies?

Demo

Ejaaz: It's to actually use the thing. It's to build the thing. Ejaaz: And we have two special demos that we have prepared for you that we're about Ejaaz: to jump into. Now, Josh, can you guess what my first demo is about? Josh: The theme. First one's a game. We're gamers, man. I want to play a game. Josh: I want to see how well it does on a game. Josh: I know we did this demo in the past months ago. It left a lot to be desired.

Josh: So I'm curious to see the current up-to-date status as it relates to Cloud Code Josh: versus Codex. Who's winning on the one-shot game prompt? Ejaaz: Indeed. Okay. So I am a nostalgic kind of guy. And so I was like, Ejaaz: oh, back in the day, I loved Mario. Ejaaz: So I want you, both of these models, to create the best Mario type or inspired Ejaaz: game, a side scroller, but make it futuristic. Ejaaz: Maybe add a little bit of neon, sprinkle a bit of neon in there,

Ejaaz: create levels. I want game design. I want there to be enemies. Ejaaz: I want there to be pitfalls. Ejaaz: And I also want there to be a scoreboard and also tell me how to do this thing. Ejaaz: I want, give me the whole package. Ejaaz: Basically, I fed this prompt or idea into ChatGPT and Claude. Ejaaz: And I said, can you create a detailed prompt that I can then feed into your coding models? Ejaaz: I then set each of the coding models to their highest settings.

Ejaaz: So what you're about to see is the best of the best for the most Ejaaz: detailed prompt that they came up with and let's see what they Ejaaz: did so step number one or example number Ejaaz: one is called opus 4.7 so Ejaaz: this is called code at the highest setting with their latest model Ejaaz: um okay it took the prompt pretty literally it's titled this neon plumber moon Ejaaz: base run which is obviously mario inspired and it said hey this is a demo edition

Ejaaz: by the way this is not production ready what i like about this is it's giving Ejaaz: me the instructions but how does the game actually play out let's see it looks Ejaaz: good can you see me here josh i Josh: Can yes i can and it looks like.

Ejaaz: The animations are pretty good i'm jumping around i think Ejaaz: i'm like a little robot i can see my feet pitter-pattering now Ejaaz: i'm guessing this thing is about to kill me so let's see if i can jump oh i Ejaaz: can jump there we go that's awesome um one bit can i kill this guy oh yes i Ejaaz: can now one bit of feedback i've noticed is uh i can't double jump and it told Ejaaz: me in the menu that i could double

Ejaaz: jump so that's weird so the physics hasn't really paid off can i die Josh: Oh, it certainly looks like you could die. Ejaaz: I can die. Great. Okay. So that is Claude's attempt at it. What's your feedback Ejaaz: on this, Josh? I think the graphics are pretty good. Josh: The graphics are great. For one shot, I mean, granted, this is only one single Josh: prompt. So for one prompt, it created great graphics.

Josh: It had sound design that actually sounds pretty accurate to what you would expect in the game. Josh: It has similar principles. It's following gaming principles. Josh: You kind of understand what looks dangerous, what doesn't. Josh: You knew that those spikes were going to hurt you and they hurt you.

Josh: The logic seems to be a little bit flawed i think it's having problems with gravity or at least that Josh: double jump functionality because it looks like those coins that you probably Josh: want to collect you can't actually reach because you can't do the double jump Josh: so in terms of logic not so hot in terms of visuals aesthetics in terms of i Josh: mean how good this game is from one shot very impressive yeah.

Ejaaz: I think it's important to understand that i started from zero it literally asked Ejaaz: me to give it a folder to build in and the folder was completely empty. Ejaaz: So all the visual renderings, all the graphics, the animation style, Ejaaz: the scoring system, the way that the avatar moves and looks was created from Ejaaz: scratch from a bunch of characters from this AI model.

Ejaaz: So this is Claude Code's current best attempt and it is way better than what Ejaaz: we tested out and honestly demoed on this show about a month ago.

Ejaaz: But now let's see what OpenAI's ChatGPT 5.5 codex at the highest possible setting cooked up Josh: Okay and this is using the same prompt correct so you just fed the model the Josh: same prompt identical but identical right oh god i'm excited i hope codex did Josh: well because now now that i'm a fan i'm gassing it up it better perform here okay.

Ejaaz: So this is gpt 5.5's attempt now you Ejaaz: might notice that this isn't the entire browser that's because Ejaaz: codex has a very unique feature which is not only Ejaaz: can it do all the coding in a single app for you but it Ejaaz: has an in-app browser so it can Ejaaz: live test the thing in the app without you needing to go to google chrome or Ejaaz: whatever but anyway we have the starting screen here it has also called it neo

Ejaaz: neon plumber moon base run it looks a little more rudimentary from the start Ejaaz: but i do like the background animation josh we didn't get this in the previous Ejaaz: one or at least not this side scrolling thing well let's Ejaaz: Oh. Josh: Oh, this is nice. Ejaaz: This is nice. I think this has good logic. Ejaaz: Wait, but this is no music. There's no music. I can't double jump. Ejaaz: Might be a skill issue. Might be a prompt issue.

Josh: Let's have a look. Did it say you can double jump? Ejaaz: That's a good question, actually. Josh: This is a fully playable game. Ejaaz: Yes. And I like that it's like zoomed in. There's like... Oh, Ejaaz: we got the boost. I can jump on the platforms. Let's see if I can kill this guy. Josh: Yes. nice okay. Ejaaz: And can i jump the gap there's a scoring system Josh: You could see your hearts oh dude this is way better power.

Ejaaz: Up wait oh my god i want the power up i'm still gonna go back double jump Josh: You can you could go back go back to the last platform oh god.

Ejaaz: I died i'm going i'm going to the last platform here Josh: We go it looks like they're sequentially gaining height which is interesting, Josh: oh but okay so if i'm comparing these two i'm actually i'm not feeling very Josh: let down this is good aside from the music not existing which we may not have Josh: explicitly asked um this it looks like the logic plays better the actual gameplay Josh: is usable this is a full i don't know if it's glitching or if this is you glitching no no.

Ejaaz: That is it's glitching it's glitching a bit Josh: Okay so it's still there are some edge case errors yeah but Josh: this is different in the sense that you have your hearts clearly projected you Josh: have a score system that's clearly in place you're able to get these power-ups Josh: they work they function i mean this is a very clean and functional game so i Josh: would give this to codex i think the experience perhaps the design of claude Josh: was better and And perhaps the music,

Josh: I mean, music was definitely better versus none, but Claude, Josh: in terms of just, or Codex, in terms of just coding logic and making a better Josh: game, I give, I give this Codex. Do you have a take? Ejaaz: Yeah. So on the build side of things, I had a much more pleasant experience.

Ejaaz: Using codex as well so i think codex wins Ejaaz: on this um i one-shotted it in the true sense Ejaaz: where i just gave it a single prompt and codex didn't ask for Ejaaz: any permissions it just kind of went on and did the thing i saw Ejaaz: it it's thinking and at points where it was unsure it thought amongst itself Ejaaz: and then made the decision to progress forwards whereas with cloud code it would Ejaaz: come to me now that might just be a developer engineer's preference right like

Ejaaz: if you're building a production ready app for like, I don't know, Ejaaz: a big company that you work for, you probably want to have more hands-on involvement.

Dashboard Creation Challenge

Ejaaz: Whereas if you're just building a game like we did today, where I don't really Ejaaz: care what it ends up looking like or what it does, then the hands-off preference Ejaaz: is probably something that you would use Codex for. But I think Codex wins this. Josh: So for our second demo, we have this handwritten piece of paper that I actually Josh: wrote and took a picture of. Josh: I didn't. It's GPT Image Gen 2.0, but it looks like it's handwritten. Josh: The handwriting was too nice.

Ejaaz: Josh. That was the giveaway. Josh: Yeah, my handwriting is far sloppier than this. But the idea is that you can Josh: even write things on the back of a napkin and you could turn that into an application. Josh: So what we did here is we just asked for it to create a generic limitless dashboard Josh: application on the back of a piece of paper, fed it into the model, and this is what we got. Josh: So it looks like it did a pretty good job.

Josh: I could tell this is Claude before you even tell me which model it is because Josh: it has the standard design principles.

Josh: Claude design is so basic and Josh: it's so predictable where like okay i've seen this Josh: dashboard before it looks like it was a mission success there's a Josh: lot of text on this page a lot of stuff going on a lot Josh: of graphics i give a lot of credit for kind of inferring what Josh: we would want to be seeing from something like this where we have a Josh: proper trip budget i don't think we asked for a trip budget um but okay i think

Josh: it looks like it made it did a lot of inferring right like it kind of made a Josh: lot of assumptions but in the end of the day it did take what we had on the Josh: napkin and it turned it into a pretty generic dashboard of sorts based on very Josh: limited information that we gave it.

Ejaaz: I think the issue with this is we asked for something Ejaaz: completely different it created a dashboard um but Ejaaz: we asked it for it to be based around the limitless podcast and Ejaaz: it created a travel planning board so i don't know Ejaaz: whether that was a a prompt issue or whether we just fed Ejaaz: it the wrong image but but here we go here is where we're Ejaaz: at um now let's take a look at what openai did okay so here we have the same

Ejaaz: prompt fed into gpt 5.5 and it's funny i can instantly tell this is GPT-515 Ejaaz: because it's cleaner and it's not neon and it's not trying to go for some futuristic spin. Ejaaz: It looks very simplistic. This is actually a website or app that I would probably Ejaaz: be more inclined to engage with. Ejaaz: It's also more visually perceptive to me, right? Ejaaz: Like, what do I have at the front here? It's this five-day trip that I want to go on.

Ejaaz: It's giving me the basic information that I need to know at the start. Ejaaz: It has a bunch of different tabs as well. Ejaaz: But again, it isn't what I specified on the napkin. So I think this might be Ejaaz: a skill to show on our side, Josh. But otherwise, like, look at these graphics. Ejaaz: They're like really good. One thing I've noticed is stylistically, Ejaaz: although both models create very different looking things, the animation style looks the same.

Ejaaz: Have you noticed that even with the game previously that we just demoed, Ejaaz: the avatar looked the same.

Ejaaz: It was given the same sort of title and the objects interacted in the same way Ejaaz: we're seeing this here so maybe it's just a change in quality i actually prefer gpt 5.5 on this one Josh: Yeah this is crazy i'm just going to suspect Josh: there was a prompt issue there where yes like we clearly we asked for something Josh: that we didn't actually want but here it is i think if you're just comparing Josh: them apples to apples uh chat gpt and codex is like no-brainer 10 times better

Josh: i far prefer this if you look at the original napkin photo this is much more Josh: accurate to what the design looked like on that original piece of paper. Josh: And then if you also just compare the general design, this is far easier to understand. Josh: It's just a lot less dense. It's designed better. I wouldn't even say this is really...

Josh: A fair comparison it seems like codex just like completely crushed this and Josh: it has all the functionality built in it looks good i am giving another win Josh: to codex here that's two for two. Ejaaz: Wow look i've got like a re-optimization uh toggle at the top and it actually Ejaaz: updated i wonder where it's pulling that data from it's already hooked Josh: Into data look at that yeah impressive stuff.

The AI Model Harness Explained

Ejaaz: Very very cool now one major reason why both of these models have advanced so Ejaaz: rapidly over the last couple of months is something known as the ai model harness Ejaaz: now you have the AI model, which is something that you and I have interacted with quite a lot. Ejaaz: It's via ChatGPT or Claude itself.

Ejaaz: But there's an added layer that you can put on top of this model, Ejaaz: which comes in the form of prescripted prompts that are engineered to make the Ejaaz: model act in a particular way. Ejaaz: But it's also the environment that the model works in. Ejaaz: It's also the policies that you set to make sure that the model acts and behaves Ejaaz: and sounds in a particular way. Ejaaz: That's why we talked about Claude's

Ejaaz: personality earlier being better than ChatGPT. It all plays into the Ejaaz: We figured out was it's an entirely new product category on its own. Ejaaz: In fact, Cursor had some news over the last couple of days where they made their Ejaaz: harness, Cursor SDK, available via API. Ejaaz: And the reason why this is such a big deal is critics criticized Cursor for Ejaaz: being an AI wrapper, which meant that Cursor doesn't have a model of its own.

Ejaaz: It would just create this harness, a set of prompts environments around, say, Claude or ChatGPT. Ejaaz: And so people would say, cursor isn't actually special. Turns out the wrapper Ejaaz: or the harness actually made these models way more intelligent. Ejaaz: In fact, if you added cursor's harness on top of GPT 5.5 and Claude Opus 4.7 Ejaaz: right now, you end up with a smarter, more intelligent, more efficient model Ejaaz: than the actual base models themselves.

Ejaaz: Now, remember, AI Labs spent hundreds of millions of dollars to train these Ejaaz: models and to create the best thing and put their best foot forward. Ejaaz: And still you have a startup which is worth, what is it now, Ejaaz: $10 billion right now, potentially being acquired by XAI for $60 billion, Ejaaz: creating a better model on top. Ejaaz: So the harness and the AI model are arguably one and the same at this point.

Ejaaz: And it's just a valuable moat to point out that these models aren't just better Ejaaz: at coding because of the base model itself. It's because of this thing known as a harness. Josh: Yeah. And the harness is the difference maker when it comes to building this super app.

Josh: It's like every single company is trying to build the super Josh: app the all-in-one application that kind of serves Josh: as your operating system anytime you need to engage with ai Josh: this is the place that you could do it and it's all encompassing it's Josh: all in one now one of the best applications we've seen for this in the early Josh: days has been something like open claw where it's this extension of what an

Josh: operating system could look like starting with ai at the foundation and open Josh: claw did a really amazing job of that now in some news this week you can now Josh: use your chat gpt account to generate tokens with OpenClaw. Josh: So previously you had to use the API, whether you were using Anthropic or OpenAI Josh: or any of the other models, and it was pretty expensive. It costs a lot of money. Josh: Now, thanks to Sam Altman this week announcing, you can actually use your account

Josh: connected with it. And I think this is the beginning of a multi-step plan. Josh: To really integrate OpenClaw directly into Codex in a way that Anthropik can't. Josh: Because if you'll remember, OpenAI owns OpenClaw. Josh: They bought Peter and granted OpenClaw will stay open source forever, Josh: but they have the ability to actually integrate directly into their products. Josh: And I suspect that's what we're going to see.

Josh: In fact, we even got some confirmation from another post from one of the Codex Josh: developers who replied to a post that was saying, Codex only needs a native Josh: editor, an iOS app, a full browser, and OpenClaw.

Josh: And the developer, Tebow, said all of this and more Josh: is coming to which sam altman retweeted it so we are Josh: indeed getting open claw inside of codex we're getting a Josh: mobile ios apps that you can access it remotely and soon Josh: there's going to be no reason to really use a different app Josh: because it's going to be all-encompassing now are there still downfalls yes Josh: computer use 20 faster on codex but yesterday i was playing around with it i

Josh: told it to increase the volume of my music and it took 10 minutes to do it because Josh: it tried to increase the slider on spotify even though it was max without actually Josh: increasing my system audio so it's still a little dumb, but it is getting better. Josh: And I think this leads me to this post that I really love, the vanilla maxing Josh: post we have to talk about.

The Future of AI Tools

Josh: Which starts by saying, you should 100% be vanilla maxing. Just use the tools Josh: as they're handed to you. That's it.

Josh: Because a lot of people, and I've found this personally, and in fact, Josh: I've been caught by this personally, is that you try to get caught up in using Josh: all these different repos and these skills and these plugins, Josh: when the reality is, is if you just wait, the AI labs are shipping fast enough, Josh: they'll just integrate it into your own native application. Josh: So I'm vanilla maxing you, Jess.

Ejaaz: I'm totally vanilla maxing as well, dude. Like, listen, OpenClaw, Ejaaz: when it was hyped up, was incredibly impressive and still is incredibly impressive. Ejaaz: It opened up an entirely new product market and segment. That's why OpenAI acquired them. Ejaaz: But something's majorly changed over the last couple of months, Ejaaz: which is OpenClaw has kind of fallen off. No one talks about it anymore.

Ejaaz: People who are complaining about the errors and bugs that we're facing have Ejaaz: kind of gone silent because they've just grown bored and they don't want to Ejaaz: put their energy and effort into it. Ejaaz: And the reason why is because although these tools are very frontier level, Ejaaz: they can't actually be scaled to a practical use. Ejaaz: You don't feel safe integrating OpenClaw into your desktop where you have personal

Ejaaz: files. I've seen horror stories where they access credit card data and expose Ejaaz: that or where they deleted old wedding photos and the wife was super angry, Ejaaz: a bunch of the stuff like that. Ejaaz: If you are able to get given or access to a tool that comes under a branded Ejaaz: reputation, such as ChatGPT, Codex, or Claude Cowork, where it kind of like Ejaaz: takes over your computer, but in a sandboxed environment.

Ejaaz: I know that NVIDIA also released NemoClaw, which is like the enterprise-grade Ejaaz: secure version of OpenClaw. Ejaaz: You're vanilla maxing. That is the way to do it. And there's no need to rush Ejaaz: ahead and lose all your data as a consequence. So that's basically it for the episode. Ejaaz: We wanted to give you a comprehensive guide and insight into Codex GPT 5.5 versus Claude Opus 4.7.

Ejaaz: There's a lot of numbers in there, but basically the best coding models from Ejaaz: both sides to see which is better.

Claude Mythos

Ejaaz: And the truth is, there isn't a clear winner right now. I would say it's probably Ejaaz: Codex GPT 5.5, but the narrative switched so recently that maybe, Ejaaz: maybe Claude can still catch up. And the only reason why I say that, Ejaaz: Josh, is that's the only reason Ejaaz: there's a model that we haven't discussed or demonstrated yet because we can't. Ejaaz: It's called Claude Mythos. Ejaaz: It was kind of pseudo-released about a few weeks ago.

Ejaaz: And on all benchmarks, it is technically better than 5.5. Ejaaz: But the reason why we can't demo it is we can't get access to it. Ejaaz: And the reason cited by Anthropic was because it's too dangerous. Ejaaz: It's a cybersecurity risk. In fact, it wasn't just Anthropic saying it. Ejaaz: It was Peter Heskett of the US Department of War also saying this, Ejaaz: right? So there's concerns around that. Ejaaz: OpenAI has created a mythos level type model here, but has made it available

Ejaaz: to everyone. And so the argument could be made that it's just because Anthropic Ejaaz: doesn't have enough compute. Ejaaz: So there's a lot of rumors around this, but I'm excited to get my hands on the Ejaaz: best models from each of these and compare them directly. Josh: Yeah. And the compute's actually been degrading. So I think I want to wrap this Josh: up on like, what do you actually currently use? Josh: What is the limitless production stack? How are we using these AI models?

Verdicts

Josh: And for me, at least, it's not even close. I'm codex-pilled. Josh: I'm fully switched over. I am codex superior domination. It's going to be the month of codex. Josh: Maybe Anthropic will have a comeback, but that's not happening until at least Josh: June, July, because this month is codex month. Josh: So I've been using codex for basically everything, all of the difficult tasks that I need.

Josh: What I have found is that GPT 5.5 as an LLM, as a language model, Josh: as a chatbot is a little bit inferior to Opus 4.7, which I believe to be the Josh: better model if you're just chatting with an AI.

Josh: I like its personality it's warmer it's more precise it normally Josh: gets the idea of what i want so if i am building a complex Josh: project opus 4.7 is the orchestrator and Josh: codex is the actual implementer the executor of this code of this plan i've Josh: also noticed that opus 4.7 is a bit inferior to 4.6 at a few things and i think Josh: this is another piece of alpha here i actually use opus 4.6 whenever i'm doing

Josh: anything relating to writing or word ingestion so one of the projects i've been doing recently.

Josh: Is andre carpathy he created this like wiki for Josh: your own person where it ingests files and it kind of writes Josh: these summaries for you and it creates a personal knowledge wiki i use opus Josh: 4.6 exclusively for that because opus 4.7 i think is far inferior at summarizing Josh: and kind of rewriting these topics that i use in my obsidian so that's kind Josh: of my stack i use opus for llms codex for everything else it just what are you Josh: currently optimizing for what.

Ejaaz: Are you planning so it's two things when i have Ejaaz: a uh my stack is actually way more diverse when Ejaaz: it comes to just like the research side of things only because i'm Ejaaz: using the ai that's like available readily wherever i am Ejaaz: right so if i'm on x a lot and i see breaking news i'm Ejaaz: just tapping grok because honestly it's a recent model i think it's like what Ejaaz: was it 4.3 at this point uh is actually pretty good and they have multiple agents

Ejaaz: that are kind of like running at this right but for the core bulk of the work Ejaaz: i've started shifting towards gpt 5.5 for the research because 5.5 research Ejaaz: things for so much longer. And it has a much more in-depth discussion. Ejaaz: In fact, I tested it out today because I was curious about the AI power stack Ejaaz: and what stocks I should be investing in to get exposure to the power grid lines Ejaaz: that are currently constraining AI data centers, right?

Ejaaz: And I was like, all right, I gave a detailed prompt to both Claude Opus 4.7 Ejaaz: and 5.5 and 5.5 completely cooked 4.7. Ejaaz: And it gave good reasoning why, whereas 4.7 did not. Ejaaz: I had to ask it more question. So all in all, I think 5.5 is my preference right Ejaaz: now. I still use 4.7 because of the personality. Ejaaz: It's like less of an AI type of voice versus GPT 5.5.

Ejaaz: But again, I feel like OpenAI is on a generational run right now, Ejaaz: and they might just kind of fix this in the next couple of hours at this point. Josh: Yeah, it's coming. It's coming quick. And I think now is a good time to kind Josh: of get familiar with Codex to understand the way it works. Josh: And as they implement these features, you'll be able to adopt them within the hour, within the day.

Josh: It's pretty amazing. And it's been fun to just experiment. It's been fun to try something new. Josh: And it's, again, competition is just better for everyone. So the end winner Josh: of this is the user, because for as low as $20 a month, you get access to all Josh: this frontier intelligence, all these capabilities. Josh: And it's just, it's really been unbelievable to watch. So that is the comparison, Codex versus Opus.

Josh: If you have not tried both of them, I encourage you to give it a try. Josh: Test the prompts against one another. If you have any type of work that you Josh: need, if you're working on a computer at all, chances are you can use AI to Josh: help you do your job even better. Josh: Or you could just use it to help you do hobbies and side projects that you've Josh: always wanted to do. So give it a try. Josh: Let us know your preference, codex, cloud code. Which one is it going to be?

Josh: I think that's probably it for the episode. Thank you guys so much for watching. Josh: If you enjoyed it, please don't forget to share with your friends. Josh: Let them know which model they picked. And also don't forget to rate it five Josh: stars on your favorite podcast listening platform. Any final thoughts, EJS, before we go? Ejaaz: No, that's it. Thank you guys so much for listening and we'll see you on the next one.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android