¶ Claude vs Codex
Josh: A few months ago, we told you to use Claude. Now we're telling you to switch Josh: back because for those of you who aren't familiar, well, over Christmas break, Josh: there was a major vibe shift where AI coding went from this like fun tool to Josh: things that developers actually use when they're shipping code. Josh: And even if you're not a developer, the amount of use cases and applications Josh: that were created around that time were really strong.
Josh: And since then, Anthropic has gone on this generational run of shipping these Josh: incredible products seemingly every single day that has turned Claude code into Josh: this supercharged super app that is the place that EJS, Josh: I know you've gone to, I've gone there too, in order to get all of our AI progress done. Josh: Any work that we have, we've gone to Cloud Code.
Josh: Now, OpenAI has woken up. And over the last few weeks, Codex has shipped more Josh: features than most companies ship in a year. Josh: And I bet, I guarantee that you haven't heard of some of these features that Josh: we're going to talk about in this episode. Josh: The pendulum has fully swung back, or at least I believe, because I'm totally Codex-pilled.
Josh: And in this episode, we're going to kind of walk through the differences between Josh: these two and why the model that you're using today probably won't be the model you're using tomorrow. Josh: And I don't think we're going to convince you, but maybe we could show you why Josh: you might want to consider using something else here. Ejaaz: I just want to talk through some of the crazy stats here because the script
Ejaaz: has genuinely flipped. A few months ago, Claude Code was anything everyone could talk about. Ejaaz: And every software engineer was using Claude Code. Every enterprise was installing it. It was crazy. Ejaaz: But just over the last couple of weeks, specifically by the end of April, Ejaaz: Chat GPT 5.5 was released, and that was plugged into the coding AI model. Ejaaz: It's all one and the same.
Ejaaz: And OpenAI went on this code red run where they focus on nothing but building Ejaaz: the best coding AI model and the best LLM. Ejaaz: And the numbers show that it's worked. Ejaaz: Over the last week, Codex has been downloaded over or installed over 46 million times.
Ejaaz: Cold code, under 500,000 times. Now, that is crazy to say, because if you look Ejaaz: at the historical data, cold code downloads and installs has absolutely dwarfed Ejaaz: Codex, but something changed over the last couple of weeks. Ejaaz: That something was OpenAI putting out just a better model. You mentioned that Ejaaz: you were Codex pill, Josh. Ejaaz: I think so am I. I've spent the last couple of days playing around with Codex.
Ejaaz: This morning, we prepped a bunch of really cool demos, and it is just completely flipped script. Ejaaz: But it's one thing saying it. It's another thing actually showing the direct Ejaaz: comparison. So we created this visual artifact to kind of Ejaaz: give you the scoreboard. And you can see it at the top here. Ejaaz: It's OpenAI Codex at 11 and Anthropic Claude at two. But let me explain why. Ejaaz: Okay. So number one, computer use. Ejaaz: Codex and Claude code can use your computer.
Ejaaz: It can take over your desktop and it can like move your cursor around. Ejaaz: Now, Claude pioneered it. There were the first ones there, but it was super slow. Ejaaz: It kind of runs into a bunch of obstacles and you have to kind of like handhold Ejaaz: it and improve it to do a bunch of different things. Ejaaz: Codex is not only quicker than me, it's quicker than the average person. Ejaaz: In fact, I can actually see the cursor move around so quickly.
Ejaaz: And it's like using a computer, but it's a superhuman and it can run pretty much 24-7 at this point. Ejaaz: Long horizon autonomy. Codex can work for longer in a much more intelligent Ejaaz: manner versus Claude code, which is, again, crazy to say because literally a Ejaaz: month ago, it was the inverse of this. Ejaaz: Claude right now can run for a decent number of times or amount of time, Ejaaz: but not as long as Codex can.
Ejaaz: And then the last two that I want to talk about here is browser use. Ejaaz: So Codex can take over your browser. It can do a lot more intentional things. Ejaaz: It understands what it's looking at, very importantly. Previously, it could not do that.
¶ Image Generation Capabilities
Ejaaz: Claude can do the same, but not as intelligently. And then finally, Ejaaz: ChatGPT Images 2.0 got released, what was it, like two weeks ago now? Oh, it's so good. Ejaaz: Yeah, it's the image generation model from OpenAI, and it is absolutely astounding. Ejaaz: In fact, it beat all the other predecessors, including Google's, Ejaaz: what is it, Nano Banana 2.0 Pro, which previously held the lead. Ejaaz: It beat it across every single benchmark.
Ejaaz: Anthropic, on the other hand, doesn't even have an image gen model. Ejaaz: So, so far, it's crushing. Yeah. Josh: Yeah, I think a lot of the best is now bundled into codex. The image gen for Josh: anyone who uses any sort of visual work is unbelievable. And being able to use Josh: that directly in your software is awesome. Josh: One thing that you mentioned is the long horizon autonomy. I think that needs Josh: a double clicking on because it's really impressive how well it works.
Josh: Traditionally, there's been this thing called a Ralph loop that we use. Josh: It's actually named after the character from The Simpsons who is very persistent. Josh: And it's basically a planning mode where you give the AI a goal and it will Josh: continue to iterate towards that goal until it accomplishes it. Josh: So like, let's say you want to build a Lego car or something and you give it the exact parameters.
Josh: It will go and go and go until it solves that problem and gives you exactly Josh: what you want in a way that other AI models haven't. Josh: Codex did that. And this is the only native implementation that you can get Josh: of this long horizon thinking where it actually will go for days on end. Josh: I've seen screenshots of some thinking for as long as 36 hours to accomplish the goal. Josh: So if you have really difficult tasks, Codex is going to be really good at solving those.
¶ Long Horizon Autonomy
Josh: Now, continuing to scroll down, there was another feature that was just released Josh: this week called auto review. Josh: And a huge pain in the ass for people who are creating code for working on complex Josh: projects, whatever it may be, is you're constantly having to sit there and approve Josh: things because the permission system is a little finicky, right? Josh: You don't want to give it full access to your computer.
Josh: You also don't want to sit there and approving every time it wants to use Chrome Josh: or every time I want to access your file. Josh: So Codex created auto review and they rolled it out last week where the agent is kind of smart. Josh: It knows which things are going to possibly be systemic existential threats Josh: and which approvals aren't. Josh: And it will just automatically approve all the things that aren't going to get you in a lot of trouble.
Josh: It creates a much easier user interface where you can just kind of walk away Josh: from the computer for a little while and come back and things get done.
Josh: Memory and context is pretty strong i'd say Josh: the one thing and we haven't mentioned many claude winners the Josh: place where claude wins currently is on their open claw capability funny Josh: enough because open ai bought open claw but dispatch is the mobile app feature Josh: for claude in which you can actually engage with claude code remotely that doesn't Josh: currently exist on codex and while the team has promised to ship that you don't
Josh: actually have that currently today claude has that also in terms of the personality Josh: and ui claude is just so much better i think we're going to get into our.
Josh: Personal takes but whenever you're using an llm versus an actual Josh: tool set or a harness claude is pretty great and the Josh: ui is very warm so there's there's some kind of Josh: instances in which claude is better but for the most part codex is Josh: really just kind of crushing it and i've really enjoyed using it one of the Josh: fun things is pets i mean just recently they released pets and claude also released
Josh: pets but these pets are a little bit different this is an example of angry dario Josh: we're seeing on the screen and it's fun because you have this persistent character Josh: that exists throughout your computer use. Josh: And as you're engaging with Codex, it'll just kind of chat with you in the background Josh: so you can see your progress, see where you're at. Josh: It's fun, it's playful, and it just shows that they kind of care about the user experience.
Josh: Now, one feature I would guarantee most people don't know is Chronicle, EJS. Josh: And you were just telling me about Chronicle and how cool it is, Josh: how it kind of monitors your screen as you go. This seems like novel technology Josh: that we haven't seen yet. Ejaaz: Yeah, so one of the earliest episodes that we did here on Limitless was an interview Ejaaz: with the folks at OpenAI that created... Ejaaz: Something called, what was it called, Josh? Do you remember?
Ejaaz: It was like agent mode or personal mode, something like that. Josh: Yes. It thought overnight for you, right? Yes. Ejaaz: It basically took all the conversations that you'd had with ChatGPT the night Ejaaz: before or the day before or the week before, and it created important context Ejaaz: around you in the form of something called memories. Ejaaz: This is where AI memory was birthed from OpenAI themselves, from the OpenAI
Ejaaz: team. And what it would do is it would feed you a report in the morning that Ejaaz: would update you on information that it thought you would be interested to read about. Ejaaz: So say, for example, you were interested in the stock market, Ejaaz: it'll give you an update on a bunch of advancements that had happened overnight Ejaaz: or over the last week or whatever it might be. Ejaaz: Right now, fast forward today, memory is embedded across every single AI model
Ejaaz: and tool. The reason why is context is so important. Ejaaz: It's one thing a user asking for something explicitly and directly. Ejaaz: It's a complete other thing for an AI to actually understand what you mean, Ejaaz: the nuance in the sentence that you've created, and even better, Ejaaz: to predict what you want. Ejaaz: But there was still an obstacle, which was you needed to feed it the context Ejaaz: and say, hey, Claude, hey, ChatGPT, can you remember this?
Ejaaz: OpenAI recently released a feature called Chronicle, where it observes what you scroll through, Ejaaz: What you click on, what you type, and it builds its own context and memories Ejaaz: around you without you needing to feed it, which actually led to a really cool Ejaaz: prompt that you pointed out, Josh, or that you found, which was, Ejaaz: what have I been doing very inefficiently on my computer, according to Chronicle,
Ejaaz: which is this new memory feature, make some recommendations, Ejaaz: be direct, tell me what I need to hear. That's, that's pretty awesome. Josh: Yeah. So this is alpha because I don't think a lot of people recognize that Josh: this is a possibility because Codex and OpenAI didn't do a good job of explaining this. Josh: When they released Chronicle, they said it's a way of the system to review your Josh: code as you've gone because it's been taking sequential screenshots.
¶ Chronicles
Josh: But it's the reality is, is that it's much bigger than this. Josh: And I suspect they didn't market it this way because it could be a bit of a privacy issue, Josh: but it's essentially constantly monitoring your screen and taking screenshots Josh: of what's happening on your screen and interpreting it so it understands your Josh: habits, the way that you work, the thing that you do. Josh: And then you can ask it, what have I been doing very inefficiently on my computer?
Josh: According to Chronicle, make some recommendations, be direct, Josh: tell me what I need to hear. Josh: And it'll actually evaluate how you've been using your computer, Josh: how long you've been scrolling on Twitter, perhaps, how long you haven't been Josh: doing the things you're supposed to be working on, or just generally how to Josh: improve your workflow and give you real feedback based on your actual actions that it's seen.
Josh: And I think this is a super powerful thing currently only available to pro members. Josh: So if you pay for the $100, $200 a month subscription, you get access to this. Josh: But I suspect this is the early signs of a very important feature they're going Josh: to roll out, which is that entire computer monitoring system to improve your Josh: system and also probably train the models to get better at engaging with your system.
Josh: But I found Chronicle to be one of those kind of secret features that not a Josh: lot of people know about, but has a lot of upside if you use it to your advantage Josh: and let it monitor what you're doing and improve your workflow on a day-to-day basis. Ejaaz: So the point is, from both of these companies, Anthropic and OpenAI, Ejaaz: we are getting feature releases every single week. In fact, every single day. Ejaaz: And it's becoming, I'm being bombarded by this.
Ejaaz: And it's hard to keep track with all of this. So what is the number one litmus Ejaaz: test for both of these models and products and companies?
¶ Demo
Ejaaz: It's to actually use the thing. It's to build the thing. Ejaaz: And we have two special demos that we have prepared for you that we're about Ejaaz: to jump into. Now, Josh, can you guess what my first demo is about? Josh: The theme. First one's a game. We're gamers, man. I want to play a game. Josh: I want to see how well it does on a game. Josh: I know we did this demo in the past months ago. It left a lot to be desired.
Josh: So I'm curious to see the current up-to-date status as it relates to Cloud Code Josh: versus Codex. Who's winning on the one-shot game prompt? Ejaaz: Indeed. Okay. So I am a nostalgic kind of guy. And so I was like, Ejaaz: oh, back in the day, I loved Mario. Ejaaz: So I want you, both of these models, to create the best Mario type or inspired Ejaaz: game, a side scroller, but make it futuristic. Ejaaz: Maybe add a little bit of neon, sprinkle a bit of neon in there,
Ejaaz: create levels. I want game design. I want there to be enemies. Ejaaz: I want there to be pitfalls. Ejaaz: And I also want there to be a scoreboard and also tell me how to do this thing. Ejaaz: I want, give me the whole package. Ejaaz: Basically, I fed this prompt or idea into ChatGPT and Claude. Ejaaz: And I said, can you create a detailed prompt that I can then feed into your coding models? Ejaaz: I then set each of the coding models to their highest settings.
Ejaaz: So what you're about to see is the best of the best for the most Ejaaz: detailed prompt that they came up with and let's see what they Ejaaz: did so step number one or example number Ejaaz: one is called opus 4.7 so Ejaaz: this is called code at the highest setting with their latest model Ejaaz: um okay it took the prompt pretty literally it's titled this neon plumber moon Ejaaz: base run which is obviously mario inspired and it said hey this is a demo edition
Ejaaz: by the way this is not production ready what i like about this is it's giving Ejaaz: me the instructions but how does the game actually play out let's see it looks Ejaaz: good can you see me here josh i Josh: Can yes i can and it looks like.
Ejaaz: The animations are pretty good i'm jumping around i think Ejaaz: i'm like a little robot i can see my feet pitter-pattering now Ejaaz: i'm guessing this thing is about to kill me so let's see if i can jump oh i Ejaaz: can jump there we go that's awesome um one bit can i kill this guy oh yes i Ejaaz: can now one bit of feedback i've noticed is uh i can't double jump and it told Ejaaz: me in the menu that i could double
Ejaaz: jump so that's weird so the physics hasn't really paid off can i die Josh: Oh, it certainly looks like you could die. Ejaaz: I can die. Great. Okay. So that is Claude's attempt at it. What's your feedback Ejaaz: on this, Josh? I think the graphics are pretty good. Josh: The graphics are great. For one shot, I mean, granted, this is only one single Josh: prompt. So for one prompt, it created great graphics.
Josh: It had sound design that actually sounds pretty accurate to what you would expect in the game. Josh: It has similar principles. It's following gaming principles. Josh: You kind of understand what looks dangerous, what doesn't. Josh: You knew that those spikes were going to hurt you and they hurt you.
Josh: The logic seems to be a little bit flawed i think it's having problems with gravity or at least that Josh: double jump functionality because it looks like those coins that you probably Josh: want to collect you can't actually reach because you can't do the double jump Josh: so in terms of logic not so hot in terms of visuals aesthetics in terms of i Josh: mean how good this game is from one shot very impressive yeah.
Ejaaz: I think it's important to understand that i started from zero it literally asked Ejaaz: me to give it a folder to build in and the folder was completely empty. Ejaaz: So all the visual renderings, all the graphics, the animation style, Ejaaz: the scoring system, the way that the avatar moves and looks was created from Ejaaz: scratch from a bunch of characters from this AI model.
Ejaaz: So this is Claude Code's current best attempt and it is way better than what Ejaaz: we tested out and honestly demoed on this show about a month ago.
Ejaaz: But now let's see what OpenAI's ChatGPT 5.5 codex at the highest possible setting cooked up Josh: Okay and this is using the same prompt correct so you just fed the model the Josh: same prompt identical but identical right oh god i'm excited i hope codex did Josh: well because now now that i'm a fan i'm gassing it up it better perform here okay.
Ejaaz: So this is gpt 5.5's attempt now you Ejaaz: might notice that this isn't the entire browser that's because Ejaaz: codex has a very unique feature which is not only Ejaaz: can it do all the coding in a single app for you but it Ejaaz: has an in-app browser so it can Ejaaz: live test the thing in the app without you needing to go to google chrome or Ejaaz: whatever but anyway we have the starting screen here it has also called it neo
Ejaaz: neon plumber moon base run it looks a little more rudimentary from the start Ejaaz: but i do like the background animation josh we didn't get this in the previous Ejaaz: one or at least not this side scrolling thing well let's Ejaaz: Oh. Josh: Oh, this is nice. Ejaaz: This is nice. I think this has good logic. Ejaaz: Wait, but this is no music. There's no music. I can't double jump. Ejaaz: Might be a skill issue. Might be a prompt issue.
Josh: Let's have a look. Did it say you can double jump? Ejaaz: That's a good question, actually. Josh: This is a fully playable game. Ejaaz: Yes. And I like that it's like zoomed in. There's like... Oh, Ejaaz: we got the boost. I can jump on the platforms. Let's see if I can kill this guy. Josh: Yes. nice okay. Ejaaz: And can i jump the gap there's a scoring system Josh: You could see your hearts oh dude this is way better power.
Ejaaz: Up wait oh my god i want the power up i'm still gonna go back double jump Josh: You can you could go back go back to the last platform oh god.
Ejaaz: I died i'm going i'm going to the last platform here Josh: We go it looks like they're sequentially gaining height which is interesting, Josh: oh but okay so if i'm comparing these two i'm actually i'm not feeling very Josh: let down this is good aside from the music not existing which we may not have Josh: explicitly asked um this it looks like the logic plays better the actual gameplay Josh: is usable this is a full i don't know if it's glitching or if this is you glitching no no.
Ejaaz: That is it's glitching it's glitching a bit Josh: Okay so it's still there are some edge case errors yeah but Josh: this is different in the sense that you have your hearts clearly projected you Josh: have a score system that's clearly in place you're able to get these power-ups Josh: they work they function i mean this is a very clean and functional game so i Josh: would give this to codex i think the experience perhaps the design of claude Josh: was better and And perhaps the music,
Josh: I mean, music was definitely better versus none, but Claude, Josh: in terms of just, or Codex, in terms of just coding logic and making a better Josh: game, I give, I give this Codex. Do you have a take? Ejaaz: Yeah. So on the build side of things, I had a much more pleasant experience.
Ejaaz: Using codex as well so i think codex wins Ejaaz: on this um i one-shotted it in the true sense Ejaaz: where i just gave it a single prompt and codex didn't ask for Ejaaz: any permissions it just kind of went on and did the thing i saw Ejaaz: it it's thinking and at points where it was unsure it thought amongst itself Ejaaz: and then made the decision to progress forwards whereas with cloud code it would Ejaaz: come to me now that might just be a developer engineer's preference right like
Ejaaz: if you're building a production ready app for like, I don't know, Ejaaz: a big company that you work for, you probably want to have more hands-on involvement.
¶ Dashboard Creation Challenge
Ejaaz: Whereas if you're just building a game like we did today, where I don't really Ejaaz: care what it ends up looking like or what it does, then the hands-off preference Ejaaz: is probably something that you would use Codex for. But I think Codex wins this. Josh: So for our second demo, we have this handwritten piece of paper that I actually Josh: wrote and took a picture of. Josh: I didn't. It's GPT Image Gen 2.0, but it looks like it's handwritten. Josh: The handwriting was too nice.
Ejaaz: Josh. That was the giveaway. Josh: Yeah, my handwriting is far sloppier than this. But the idea is that you can Josh: even write things on the back of a napkin and you could turn that into an application. Josh: So what we did here is we just asked for it to create a generic limitless dashboard Josh: application on the back of a piece of paper, fed it into the model, and this is what we got. Josh: So it looks like it did a pretty good job.
Josh: I could tell this is Claude before you even tell me which model it is because Josh: it has the standard design principles.
Josh: Claude design is so basic and Josh: it's so predictable where like okay i've seen this Josh: dashboard before it looks like it was a mission success there's a Josh: lot of text on this page a lot of stuff going on a lot Josh: of graphics i give a lot of credit for kind of inferring what Josh: we would want to be seeing from something like this where we have a Josh: proper trip budget i don't think we asked for a trip budget um but okay i think
Josh: it looks like it made it did a lot of inferring right like it kind of made a Josh: lot of assumptions but in the end of the day it did take what we had on the Josh: napkin and it turned it into a pretty generic dashboard of sorts based on very Josh: limited information that we gave it.
Ejaaz: I think the issue with this is we asked for something Ejaaz: completely different it created a dashboard um but Ejaaz: we asked it for it to be based around the limitless podcast and Ejaaz: it created a travel planning board so i don't know Ejaaz: whether that was a a prompt issue or whether we just fed Ejaaz: it the wrong image but but here we go here is where we're Ejaaz: at um now let's take a look at what openai did okay so here we have the same
Ejaaz: prompt fed into gpt 5.5 and it's funny i can instantly tell this is GPT-515 Ejaaz: because it's cleaner and it's not neon and it's not trying to go for some futuristic spin. Ejaaz: It looks very simplistic. This is actually a website or app that I would probably Ejaaz: be more inclined to engage with. Ejaaz: It's also more visually perceptive to me, right? Ejaaz: Like, what do I have at the front here? It's this five-day trip that I want to go on.
Ejaaz: It's giving me the basic information that I need to know at the start. Ejaaz: It has a bunch of different tabs as well. Ejaaz: But again, it isn't what I specified on the napkin. So I think this might be Ejaaz: a skill to show on our side, Josh. But otherwise, like, look at these graphics. Ejaaz: They're like really good. One thing I've noticed is stylistically, Ejaaz: although both models create very different looking things, the animation style looks the same.
Ejaaz: Have you noticed that even with the game previously that we just demoed, Ejaaz: the avatar looked the same.
Ejaaz: It was given the same sort of title and the objects interacted in the same way Ejaaz: we're seeing this here so maybe it's just a change in quality i actually prefer gpt 5.5 on this one Josh: Yeah this is crazy i'm just going to suspect Josh: there was a prompt issue there where yes like we clearly we asked for something Josh: that we didn't actually want but here it is i think if you're just comparing Josh: them apples to apples uh chat gpt and codex is like no-brainer 10 times better
Josh: i far prefer this if you look at the original napkin photo this is much more Josh: accurate to what the design looked like on that original piece of paper. Josh: And then if you also just compare the general design, this is far easier to understand. Josh: It's just a lot less dense. It's designed better. I wouldn't even say this is really...
Josh: A fair comparison it seems like codex just like completely crushed this and Josh: it has all the functionality built in it looks good i am giving another win Josh: to codex here that's two for two. Ejaaz: Wow look i've got like a re-optimization uh toggle at the top and it actually Ejaaz: updated i wonder where it's pulling that data from it's already hooked Josh: Into data look at that yeah impressive stuff.
¶ The AI Model Harness Explained
Ejaaz: Very very cool now one major reason why both of these models have advanced so Ejaaz: rapidly over the last couple of months is something known as the ai model harness Ejaaz: now you have the AI model, which is something that you and I have interacted with quite a lot. Ejaaz: It's via ChatGPT or Claude itself.
Ejaaz: But there's an added layer that you can put on top of this model, Ejaaz: which comes in the form of prescripted prompts that are engineered to make the Ejaaz: model act in a particular way. Ejaaz: But it's also the environment that the model works in. Ejaaz: It's also the policies that you set to make sure that the model acts and behaves Ejaaz: and sounds in a particular way. Ejaaz: That's why we talked about Claude's
Ejaaz: personality earlier being better than ChatGPT. It all plays into the Ejaaz: We figured out was it's an entirely new product category on its own. Ejaaz: In fact, Cursor had some news over the last couple of days where they made their Ejaaz: harness, Cursor SDK, available via API. Ejaaz: And the reason why this is such a big deal is critics criticized Cursor for Ejaaz: being an AI wrapper, which meant that Cursor doesn't have a model of its own.
Ejaaz: It would just create this harness, a set of prompts environments around, say, Claude or ChatGPT. Ejaaz: And so people would say, cursor isn't actually special. Turns out the wrapper Ejaaz: or the harness actually made these models way more intelligent. Ejaaz: In fact, if you added cursor's harness on top of GPT 5.5 and Claude Opus 4.7 Ejaaz: right now, you end up with a smarter, more intelligent, more efficient model Ejaaz: than the actual base models themselves.
Ejaaz: Now, remember, AI Labs spent hundreds of millions of dollars to train these Ejaaz: models and to create the best thing and put their best foot forward. Ejaaz: And still you have a startup which is worth, what is it now, Ejaaz: $10 billion right now, potentially being acquired by XAI for $60 billion, Ejaaz: creating a better model on top. Ejaaz: So the harness and the AI model are arguably one and the same at this point.
Ejaaz: And it's just a valuable moat to point out that these models aren't just better Ejaaz: at coding because of the base model itself. It's because of this thing known as a harness. Josh: Yeah. And the harness is the difference maker when it comes to building this super app.
Josh: It's like every single company is trying to build the super Josh: app the all-in-one application that kind of serves Josh: as your operating system anytime you need to engage with ai Josh: this is the place that you could do it and it's all encompassing it's Josh: all in one now one of the best applications we've seen for this in the early Josh: days has been something like open claw where it's this extension of what an
Josh: operating system could look like starting with ai at the foundation and open Josh: claw did a really amazing job of that now in some news this week you can now Josh: use your chat gpt account to generate tokens with OpenClaw. Josh: So previously you had to use the API, whether you were using Anthropic or OpenAI Josh: or any of the other models, and it was pretty expensive. It costs a lot of money. Josh: Now, thanks to Sam Altman this week announcing, you can actually use your account
Josh: connected with it. And I think this is the beginning of a multi-step plan. Josh: To really integrate OpenClaw directly into Codex in a way that Anthropik can't. Josh: Because if you'll remember, OpenAI owns OpenClaw. Josh: They bought Peter and granted OpenClaw will stay open source forever, Josh: but they have the ability to actually integrate directly into their products. Josh: And I suspect that's what we're going to see.
Josh: In fact, we even got some confirmation from another post from one of the Codex Josh: developers who replied to a post that was saying, Codex only needs a native Josh: editor, an iOS app, a full browser, and OpenClaw.
Josh: And the developer, Tebow, said all of this and more Josh: is coming to which sam altman retweeted it so we are Josh: indeed getting open claw inside of codex we're getting a Josh: mobile ios apps that you can access it remotely and soon Josh: there's going to be no reason to really use a different app Josh: because it's going to be all-encompassing now are there still downfalls yes Josh: computer use 20 faster on codex but yesterday i was playing around with it i
Josh: told it to increase the volume of my music and it took 10 minutes to do it because Josh: it tried to increase the slider on spotify even though it was max without actually Josh: increasing my system audio so it's still a little dumb, but it is getting better. Josh: And I think this leads me to this post that I really love, the vanilla maxing Josh: post we have to talk about.
¶ The Future of AI Tools
Josh: Which starts by saying, you should 100% be vanilla maxing. Just use the tools Josh: as they're handed to you. That's it.
Josh: Because a lot of people, and I've found this personally, and in fact, Josh: I've been caught by this personally, is that you try to get caught up in using Josh: all these different repos and these skills and these plugins, Josh: when the reality is, is if you just wait, the AI labs are shipping fast enough, Josh: they'll just integrate it into your own native application. Josh: So I'm vanilla maxing you, Jess.
Ejaaz: I'm totally vanilla maxing as well, dude. Like, listen, OpenClaw, Ejaaz: when it was hyped up, was incredibly impressive and still is incredibly impressive. Ejaaz: It opened up an entirely new product market and segment. That's why OpenAI acquired them. Ejaaz: But something's majorly changed over the last couple of months, Ejaaz: which is OpenClaw has kind of fallen off. No one talks about it anymore.
Ejaaz: People who are complaining about the errors and bugs that we're facing have Ejaaz: kind of gone silent because they've just grown bored and they don't want to Ejaaz: put their energy and effort into it. Ejaaz: And the reason why is because although these tools are very frontier level, Ejaaz: they can't actually be scaled to a practical use. Ejaaz: You don't feel safe integrating OpenClaw into your desktop where you have personal
Ejaaz: files. I've seen horror stories where they access credit card data and expose Ejaaz: that or where they deleted old wedding photos and the wife was super angry, Ejaaz: a bunch of the stuff like that. Ejaaz: If you are able to get given or access to a tool that comes under a branded Ejaaz: reputation, such as ChatGPT, Codex, or Claude Cowork, where it kind of like Ejaaz: takes over your computer, but in a sandboxed environment.
Ejaaz: I know that NVIDIA also released NemoClaw, which is like the enterprise-grade Ejaaz: secure version of OpenClaw. Ejaaz: You're vanilla maxing. That is the way to do it. And there's no need to rush Ejaaz: ahead and lose all your data as a consequence. So that's basically it for the episode. Ejaaz: We wanted to give you a comprehensive guide and insight into Codex GPT 5.5 versus Claude Opus 4.7.
Ejaaz: There's a lot of numbers in there, but basically the best coding models from Ejaaz: both sides to see which is better.
¶ Claude Mythos
Ejaaz: And the truth is, there isn't a clear winner right now. I would say it's probably Ejaaz: Codex GPT 5.5, but the narrative switched so recently that maybe, Ejaaz: maybe Claude can still catch up. And the only reason why I say that, Ejaaz: Josh, is that's the only reason Ejaaz: there's a model that we haven't discussed or demonstrated yet because we can't. Ejaaz: It's called Claude Mythos. Ejaaz: It was kind of pseudo-released about a few weeks ago.
Ejaaz: And on all benchmarks, it is technically better than 5.5. Ejaaz: But the reason why we can't demo it is we can't get access to it. Ejaaz: And the reason cited by Anthropic was because it's too dangerous. Ejaaz: It's a cybersecurity risk. In fact, it wasn't just Anthropic saying it. Ejaaz: It was Peter Heskett of the US Department of War also saying this, Ejaaz: right? So there's concerns around that. Ejaaz: OpenAI has created a mythos level type model here, but has made it available
Ejaaz: to everyone. And so the argument could be made that it's just because Anthropic Ejaaz: doesn't have enough compute. Ejaaz: So there's a lot of rumors around this, but I'm excited to get my hands on the Ejaaz: best models from each of these and compare them directly. Josh: Yeah. And the compute's actually been degrading. So I think I want to wrap this Josh: up on like, what do you actually currently use? Josh: What is the limitless production stack? How are we using these AI models?
¶ Verdicts
Josh: And for me, at least, it's not even close. I'm codex-pilled. Josh: I'm fully switched over. I am codex superior domination. It's going to be the month of codex. Josh: Maybe Anthropic will have a comeback, but that's not happening until at least Josh: June, July, because this month is codex month. Josh: So I've been using codex for basically everything, all of the difficult tasks that I need.
Josh: What I have found is that GPT 5.5 as an LLM, as a language model, Josh: as a chatbot is a little bit inferior to Opus 4.7, which I believe to be the Josh: better model if you're just chatting with an AI.
Josh: I like its personality it's warmer it's more precise it normally Josh: gets the idea of what i want so if i am building a complex Josh: project opus 4.7 is the orchestrator and Josh: codex is the actual implementer the executor of this code of this plan i've Josh: also noticed that opus 4.7 is a bit inferior to 4.6 at a few things and i think Josh: this is another piece of alpha here i actually use opus 4.6 whenever i'm doing
Josh: anything relating to writing or word ingestion so one of the projects i've been doing recently.
Josh: Is andre carpathy he created this like wiki for Josh: your own person where it ingests files and it kind of writes Josh: these summaries for you and it creates a personal knowledge wiki i use opus Josh: 4.6 exclusively for that because opus 4.7 i think is far inferior at summarizing Josh: and kind of rewriting these topics that i use in my obsidian so that's kind Josh: of my stack i use opus for llms codex for everything else it just what are you Josh: currently optimizing for what.
Ejaaz: Are you planning so it's two things when i have Ejaaz: a uh my stack is actually way more diverse when Ejaaz: it comes to just like the research side of things only because i'm Ejaaz: using the ai that's like available readily wherever i am Ejaaz: right so if i'm on x a lot and i see breaking news i'm Ejaaz: just tapping grok because honestly it's a recent model i think it's like what Ejaaz: was it 4.3 at this point uh is actually pretty good and they have multiple agents
Ejaaz: that are kind of like running at this right but for the core bulk of the work Ejaaz: i've started shifting towards gpt 5.5 for the research because 5.5 research Ejaaz: things for so much longer. And it has a much more in-depth discussion. Ejaaz: In fact, I tested it out today because I was curious about the AI power stack Ejaaz: and what stocks I should be investing in to get exposure to the power grid lines Ejaaz: that are currently constraining AI data centers, right?
Ejaaz: And I was like, all right, I gave a detailed prompt to both Claude Opus 4.7 Ejaaz: and 5.5 and 5.5 completely cooked 4.7. Ejaaz: And it gave good reasoning why, whereas 4.7 did not. Ejaaz: I had to ask it more question. So all in all, I think 5.5 is my preference right Ejaaz: now. I still use 4.7 because of the personality. Ejaaz: It's like less of an AI type of voice versus GPT 5.5.
Ejaaz: But again, I feel like OpenAI is on a generational run right now, Ejaaz: and they might just kind of fix this in the next couple of hours at this point. Josh: Yeah, it's coming. It's coming quick. And I think now is a good time to kind Josh: of get familiar with Codex to understand the way it works. Josh: And as they implement these features, you'll be able to adopt them within the hour, within the day.
Josh: It's pretty amazing. And it's been fun to just experiment. It's been fun to try something new. Josh: And it's, again, competition is just better for everyone. So the end winner Josh: of this is the user, because for as low as $20 a month, you get access to all Josh: this frontier intelligence, all these capabilities. Josh: And it's just, it's really been unbelievable to watch. So that is the comparison, Codex versus Opus.
Josh: If you have not tried both of them, I encourage you to give it a try. Josh: Test the prompts against one another. If you have any type of work that you Josh: need, if you're working on a computer at all, chances are you can use AI to Josh: help you do your job even better. Josh: Or you could just use it to help you do hobbies and side projects that you've Josh: always wanted to do. So give it a try. Josh: Let us know your preference, codex, cloud code. Which one is it going to be?
Josh: I think that's probably it for the episode. Thank you guys so much for watching. Josh: If you enjoyed it, please don't forget to share with your friends. Josh: Let them know which model they picked. And also don't forget to rate it five Josh: stars on your favorite podcast listening platform. Any final thoughts, EJS, before we go? Ejaaz: No, that's it. Thank you guys so much for listening and we'll see you on the next one.
