¶ AI Autonomy Ladder
Ejaaz: 99% of people are using AI models the same way that they use Google. Ejaaz: But recently, a new way of prompting your AI has emerged that doesn't just replace Ejaaz: the way that you work, it promotes you to the CEO of your very own AI company. Ejaaz: It's called Loops and it's part of a growing development in agent autonomy where Ejaaz: AI agents basically spin up and autonomously complete tasks or goals that you Ejaaz: set for it, often working throughout the night.
Ejaaz: In 2019, the longest that an AI agent could work autonomously for was for two seconds. Ejaaz: Fast forward to today, and they can work autonomously for 12 hours, Ejaaz: and that's doubling every couple of months. Ejaaz: Andre Carpathy calls this phenomenon the autonomy slider, where you can go take Ejaaz: a dial that slides from humans that approve everything to humans that periodically check in. Ejaaz: And it's part of this growing trend of agents consuming and taking up more of
Ejaaz: human capital and labor. And the question that remains going forwards is, what will humans do? Ejaaz: And will they be entirely replaced Ejaaz: by AI? Or will they be the ultimate orchestrator of their destiny?
Josh: Yeah, I think the goal for this episode is really just to inform people on what's Josh: possible current day with these agents, with these LLMs, with writing these Josh: loops, as well as where you can possibly find yourself within that stack, Josh: because it gets pretty complicated. Josh: When we're getting into loops, not everyone needs to use loops, Josh: but everyone should be using LLMs probably slightly different than how you're using them today.
Josh: So maybe we could start with a little history lesson in terms of the four levels Josh: in which we have been engaging with llm starting with the first level which is just prompting, Josh: generally like most people are probably still doing this started in 2022 2023 Josh: around the release of chat gpt Josh: the way that you would engage with these llms is you would just submit a question Josh: or you submit a prompt and you get some language back now if you are still doing
Josh: this that's okay because i find a.
¶ From Prompts to Agents
Josh: Three years ago, four years ago. It has since advanced pretty, Josh: pretty meaningfully since then. Josh: The second step of this is agents. And we're going to spend some time on agents. Josh: Everyone's kind of heard of an agent. Maybe not everyone knows what an agent is. Josh: An agent is something that could think for a little bit longer. Josh: It could run a bit longer than just a standard prompt. Josh: It can go off and do things. It could call tools for you.
Josh: It's a much more capable version of the text box. Then, like we talked about Josh: all the time on the show recently in the last few weeks, there's the harness Josh: feature in which you put an LLM into a container and that gives it a memory feature. Josh: That gives it complete tool use. That's something like an open claw that we've Josh: talked about a lot that some people do use and that's level three.
Josh: And now level four, which is the new thing that has come this week, Josh: that's really been highlighted by some of the top leaders at these AI labs is loops. Josh: And a loop is essentially a version of an agent that has an orchestration layer Josh: and kind of builds upon itself.
Josh: So it allows you to kind of continue to scope yourself out. If you can imagine Josh: you're kind of you're dealing directly with an employee at level one and then Josh: you're kind of directing that person to go off and do their own in level two, Josh: At level three with the harness, you're kind of directing a series of people to help you. Josh: And then level four, you're just the top level CEO who's directing your C-suite
Josh: to go and manage all the employees below you. So there's an entire stack to this. Josh: It's very cool. It just how do you use your AI currently? Where would you say Josh: that you fit in this stack? Ejaaz: Yeah, so looking at this diagram that we have on the screen here, Ejaaz: I'm somewhere between number two and number three. I'm somewhere between using Ejaaz: agents and trying to figure out the whole harness thing. Ejaaz: Now, what am I doing when it comes to like spinning up agents?
Ejaaz: If you look at either my Claude or my ChatGPT desktop apps right now, Ejaaz: I've renamed a bunch of my conversations to a particular focus or subject and then agent after it. Ejaaz: And so I can go to it and this agent basically has all the context of what I Ejaaz: wanted to do, whether it's like research a particular topic, Ejaaz: create some kind of an outline for something, research a particular investment angle.
Ejaaz: It already knows and has the embedded context for what it needs to do. Ejaaz: And there's usually like one to maybe three tasks that it needs to autonomously execute on its own. Ejaaz: And so it runs in kind of like a sequence. But if any of that sequence kind Ejaaz: of breaks, let's say it kind of tries to retrieve data from some particular Ejaaz: website and it is unable to do so, it breaks.
Ejaaz: And it comes to me and it says, hey, Ejaz, is there some other thing that you Ejaaz: want to look at or retrieve from, blah, blah, blah? Ejaaz: It's not fully autonomous. Now, number three, the Harness side of things is Ejaaz: what I'm trying to kind of like mold my understanding around. Ejaaz: What I've noticed is when you type in a prompt and you get a response,
Ejaaz: you can kind of tell that it's AI-y. Like usually when we kind of create artifacts, Ejaaz: it comes in a particular font or it speaks in a particular type of language. Ejaaz: The Harness helps kind of like take your prompt and kind of mold it into something Ejaaz: that is more human-like, but also more nuanced with what you are trying to do. Ejaaz: Like it effectively gets
Ejaaz: closer towards that ultimate goal. Like we were talking before recording this Ejaaz: episode about human taste and how AI doesn't really get human taste. Ejaaz: The harness helps you get towards that ultimate kind of taste profile for the Ejaaz: particular output that you're trying to generate.
¶ Understanding AI Loops
Ejaaz: I haven't tried working with loops just yet, but my understanding of this, Ejaaz: and correct me if I'm wrong, is you have an AI. You can prompt it and you can get some kind of output. Ejaaz: A loop specifically is an AI agent that doesn't break, if it comes across an Ejaaz: obstacle that it doesn't understand, its instinct isn't to come to the human Ejaaz: and say, hey, like, I can't figure this out, guide me.
Ejaaz: It completely reiterates the prompt over and over again until it gets past that Ejaaz: obstacle, working towards like one objective. So a few examples I've seen for Ejaaz: this is if you are coding, right?
Ejaaz: And let's say there's multiple workflows of a code base that you want to work Ejaaz: on, and it comes across a hiccup where it can't retrieve data from one of those Ejaaz: particular flows, it is able to kind of like circumnavigate around it, Ejaaz: maybe spin up its own separate flow and try to figure out the problem. Ejaaz: And often this results in an agent working for multiple hours at a time,
Ejaaz: often overnight. I think Carpathy spoke about his auto research agent working Ejaaz: overnight whilst he slept. Ejaaz: And we're seeing different variations of this start to arise. Ejaaz: Where are you, Josh, in the stack? Josh: Yeah, loops are like the closed source a system where you kind of define an Josh: outcome and it will continue to work towards that outcome without any external inputs. Josh: It's very cool. It's very automated. I don't think it's for everyone.
Josh: It's certainly not for me because I haven't really had a use case for loops per se. Josh: I would say I'm sitting at each one of those first three phases given whatever Josh: tasks I'm trying to do. And I think it's important to understand that a lot Josh: of people might not even need to go past number one unless you're actually doing productive work. Josh: A lot of the agents, a lot of the harnesses are for kind of automating more.
Josh: More systems from your life if you're just trying to use this as google if you're Josh: just trying to use this as a writing assistant or someone to chat with the prompting Josh: is really strong and i find a lot of times Josh: this is my outlook or this is my outlet for like google search results so instead Josh: of searching for google i'll get a little more in-depth results i'll ask my llm Josh: for agents i use them quite a bit when i'm doing a little bit more productive
Josh: work for example we track the analytics on limitless and we want a place in Josh: which we can have all those analytics dumped to a dashboard, Josh: that is an agent that I run. Josh: So it goes into my browser. It detects all of the views that we've had from Josh: the week for YouTube, from Spotify, from RSS feed, where you should all be subscribed Josh: to and rate us five stars.
Josh: And it compiles it into a singular spreadsheet in which we could then publish Josh: online and we could share with prospective sponsors and things like that. Josh: And then for harnesses I've used, because I mean, that's mostly OpenClaw. Josh: I've used OpenClaw. I really enjoyed the process. I find myself using it a bit less and less.
Josh: And I think in the loops feature, at least it's probably most productive right Josh: now for people who are writing code, who are writing verifiable solutions. Josh: One of the difficult things that as I was looking into loops and figuring out Josh: how I can structure them into my life, one of the problems that I run into is Josh: I'm not really sure I have a verifiable, Josh: set of outputs that I wanted to optimize for, for a lot of the work that I'm
Josh: doing, because a lot of it is subjective. A lot of it is kind of creative work. Josh: It requires a human in the loop for a lot more of it. Josh: So I would say I am number one, two, and three on the list. Haven't quite made my way to four. Josh: But yeah, for the people who are, those are the people like Boris Churny from Josh: Anthropic. And we know Andre and Peter Steinberg from OpenAI.
Josh: They are all on four. They are using it to, Josh: create these like unbelievable, agentic systems and continue to remove themselves out of the loop. Ejaaz: You know what I've realized? With loops in particular and just AI agents in Ejaaz: general, they're trying to improve our understanding or rather their understanding Ejaaz: of the English language. Ejaaz: So one of my favorite Carpathy quotes back in the day was English is the new
Ejaaz: programming language. I think you said this like two, two and a half years ago. Ejaaz: And I've just realized that like us creating AI agents is basically like, Ejaaz: it's the same model. It hasn't necessarily got smarter. Ejaaz: It's just like using that model to kind of like keep ramming its head and its Ejaaz: brain against a particular problem until it understands what the human actually means.
Ejaaz: And so like in this new world, like I know you just used the example of like, Ejaaz: you know, loops can be used for coding specifically, Ejaaz: coding that Boris Churny and Carpathy is doing is English. Ejaaz: Like they're speaking to the LLM, they are writing in English to the LLM. Ejaaz: And yeah, maybe they're copy and pasting some versions of code, Ejaaz: but that code is primarily generated by an AI.
Ejaaz: I think like something crazy, like 80% plus of code generated at Anthropic, Ejaaz: both for research and for just general consumer adoption is generated by Claude itself. Ejaaz: And so that's one thing. The other thing is the model just not getting smarter Ejaaz: is a really interesting thing. Like typically in my head, I would think, Ejaaz: okay, you need a better model to be able to unlock some of these new features Ejaaz: like AI agents, autonomous loops, et cetera.
Ejaaz: But really you could just take the same model, wrap a harness around it and Ejaaz: try to get it to understand what particular goal it's getting at and just run Ejaaz: that iteration over and over and over again until you get a better output.
Ejaaz: And I guess this is the same concept as inference or reinforcement learning Ejaaz: where like we've found this trend of post-training of these AI models, Ejaaz: these AI models just getting smarter, not because they've got bigger GPUs or more expensive GPUs. Ejaaz: It's because you've just taken the same model and you've just run it through Ejaaz: a different reasoning framework over and over again until it can do a thing.
Ejaaz: And this is the practical embellishment of it. I personally haven't found like Ejaaz: an obvious use case for loops either. Ejaaz: So either you and I are boxing ourselves into a particular realm and maybe someone Ejaaz: listening to this is using this for like their software engineering thing or Ejaaz: their marketing thing. But yeah, I guess that's where I sit right now.
¶ Why Autonomy Is Rising
Josh: Well, I think it's probably a skill issue on both our parts. Josh: Like there is certainly a use case for us in which we can use a loop in which Josh: we can define this outcome, send an agent off to go do it, and it will iterate Josh: on itself until it comes to a conclusion. Josh: I think it's just so novel and so new. It's difficult to kind of understand Josh: why. And we have this really great chart on screen that you're showing now, Josh: which is the why now section of this.
Josh: And it's because the duration of a task that these agents can run is so much Josh: longer than it used to be. Josh: I mean, in 2019 we have here, it was two seconds. This was well before ChatGPT.
Josh: But even early last year, in 2025, the duration that an agent could run on one single task was Josh: less than an hour in length so there's only so many tokens it could generate Josh: there's only so much reasoning it can do and there's only so much iteration Josh: you could get over that hour time period let alone the amount of costs that Josh: these tokens are going to be, Josh: costing you if you're using like the api or anything like that now fast forward
Josh: to today i mean the best models in the world they're getting days worth of runtime Josh: so they can really think deeply and continue to iterate on themselves over and over i see examples of um.
Josh: Backslash goal on x all the time of people who have a problem whether it be Josh: an optimization problem where they have a bug that they need to fix and they'll Josh: put this backslash goal on it for Josh: however long it needs to and it'll think for three four even five days i've Josh: seen in order to optimize for the specific parameter and this is possible because Josh: these models now can think for days long,
Josh: you have to assume months is coming what does it look like Like when an agent can think for months. Josh: I mean, it's a really interesting paradigm shift that I'm not sure where people Josh: are going to find value in the open-ended way that it exists today, Josh: right? It's like, okay, here's this agent. Josh: You can tell to do whatever you want. You can create a loop. Josh: You can create an infrastructure system for it to operate in.
Josh: It's pretty much open-ended and it's on you. And I think the answer to that Josh: is that not even the AI companies really understand the best use cases for it quite yet. Josh: I would imagine it's still this really difficult thing of how do you unlock Josh: value from essentially an open-ended agent that can go and run for an infinite Josh: amount of time? I don't know. Ejaaz: I also question like what a human's purpose would be at that point.
Ejaaz: Like if you automate enough of the thinking and the curiosity behind like solving Ejaaz: particular problems, What do humans end up doing at that point, Ejaaz: especially if they don't do the work themselves? Ejaaz: They don't understand it, right? You need an AI to kind of like understand what Ejaaz: on earth is going on in the first place.
Ejaaz: And eventually like an AI will then start setting goals, like more ambitious Ejaaz: goals than a human can in terms of like what to like kind of solve or go after. Ejaaz: There were some very low-level examples that I saw in response to Pete Steyer's tweet about loops. Ejaaz: And there's some kind of concrete examples that I want to run through very quickly here.
Ejaaz: So one of them is using it for code, right? So a classic loop could basically Ejaaz: look like, okay, can you please pull live errors for my particular app? Ejaaz: Can you inspect and figure out where the bug might particularly be? Ejaaz: Can you create then a fix for this particular bug in my code? Ejaaz: And then can you deploy it? Then can you check the health of that deployment Ejaaz: and make sure that nothing else is broken?
Ejaaz: And then record what failed and feed that into a database so that in the future, Ejaaz: we can detect errors like this or prevent it when we code and build some of Ejaaz: These future app features. Ejaaz: Now, that is kind of like a very small and specific enough use case that can Ejaaz: be generalized across basically any app or software engineering project that Ejaaz: you might be working on if you're listening to this.
Ejaaz: And I wonder how many hours worth of engineering time that this replaces. Ejaaz: Because I know that there are entire teams having worked at companies. Ejaaz: Been a product manager in the past, entire teams of software engineers that Ejaaz: spend their entire days working on something like that. So that's one thing. Ejaaz: And then for content, which is very applicable for product managers, Ejaaz: or even like the work that you and I do, Josh, an agent could read a PRD.
Ejaaz: So which is a product requirement doc, which is usually kind of like created Ejaaz: for a strategic goal that you want to kind of like build at your company, Ejaaz: like a product or a feature, it then writes whatever that next asset could be. Ejaaz: So it could be like a design profile or a mockup of what that feature might Ejaaz: look like, score it against like some kind of criteria that the company has Ejaaz: across like, you know, it must follow our vision, A, B, and C.
Ejaaz: It must also look a particular way. This is our design profile, Ejaaz: our brand kind of profile and our aesthetic. Ejaaz: And then it kind of like updates its progress depending on like what other teams Ejaaz: have shipped. So maybe it's dependent on a particular feature. Ejaaz: And so it updates itself autonomously like that. Now, this all sounds very vague Ejaaz: intentionally because it's meant to apply to your particular business or your particular project.
Ejaaz: But make no mistake, this is what Ejaaz: a lot of humans are paid upwards of six figures to do on a daily basis. Ejaaz: It's that nuance. And we're starting to see basically AI models and AI agents Ejaaz: enter into that human taste profile. So when I think about where we end up eventually, Ejaaz: There's a common argument that's made that it's like, oh, humans will always have the taste.
Ejaaz: They'll always be able to kind of direct where the AI should go because we are Ejaaz: this all being kind of like smart kind of entity. Ejaaz: But I see increasingly AI stepping into that boundary and becoming the tastemaker Ejaaz: for all of the work that we end up doing.
¶ Human Taste Still Matters
Josh: I still believe that to be true, that humans in the loop are critically important Josh: to applying human taste. I saw this great chart. I have no idea where it is. Josh: Somewhere in the depths of X. But basically, it was showing that in the App Josh: Store, the iOS App Store, where everyone downloads their apps, Josh: the amount of apps that have gone into production that have been published recently has gone vertical.
Josh: I think it's doubled or tripled over the last six months. Everybody's publishing apps at the App Store. Josh: The amount of five-star reviews and the amount of downloads has actually either Josh: stayed flat or gone down. Josh: It has not matched the amount of new apps that are going to the app store. Josh: Why is this? It's because a lot of the apps don't have enough care applied to
Josh: them. They're just not great applications. And when I think about, Josh: how I use my phone on a regular device or on a regular day or how I use my laptop Josh: and the applications that I actually spend time on, there's a very fixed set Josh: of them. And I'm a little stubborn when it comes to downloading new ones because Josh: a lot of the new ones just are not great.
Josh: And I think a lot of that comes from this, this lack of care that is presented Josh: from AI outputs, where if you're optimizing for a specific parameter that you Josh: can measure, it's going to do it great, but it doesn't understand the subtle nuances of how humans Josh: engage and how they really love to use these products like one of the products Josh: that i use totally unrelated totally not sponsored but this app called copilot
Josh: money it's like a budgeting application and it's so thoughtfully curated and designed and.
Josh: And it really deeply understands all the complexities that are related to humans Josh: when it comes to budgeting it understands a lot of the Josh: the design characteristics same with an app called flighty i'm sure a lot of Josh: people have heard flighty it's like a flight tracking application there's a Josh: thousand ways to track a flight but flighty really cares about design they really care about how, Josh: it's implemented with the human and they've created this amazing output and
Josh: i don't see that changing one thing that i did want to note is that, Josh: i think when a lot of people see this they imagine a world in which they are Josh: getting replaced everyone's like ai is replacing me look how much i could do Josh: now it has these loops and i think the reality is it gives you a lot more agency Josh: to do the things you want to do, Josh: where maybe you're not doing the day-to-day where,
Josh: you would normally prompt an agent to do this but you're doing a lot of the Josh: higher level tasks you can imagine yourself not having to do Josh: the day-to-day like for example if you're just managing your household you no Josh: longer have to take out the trash you don't have to run errands you could just Josh: focus on how to make your household the best household it is because you have Josh: that higher level ability
Josh: and in that chart that we showed in the artifact earlier on it shows a decreasing Josh: sized human it's the amount of input that a human is needed to get the output you want, Josh: but it's still ultimately on the human being in order to to push and navigate Josh: towards the outputs that you want because ultimately these tools are just for us so when i think of.
Josh: Ai becoming increasingly good and when it comes to running the show even i've Josh: leaned on it we both have i think a lot more recently but all that's done is Josh: actually given us more leverage to do more with the show than have it replace Josh: us and even in the case that. Josh: We could clone ourselves. We could create a video version of ourselves that Josh: has a perfect voice that sounds just like us. I don't think people actually want that.
Josh: There's that lacking human nature that still isn't understood. Josh: And I find that it's more empowering when I hear that these loops exist that Josh: can run for days on end and create amazing outputs versus not where it's kind Josh: of extracted from us. I don't really think that's true.
Ejaaz: Yeah, it's like that stat of, well, it's that thesis that everyone held about Ejaaz: a year ago, which is like with the increase of AI adoption, people will have Ejaaz: more free time to have fun and leisure. Ejaaz: And in fact, the opposite has shown that like people just work way more and work harder. Ejaaz: And the output of that work is measured across like pretty much every single Ejaaz: company and profession and role.
Ejaaz: I do generally agree with that. I don't think humans are going to get wiped out anytime soon. Ejaaz: But one thing that is kind of nagging my brain is if we extrapolate this intelligence out enough, Ejaaz: there is no reason why AI won't be able to take over or replace other parts Ejaaz: of the cognitive process that a human can do, Ejaaz: particularly if it's one AI model trained on the entire corpus of knowledge Ejaaz: that a bunch of humans have been guiding it.
Ejaaz: So when I think about Anthropic, when I think about OpenAI, I think about all the Ejaaz: millions of people that use their product every single day and the data that Ejaaz: they ingest every single day that gets recorded on one singular database that Ejaaz: can then be reused to train a better model that is more hyper-optimized towards humans. Ejaaz: You could argue that as a single human, you don't get to meet and read the thoughts Ejaaz: of every other human that is out there.
Ejaaz: You have your very own individual process. And I think that an AM model that Ejaaz: can get access to the world's brain and thoughts could probably create something Ejaaz: kind of close to knowing what that human taste profile would be.
¶ The Cost of Intelligence
Ejaaz: The other major question that I'm wondering is, how much is all of this going to cost? Ejaaz: One, like, stat that has stuck in my head over the recent few weeks is that Ejaaz: Philanthropic particularly, they service, or like, Ejaaz: the Fortune 10, the top 10 companies in the world, nine of them use Clawed, Ejaaz: and their budget's increased by 500%, or is projected to increase by 500% by the end of this year.
Ejaaz: And they're doing this willingly because the ROI, the value that they're getting Ejaaz: out of that is pretty massive. Ejaaz: Alternatively, there are companies like Uber that have slashed their budgets Ejaaz: massively because their entire year's budget was spent in a couple of months. Ejaaz: So I'm wondering, in this world of agent loops where you've got AIs working Ejaaz: overnight for you, the bills are going to increase pretty massively.
Ejaaz: And I'm wondering, unless these AI models don't get cheaper, Ejaaz: and there's an infrastructure bottleneck there where these GPUs cost a lot of Ejaaz: money, we can't scale power and infrastructure anytime soon. Ejaaz: We need so much more energy than we already have currently on Earth to be able to power these things.
Ejaaz: The cost of these things are just going to go up a lot more massively, Ejaaz: which means that either this is only going to be a power or a tool reserved Ejaaz: for the rich, or something's going to break here and maybe open source models Ejaaz: get adopted more aggressively. Josh: Yeah, I imagine there's probably use cases for all of the above.
Josh: It's like open source models will continue to improve they'll be able to do Josh: a lot of the more trivial tasks that don't require frontier intelligence so Josh: therefore the cost of those types of loops will go down because not everyone Josh: needs to have the most cutting edge, Josh: software stack engineering like they're just kind of having it help them through Josh: their day-to-day maybe it's replying to emails maybe it's whatever miscellaneous things it may be
Josh: there's a high probability that these open source models as they continue to Josh: improve will be able to bite off a meaningful chunk of that then the other half Josh: is using these frontier models that is a requirement in order to get the absolute Josh: best results for whatever very challenging work they're doing. Josh: And that is going to cost a lot of money for sure. Josh: And I don't see that changing, but I think the output of the dollars in will continue to go up.
Josh: It's because as you get more knowledge per token, as you get more output per Josh: prompt, it very clearly, I mean, the economics seem to make sense. Josh: And I think that's kind of right now.
Josh: Enterprise spend on these models they're trying to figure out well how much Josh: value can we actually get back from every dollar spent and right now it's a Josh: little bit unsure you mentioned uber we have uber here that we're showing on screen Josh: where uber just recently put a cap on the amount of tokens that Josh: their employees are allowed to use at fifteen hundred dollars per engineer per tool per month and
Josh: we'll see how that works because a lot of other companies that we know they're Josh: kind giving their engineers unlimited budget in fact they're kind of ranking Josh: the engineers based on how many tokens they're using per month and. Josh: We'll see where that goes. I suspect the companies that are spending more on Josh: tokens will continue to see a higher upside for now, at least. Josh: But like you mentioned, the underlying problem with all of this is we're going
Josh: to continue to have more prompts. I mean, these loops consume a tremendous amount Josh: of tokens, whether they're frontier tokens or open source tokens. Josh: It doesn't matter. We're going to need orders of magnitude more than we have. Josh: And we don't have the computability. It really does always come down to that Josh: energy problem, that infrastructure problem.
Josh: We don't have the infra built out to support this so therefore the costs likely Josh: continue to stay high maybe it's not because you're paying the provider for tokens Josh: perhaps it's just renting the gpu time from a cluster that is doing much more Josh: valuable work so i think that might ultimately be Josh: that crux is the actual availability of the compute to do these things and that's Josh: why these edge compute devices like having your,
Josh: mac studio on your desktop that can run locally it's probably a pretty valuable thing to have. Ejaaz: So I'm sure a lot of you are wondering, you know, how does this apply to me? Ejaaz: You know, I have none of my friends have mentioned this loop feature. Ejaaz: I don't really know many people who are using it. Ejaaz: As we mentioned earlier, like this isn't probably going to be used by the bulk Ejaaz: or majority of people yet until some of those use cases actually arise.
Ejaaz: I think it's mainly going to happen in the workplace. It's going to happen with Ejaaz: like some of these enterprise companies that are trying to automate certain Ejaaz: departments or functions of their particular a company like marketing, Ejaaz: like software engineering. Ejaaz: And I think it'll start with lower level tasks because these agents still aren't Ejaaz: smart enough to understand nuance completely.
Ejaaz: And also, you don't just want to let an agent run loose overnight whilst you're Ejaaz: sleeping and then take down your entire company. And one place where it's working Ejaaz: tirelessly to accelerate the development of that Ejaaz: And we have Boris Cherny over here basically explaining how he's basically ditched Ejaaz: his integrated development environment.
¶ Recursive Self-Improvement
Ejaaz: He has ditched all of his normal tools that he had spent decades basically honing Ejaaz: his software engineering skill on to now completely focus on building up these Ejaaz: agent loops. And what is he focused on? Ejaaz: Well, he works primarily on cloud code, but the other folks at Anthropic and Ejaaz: OpenAI have started this thing called Ejaaz: Recursive self-improvement or RSI, which is basically the goal of getting your Ejaaz: AI model to build the next version of itself.
Ejaaz: And this is a test that Anthropic and the folks at OpenAI do for any new model that they release. Ejaaz: They set it a goal or task to basically rebuild itself in a more improved fashion. Ejaaz: Now, one thing that the AI has gotten really good at is building out that next function.
Ejaaz: But one thing it's not very good at is figuring out what research problems they Ejaaz: should fix, what research problems it should focus on to try and, Ejaaz: you know, overcome and make it ultimately, you know, a better model than its competitors. Ejaaz: Now, RSI is something, it's kind of like the golden egg that each AI lab is going after. Ejaaz: And this is the primary use of agent loops right now.
Ejaaz: And you can see why it might be obvious. If you have an AI model that can basically Ejaaz: build the next best version of itself, eventually you're going to get to AGI, Ejaaz: whatever the hell that looks like, and then you can apply it to pretty much any sector. Ejaaz: Now, the problem and the worry that kind of immediately pops into my head and Ejaaz: a lot of these researchers head is, if it eventually does get that smart, right?
Ejaaz: Could escape human control completely and run off on its own and do its own Ejaaz: thing. Because at that point, why would it need a human to kind of like guide it or shepherd it? Ejaaz: Instead, it can just kind of like do its own thing. So this is like the primary Ejaaz: use case that I'm seeing for agent loops being worked on right now.
Ejaaz: I would love to see a like more broader application across like kind of like Ejaaz: consumer professions, like in finance, like in science and stuff like that, Ejaaz: which I do believe it'll spill over eventually. But unless you're seeing anything Ejaaz: else, Josh, I think like that is primarily it on agent loops and agent autonomy. Josh: It's on you to figure out the best use cases for it. Like there's no real company
Josh: defining it. They're just giving you the tools. And I mean, for better or worse, Josh: it's very open-ended. So it's on you to figure out how best to use these.
¶ Four Rungs Explained
Josh: I think if this sounds a little overwhelming, maybe we could outline a few examples Josh: of each one of these kind of four rungs in the ladder here. Josh: The first one being prompting this everyone has done before. Josh: I'm sure it's like rewrite this Josh: email to sound more confident or explain what my doctor meant by this.
Josh: But then you've probably also used the partial agentic usage as well of these Josh: models, which is like planet my three day vacation to Lisbon that I'm going on next week. Josh: And it will actually go off and use tools and it will think complex thoughts Josh: and ideas and kind of surface you a full itinerary for your trip. Josh: And then there's the third one, which is the harness. This is a little more Josh: complicated. This is for people who are building more project based stuff.
Josh: So for example, if you want to build you a website for your dog walking business Josh: and you kind of describe it and you go back and forth on a spec and then it Josh: goes off and implements that. Josh: And the fourth is loops, which doesn't have to necessarily be overwhelming. Josh: It can be simple as let's say you are. Josh: Interested in the news, you could say every morning before I wake up, Josh: scan these 10 sources plus market data and give me this bulleted brief.
Josh: Or let's say you have a to-do list. It'll go off and think overnight and solve Josh: all those problems overnight, iteratively until it comes to a solution that Josh: it hopefully arrives at in the mornings. There's a lot of use cases. Josh: I think a lot of it requires creativity. Josh: And that is the prompt we will leave you with today, which is share with us, Josh: please, how you are using these models best. Josh: Because so much of the question isn't are these models smart?
Josh: It's how can I extract that intelligence from them in the most effective way Josh: for my life? So I would be so curious to hear which rung of the ladder you find Josh: yourself on one through four.
Josh: And then what the most interesting use cases you found, Josh: among those rungs of the ladder are you using loops currently what are you using them for, Josh: are you with agents are you still using it as a google extension if you're still Josh: using it as a google extension i would encourage a little more creativity really Josh: try to ask harder questions and figure out how it could be implemented in your Josh: life but i think that's pretty much it on the loop um,
Josh: you're not going anywhere but your job might shift a little bit in terms of Josh: scope as these tools get more powerful and that should be the hope that should Josh: be the goal because it'll allow you to do so much more that you want to accomplish, I believe. Josh: And yeah, I think that's where we'll leave you with today.
¶ Closing
Ejaaz: Thank you folks so much for listening. Similar to Josh's prompts, Ejaaz: I'm actually kind of curious, for one singular task that you've used your AI Ejaaz: for, what is the most number of tokens that you've burnt? Ejaaz: Be honest, it can be for any use case, doesn't matter, let us know. Ejaaz: And also, what is the longest that you've had an AI work on a particular task Ejaaz: for? Is it a couple of minutes? Is it hours? Is it potentially overnight?
Ejaaz: Let us know. I'm curious. And what was the associated bill with that? Ejaaz: And yeah, we'll see you on the next episode. Wherever you listen to us, Ejaaz: if you haven't subscribed, if you haven't rated us, if you're not leaving us Ejaaz: comments, what are you doing? Ejaaz: We respond to pretty much any and every one of them. We listen to your feedback. Ejaaz: It feeds into some of the work and content that we put out.
Ejaaz: We are almost hitting 60,000 of you folks. And you guys are reading our newsletter, Ejaaz: which is like hit out to about 100,000 plus people. every single week. We post twice a week. Ejaaz: But yeah, wherever you are, please subscribe to us, leave us a comment, Ejaaz: and we'll see you on the next one. Josh: See you guys next time. Ejaaz: Peace.
