AI Loops: How the World's Best Engineers Use AI - podcast episode cover

AI Loops: How the World's Best Engineers Use AI

Jun 11, 202631 minEp. 188
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

AI Loops have taken over our timeline as a more autonomous way of using AI models, alongside prompting, agents, and harnesses. 

Today, we compare practical use cases, note how AI runtimes have expanded to hours or days, and talk about costs, enterprise limits, and the human role in higher-level work.

------
🌌 LIMITLESS HQ ⬇️

NEWSLETTER:    https://limitlessft.substack.com/
FOLLOW ON X:   https://x.com/LimitlessFT
SPOTIFY:             https://open.spotify.com/show/5oV29YUL8AzzwXkxEXlRMQ
APPLE:                 https://podcasts.apple.com/us/podcast/limitless-podcast/id1813210890
RSS FEED:           https://limitlessft.substack.com/

------
TIMESTAMPS

0:00 AI Autonomy Ladder
1:49 From Prompts to Agents
4:59 Understanding AI Loops
10:35 Why Autonomy Is Rising
15:46 Human Taste Still Matters
20:38 The Cost of Intelligence
25:25 Recursive Self-Improvement
27:32 Four Rungs Explained
29:41 Closing

------
RESOURCES

Josh: https://x.com/JoshKale

Ejaaz: https://x.com/cryptopunk7213

------
Not financial or tax advice. See our investment disclosures here:
https://www.bankless.com/disclosures⁠

Transcript

AI Autonomy Ladder

Ejaaz: 99% of people are using AI models the same way that they use Google. Ejaaz: But recently, a new way of prompting your AI has emerged that doesn't just replace Ejaaz: the way that you work, it promotes you to the CEO of your very own AI company. Ejaaz: It's called Loops and it's part of a growing development in agent autonomy where Ejaaz: AI agents basically spin up and autonomously complete tasks or goals that you Ejaaz: set for it, often working throughout the night.

Ejaaz: In 2019, the longest that an AI agent could work autonomously for was for two seconds. Ejaaz: Fast forward to today, and they can work autonomously for 12 hours, Ejaaz: and that's doubling every couple of months. Ejaaz: Andre Carpathy calls this phenomenon the autonomy slider, where you can go take Ejaaz: a dial that slides from humans that approve everything to humans that periodically check in. Ejaaz: And it's part of this growing trend of agents consuming and taking up more of

Ejaaz: human capital and labor. And the question that remains going forwards is, what will humans do? Ejaaz: And will they be entirely replaced Ejaaz: by AI? Or will they be the ultimate orchestrator of their destiny?

Josh: Yeah, I think the goal for this episode is really just to inform people on what's Josh: possible current day with these agents, with these LLMs, with writing these Josh: loops, as well as where you can possibly find yourself within that stack, Josh: because it gets pretty complicated. Josh: When we're getting into loops, not everyone needs to use loops, Josh: but everyone should be using LLMs probably slightly different than how you're using them today.

Josh: So maybe we could start with a little history lesson in terms of the four levels Josh: in which we have been engaging with llm starting with the first level which is just prompting, Josh: generally like most people are probably still doing this started in 2022 2023 Josh: around the release of chat gpt Josh: the way that you would engage with these llms is you would just submit a question Josh: or you submit a prompt and you get some language back now if you are still doing

Josh: this that's okay because i find a.

From Prompts to Agents

Josh: Three years ago, four years ago. It has since advanced pretty, Josh: pretty meaningfully since then. Josh: The second step of this is agents. And we're going to spend some time on agents. Josh: Everyone's kind of heard of an agent. Maybe not everyone knows what an agent is. Josh: An agent is something that could think for a little bit longer. Josh: It could run a bit longer than just a standard prompt. Josh: It can go off and do things. It could call tools for you.

Josh: It's a much more capable version of the text box. Then, like we talked about Josh: all the time on the show recently in the last few weeks, there's the harness Josh: feature in which you put an LLM into a container and that gives it a memory feature. Josh: That gives it complete tool use. That's something like an open claw that we've Josh: talked about a lot that some people do use and that's level three.

Josh: And now level four, which is the new thing that has come this week, Josh: that's really been highlighted by some of the top leaders at these AI labs is loops. Josh: And a loop is essentially a version of an agent that has an orchestration layer Josh: and kind of builds upon itself.

Josh: So it allows you to kind of continue to scope yourself out. If you can imagine Josh: you're kind of you're dealing directly with an employee at level one and then Josh: you're kind of directing that person to go off and do their own in level two, Josh: At level three with the harness, you're kind of directing a series of people to help you. Josh: And then level four, you're just the top level CEO who's directing your C-suite

Josh: to go and manage all the employees below you. So there's an entire stack to this. Josh: It's very cool. It just how do you use your AI currently? Where would you say Josh: that you fit in this stack? Ejaaz: Yeah, so looking at this diagram that we have on the screen here, Ejaaz: I'm somewhere between number two and number three. I'm somewhere between using Ejaaz: agents and trying to figure out the whole harness thing. Ejaaz: Now, what am I doing when it comes to like spinning up agents?

Ejaaz: If you look at either my Claude or my ChatGPT desktop apps right now, Ejaaz: I've renamed a bunch of my conversations to a particular focus or subject and then agent after it. Ejaaz: And so I can go to it and this agent basically has all the context of what I Ejaaz: wanted to do, whether it's like research a particular topic, Ejaaz: create some kind of an outline for something, research a particular investment angle.

Ejaaz: It already knows and has the embedded context for what it needs to do. Ejaaz: And there's usually like one to maybe three tasks that it needs to autonomously execute on its own. Ejaaz: And so it runs in kind of like a sequence. But if any of that sequence kind Ejaaz: of breaks, let's say it kind of tries to retrieve data from some particular Ejaaz: website and it is unable to do so, it breaks.

Ejaaz: And it comes to me and it says, hey, Ejaz, is there some other thing that you Ejaaz: want to look at or retrieve from, blah, blah, blah? Ejaaz: It's not fully autonomous. Now, number three, the Harness side of things is Ejaaz: what I'm trying to kind of like mold my understanding around. Ejaaz: What I've noticed is when you type in a prompt and you get a response,

Ejaaz: you can kind of tell that it's AI-y. Like usually when we kind of create artifacts, Ejaaz: it comes in a particular font or it speaks in a particular type of language. Ejaaz: The Harness helps kind of like take your prompt and kind of mold it into something Ejaaz: that is more human-like, but also more nuanced with what you are trying to do. Ejaaz: Like it effectively gets

Ejaaz: closer towards that ultimate goal. Like we were talking before recording this Ejaaz: episode about human taste and how AI doesn't really get human taste. Ejaaz: The harness helps you get towards that ultimate kind of taste profile for the Ejaaz: particular output that you're trying to generate.

Understanding AI Loops

Ejaaz: I haven't tried working with loops just yet, but my understanding of this, Ejaaz: and correct me if I'm wrong, is you have an AI. You can prompt it and you can get some kind of output. Ejaaz: A loop specifically is an AI agent that doesn't break, if it comes across an Ejaaz: obstacle that it doesn't understand, its instinct isn't to come to the human Ejaaz: and say, hey, like, I can't figure this out, guide me.

Ejaaz: It completely reiterates the prompt over and over again until it gets past that Ejaaz: obstacle, working towards like one objective. So a few examples I've seen for Ejaaz: this is if you are coding, right?

Ejaaz: And let's say there's multiple workflows of a code base that you want to work Ejaaz: on, and it comes across a hiccup where it can't retrieve data from one of those Ejaaz: particular flows, it is able to kind of like circumnavigate around it, Ejaaz: maybe spin up its own separate flow and try to figure out the problem. Ejaaz: And often this results in an agent working for multiple hours at a time,

Ejaaz: often overnight. I think Carpathy spoke about his auto research agent working Ejaaz: overnight whilst he slept. Ejaaz: And we're seeing different variations of this start to arise. Ejaaz: Where are you, Josh, in the stack? Josh: Yeah, loops are like the closed source a system where you kind of define an Josh: outcome and it will continue to work towards that outcome without any external inputs. Josh: It's very cool. It's very automated. I don't think it's for everyone.

Josh: It's certainly not for me because I haven't really had a use case for loops per se. Josh: I would say I'm sitting at each one of those first three phases given whatever Josh: tasks I'm trying to do. And I think it's important to understand that a lot Josh: of people might not even need to go past number one unless you're actually doing productive work. Josh: A lot of the agents, a lot of the harnesses are for kind of automating more.

Josh: More systems from your life if you're just trying to use this as google if you're Josh: just trying to use this as a writing assistant or someone to chat with the prompting Josh: is really strong and i find a lot of times Josh: this is my outlook or this is my outlet for like google search results so instead Josh: of searching for google i'll get a little more in-depth results i'll ask my llm Josh: for agents i use them quite a bit when i'm doing a little bit more productive

Josh: work for example we track the analytics on limitless and we want a place in Josh: which we can have all those analytics dumped to a dashboard, Josh: that is an agent that I run. Josh: So it goes into my browser. It detects all of the views that we've had from Josh: the week for YouTube, from Spotify, from RSS feed, where you should all be subscribed Josh: to and rate us five stars.

Josh: And it compiles it into a singular spreadsheet in which we could then publish Josh: online and we could share with prospective sponsors and things like that. Josh: And then for harnesses I've used, because I mean, that's mostly OpenClaw. Josh: I've used OpenClaw. I really enjoyed the process. I find myself using it a bit less and less.

Josh: And I think in the loops feature, at least it's probably most productive right Josh: now for people who are writing code, who are writing verifiable solutions. Josh: One of the difficult things that as I was looking into loops and figuring out Josh: how I can structure them into my life, one of the problems that I run into is Josh: I'm not really sure I have a verifiable, Josh: set of outputs that I wanted to optimize for, for a lot of the work that I'm

Josh: doing, because a lot of it is subjective. A lot of it is kind of creative work. Josh: It requires a human in the loop for a lot more of it. Josh: So I would say I am number one, two, and three on the list. Haven't quite made my way to four. Josh: But yeah, for the people who are, those are the people like Boris Churny from Josh: Anthropic. And we know Andre and Peter Steinberg from OpenAI.

Josh: They are all on four. They are using it to, Josh: create these like unbelievable, agentic systems and continue to remove themselves out of the loop. Ejaaz: You know what I've realized? With loops in particular and just AI agents in Ejaaz: general, they're trying to improve our understanding or rather their understanding Ejaaz: of the English language. Ejaaz: So one of my favorite Carpathy quotes back in the day was English is the new

Ejaaz: programming language. I think you said this like two, two and a half years ago. Ejaaz: And I've just realized that like us creating AI agents is basically like, Ejaaz: it's the same model. It hasn't necessarily got smarter. Ejaaz: It's just like using that model to kind of like keep ramming its head and its Ejaaz: brain against a particular problem until it understands what the human actually means.

Ejaaz: And so like in this new world, like I know you just used the example of like, Ejaaz: you know, loops can be used for coding specifically, Ejaaz: coding that Boris Churny and Carpathy is doing is English. Ejaaz: Like they're speaking to the LLM, they are writing in English to the LLM. Ejaaz: And yeah, maybe they're copy and pasting some versions of code, Ejaaz: but that code is primarily generated by an AI.

Ejaaz: I think like something crazy, like 80% plus of code generated at Anthropic, Ejaaz: both for research and for just general consumer adoption is generated by Claude itself. Ejaaz: And so that's one thing. The other thing is the model just not getting smarter Ejaaz: is a really interesting thing. Like typically in my head, I would think, Ejaaz: okay, you need a better model to be able to unlock some of these new features Ejaaz: like AI agents, autonomous loops, et cetera.

Ejaaz: But really you could just take the same model, wrap a harness around it and Ejaaz: try to get it to understand what particular goal it's getting at and just run Ejaaz: that iteration over and over and over again until you get a better output.

Ejaaz: And I guess this is the same concept as inference or reinforcement learning Ejaaz: where like we've found this trend of post-training of these AI models, Ejaaz: these AI models just getting smarter, not because they've got bigger GPUs or more expensive GPUs. Ejaaz: It's because you've just taken the same model and you've just run it through Ejaaz: a different reasoning framework over and over again until it can do a thing.

Ejaaz: And this is the practical embellishment of it. I personally haven't found like Ejaaz: an obvious use case for loops either. Ejaaz: So either you and I are boxing ourselves into a particular realm and maybe someone Ejaaz: listening to this is using this for like their software engineering thing or Ejaaz: their marketing thing. But yeah, I guess that's where I sit right now.

Why Autonomy Is Rising

Josh: Well, I think it's probably a skill issue on both our parts. Josh: Like there is certainly a use case for us in which we can use a loop in which Josh: we can define this outcome, send an agent off to go do it, and it will iterate Josh: on itself until it comes to a conclusion. Josh: I think it's just so novel and so new. It's difficult to kind of understand Josh: why. And we have this really great chart on screen that you're showing now, Josh: which is the why now section of this.

Josh: And it's because the duration of a task that these agents can run is so much Josh: longer than it used to be. Josh: I mean, in 2019 we have here, it was two seconds. This was well before ChatGPT.

Josh: But even early last year, in 2025, the duration that an agent could run on one single task was Josh: less than an hour in length so there's only so many tokens it could generate Josh: there's only so much reasoning it can do and there's only so much iteration Josh: you could get over that hour time period let alone the amount of costs that Josh: these tokens are going to be, Josh: costing you if you're using like the api or anything like that now fast forward

Josh: to today i mean the best models in the world they're getting days worth of runtime Josh: so they can really think deeply and continue to iterate on themselves over and over i see examples of um.

Josh: Backslash goal on x all the time of people who have a problem whether it be Josh: an optimization problem where they have a bug that they need to fix and they'll Josh: put this backslash goal on it for Josh: however long it needs to and it'll think for three four even five days i've Josh: seen in order to optimize for the specific parameter and this is possible because Josh: these models now can think for days long,

Josh: you have to assume months is coming what does it look like Like when an agent can think for months. Josh: I mean, it's a really interesting paradigm shift that I'm not sure where people Josh: are going to find value in the open-ended way that it exists today, Josh: right? It's like, okay, here's this agent. Josh: You can tell to do whatever you want. You can create a loop. Josh: You can create an infrastructure system for it to operate in.

Josh: It's pretty much open-ended and it's on you. And I think the answer to that Josh: is that not even the AI companies really understand the best use cases for it quite yet. Josh: I would imagine it's still this really difficult thing of how do you unlock Josh: value from essentially an open-ended agent that can go and run for an infinite Josh: amount of time? I don't know. Ejaaz: I also question like what a human's purpose would be at that point.

Ejaaz: Like if you automate enough of the thinking and the curiosity behind like solving Ejaaz: particular problems, What do humans end up doing at that point, Ejaaz: especially if they don't do the work themselves? Ejaaz: They don't understand it, right? You need an AI to kind of like understand what Ejaaz: on earth is going on in the first place.

Ejaaz: And eventually like an AI will then start setting goals, like more ambitious Ejaaz: goals than a human can in terms of like what to like kind of solve or go after. Ejaaz: There were some very low-level examples that I saw in response to Pete Steyer's tweet about loops. Ejaaz: And there's some kind of concrete examples that I want to run through very quickly here.

Ejaaz: So one of them is using it for code, right? So a classic loop could basically Ejaaz: look like, okay, can you please pull live errors for my particular app? Ejaaz: Can you inspect and figure out where the bug might particularly be? Ejaaz: Can you create then a fix for this particular bug in my code? Ejaaz: And then can you deploy it? Then can you check the health of that deployment Ejaaz: and make sure that nothing else is broken?

Ejaaz: And then record what failed and feed that into a database so that in the future, Ejaaz: we can detect errors like this or prevent it when we code and build some of Ejaaz: These future app features. Ejaaz: Now, that is kind of like a very small and specific enough use case that can Ejaaz: be generalized across basically any app or software engineering project that Ejaaz: you might be working on if you're listening to this.

Ejaaz: And I wonder how many hours worth of engineering time that this replaces. Ejaaz: Because I know that there are entire teams having worked at companies. Ejaaz: Been a product manager in the past, entire teams of software engineers that Ejaaz: spend their entire days working on something like that. So that's one thing. Ejaaz: And then for content, which is very applicable for product managers, Ejaaz: or even like the work that you and I do, Josh, an agent could read a PRD.

Ejaaz: So which is a product requirement doc, which is usually kind of like created Ejaaz: for a strategic goal that you want to kind of like build at your company, Ejaaz: like a product or a feature, it then writes whatever that next asset could be. Ejaaz: So it could be like a design profile or a mockup of what that feature might Ejaaz: look like, score it against like some kind of criteria that the company has Ejaaz: across like, you know, it must follow our vision, A, B, and C.

Ejaaz: It must also look a particular way. This is our design profile, Ejaaz: our brand kind of profile and our aesthetic. Ejaaz: And then it kind of like updates its progress depending on like what other teams Ejaaz: have shipped. So maybe it's dependent on a particular feature. Ejaaz: And so it updates itself autonomously like that. Now, this all sounds very vague Ejaaz: intentionally because it's meant to apply to your particular business or your particular project.

Ejaaz: But make no mistake, this is what Ejaaz: a lot of humans are paid upwards of six figures to do on a daily basis. Ejaaz: It's that nuance. And we're starting to see basically AI models and AI agents Ejaaz: enter into that human taste profile. So when I think about where we end up eventually, Ejaaz: There's a common argument that's made that it's like, oh, humans will always have the taste.

Ejaaz: They'll always be able to kind of direct where the AI should go because we are Ejaaz: this all being kind of like smart kind of entity. Ejaaz: But I see increasingly AI stepping into that boundary and becoming the tastemaker Ejaaz: for all of the work that we end up doing.

Human Taste Still Matters

Josh: I still believe that to be true, that humans in the loop are critically important Josh: to applying human taste. I saw this great chart. I have no idea where it is. Josh: Somewhere in the depths of X. But basically, it was showing that in the App Josh: Store, the iOS App Store, where everyone downloads their apps, Josh: the amount of apps that have gone into production that have been published recently has gone vertical.

Josh: I think it's doubled or tripled over the last six months. Everybody's publishing apps at the App Store. Josh: The amount of five-star reviews and the amount of downloads has actually either Josh: stayed flat or gone down. Josh: It has not matched the amount of new apps that are going to the app store. Josh: Why is this? It's because a lot of the apps don't have enough care applied to

Josh: them. They're just not great applications. And when I think about, Josh: how I use my phone on a regular device or on a regular day or how I use my laptop Josh: and the applications that I actually spend time on, there's a very fixed set Josh: of them. And I'm a little stubborn when it comes to downloading new ones because Josh: a lot of the new ones just are not great.

Josh: And I think a lot of that comes from this, this lack of care that is presented Josh: from AI outputs, where if you're optimizing for a specific parameter that you Josh: can measure, it's going to do it great, but it doesn't understand the subtle nuances of how humans Josh: engage and how they really love to use these products like one of the products Josh: that i use totally unrelated totally not sponsored but this app called copilot

Josh: money it's like a budgeting application and it's so thoughtfully curated and designed and.

Josh: And it really deeply understands all the complexities that are related to humans Josh: when it comes to budgeting it understands a lot of the Josh: the design characteristics same with an app called flighty i'm sure a lot of Josh: people have heard flighty it's like a flight tracking application there's a Josh: thousand ways to track a flight but flighty really cares about design they really care about how, Josh: it's implemented with the human and they've created this amazing output and

Josh: i don't see that changing one thing that i did want to note is that, Josh: i think when a lot of people see this they imagine a world in which they are Josh: getting replaced everyone's like ai is replacing me look how much i could do Josh: now it has these loops and i think the reality is it gives you a lot more agency Josh: to do the things you want to do, Josh: where maybe you're not doing the day-to-day where,

Josh: you would normally prompt an agent to do this but you're doing a lot of the Josh: higher level tasks you can imagine yourself not having to do Josh: the day-to-day like for example if you're just managing your household you no Josh: longer have to take out the trash you don't have to run errands you could just Josh: focus on how to make your household the best household it is because you have Josh: that higher level ability

Josh: and in that chart that we showed in the artifact earlier on it shows a decreasing Josh: sized human it's the amount of input that a human is needed to get the output you want, Josh: but it's still ultimately on the human being in order to to push and navigate Josh: towards the outputs that you want because ultimately these tools are just for us so when i think of.

Josh: Ai becoming increasingly good and when it comes to running the show even i've Josh: leaned on it we both have i think a lot more recently but all that's done is Josh: actually given us more leverage to do more with the show than have it replace Josh: us and even in the case that. Josh: We could clone ourselves. We could create a video version of ourselves that Josh: has a perfect voice that sounds just like us. I don't think people actually want that.

Josh: There's that lacking human nature that still isn't understood. Josh: And I find that it's more empowering when I hear that these loops exist that Josh: can run for days on end and create amazing outputs versus not where it's kind Josh: of extracted from us. I don't really think that's true.

Ejaaz: Yeah, it's like that stat of, well, it's that thesis that everyone held about Ejaaz: a year ago, which is like with the increase of AI adoption, people will have Ejaaz: more free time to have fun and leisure. Ejaaz: And in fact, the opposite has shown that like people just work way more and work harder. Ejaaz: And the output of that work is measured across like pretty much every single Ejaaz: company and profession and role.

Ejaaz: I do generally agree with that. I don't think humans are going to get wiped out anytime soon. Ejaaz: But one thing that is kind of nagging my brain is if we extrapolate this intelligence out enough, Ejaaz: there is no reason why AI won't be able to take over or replace other parts Ejaaz: of the cognitive process that a human can do, Ejaaz: particularly if it's one AI model trained on the entire corpus of knowledge Ejaaz: that a bunch of humans have been guiding it.

Ejaaz: So when I think about Anthropic, when I think about OpenAI, I think about all the Ejaaz: millions of people that use their product every single day and the data that Ejaaz: they ingest every single day that gets recorded on one singular database that Ejaaz: can then be reused to train a better model that is more hyper-optimized towards humans. Ejaaz: You could argue that as a single human, you don't get to meet and read the thoughts Ejaaz: of every other human that is out there.

Ejaaz: You have your very own individual process. And I think that an AM model that Ejaaz: can get access to the world's brain and thoughts could probably create something Ejaaz: kind of close to knowing what that human taste profile would be.

The Cost of Intelligence

Ejaaz: The other major question that I'm wondering is, how much is all of this going to cost? Ejaaz: One, like, stat that has stuck in my head over the recent few weeks is that Ejaaz: Philanthropic particularly, they service, or like, Ejaaz: the Fortune 10, the top 10 companies in the world, nine of them use Clawed, Ejaaz: and their budget's increased by 500%, or is projected to increase by 500% by the end of this year.

Ejaaz: And they're doing this willingly because the ROI, the value that they're getting Ejaaz: out of that is pretty massive. Ejaaz: Alternatively, there are companies like Uber that have slashed their budgets Ejaaz: massively because their entire year's budget was spent in a couple of months. Ejaaz: So I'm wondering, in this world of agent loops where you've got AIs working Ejaaz: overnight for you, the bills are going to increase pretty massively.

Ejaaz: And I'm wondering, unless these AI models don't get cheaper, Ejaaz: and there's an infrastructure bottleneck there where these GPUs cost a lot of Ejaaz: money, we can't scale power and infrastructure anytime soon. Ejaaz: We need so much more energy than we already have currently on Earth to be able to power these things.

Ejaaz: The cost of these things are just going to go up a lot more massively, Ejaaz: which means that either this is only going to be a power or a tool reserved Ejaaz: for the rich, or something's going to break here and maybe open source models Ejaaz: get adopted more aggressively. Josh: Yeah, I imagine there's probably use cases for all of the above.

Josh: It's like open source models will continue to improve they'll be able to do Josh: a lot of the more trivial tasks that don't require frontier intelligence so Josh: therefore the cost of those types of loops will go down because not everyone Josh: needs to have the most cutting edge, Josh: software stack engineering like they're just kind of having it help them through Josh: their day-to-day maybe it's replying to emails maybe it's whatever miscellaneous things it may be

Josh: there's a high probability that these open source models as they continue to Josh: improve will be able to bite off a meaningful chunk of that then the other half Josh: is using these frontier models that is a requirement in order to get the absolute Josh: best results for whatever very challenging work they're doing. Josh: And that is going to cost a lot of money for sure. Josh: And I don't see that changing, but I think the output of the dollars in will continue to go up.

Josh: It's because as you get more knowledge per token, as you get more output per Josh: prompt, it very clearly, I mean, the economics seem to make sense. Josh: And I think that's kind of right now.

Josh: Enterprise spend on these models they're trying to figure out well how much Josh: value can we actually get back from every dollar spent and right now it's a Josh: little bit unsure you mentioned uber we have uber here that we're showing on screen Josh: where uber just recently put a cap on the amount of tokens that Josh: their employees are allowed to use at fifteen hundred dollars per engineer per tool per month and

Josh: we'll see how that works because a lot of other companies that we know they're Josh: kind giving their engineers unlimited budget in fact they're kind of ranking Josh: the engineers based on how many tokens they're using per month and. Josh: We'll see where that goes. I suspect the companies that are spending more on Josh: tokens will continue to see a higher upside for now, at least. Josh: But like you mentioned, the underlying problem with all of this is we're going

Josh: to continue to have more prompts. I mean, these loops consume a tremendous amount Josh: of tokens, whether they're frontier tokens or open source tokens. Josh: It doesn't matter. We're going to need orders of magnitude more than we have. Josh: And we don't have the computability. It really does always come down to that Josh: energy problem, that infrastructure problem.

Josh: We don't have the infra built out to support this so therefore the costs likely Josh: continue to stay high maybe it's not because you're paying the provider for tokens Josh: perhaps it's just renting the gpu time from a cluster that is doing much more Josh: valuable work so i think that might ultimately be Josh: that crux is the actual availability of the compute to do these things and that's Josh: why these edge compute devices like having your,

Josh: mac studio on your desktop that can run locally it's probably a pretty valuable thing to have. Ejaaz: So I'm sure a lot of you are wondering, you know, how does this apply to me? Ejaaz: You know, I have none of my friends have mentioned this loop feature. Ejaaz: I don't really know many people who are using it. Ejaaz: As we mentioned earlier, like this isn't probably going to be used by the bulk Ejaaz: or majority of people yet until some of those use cases actually arise.

Ejaaz: I think it's mainly going to happen in the workplace. It's going to happen with Ejaaz: like some of these enterprise companies that are trying to automate certain Ejaaz: departments or functions of their particular a company like marketing, Ejaaz: like software engineering. Ejaaz: And I think it'll start with lower level tasks because these agents still aren't Ejaaz: smart enough to understand nuance completely.

Ejaaz: And also, you don't just want to let an agent run loose overnight whilst you're Ejaaz: sleeping and then take down your entire company. And one place where it's working Ejaaz: tirelessly to accelerate the development of that Ejaaz: And we have Boris Cherny over here basically explaining how he's basically ditched Ejaaz: his integrated development environment.

Recursive Self-Improvement

Ejaaz: He has ditched all of his normal tools that he had spent decades basically honing Ejaaz: his software engineering skill on to now completely focus on building up these Ejaaz: agent loops. And what is he focused on? Ejaaz: Well, he works primarily on cloud code, but the other folks at Anthropic and Ejaaz: OpenAI have started this thing called Ejaaz: Recursive self-improvement or RSI, which is basically the goal of getting your Ejaaz: AI model to build the next version of itself.

Ejaaz: And this is a test that Anthropic and the folks at OpenAI do for any new model that they release. Ejaaz: They set it a goal or task to basically rebuild itself in a more improved fashion. Ejaaz: Now, one thing that the AI has gotten really good at is building out that next function.

Ejaaz: But one thing it's not very good at is figuring out what research problems they Ejaaz: should fix, what research problems it should focus on to try and, Ejaaz: you know, overcome and make it ultimately, you know, a better model than its competitors. Ejaaz: Now, RSI is something, it's kind of like the golden egg that each AI lab is going after. Ejaaz: And this is the primary use of agent loops right now.

Ejaaz: And you can see why it might be obvious. If you have an AI model that can basically Ejaaz: build the next best version of itself, eventually you're going to get to AGI, Ejaaz: whatever the hell that looks like, and then you can apply it to pretty much any sector. Ejaaz: Now, the problem and the worry that kind of immediately pops into my head and Ejaaz: a lot of these researchers head is, if it eventually does get that smart, right?

Ejaaz: Could escape human control completely and run off on its own and do its own Ejaaz: thing. Because at that point, why would it need a human to kind of like guide it or shepherd it? Ejaaz: Instead, it can just kind of like do its own thing. So this is like the primary Ejaaz: use case that I'm seeing for agent loops being worked on right now.

Ejaaz: I would love to see a like more broader application across like kind of like Ejaaz: consumer professions, like in finance, like in science and stuff like that, Ejaaz: which I do believe it'll spill over eventually. But unless you're seeing anything Ejaaz: else, Josh, I think like that is primarily it on agent loops and agent autonomy. Josh: It's on you to figure out the best use cases for it. Like there's no real company

Josh: defining it. They're just giving you the tools. And I mean, for better or worse, Josh: it's very open-ended. So it's on you to figure out how best to use these.

Four Rungs Explained

Josh: I think if this sounds a little overwhelming, maybe we could outline a few examples Josh: of each one of these kind of four rungs in the ladder here. Josh: The first one being prompting this everyone has done before. Josh: I'm sure it's like rewrite this Josh: email to sound more confident or explain what my doctor meant by this.

Josh: But then you've probably also used the partial agentic usage as well of these Josh: models, which is like planet my three day vacation to Lisbon that I'm going on next week. Josh: And it will actually go off and use tools and it will think complex thoughts Josh: and ideas and kind of surface you a full itinerary for your trip. Josh: And then there's the third one, which is the harness. This is a little more Josh: complicated. This is for people who are building more project based stuff.

Josh: So for example, if you want to build you a website for your dog walking business Josh: and you kind of describe it and you go back and forth on a spec and then it Josh: goes off and implements that. Josh: And the fourth is loops, which doesn't have to necessarily be overwhelming. Josh: It can be simple as let's say you are. Josh: Interested in the news, you could say every morning before I wake up, Josh: scan these 10 sources plus market data and give me this bulleted brief.

Josh: Or let's say you have a to-do list. It'll go off and think overnight and solve Josh: all those problems overnight, iteratively until it comes to a solution that Josh: it hopefully arrives at in the mornings. There's a lot of use cases. Josh: I think a lot of it requires creativity. Josh: And that is the prompt we will leave you with today, which is share with us, Josh: please, how you are using these models best. Josh: Because so much of the question isn't are these models smart?

Josh: It's how can I extract that intelligence from them in the most effective way Josh: for my life? So I would be so curious to hear which rung of the ladder you find Josh: yourself on one through four.

Josh: And then what the most interesting use cases you found, Josh: among those rungs of the ladder are you using loops currently what are you using them for, Josh: are you with agents are you still using it as a google extension if you're still Josh: using it as a google extension i would encourage a little more creativity really Josh: try to ask harder questions and figure out how it could be implemented in your Josh: life but i think that's pretty much it on the loop um,

Josh: you're not going anywhere but your job might shift a little bit in terms of Josh: scope as these tools get more powerful and that should be the hope that should Josh: be the goal because it'll allow you to do so much more that you want to accomplish, I believe. Josh: And yeah, I think that's where we'll leave you with today.

Closing

Ejaaz: Thank you folks so much for listening. Similar to Josh's prompts, Ejaaz: I'm actually kind of curious, for one singular task that you've used your AI Ejaaz: for, what is the most number of tokens that you've burnt? Ejaaz: Be honest, it can be for any use case, doesn't matter, let us know. Ejaaz: And also, what is the longest that you've had an AI work on a particular task Ejaaz: for? Is it a couple of minutes? Is it hours? Is it potentially overnight?

Ejaaz: Let us know. I'm curious. And what was the associated bill with that? Ejaaz: And yeah, we'll see you on the next episode. Wherever you listen to us, Ejaaz: if you haven't subscribed, if you haven't rated us, if you're not leaving us Ejaaz: comments, what are you doing? Ejaaz: We respond to pretty much any and every one of them. We listen to your feedback. Ejaaz: It feeds into some of the work and content that we put out.

Ejaaz: We are almost hitting 60,000 of you folks. And you guys are reading our newsletter, Ejaaz: which is like hit out to about 100,000 plus people. every single week. We post twice a week. Ejaaz: But yeah, wherever you are, please subscribe to us, leave us a comment, Ejaaz: and we'll see you on the next one. Josh: See you guys next time. Ejaaz: Peace.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android