AI workflow tool brainstorm session - podcast episode cover

AI workflow tool brainstorm session

Feb 27, 202633 minEp. 571
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Summary

Caleb Porzio brainstorms a robust AI workflow tool called OTAT, or "One Thing At A Time," to tackle intricate coding challenges. He details a 14-step process, from bug reproduction and analysis to solution implementation and critique, using AI for each isolated task. The discussion highlights the necessity of this meticulous approach over single-shot AI prompts for confidently solving complex, systemic problems in software development.

Episode description

The podcaster did not provide a description for this episode.

Transcript

Intro / Opening

Okay, we're back. Um, you know, I meant to crack the Coke Zero in front of the mic, but Just realized that's long gone and all that, you know, fire I had I gotta rekindle because uh they decided to mow the lawn, the nerve of living in a place where they mow the lawn for you.

Core AI Workflow Principles

Um, so let's talk about we're gonna build a tool. That's what we're gonna do, because that's what you do with AI, is you build a tool. Everybody needs their own harness, right? You need your own AI harness. That's what we need. I'm gonna build my own. Get ready. Alright? Pens out. These are the core values. This is the only core value. One thing at a time. That's it. That's it. That's all that matters here. One thing at a time.

You ask AI to do one thing at a time. In a fresh context. Not that stale context that you keep piling on new things. No, one thing at a time. Okay? Also known as OTAT. Uh so the other value that I have, but I d I don't think this is gonna be like part of the harness. I think it's just a value that we need to recognize is make the decision easy.

then make the easy decision. Folks, you've heard make the change easy, then make the easy change. I believe it was George Washington who said it first and Kent Beck who said it second. Make I but I hear say to you, good people of the podcast. To make the decision easy and make the easy decision. What do I mean by that? I mean a lot of this AI stuff is making decisions. I don't know if it's that or it's just my job is making a lot of decisions. Maybe it's all of life is just making it.

That's what it feels like. And yeah, a lot of dis it's decisions and tasks, you know? But AI can do the tasks now. So it's mostly decision making that feels like the thing that I have to do now. And uh I hate decision making. It's very stressful to me. But what I realized is I don't hate decisions, or really, what it is is like.

A difficult decision is just uh a decision with not enough knowledge, you know? Knowledge gathered makes decisions easy. So you have a hard decision about, you know, should you go to Yale or community college? You know, there's a lot now, that's a bad example. You have a decision like should I I don't have any decisions off the top of my head, but you get what I'm saying is should we support this feature?

Or whatever. And then it's like, oh, I don't know. But what are all the questions you have underneath the hood? You know? Oh, is this what people want? Oh, is this what other things have done? Oh, is this gonna bite us in the future? Blah, blah, blah. What are the implications of this? How hard would it be to do? These are all questions. And if you had all of them answered and you had an Oracle who was just sitting there and you said, Hey, um

Do uh my competitors have this feature? Uh let me see. Pip uh yeah, they all do. And uh is this gonna bite us in the future? Uh no it won't. And then you go and it should is this gonna take a lot of work? Uh no. Um, do you know how to do it? Yeah. Uh okay. Let's do it. You know?

That's how the decision becomes easy because you have knowledge gathered. So that's one of my core principles here that I keep trying to apply is anytime I feel like a decision is difficult, I go, How can I gather more knowledge so that the decision becomes easy? Okay. But I don't think that has anything to do with the tool. I think it's just how I will design my prompts for the tool. We're gonna build a tool. It's gonna be called OTAT. Umne thing at a time.

AI Bug Reproduction and Analysis

And how does this work? Well, I don't know yet, but let me describe to you what I did yesterday for this Alpine pull request, and then we'll kind of work our way from there, okay? So I want uh sorry, this alpine bug. It's an issue. Okay, so it's a GitHub discussion. All right. So what I did is I opened a new folder. I think this is just kind of how it has to be, is these like sessions, you know? So let's let's get our pen up.

A session is a folder. So you have a directory, okay, for your session. Maybe they're just numbered. Session one, okay? And now inside of this folder, and let's say this folder gets created in the So your root slash TMP slash session one so we don't have to worry about cleaning it up. So you create a new folder, you CD into that directory, and you run claude hyphen hyphen dangerously skip permissions. That's a very important part of this.

is uh we can't be bothered to approve anything. So it has to be a dangerously skip permissions situation. And if you're scared of that, then I guess you have to run it in some containerized safer environment. And so for this Alpine thing, I created this folder, I C D into it, I opened Claude and I said, Uh what was the first thing I said? Okay, I I like posted pasted the description and I said your job is to reproduce this issue.

And you know, feel free to like create a new folder called Reproduction and Um Yeah, that's what I said. Create a new folder called reproduction. Uh do whatever you need to do in there, you know, open it a web server, use playwright, just get it to reproduce. Make sure that you can reliably reproduce this in a web browser.

And it did that and it was like sure, here you go. And then I said, uh and then write like a markdown file that says like how to reproduce a problem given, you know, like write a guide so that somebody could just look at this markdown file, open up this thing in the URL bar, do these actions and see it fail, right?

Okay, so that was like the first step was get it to reproduce. Second step I was like, okay, now I need a feeling test. So I said open up a new folder called repo, clone the Alpine repo, and pull out a new brand. and write a failing test for it. Okay. So now the groundwork is laid. Um because that's my acceptance criteria. If I get this test to pass, then we're good. I don't know if I needed to have it write the failing test right away or not. I'm not sure.

Um but then I went through a a series of things. So like the first one was All right, so we have the problem. So now I need it written up. I need this problem like written up well. So like what is the root problem? So I go, You have a reproduction and a failing test, you know.

Just dig in and use all your tentacles to figure out what's going on under the hood that causes this problem and write a document that explains exactly what the problem is and don't write solution stuff. It did write solution stuff. So then I added this is a key point. I added a new step after. I already told you that. So And each one of these I'm literally just running Claude Hyphen Knife and Dangerously skip permissions.

I am writing out a prompt, but I'm actually writing it in a markdown file and then copy and pasting it in because I'm just building out the thing. But in reality, this will be automatic. Like you'll write the prompts in markdown files and they will be run in sequence, whatever. So then I have a new fresh like kill that clawed instance, open the next one and say, hey, look at the you know mark problem.md file and strip out any solution talk. Great.

Solution Brainstorming and Decision Making

Um so then I keep going and then what what was the next one?'Cause I I'm just gonna explain this workflow to you because I think it was really good. Um so let me even see where we were. C D in here? Yeah. Okay.

Yeah, so the first one was reproduced, the second was a failing test, the third one was create that problem, the fourth one was strip the solution from that problem. The fifth one was write down all the potential solutions that you can think of. And this is one where I think I went wrong because I asked it to do

multiple things. Well I asked it to do one thing and I explained. Think of a bunch of solutions and write them down. I gave it examples of like solutions that you know, upstream solutions, lazy effort s lazy low effort solutions, but still correct solutions. um high effort, pure, correct, robust solutions, things like that. Um and then I asked it to pick two solutions, one that would be a low effort passion

that would work but would be low effort and one that would be if we had all the time in the world to make the world as perfect as possible, what would that be? And then make a decision based off Well I didn't ask it to make a decision. I asked it to just I don't think I yeah. I just asked to lit out list out all the potential solutions. So that worked, but I am skeptical here if I broke the rule of of OTAP, where I should have said, um, write out one potential solution to this problem.

And then put it in a loop and was like and had it do it maybe ten times or tell it like, if you don't see if you feel like everything's already been covered here, then don't do anything. Um But otherwise, like, yeah. You just because I think if the L L M is trying to think of too many I think this is the exact type of thing I'm trying to avoid. I should have put this in a loop so that it created one potential solution at a time.

The the tricky part is when does that loop end, you know? Um, but I think that that would have been an important modification of this. So now I have a file with all the potential solutions. Then I go to number six, which is question. So it's like look at and I tell it which context to look at. So I go like look at the problem, look at the potential solutions, and I want you to come up with a list of questions. that if answered would make this decision really, really easy.

So this could be performance questions, it could be breaking change questions, could be community demand questions, whatever. Use GitHub, go out, look for things, pull down code from other repositories, write tests, run tests. Um but don't don't actually answer these questions. Just write down the question. So that was done. And then I have another phase called answer. And this one I did put in a loop. So I said, you know, this prompt is like open up the questions.md file, pick one question.

Get to the bottom of the answer empirically, like actually run code to dis to determine this answer, and then write your answer in there and then kill yourself. And then we go and we do it again and again and again until there's no more left. Okay. Then, so I look at all of these uh these question and answers in this file, but some of them are still hinting at solutions that like, oh, well, because we learned this, we know that this is the correct solution.

So I have another strip solution step. So number eight is strip solution language from all of the answers. Okay, so then number nine is evaluate the the right uh evaluate the right answer. So look at all the potential solutions, look at everything we learned from the questions and answers, and now tell me the low effort solution and the high effort solution and then what you think we should do and come up with a single decision.

So then it does and it it was good. And so it's like, okay, now we need to go and implement this.

AI Implementation and Critique Process

But we can do more, we can gather more knowledge before we even implement it. So I have another step called layup. This is where it's like, okay, look at the proposed solution. Now gather every bit of knowledge you can to lay up the implementation so that it's just like

Really, really easy, you know, for the next press is Alyoop. What's the difference? What's let's get to the bottom of this here quick. Ally Yoop An El Oop is a high flying basketball play where one player lobs the ball near the rim and the teammate jumps to catch or dungeon. Now a layup. I know what a layup is, but like people use it, let's say layup metaphor. Layup metaphor commonly signifies an easy guaranteed or high percentage opportunity for success.

Um so I'm using them wrong. So it shouldn't be the layup. It should be ally up, also known as Alb metaphor. Um Fine. We're alleyooping the implementation, gather, look at other repositories, look at everything for like algorithms that we're gonna need, test cases we're gonna need to write, things that can help us.

extra context that can help us. Just do anything you can to like have this list of things available so that when somebody be be right there with your pile of things so that when the surgeon actually doing the fix says scalpel, you just hand it. Right. So then then step eleven is is do the surgery. It's implement. So step eleven is really simple. It's like just using all this context from before.

Go ahead and implement the solution. Then twelve is an important step because it does the whole solution, but There's problems, like there just always are. Like I looked at what it did. It didn't have enough tests for a certain thing. I forgot. There were a few things where I was like, eh. So then step twelve is critique the implementation. So then another step that goes

What is this implementation missing? What code could be cleaned up? What things could be renamed? What things might be left over? You know, all that stuff. It makes a list. Then step thirteen is implement the critique. And then 14 is pull request, where it creates a pull request, writes a description, um, and that's it. Okay, there you go. That's how to create a pull request. And this sounds insane.

And this is crazy amounts of tokens and context and stuff for a single pull request. But this is honestly the only way. that that I'm gonna feel confident in these AIs. And so in this case, it did produce something that was good. And when I had AI just one shot it, that was bad. So I have some, I have one data point of evidence that says this is the way to handle a problem in your application.

Deep Dive: Alpine X-Data Bug

A lot of times i again, it's all in like what your How complicated your system is. Of course, there might be like simple problems or like a bug here and there where it's an obvious fix. But some systems have deep, deep, deep systemic problems that changes to them have like unbelievable ramifications and they have to be thought out really, really deeply. This actual one, just for fun, is X data in Alpine where you say like count equals one or whatever. If live wire morph dom or if um

HTMX, a lot of HTMX people hit this. If you try to like change that in your HTML to like count two, you know, just in the HTML, it does change the element, you know, on the page, but it doesn't change the actual data. under the hood. So like if you have something else reacting to count, it's not gonna update to count two because you changed the attribute. And so that's Sort of a bug'cause we We do recognize attribute changes in other circumstances, but you can imagine the ramifications here.

The one-shot solution was like, oh, we'll just tear down X data and and init the whole tree back up when it changes. It's like, no, that's a bad solution. And I have a lot of reasons why, because this other process I did explored that and found all sorts of problems with.

Designing the OTAT Workflow Tool

Um the other the the actual solution is very different. Okay. So how do we turn this into a tool? Because I think this is what I want. I think I want a tool where I think core, a core part of this. is these this series of markdown files. So I think that's maybe what it is, is you have workflows like And all they are is a series of markdown files, right? And they start with numbers. So like number one underscore something, right?

Yeah. Okay. Yep. And this is in a workflow called, you know, in this case review PR. Okay. So when this workflow is run, when you run this with our script or whatever our tool is, so we say like run workflow, I don't know, run.sh, I want it to create an empty directory. C D into it. and loop these prompts with Claude. Right in right in the folder, right?

That's that's it, right? Is it as simple as that? How is this novel? Um, I don't know. And maybe you could have these folders of prompt lists. Um Hmm. Yeah, I'm just thinking. I'm trying to think here. So maybe you have this folder of prompt lists.

Anywhere you want and you can run this. It could even be a global shell script and it will always just create that temporary directory and all that stuff. I really like it being a throw like all of it being in a throwaway directory and not like doing it. The current directory that you're working in. Um maybe I won't like that eventually, but it just feels really good to be able to just like spawn a task.

Um so yeah. I don't know. I mean I guess it it would be really this that's the simplest version, what I just described. But then but there's like some management involved, you know? Like you want You want um What do you want? You want like feedback, you know. You wanna know if this thing's stalled. You wanna have timeouts, you know, for these scripts so that it doesn't just go on forever. Um you want timeouts and you want some feedback of like how it's doing with these steps.

And then I think you might also want, you know, parallelization so that some of these steps can be done in parallel. And I don't know the best way to do that. There's like I could think of a few ways to do it. Are we doing good so far?

Tool Architecture and Interface Concepts

Is that good so far? Yeah, it feels pretty good. It does feel pretty good, I would say, right? I don't know, where do you store these things? Are they in one global folder called like workflows? And I would have workflows for like every different project that I have. Um Or are they in you know, in repositories? So like does LiveWire have a dot workflows or something or prompts, maybe it's just a prompts directory and then each one is a folder. That, yeah. I mean, maybe...

Maybe it could be if you point it at a file, you could point it at a single markdown file and it will create the folder and do its thing. Um I think that is one really important piece is the cra I don't I don't it seems like I'm stuck on it but

I kinda am. It's like it has to create a folder. Um because I want it to have its own I don't want it to be influenced by the rest of the world and I want it to be able to just like, oh, use NPM and install some package and scour through its code base or create an entire layer of a lap inside of that folder that can reproduce a bug and simlink even another directory from a sibling folder or something, you know?

Um there's a question of logs, so like having you know output files of like the logs of these things. So maybe the directory that gets created has some of this. Okay. So you you call run dot SA. And you point it to one of these workflow folders. So run.sh workflow directory. Okay. And that's pretty much it. You just hit enter, then it creates a temporary directory. It CDs into it and it has a script where it's looping through these claud sessions. It's piping output into log.

Like it's streaming the JSON into log files so tail it, you know, and see that it's still working and going fine. Um And then yeah. Is there anything more than that? I don't know. I don't think so. Right? I think that's kind of the gist. Um Yeah. I don't know. It'd be nice in a perfect world there's like some sort of There's like an interface where we can spawn many of them at the same time and it can manage all of them at the same time. Um yeah. Which does feel pretty good.

Yeah. Like is that the dream tool? I don't know. It seems pretty dreamy. Like you run a command and you can just like search through your workflows and execute them, like skills kind of things. Um And then it like you know, has one running and then you can execute another one and maybe you can go into those sessions um

if you want or you can resume certain parts of other sessions or I don't know, you know? Yeah, I don't know. I don't know. Is this enough? Do we just start with this and see what happens?

Advanced Workflow Management and Parallelization

We start with this and see what happens. Maybe that's what we do. One kind of tricky bit about it is like every step, every markdown prompt that I have.

They're very short because I'm adopting this OTAP principle that I just invented and named OTAP and is definitely gonna catch on. Um This OTAP principle makes these prompts really simple, but at the beginning I always say look at like the this markdown file and this markdown file, and those are like specific file names that previous prompts have told the LLM to create and fill with. You know.

So that feels like something that could be systematized a little bit. Like every prompt has a corresponding output file, you know? And then each prompt is like maybe there's a there could be a coordinating file for each of these workflows. So it's like a workflow.md or something and Yeah. A workflow dot M D that basically lists out the order of them, when they should be sequenced, when they should be parallel.

what files they should go into. Maybe it's a chain, you know? That'd be kinda interesting, wouldn't it? Hmm. Hmm. Would it be interesting if it was like pipes, you know? Like you pipe one one thing out into another. Maybe each oh wouldn't that oh okay, I don't know. Every prompt. Outputs the next prompt. That's kind of interesting. I think it's a stupid idea, but it's fun to think about. Like, hmm.

Yeah. I mean, I guess that's kind of the goal, right? But no. Or maybe the workflow file, maybe the workflow file has all of the prompts in it. And That's like the coordinator session. Let's just go to coordinator agents for a second. So maybe there's this coordinator agent, okay, that consumes this workflow. and it knows to basically like split these out. It knows to generate

Hm. Maybe it wraps each prompt in like some extra decorating stuff so that those prompts output prompts with more context or something. I don't know. It's probably stupid. Um The coordinator thing could be interesting, but it's Like it might just be better to just have a simple series of tasks. Um Yeah. So I don't know. I don't know. The answering the questions being done in parallel and exploring potential solutions being done in parallel would be

Pretty cool. Like I would really like that. How does that work? So if we're doing this simple kind of route loop, it's going one after another. When it gets to something like potential solutions. It would be great if it could spawn them all in parallel and they could each answer their own questions. So is that an agent being spawned and going break?

break these questions out into X amount of prompts, spawn child sessions. So is it an agent doing the spawning of the parallel children, or is it the bash script spawning the parallel thing? Yeah. I think it has to be I think an agent would be best. Um I don't know. You know what I'm saying? Maybe we can worry about that later. Maybe we can just get a simple maybe I get a simple tool going that I can just and I could start with a test workflow that's just like say hello. Say hi back.

Ask what your name is, you know, and just test it with that and see if I can get it going. Um but yeah, so like what is the like kind of back to the experience of using this thing? Is it just

Finalizing Tool and Future Challenges

Is it a global script and you point it to a workflow directory and that's that? That feels kinda right to me. You know? I feel like that's right. I do And maybe there's you could have like a workflow.md file that has context for every single task that's like you are in this repository, you know. And maybe that's something that gets mutated over time. Oh, maybe would that be interesting if the prompt files, those workflow files that you have, if they get copied?

Into that new temporary folder, and then they modify themselves. So like you pipe the prompt. So, like, you know, the first one, let's say, reproduce the problem. It has its own prompt. And then when it's finished, it writes its own results to its own file. So then something else can review the results of that step. Seems kinda interesting, right? Maybe? I don't know. Is it unnecessary token usage if the prompt is too big? But the prompt's not going to be that big.

Hmm, I kinda like that idea. So it copies all the markdown files and like that's how context can be can build up and then there should be I think one markdown file that's like context that every step shares. Maybe that's what this whole thing is. Maybe each step is building up a single context file that every subsequent step is going to look at. That's interesting. So it's called context.md.

And it just stores like each each prompt is like, okay, do this thing now write to the context file like context.md. Write um, you know, a description of the problem that you found, whatever. And then the next step is like, okay, now write, you know. All the potential solutions maybe? Because some it's like the tricky part with that is some some of these things you might want to give context to the next. Or a one specific thing but not another one. I'm trying to think of like

I don't know. But maybe this is just the place to start. Maybe we just start with this. So again, oh yeah, we gotta come up with something like crystal clear to start with. We have a command, and what is this called? Is it called minions otap? Is it called I don't know. Uh I don't know what it's called. It's called do workflow stupid Toonst. Yeah. I don't know. It's called something. Right now it's called run.s.

Okay. So you fire this run dot sh, you pass it a folder that has a series of markdown files in it that are numbered and it will go through number by number and abide by those commands. And that'll be in a separate directory. They may write to a shared context file, and then when they're finished, they're finished, I guess. And that's the end of that.

Yeah. I don't know. And then of course we do need a manager and stuff for it. Um like there needs to be some global place, right, that like knows all of the workflows being run, I would think. Maybe maybe it's maybe it just scans the temporary directory for like folders with a certain prefix. Could be interesting. And then maybe they're just given, you know, session numbers. But I don't know. Maybe it's there's a work the workflow has a name. So Name.

So, all right, name and then ID. And that's a folder. And now this process can just scan the directories in that temporary directory to see which ones of those folders exist. And then I suppose Yeah. How would it know the status of one being like Done or not. That's where it's like, yeah, you kinda want like a global, like a management file, like a JSON file, but that type of stuff, yeah. No, I want the source of truth to be like the existence of the folder, you know? Hmm.

I don't know. I mean it could change its own folder name when it's done complete or something. Uh I don't know. Maybe I just start with this and see if I can uh this is where I get stuck. Like I'll build this. And then I'll like use it and be like, oh, I mean this is pretty useful, but I won't go the extra mile to like think through these little details that would make it like really, really clean. And I think one of the holes in this.

Questions, right? Decisions become easy when you gather more knowledge. The questions are: how to track workflow runs? Because I don't always want them cleaned up when they're done, you know?

Because I find myself like, okay, now I'll go review its pull request that it submitted. Oh, but I want something changed. So I'll CD into that directory. This is the really nice part about it, is I could CD into that directory, run claude resume, and basically hop back into any session from any one of those runs. And I do like that, so I don't really want them cleaned up until I tell it to clean up or until I like mark it as complete or cleaned up clean upable. Um

Okay, so that's a question. And another one is what? That's how to track a workflow, how to clean one up. Um Parallelization, you know, how to control that. It's like the simpler the better. I don't want to just invent like all of this ceremony around it, but I I feel like I do need some conventions here, you know? For like

Yeah. Which things are parallel and not I'm I might just have to start getting into it, you know? That's what I'm gonna do. I'm gonna go and like build a proof of concept. I'll be seeing ya. Thanks. I I don't know if this was interesting to you at all. I'll be seeing ya. Yebba debba doo. Stop it.

This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.
For the best experience, listen in Metacast app for iOS or Android