Welcome to the wild world of trial and error entrepreneurship. It's FounderQuest time! So Josh, are you ready for Christmas? These are shopping all done? Yeah, I think I got most of it done actually. I've got like one stop to make and it's probably the one I should have done first, which is seize candy. Oh yes. We'll see how that goes. I always save that to like the last week of Christmas. Yeah. But so far I've been, you know, I've always, I've always managed to make it work. Awesome.
We had a new seized candy open up right around the corner from our house. So that's kind of dangerous. Yeah. and so i have like all of my packages get shipped to our box at the ups store so i use that for you know for honey badger stuff i use that for my other business and i use it for personal things i don't want to go to my house like christmas presents and i thought i had it all perfectly planned out
all the packages showing up within a couple day window so I could just go and do one trip because it gets crazy at the UPS store at Christmas time. So I went there yesterday and somehow I miscounted the number of packages I was expecting. And as I was leaving, I think like within five minutes of my leaving the store, another package showed up. And so I had to go back for the second time. That's hilarious timing though. So you got the email. Okay. And then you went back. Nice.
All right. Well, hopefully that's it for you. Yeah. Yeah. I think I have everything now, but maybe now that you mentioned it, maybe a trip to the CU store would be a good idea. Yeah. It all, it usually is. Yeah. My mail story is our mailbox got. taken out when i was at rubyconf and we have this like ancient heavy duty custom steel like welded mailbox on the side of this
fairly busy road i think it's probably been there since the house which was built in 1965. and i have no idea how it got like thrown basically 12 feet down a hill with the concrete shoe ripped out of the ground by the way the best i can tell is like a truck hit it and decided to just drive off or something with a huge dent in the front of it because it's enough to destroy most vehicles so i'm really like confused what happened but anyway
we were without any mail delivery for a number of weeks and starting to ramp up into the holiday season. So I was having to like drive to the mailbox once a week and walk away with this like armful of packages basically. Right. We finally got it fixed last weekend. Nice. That's pretty impressive. It sounds substantial. Yeah. It's the kind of mailbox that I think someone put in.
as revenge for maybe their previous teenagers like taking the bat to the old mailbox it's the kind of mailbox like some old timer welds himself and puts in there to like yeah basically destroy anything that's going to run into it. So I guess it lasted long enough, but didn't last forever. that's awesome so did you get like a a fancy kind of mailbox like uh supporting a sports team or you know something like that or maybe a big flying fish or that this fish would be cool um no i we
So there's like some planned road construction that's going to affect our property front significantly next year. And so I opted just for the cheapest Lowe's special, or I think my handyman. bought it off of Amazon, I think. So it's like the cheapest mailbox you can find. But the plan is my wife wants to reuse the old one. It was sturdy enough that it didn't. There's no debt. There's not a.
scratch on it except for the rust. So we want to reuse it at some point, but I didn't want to go to the trouble of putting new concrete in. I mean, the thing weighs like probably 250 pounds. Yeah. It's currently stored on the side of my shed. And we'll deal with that once the construction's done. And in the meantime, hopefully no teenagers want to pull some shenanigans with the cheap mailbox because it's not going to stand up to anything. That's hilarious.
Well, I'm glad the mailbox survived and it could tell the tale. Yeah, it'll return. So I think I feel like we've both been using AI or whatever LLM tools. for a while now. I like everyone else, but I feel like maybe you've been using them like in your code editor or for code generation a little bit more than me. Probably because you've been writing a lot more code than I have been the past few years, to be honest. I'm stuck over here in marketing land, writing blog posts and stuff.
Not with AI. So I feel like you've been using AI code generation tools a little bit more than I have. I think we've both been using them along with everyone else. but yeah in fact kevin i think was the first among us to really start diving into these tools he was a big fan early on of diving into chat gpt and asking it for tips about what he was working on and i didn't love the idea because at the early time when it
first came out, I was like the old man shaking fist at cloud. I was like, I don't want AI stuff. But as he kept on raving about it, I was like, okay, I guess I should give this a shot. And so I hopped into chat GPT and I started asking things like I was interested in doing something in Golang.
And GoWing is not my first language. It's not the one I spend a lot of my time in, but we do have some Go code around here. And so from time to time, we do some Go projects. And I was curious about making a change to one of our Go projects that we have. And I wanted to use a... and memory cache.
for doing something to speed up some stuff right now we're going to redis a lot for one of our processes and we wanted to find a way to get that stuff into the process itself instead of having to go to redis and i didn't know what the landscape looked like for go for doing in-memory caches like i know
If you've got a Rails app, the Rails cache is right there. It's handy. So anyway, I asked ChatGPT, I'm like, hey, tell me about what options there are for Golang, for doing in-memory cache, and it has to support TTLs, time-limited keys.
and it gave me four options and it gave a little blurb about what each one was and like what its strengths and weaknesses were and then i dove down like okay well i'm interested in this kind of scenario which of these four would be the best one i should check out first and it gave me a recommendation based on you know its api and blah blah and so that was like super handy save me a morning's worth of research because it's like okay there's
four great candidates. And then I narrowed it down to one and I checked out that one and like, yeah, I love that. This will work. It was super helpful in doing that kind of research, but I never really got into the actual having it write code right away. I was just like asking questions. And then I got the idea. I was like, well, if it's got a bunch of.
good suggestions maybe i should just have it write some code for me so i said okay well i want to do this little task and i already knew how i would do it
But it would just be kind of tedious to write this code, just not terribly exciting code. And so I just asked it to write me some code. I think this was Ruby this time. And it gave me some... code and i'm like yeah that's good i can use that as a starting point and maybe it was like 90 of the way there but i did some edits and yeah it was kind of cool like just saved me a lot of time that way yeah so i'm curious before the code generation how did you find it compared to
What you used to do, like go to Google and start doing some, do a search and do research and go from there. Like, like how did it save you time and how did the experience compare to your old way of doing things? Yeah, I think the best thing was that it showed me...
I could like give a vague general description of what I wanted. Like, show me some caching libraries for Go. I knew I wanted that, but I didn't know what was available. And it knows, right? Because it's indexed the entire web or whatever. So I think like previously I would have gone to Google and I would have typed in.
go caching libraries and i would have probably found those same four suggestions and i would have gone through them one by one and i would okay what does this one do the thing that i want or how does the api look on this one and instead like it was able to shorten that research time down from i don't know an hour or two down to five or ten minutes yeah
yeah that makes sense yeah i found that that it can be good for like if you don't really know what you're talking about or like what you should be searching for because it's basically a pattern matching engine as far as i know i think that's how they work like they're trained on other content and they're going to like basically regurgitate that in in one way or the other i've even found it as a useful starting place for trying to figure out what
specifically i should be searching for if i'm using like a traditional search engine or something like that because like with a super new concept like i don't always know the language or what are the terms i should be looking for if i want to get to the deeper side of this whatever technology or whatever it is i can use it to jump ahead a little bit
without having to go read a bunch of stuff. In the past, I might have gone and read a whole book or something or a bunch of articles to get myself in the headspace of like, okay, now I'm starting to understand these concepts, these terms and things. And now I have the context that I need to go out and actually do further. research into this. So I found it's in some cases it helps me get a head start on that sort of process.
Yeah, yeah, I found it really does a good job of surfacing those things you don't know, like getting you out, I guess that to that next level of knowledge about a thing like you might know, okay, what is the deal with distributed systems right and they can give you some text and you're like oh i should go research the cap theorem or you know whatever based on what it gives you back yeah
So you started using it for co-generation a little bit more. I remember, I think I remember specifically that time when you like did your first, you had it basically like write some go tests, I think, or something like that. And it was just a pretty.
cool experience to have it just knock out some simple boilerplate code like that for you yeah i was early on to the co-pilot beta with github and i remember yeah i was in vs code one day and i was writing i had just written some stuff in our rails app and i had written a test
Because we do a lot of unit testing in our Rails app. And so I had written a test. And then the autocomplete suggestion in my editor was like three more tests. And they were exactly what I would have written. And so I was like.
tab complete yes thank you very much and they were all legit and you know it's like great i just saved myself i don't know five minutes of typing but like hey it's five minutes of typing i didn't have to do right and it was just boring stuff of testing this condition and testing that condition
So that I think really opened my eyes. Like, Hey, this could be kind of cool. I still didn't really trust it much. You know, I still read every line and I'm like, I don't know. But the thing that's kind of really nice about using the AI tools when it comes to code is that, you know, when you're done.
Does it work or not, right? It's provable, right? Does this thing do what it's supposed to do or does it not? Versus, I don't know, if you asked AI about why do I have this skin condition? I'm not a dermatologist. I have no expertise, right? So I have no idea if it's completely lying to me, you know, making stuff up. Yeah, it can't go reproduce the scientific research or something or do the study in quite the way that it can actually run the code or you can run the code.
to verify the results i think i agree with that for the simple cases i think and i will probably get into this a bit later but like there i think in the some of the more like complex or complicated bits of code i think that's where it starts to get a little hairy in terms of getting yourself just stuck or into the weeds a little bit
but i think the way i look at it is it can save a lot of time in the same way that like almost like snippet generation plugins for closing method definitions and other commenting out large like big blocks of text or code snippets or something. These things are patterns that are common that everyone needs to do quickly. And it's easy to like, it's easy to just build something that does that for you with this keyboard shortcut or something. And LLMs are a little bit fuzzier, but
They can do a lot of the same thing. It's like very smart, like autocomplete basically where. If you're writing some Go code and it's similar to Go code that all the other Go developers have written and they wrote these tests, then it's probably you're going to want to write these tests too. And it can fill in some of the specifics if it's simple enough. to like customize that to your specific context.
which is neat. Yeah, I found that like the very smart autocomplete was where I was found the most benefit for quite a while. Like I would just tap complete that. Thanks. It was right there. But until recently, I didn't really start a new thing.
with AI like from scratch or really ask it to do a whole lot. I wasn't doing the chat kind of base thing, which has gotten popular more recently with like the cursor editor where you have this like chat interface that's right in the thing. And I think co-pilot. added this over time as well, where now you can talk to the LLM and the editor. And I didn't really play that much until I started seeing people on Blue Sky talking about WindSurf.
which is yet another editor that comes along competes with cursor that basically this built around some notion that you want to have a conversation with the ai and let it do the driving basically while you supervise and agree you know accept or reject the code suggestions that makes
So I was seeing people talk about windsurf a lot and I was like, okay, I'll try this out. And so I didn't want to try it out and like a big existing project. I figured it might, I don't know, just have too much friction with my brain. Like I already, I know what I want to do. There's something to do it, you know, kind of thing. So what I.
did is like, well, I want to deploy this new Zookeeper cluster to our infrastructure. And I know I want to use Terraform. And I know exactly how I would do this if I was just doing it from scratch. But let me see what the AI can do. And so this is my first use case for Winsurf. I was like, I opened up a brand new empty folder. I'm like, all right. And I described what I wanted to keep a cluster and it's got to have an auto scaling group.
blah, blah, blah, blah. And my prompt is, I don't know, two or three paragraphs long because I know exactly what I want. And it just starts generating files. Okay, here's a variables file. And here's the main file. And here, you know, I don't know, three or four different files. And I was like, wow, I'm watching it like type in the windows of each tab.
And I started reading through it. I'm like, this is basically what I would have written. And so now it's going from saving me like five minutes of typing to save me like an hour of typing, right? Because the Terraform syntax is very...
well documented it knows exactly what it needs to write to give you an auto scaling group for example and an act instance and blah blah blah so it's like i didn't have to go and like oh what was that syntax for this particular thing that i haven't typed in you know a month or whatever it just did it all and now i've got like five
files in my editor five open tabs yes accept all that that's awesome you know and of course then i went from there on and i was like oh now i want this and now i want that and it's like this really cool interactive experience it's pretty fun Yeah. Yeah. That's really cool. You got me checking out windsurf recently too. And I've been experimenting with it and I had some similar results. I noticed like.
The way you describe that, it reminds me a little bit of the old way to do it would have been, I would have gone and I would have searched for examples of what I wanted to do. And maybe I would have landed on like if it was. specific enough there might be like a starter someone might have made like some kind of project starter or like you used to have rails kits for example which are like you know basically a starter kit for
doing a specific thing that a lot of people want to do. So I might've found if there was like a starter repo that had some examples of like, what do you name the variables file and where should it go? And all these basic. things that you will know once you're familiar with Terraform or what you're doing, or maybe you already do know, but it's just a lot of work to go and set that all up.
Again, the LLM seems to be doing that on the fly, but also customizing it to some specific things that you prompt it with. And after I had that Terraform success, I was like, okay, well, that was a blank folder. Let me try an actual. existing configuration and may add something to it and speaking of the context that you mentioned it can pull in context from other files in the project that you reference specifically or that are just there and then it starts creating this new stuff
that kind of matches your old stuff like using existing variables and things like that it just blew my mind yeah the thing that really impressed me about windsurf and i think cursor does this as well like they both can do the multi-file edits and that sort of thing but windsurf also can do like other operations in the workspace and it uses additional tools to inform its workflows. So part of the step that it would generate, like if you're asking it to do something really complicated.
it might go and run a terminal command to grep for some files that it thinks might exist and then use the output of that command to figure out where the files are that it needs to go and read. to make an edit or include in its extra context or something. It can do like additional operations, like exploring your actual workspace to inform the actual code changes that it's trying to make for you.
which i thought was interesting i don't know do you know if it can actually go can it go out and read documentation on the internet or is it all i think it's all local i don't think it can actually go like fetch a url can it well actually so we had the hack week that we often do near the end of the year and we just
pick a topic or some kind of fun code we want to write and so this year kevin and i decided we wanted to do a hack week where we're building a command line interface for honey badger so we want to because like you can use curl to send a deploy notification to our api and we provide a cli for a honey badger gem but we don't have a cli for every language like we don't have one for the go library or the python library so we thought well what if we built one
that was in Go, and then we could deploy that to every client, right? Because you could just grab the binary. And so I decided to try out Windsurf to start off this project. And I told it what I wanted. And it was basically like... I started with the deploy command and I said, based on the API endpoint documented at the URL for our documentation for deploys, give me a go command line.
thing that will record a deployment to that API. So it actually went and fetched, as far as I can tell, I went and fetched that page and saw what variables it needed to send, like what the API key name was, like the header name. And I looked at what the... URL structure was for deploys. And then it figured it out, right? It's like, okay, I need to create this kind of payload and I'm expecting this kind of response back. And it basically worked. It was pretty wild. That's pretty cool.
I'll have to I'm curious now because it's like implied to me in the past when I've been using it that it had gone and checked something but I don't know if you're familiar with perplexity it's like a search engine mixed with chat GPT basically where it goes you give it a prompt and it like improves your search like a search query based on your prompt then it goes and it does the search queries across google and probably some other search engines brings the results back in
summarizes the results and then answers your query based on that and aside from all the like ethical and just like information economy like what a disaster it is for content creators and all that i have found it pretty useful in terms of like, basically, again, it's doing what I would do if I had to go and read the first page of Google and read all the which.
let's be honest, is mostly AI slop in the first place. So like basically figure out which results have some sort of information that's useful to me and then pull that out. But I wonder like if these tools can do something along those lines where it goes and it actually like does the Googling for you in the background and pulls the information you're looking for directly into the context of what you're doing. That's an interesting idea.
obviously fraught with potential problems but yeah i've never seen windsurf justify what it's doing like it could have given me the terraform documentation urls for the things it was doing in that project but it never did i don't know yeah So that's what I was getting. So what I didn't mention is that so perplexity, like it includes the sources. That's the key part. So like when it makes a statement like, you know, some sort of, I mean.
I hate to say fact-based claim, but it is, you know, let's be honest, people are using this to like look up facts, which is a terrible idea, by the way. And even if you use it as a shorthand, like you need to go and you need to verify the receipts yourself, but.
perplexity at least gives you inline receipts links out to where it's getting the information where you can click through and then you can actually make sure like at least the page it's reading It's accurately summarizing that page, which by the way, might also be total bullshit, but you need to like go several levels deep to verify what you're getting.
But it does kind of let you back into that versus like having to do the entire process yourself. So windsurf doesn't give me that. Like it did, it'll, it might say like, okay, I'm going to go look up, I'm going to like check the docs, but you know.
i have very low trust of these systems and i know that they will tell you that they're reading the docs when they're not when they're just using whatever their training data is so i'm that's why i'm a little skeptical still i guess i need to go look at Windsurf's docs a little bit more or something and see what it's actually doing or capable of.
Yeah. Well, that is a concern for me. Where are these code things coming from? Where are these ideas that it's getting? For example, back to that Go CLI. I have not written a CLI with Go before, and so I've never done command line parsing in Go.
And so I have no idea. Right. But it just did it because I told it like what arguments I want to give. And so it understood that it needed to parse command line arguments. And so it brought in the Cobra dependency and it brought in the Viber dependency that I'd never heard of before. But these are. apparently popular in the go world because Kevin.
was reviewing my code well not my code it was mostly claude's code right because i was using the cloud engine but anyway he's reviewing the code and he said oh yeah i like that it chose to use cobra and viper because i think he said github cli uses those libraries as well and so
That immediately made me wonder, it's like, okay, so where was Claude copying this code from? Like, was it copying it from GitHub CLI? Are there copyright issues? Yeah, I've wondered the same thing before. Because obviously, it's not also copying the license. Right. Exactly. Yeah. When I did go to slap MIT license on the new repo that I built, I'm like, well, is this really copied by me? Yeah.
I think that's still an open question. Like there's a lot of legal things that are still being decided as far as I know in terms of, yeah, how that works. So yeah, open question, but the coolness factor is just, you can't deny it. Like after it was probably 30 minutes, I went from zero to having a working CLI that I was able to put out to GitHub and had tests because.
I asked it to write tests, and so I wrote them. And of course, it didn't get everything right the first time. That made some things wrong. And you talked about how Windsurf can actually do other commands. It'll prompt you to run the Go tests, right, as soon as it writes them. And you can even now, there's a new update, you can actually make it run the test automatically.
You can say, hey, that's a safe command. You can just go ahead and run that. And what it'll do when it runs the tests, if they fail, it actually interprets, looks at the output.
And it tries to figure out why the test failed. And then it's like, oh, I see. I messed up this. And it goes and edits the test. And then it runs the test again until they pass. And so it's like that loop that you just don't have to do because it's doing it for you. Literally, there was one time where I was like, I set it on a task and I just.
i'll tabbed away from that and went and did something else because i knew it was going to take a couple minutes for it to figure it out came back and it had figured it out right it's just wild so when it figures something like that out what's your next step do you just commit and be like Yep. Call it a day. Heck yeah. But there was a time or two when it got in this doom loop where it couldn't figure it out and it would go to edit the test and the test would fail and it'd go and try.
do a different edit and that, you know, so you can tell it's just spinning its wheels. And so I'm like, all right, stop, you know, let me get the keyboard back from you here for a second. Let me do it. It makes me feel like I'm working with a junior developer. I'm sitting side by side and that junior developer is at the keyboard and I'm telling him, okay, go do this.
he types that or does the research or whatever and it comes back to me with something that okay that kind of works but the design maybe isn't the best maybe you should do this and i tell that to windsurf and it's like oh good idea you know it's kind of conversant like that and then it comes back with a
So yeah, it's awesome. Yeah. I've noticed that like the simpler, the task, the better it is at getting it right to the point where I can quickly, like you said, like checking the work of a junior developer. maybe providing some minor feedback and having them go do it like it really does cut that loop down and obviously cuts a person out of the loop as well, which we might talk about at some point. But where I found that it's, I think I'm a little bit more skeptical of more complicated.
tasks or you know when you start to get out of the realm of a junior developer i think that's when you start to get into trouble potentially especially if i am not a subject matter expert if this is in a ruby or a rails app I'm much more, you know, I can work at a much higher level with these tools. I feel because I'm going to know a lot quicker if it's getting it horribly wrong or if it's really knocking it out of the park. But if it's in something that I'm unfamiliar with.
What it does looks really good. And if you don't have the expertise to know if it's actually good or not, I think it's very easy to have it just do a bunch of things. And you're like, if you're trusting it to the level of this thing actually knows what it's doing, which by the way, it doesn't. It's just doing what.
it thinks other people would do based on whatever the corpus of information it's trained on so it's very good at being confident we know that and so if you're not in a position to actually if you're not the person you would assign to do the code reviews for that actual person writing the code. You probably shouldn't be code reviewing your AI assistant is how I look at it.
But I think there's a risk of getting in the weeds where people start to believe these things actually know something that they don't or know more than them. And they really don't like they really need that review step. And as I've tried using Windsurf and. some other I've used cursor and some of the other tools recently.
I think my one complaint is that my day-to-day feels much more like just reviewing junior developer pull requests. Basically, that's the job when you let these things drive, which like... Isn't extremely fun to me as it's on its own. So I'm still thinking through that part of it, but where I really find them, where I really appreciate them is when they're assistive, when they make me faster at what I'm doing.
And I think there are actual cases where that's true, where you can actually get into kind of a flow of these things are actually like making me move faster or they're accelerating my ability to code. That feels great. Like it's a lot of fun at that point when you don't have to think about this boilerplate that you would have to type out, take a couple minutes to type out basically. And you can just be in the flow of actually writing the logic.
of the application or something like that yeah i totally get what you're saying like it can suck the joy out of the development work, because if all you're doing is reviewing what the AI is doing, then that's, that can be boring. But like you just said, but for the boring parts, it is fantastic. After I leveled up in my trust of windsurf.
And oh, by the way, like I signed up within two hours for a paid account after I did that Terraform stuff, because it was so freaking cool. Anyway, I ran out of my AI credits, but so. recently like this past week i decided to trust it a little bit more and use it for a little project inside of our main rails app that i've been working on so we're doing a pricing update which should come out in a few weeks and part of the pricing update is doing some changes in our ui
where you actually pick your plan. And I'm not a fan of working with UI stuff. I'm a backend kind of person. I don't really enjoy writing JavaScript and I don't love spending my time in HTML and CSS and that stuff. I can do it, but I don't like it. And so I had Winsurf do the UI for me. And first step was like, hey, I've got to.
update this page to look like what our designer gave us and so i actually i literally put like a screenshot this was chat gpt so this was like a couple weeks ago i did this part put a screenshot of what our designer did and i'm like okay give me a bootstrap version of this thing
And it did. It gave me the HTML and I'm like, bam, drop that in the project. And then this week with Windsurf, I'm like, okay, I got to make this real now. Like first it was just a mock-up and then it was HTML. Now I got to actually make the select boxes, change the amount and things like that. And so.
I asked Windsurf to do it. I'm like, all right, when I change these values, I want, and then here's my, I gave it some YAML in the project repo. I'm like, here's my new pricing and just make it work. And obviously the prompt was longer than that, but I did a fairly good job of generating a bunch of JavaScript. the right HTML structure. And I got a session of about, I don't know, an hour or so, worked through all the little corner cases.
Things that I had forgotten about, like, oh, OK, I see why you built that, but it's not quite right because it should be like this. And at one point, I even changed my data structure because it had written some code. in Ruby that was shoving the JSON into the view. And it had written three or so methods in Ruby to transform what I had in YAML into something that...
the JavaScript that it had written could use. And like, I know this is bogus. Let's just change the YAML, right? So that the JavaScript can use it. And so it did that for me. I'm like, this is awesome. So it wasn't just a junior developer at that point. It was actually. almost like a co-developer working, but I had deep knowledge of this particular domain, like our app. And so I could intelligently understand what it was doing and yeah, better prompt it.
It's also a great reason to use a boring frameworks like bootstrap because there's so many bootstrap examples and starter kits and whatever themes out there that these things I'm sure have been trained on. So it's very good. Like they are.
actually very good at generating things like bootstrap or like you said terraform configs config languages like all that kind of stuff it's like well anything that has an abundance of examples it's going to be better at i think because there's just more input that it has
I know Kevin's going to hate this because he's a little bit more fan of more like bespoke front end CSS and all that. But like the more you use like established frameworks that obviously draw boxes around you and like what you can do, but it just. makes these tools better able to assist you with these mundane tasks and i don't know if that's a good thing or a bad thing to be honest but i guess we're all going to find out
But as you said, Mike, if there's a lot of good examples out there, it's great. And I think that's one reason why it's really good at generating Go code, because it all looks the same. typically one way to do something and go right versus ruby which is you know more open-ended and go has like a very large standard library a lot of people just use that there's not as many dependencies i think i mean there are dependencies but i think people try to limit their dependencies which i'm sure helps yep
But you found that in maybe with Elixir and doing some interesting stuff there, maybe it's not as great of an assistant. Well, I was going to say, so I think like the key to using these tools is to know when to use them and when to like. know when the task is actually achievable by an LLM and when it's not. And I think probably the junior developer test is the right way to think about it. But it's easy to get yourself into trouble and waste a bunch of time if you don't think about that up front.
So I found myself like, as I'm using these things, because they are so legitimately impressive, you want to ask it everything. And sometimes it even gets it right. saves you a bunch of time but i found like the more complex the thing i'm trying to get it to do the more likely it is this take me down like a rabbit hole and waste time the other day i was updating our elixir package for honey badger which is our like client library package and
I noticed that we weren't testing on the latest version of Elixir, which is 1.17. And so we were a few versions behind in our CI. So I created a PR to like just.
bump the github actions versions that we were testing against and of course that surfaced a couple of test failures that our tests were apparently not passing on the latest version of elixir which is not great by itself but i was compelled to go and investigate i have not done any elixir i haven't touched elixir in probably a year or two so i'm a little outdated in my elixir knowledge so since i'm testing windsurfer i was like okay well let's see what it can do
to solve this problem so i went i looked at the tests the two tests that were failing were in our logger implementation which basically like listens for events some various events and elixir and erlang and then like pulls out some data and reports it to honey badger so if there's like an error in the log for example it'll report the error to honey badger with some metadata that's included and the test failure was that like elixir has like
pattern matching if it you pass an argument to a specific function it will only match that function if it matches the actual like structure of the data that is in the argument. So you can actually tell it what the shape of the argument should look like for this method to be a valid method. And it was failing on Elixir 117. but it was passing on Elixir 115 or whatever the older version was. So I copied the test output and I just basically asked, it was Claude.
that windsurf was using i asked it like why does this test pass on elixir 115 but fail on 117 And so it went and did its analysis and it found the test files. I just basically like opened the workspace. I like did windsurf dot in the terminal, open the workspace and asked it. So it finds the test file. It finds the source file. It reads them both. It spoke very confidently about the differences between Elixir 115 and 117.
and kind of explained a little bit of the issue 117 changed the structure of data that these functions are handling. And then it went and it actually suggested a change and made the change. It didn't run the test for me, but it gave me the diff to review and I accepted it.
it fixed the test and i'm coming at this just fresh i like i haven't thought about even the like i had to go ask normal clod like to remind me what some of the data types in elixir are mind you like because i'm just like my brain if i'm not working with it all the time daily i just you know i basically let that information go So I'm pretty amazed at this point that it actually fixed this very complicated looking test because to me, it's like something changed in the core in the language.
between these events, because these are core Elixir events that we're relying on. It was like an error logger. It's like the metadata in the error logger events that are coming from, I think, GenServer, one of the Elixir beam things. And so to me, that's like... You know, that's a very deep issue. And so I was like, cool. And I think I like sent the screenshot to you on Slack. And I was like, oh my God, look at what Windsurf did. So this turned into like a pretty long conversation.
with Claude in the Windsurf chat panel. So I was like, wouldn't it be cool if I could, I want to basically like copy and paste this entire thing for context into a commit message if I'm actually going to use any of this code because I want to like have it, I want to have a trail of what the decision. process was in case it's wrong and we find out that we made a horrible mistake.
so i was like well maybe i could just ask it to summarize this conversation and write a commit message and so it did it wrote like a three paragraph summary of the conversation which was accurate of the conversation like it accurately summarized the conversation we had
So I ended up, I committed that by the way, I think like, I actually like that pattern of if you do need to have it summarize what it just did. I think that's useful, but I would never just commit that verbatim as a commit message. I think it's always important to like. say when this is something that was generated because your future self will thank you for that. I think that's just I have a feeling. But anyway, I got to the point of looking at it in an actual like PR supporting.
elixir 117 basically but like something just didn't feel right and as i was like looking i was just reviewing more of spending more time just looking at like the changes that this thing had actually made and as like i started to wrap my head around the problem more because start with the actual debugging that would have led to the solution. I just let this thing do the thinking for me. And then I had to back myself into
understanding what the problem was. And because this was a particularly complicated problem, it took me a while. But as I'm looking at it, I'm like, this thing, the fix is basically loosening the pattern matching constraints. So being more accepting in the data that can basically like activate this function. So of course that is going to, that's going to cause fewer inconsistencies.
across small changes in the data structure right because it's going to accept more like broader types of events but you know was there a reason we were being so strict in the first place And the more I looked at that, the more I realized, like I went back and I read through the transcript of what it was saying it was doing and that sort of thing. I realized like, as it was confidently saying, okay, I'm going to check the, like it even said, I'm going to check the Elixir release notes.
and verify like what the difference was. But then mysteriously, it didn't actually say what the difference was. It just then went right into the fix, which was just the pattern matching constraint. I realized like this thing is basically regurgitating. the prompt that I initially gave it. Like I told it there was a difference between these two versions and I told it where to look and obviously like it can figure out, oh, if I go and I like, if I am more accepting in like the types of data that.
Well, you know, I can actually fix this, but it did not do the job of debugging the root cause and it did not actually know what the bug was.
because it doesn't have the ability to actually do that like it doesn't know anything by itself so at that point i'm like okay i'm gonna like revert all of this and i'm gonna start from the beginning and i'm going to actually go and i did some good old fashioned puts debugging where I actually inspected what the actual events were that were the data that was being passed into this function.
And I then diffed that data between Elixir 115 and 117. And I realized that there was an extra list item added. to a random, like one of the lists that we were pattern matching on. So the fix was really just, we just needed some additional method. I forget what it's called an elixir now.
I should probably go ask Claude, but like you can have multiple methods that have the same name, but match on, I think it's function clauses. So I basically, I needed to add two more function clauses that support the event for Elixir 117. while keeping the ones that still would match for 115. So the fix was actually much, it was much more specific and it was much simpler.
to be honest and on top of that i was able to document like okay this is the diff this is what changed like i still i i want to actually go and do a little more research to figure out like what actually changed that added this list item like i might need to go ask some I don't know someone with much more knowledge of the release notes.
to like actually figure out like what was added because i know they made changes like to their error handling and some of their error logging so i assume it's in there but it didn't say we added this specific element to this specific list in this event But that said, I was able to document like this is what it looks like on this version. This is what it looks like on the new version and commit that to the.
project basically so that in the future if anyone comes back to this like it's very clear like what changed and like what our solution was and what the reasoning was and yeah if we had gone with the clod version we would have ended up with a very like subtle bugs that just things that would have broken, I think. And it would have been very hard to like trace back if we hadn't explicitly said like this was written by an AI. I think a future developer is it's going to, you know.
It not only did it waste my time, but I suspect this would have wasted hours of time for someone else coming back to it in the future. So I think that's a cautionary tale of like how it can go wrong when you use these tools.
for a problem that they're just not suited to solve yeah i think it goes back to your point when you're saying about it can get to the point where all you're doing is reviewing prs right that's where you caught this it's okay this looks good as a pr and then as you're reviewing the pr like wait a minute I'm not so sure about this. And you use your senior level expertise. Be like, okay, there's something that's not right here. Let me go and dig in a bit more. And, oh, yep, I can see.
how this would have introduced some subtle bugs. And yeah, yes, it's a fix, but it's a problematic fix. And here's why. Yeah. Yeah. It's as if you had the junior developer go and maybe spend days working on the problem and still coming back with just an uninformed fix. It seems reasonable. that oh i'll just like make the constraints wider and i can fix it that way
But yeah, that was not the answer in this case. And it was like, there were all the issues, the list of issues that you would come up with of why that's a bad idea that were left out. And in my case, it's like... I wouldn't have put a junior developer on this issue in the first place because I know that I need someone who understands the context and nuance of this specific library and also has the senior developer just gut.
feeling of like, is this a good idea or not? And should I spend days going down this path in the first place? So it condensed that a little bit for me but i would never have done that in the first place so it ended up wasting probably an hour of my time when i should have realized this is not a good fit for this tool and i should start with the traditional
debugging and use my senior developer skills. And maybe I can use the LLM to assist me in remembering what is a tuple or something like that, because I know what it is. I've like. stashed it away somewhere and i just need my memory refreshed so the yeah so if the question is is ai going to replace all programmers the answer today is no right no
You would not want to replace all your senior developers with junior developers and let them just commit whatever seems like a good idea at the time, right? There's still a benefit to having some expertise behind the wheel and reviewing things. And over time, maybe they'll get better. Maybe they won't. I think we're early days.
in AI world, especially with LLMs, and we don't know exactly where they're going to end up, it might turn out that this is it, right? This is as good as this particular... approach is going to go right it's going to get and we have to try a different thing with ai and maybe the opposite issue maybe it'll be super advanced in six months who knows but
I think for now, you can't say, well, I'm a non-technical founder and I want the AI to build me a SaaS app, right? No, you're not going to have a good time with that, right? You're going to have some problems. But the same token, if you are a senior developer and... or not.
investigating these tools, you're probably cutting off some benefits that you could have. You could gain some time back in your life, for sure, having the AI help you with things that are lower level and just don't require a whole lot of brain work. Yeah. Yeah. I think like understanding how they work and knowing when to use them is the key. And I will say like, I think I'm generally a little bit more skeptical than you are when it comes to these tools, but I think like, I don't know, just.
based on what i understand of how they actually work i'm skeptical that it's going to get much better like what i've seen and especially even using this windsurf example i have seen the tooling improve drastically like you know six months ago or whatever i like i wasn't even aware that there was multi-file editing and i know i'm like i've been late to the game so i know cursor existed longer than that i do feel like people have been discovering the multi-file editing and
more advanced workflow stuff recently and while yes it is cool like i don't see the underlying technology that it's using advancing at quite the same rate like it was in the beginning I mean, obviously, I think Claude has gotten better. But from the start, I think it's always been junior level problems that these things have been useful for.
The way that the technology works just seems like it's limited to that kind of use case, like where if you don't actually know enough to review the work that it does, you're going to end up with problems and. Maybe I'll be wrong in the future, but I can't imagine in the future where that flips. So far, it hasn't flipped. People keep saying it's going to flip.
But yeah, I'm like, if anything, I think there's a big risk of people starting to rely on this, starting to think that they can trust the expert output of these things without having the ability to review it or verify it.
because i think like we we could have fixed this bug like the elixir bug and then we could have fixed the next one and the next one and the next one and we would end up with a library that is just a bunch of guesses at various things with no history or no shared knowledge no one on the team has you have the person at least on your team if you did
if you put someone that they went and they like worked on this issue for a week or something even if it's a junior developer at least that junior developer exists and has that context in their head now and can contribute that back to the team in the future
So far, these things like maybe their memories will increase a little bit, but so that would be useful. But, you know, you lose all of that decision making process. Not that you had it in the first place because these things are black boxes. So. You really don't know what it's doing and no one knows what it's doing. And so that's why I just see you need to be a level above it, at least at all times. Yeah. Agreed.
But I definitely think that people should check them out if you haven't yet. And you don't even have to switch editors now. Just this week, GitHub announced that Copilot has been updated. Now it's free to everyone. So you can use it. You don't even have to pay. And they have... the VS Code.
has the ability to edit multiple files and stuff like these new editors have done so i don't know if github will end up kneecapping cursor and windsurf or if they'll do something else that's interesting but yeah definitely check it out and have some fun with it like throw a new project
at it and see how it does play with it for a bit. You might become like me and say, hey, I'm going to use it for all my stuff. Or you may become like Josh and be like, I'm never going to use that again because it sucks. Yeah. And I'd love to hear from some junior developers who are using these tools if they are. Because I think that's still an open question for me. Like I think these tools could be useful because everyone's at a different level of learning. So.
There are junior developers. You got to start somewhere. I think these tools could be useful to junior developers in helping them learn faster as well. It's just, we need to be teaching. that from the start when to use them and as long as people are using them at the right time and not trusting them too much i think that could be a useful tool so i guess if there are any juniors listening out there i'd love to hear what your experiences are
with AI. And maybe we can talk about that more in the future. Sounds cool. Cool. Well, this has been a good chat, a long chat. Anyway, this has been FounderQuest. You can find us at FounderQuestPodcast.com. And yeah, let us know what you think of AI. FounderQuest is a weekly podcast by the founders of Ani Badger. Zero instrumentation, 360-degree coverage of errors, outages, and service degradations for your web apps.
If you have a web app, you need it. Available at honeybadger.io. Want more from the founders? Go to founderquestpodcast.com. That's one word. where you can access our huge back catalog of episodes. FounderQuest is available on iTunes, Spotify, and other purveyors of fine podcasts. We'll see you next week.