So one of the things about the Vibe coding is we didn't talk about day one and Vibe coding implications and complexity. We could do a vibe coding part two, which is just us talking about brownfield, greenfield. What's your goal? What's the difference between you're doing vibe coding as a hobby versus vibe coding for the enterprise versus vibe coding for startup. Let's do it right now.
episodes about how much time I spent not paying attention to you. That's true. Turnabout's fair play. So in our last episode, we vibe coded without realizing it, right? And this idea that one can write code with AI. and not really worry about the results or the correctness, but rather worry about it works. We wanted to make a fractal shader for Windows Terminal with minimal, if not zero, experience with any of those things. And we achieved it in less than 15 minutes.
It's fun. How long do you think it would have taken you to do that? in real world. It would have taken me four to six hours and I would have ended up eight hours. I think you're probably right. I would have taken about that one at that time. Because I would need to... First, I would chop that problem. So the problem was make a fractal shader using the HLSL.
HLSL language, and then figure out how to do it in Windows Terminal. So these HLSL shaders are high-level shader language that you use to do programmable shaders in DirectX. It's a proprietary language, but it's very well known. There's also GLSL. Gamer people, people who write games and stuff like that, make shaders all day. They could just type that stuff from scratch. But we wanted to do a fractal, so I would go and find some fractal generating math.
translated into HLSL or find an existing chunk. Then there was the Windows terminal. scaffolding that needed to go around it. Windows Terminal specific pound defines and stuff like that. And Windows Terminal doesn't give you a lot of debugging options. It just goes, nope, that failed to compile, and it gives you a line number. So then I would have probably built a test harness or found some DirectX test harness. It would have been tedious.
Were you surprised how fast we are able to get that working? No. In fact, that's why I suggested just give the file and tell it to do this. I thought there was an 80% chance it would work the first time. Did you think it needed the previous file? We worked from one that had no content. The one that we did was not anything close to what we wanted. Maybe not, but I thought... If the more information you can give it to get it on track, the more reliable, the more likely you can succeed, I think.
So we could have said, just go create one from scratch. I think that I would have estimated the chance of success at lower, not zero, but not 80 to 90, which is what we got. And I actually expected it to get the fractal design wrong. Or I kind of expected maybe it's just going to show a static fractal, not a moving, shifting one, or that it would use this fixed set of fixed colors instead of the dynamic colors that it was using.
It actually achieved everything that I would have hoped for in one shot. And to be clear, I think, you know, as our four to six hours of work would have been like, okay, so here's, I've got a basic fractal. Now let me make it zoom in and out and pan. And how do you do that? Oh, now how do you make it change colors? I imagine that's the process that I would have got.
Now, we edited it just a little bit because I picked the wrong model before. You said pick Claude 3.7, and I picked Claude 3.7 thinking, and it did not work on the first try. Why do you think that was? I think... The thinking models are, you know, actually it's, I'm not sure. Because most of the model makers tell you that the thinking models are better at coding. So theoretically, it should have done a better job.
And maybe we just got unlucky on that one and lucky on the non-thinking prompt, because that's another aspect of this. It's completely non-deterministic. Who's to say that we didn't happen on the non-thinking prompt where it achieved everything we wanted 100% the first time?
that that was a 10% likelihood given that same prompt. And if we did it 10 more times or nine more times, it'd fail some way at every one of those or not do everything that it did in that one. You can't know. And that's actually one of the aspects of this. And what some people say when you vibe code is ask the model multiple times to do it.
you know, get a comparison. Pick the one that vibes the most. And then token limits is also a consideration, right? That's true. And we might have hit a GitHub or Claude model token limit. And actually, this leads us into what is VibeCoding good for and what it's not kind of discussion.
Because vibe coding, when you're giving it a greenfield, like you said, hey, let's just ask it without giving it the file. But a lot of coding for real projects isn't start from scratch. It's take this existing thing. and update it or be inspired by it like we did, give it the file. But what if the project is more complicated and...
We were working with some shader that actually had, and we wanted to tweak it, and it had 10 files, each one of 1,000 lines of code, because it was really complicated. and we wanted to tweak parts of it. And so giving it the whole thing and saying, go tweak these parts of it, A, you probably won't be able to fit them all in the context.
So what about this? We see Meta particularly, it really loves their big contexts, and they're saying, yeah, a million contexts, we can go this far, we can go this big for contexts. Is that... Is that a solution? Is just like unlimited context windows a solution? You just feed your entire brownfield enterprise app into a giant context window? I don't think that's the answer either, because these models have limitations with large context. We talked about this in our Responsible AI talk.
this needle in the haystack, which is... can the model really reason over these very large contexts? And by reason over, in the context of code, it would be, can I go and look and understand these lines are the ones that are relevant to what I'm being asked to do versus these lines over here?
especially working over a large project. These functions are related to each other and in direct ways. But to modify this top-level function, I need to go four levels deep down this path and modify this thing and return a parameter all the way up. That kind of stuff requires really understanding the full context.
reasoning and things you need to be aware of when you're using those models for reasoning. We talked about this Eureka benchmark that Microsoft has come up with, which are, I stopped doing email, which stress the capabilities of frontier models. The benchmarks are generally designed to... not be fully achievable or where the complexity can be increased to test where the limits of the models are. And one of them is a needle in a haystack type test where you give the model a big chunk of tech.
And there's two pieces of information that are related to each other. And they're far apart from each other. You can place them close together. Like Mary was sitting in her living room and her living room had a marble floor. Mary's living room had a marble floor. So those are the two pieces of information. And if you put them in a large context of other text, and in the middle of it, you put those two together and say, what was the floor of Mary standing on made of? Models even...
You know, even the simplest ones that can handle the context will say marble floor. Mary's was standing on a marble floor because they're close to each other. but separate them and put one at the beginning and one at the end of a very large context. That was my question. And it's got to connect them and then say, what was the floor that Mary was standing on?
And even these frontier models will oftentimes fail at context window sizes that are much, much smaller than a million tokens or 128,000 tokens. just at 32K tokens or even 8K tokens, they can fail at that test. And so if they can fail at that test in language, they'll fail at that test in coding as well. Right, right, right.
I remember that there was a lot about the distance between the tokens, but then there's also, is there not a recency bias towards things that happen later, rather, in the context window? There was a paper that came out, I think, last year called Lost in the Middle. And it was specifically talking about these long contexts in the Haytack test. And what they found is that information in the middle of the context... models tend to ignore.
explicitly look at it like they do information at the beginning of the context or the end of the context. Interesting. Okay. One of the things I really liked about the way that we were doing it also, and I think it's a feature of GitHub Copilot, was that it was starting to call out pieces of text that it found that it attributed to other humans.
And it said, this was on Stack Overflow in this question. And that was on this chunk of stuff over at GitHub and then showed me the license. And we were fortunate in our Vibe coding that we found licenses that were compatible with our goal.
Early criticisms of Copilot had indicated that it's out there just kind of in the world randomly grabbing stuff and anything it could see is appropriately... Well, actually, we debunked that right away. We did. It's only public repos with appropriate licenses. It's only public repos, but the license still matters. Yeah, and it's licenses that permit.
copying like this. Exactly. So we found code that allowed us to achieve our goal from a couple of different folks, and we were able to ask Claude to generate a license, excuse me, a readme file that attributed those people, said, hey, good job.
and then linked to the place where it found the code. And that made me feel better about... Yeah. But you know, I mean, just from a, hey, people are getting credit where it's due. And maybe you can also go look and see what's the context of this code if you're digging deeper. I want to mention, too, in addition to debunking the myth that Microsoft was just going and GitHub was going and just grabbing any old code. regardless of license or whether it's public or private. We also...
Microsoft will indemnify anybody if they get involved in a copyright dispute over that content that was generated by GitHub Copilot. Yeah, I've had a couple of times GitHub Copilot start telling me that like... It's writing, and then it goes, nope, this is too much of this, it looks like.
It doesn't want to copy paste. Yeah. Right. It'll synthesize. So let me ask you this. We said, how could we have done this before? We would have probably, I would have looked up the math and looked up how to learn the HLSL somewhere in the middle between. learning it completely, and I don't know if I want to burn a Saturday learning HLSL, and Vibe coding, which achieved in 15 minutes, is a middle ground somewhere, which is Franken coding.
which is Googling around, opening up 42 tabs. We might have been able to do it in two hours, three hours, instead of four to six, or less than 15 minutes. paste it all together, we still wouldn't understand fully how it worked. And then when we were done, we would close the browser triumphantly, throwing the 42 tabs into the abyss and also would have achieved the same goal. How often do you do that kind of coding? A lot. And actually, that's Stack Overflow coding, right? Yeah, yeah.
Even at companies like Microsoft, it's just like, oh, I found this on Stack Overflow. Slap. Exactly. Just directly, directly into production. Yeah. What is that? So that's not vibe coding. What is that? I called it Franken coding. You got to be better. That's not like that. No, man. All right. All right. I'll come up with it. Actually, I'll have AI come up with a better idea. Are you going to be the one that invents that? Yeah.
I mean, maybe it's just called stack overflow coding. Everybody knows what that means. Yeah, Stack Overflow coding is probably it. Yeah. It's not as cool as vibe coding, but yes. Okay, so then... Vibe coding has a place where, when you want to get things done, when you want to prototype quickly, I don't think I would want the FAA vibe coding. I don't think I want Enterprises vibe coding. No. In fact, Enterprises probably don't want their coders vibe coding either.
as much as the productivity benefits of it. And like I said, we talked about some of these things in our last one of our podcasts a couple of episodes ago. Yep. Systems thinking one. Yep. But let's just start with... some of the basics here what we did was vibe code of v1 That's a good point. And it was a simple, relatively simple project, A. So, you know, getting past the, hey, the complexity limits of current AI.
We created a V1 and... for real projects like enterprise projects it's not just a v1 it's uh It's a V1 that needs to be iterated into V1.1, V1.2, 2.0 as requirements change. And maybe the specification gets clearer, the requirements get clearer as you start to use the V1 or the V, actually enterprise, this would have been a V dot whatever, two or whatever.
And then at that point, you go to the AI and say, well, actually, the fractals need to be... this and you need to move it like this rate and this speed and this is how they need to zoom in and out and the user says something stop it freeze it you know things like that i'm just making up enhancements here But the models then have been shown to struggle with. Brownfield with taking existing code base and knowing how to modify them. In my own personal experience,
again, ignoring the scale problems we talked about, just it goes as it goes in the context, is I've had it, and I'd use as much AI-assisted coding, not purely Vibe coding, but AI-assisted coding as possible. And if you give it a big chunk of code and say, I need to change the code so that it does these things. which mean modifications or enhancements. Many times, it just loses track of state, and part of it is updated, part of it's not. It forgets to update a function.
Or it actually refactors something and leaves out. functionality that's unrelated but still important. And so I end up with a function that's missing things that were actually... And you end up in this loop with the model like, wait a minute, it's broken now. Oh, I know what happened. I think everybody that does AI coding is like the... oh, I see. I'm sorry. And then here is the fix. Then you're like, oh, okay, awesome. It fixed it. And you drop that in, run it, and it's not fixed.
It says that it fixed the thing that you pointed out, and it literally did not. And then you're like, wait a minute, you didn't fix it. And it's like, oh, I see what happened. You're right. It's so funny. I wish I understood what it was about the model. that made it go, oh, right. Oh, you're right. I should have seen that. I'm so silly. Yeah. And then you end up in this frustrating loop. But the point here that I'm trying to make is that you get down this vibe coding route.
we end up in a situation where actually the model just can't figure out how to do it. As much as you prompt it, as much as you give it, hey, you didn't look at it. No, it's not fixed. Here's what's going on. And it's like, oh, I get it. And it tries and it thinks it's fixing it and it's not. Yeah. Is at that point, what do you do? This isn't a hobby thing at that point, you know, in this scenario. It's not like, oh, you know what? Oh, what's funny is that...
You've used the thing to generate greenfield stuff, and now it instantly becomes brownfield, and you've inherited a crappy code base by a mediocre programmer. That's right, exactly. And now you're responsible for maintaining and upgrading it, and nobody gives a crap. that you used AI and now don't understand it. Like your job is to go and deliver a feature X and like, you better figure out how to do it. At some point, that might require you to actually go spend that four to six hours to learn.
whatever, HCL, HLS, yes, whatever. HLSL. There was a statistic that I thought was not a good statistic that Google came out with recently that was something like, 25% of our code is AI generated. And they're like, okay, well, first, lines of code is a lie. Lines of code is just a number. But also, does that mean committed and in the production code? Or does that just mean code that's up for review?
like there's so much to be said in that well there's also like take a look at auto like the auto you know co-pilot auto completes yeah I don't count those as code I wrote because I didn't... Yeah, but that's one thing that you can do is say, oh, the number of characters that were actually... how much code is getting generated versus how much is being accepted.
Yeah. Or how much, I mean, it can even be how much is accepted, but the autocompletes are like, you know, oh, it completed this for loop that's obvious. Right, right, right. I mean, that's assisted code. That's not... purely like, hey, it wrote the whole thing. When you say it's generated by AI, people imagine, oh, I just told it, create this cool program. And with all these bells and whistles and the AI... generated 25% of it. It did it incrementally, character by character.
I do wonder how far away we are to... That scene that I love so much in Star Trek 4 when Scotty was like, computer, computer. You know what I mean? Well, this, again, we talked about this one. I don't think we're anywhere close to that because of the context limits. And by the way, when it comes to real enterprise code, you know, this vibe coding thing. Yeah. No, it didn't. High level H. High level shader language. High level shader language. Yeah. Say it five times fast. Exactly.
For that, we didn't care if it actually, and we wouldn't have cared, it didn't, but we wouldn't have cared if it imported some packages and said, I'm going to import these packages to help me do the fractals. Yeah. We'd have been like, cool, import them. But if you're in an enterprise, you do care about what packages it's importing. Yep. Because those packages could have vulnerabilities in them.
100%. Or they could be a package that is like, it's the monstrous, you know, I do fractals and I do 50 billion other things package. Yep. And you really just don't want to carry all that baggage with you. You want to just carry the... There's another package that just does fractals and does them awesome. So the AI is doing things like making decisions like that, that you're now accountable for.
And then the other one, and I've experienced this one with, even with ChatGPT and OpenAI, which relates to cutoffs and how familiar the model is with packages and APIs version is. you say, hey, go write, use this API, the OpenAI chat API, to call a model. and it ends up using a version of the OpenAI. API that's last years and has been deprecated. And it's because the model A either doesn't know about the new API because it was trained prior to the new API showing up.
Or it's been trained. There's so many examples of the V1 API that it's like, oh, I'm going to use V1 because I know V1 really well. And it should be using V2 because that's... the new API that has security features, but it doesn't know it that well, or it doesn't know it at all. Well, that's context. So we talk about context windows, but we don't talk about the context of working for an enterprise for 20 years and knowing where all the bodies are buried.
and why this API did this. It can only see the text that you feed it, and not the prior art and the campfire stories that are told in large enterprises. So anyway, I believe, given the architectures of these models, given the context windows and the limitations on context windows, which are inherent in transform models, unless something dramatically changes. And even then, you've got to worry about how do I make sure the model knows what are the current versions of the APIs?
the correct versions of the APIs, which packages I'm allowed to import versus my company's approved versus I'm using a no open source thing that now I'm accountable for. All of those things. to me, mean that AI is not going to completely take over and you need people that know what they're doing to own and maintain the code bases. So expert programming is maybe there's less of it.
Because, you know, the line of business web front end app is, you know, can be more guardrailed, but for more complex projects. You've got to see the big picture. But I think also agents are going to make things easier. If I had, for example, an agent where... The success metric of that fractal thing could be an agent looking at it, and I could let it iterate, and I could go to lunch. Yeah. I mean, that'll help too. That'll help a lot.
There's one other thing too I didn't mention is specification. Like you and I have some idea in our head what we think we want. Some idea. Yeah, we just said, hey, let's see what it has. Oh, that looks good. That was our specification, yeah, which was mostly vibes. Yeah. And you know that when an enterprise comes and says, we want an app that does X, Y, and Z.
that you need to fully spec the thing, or at least spec it like 80%, and then 20% is going to emerge as you start to write it. And so with AI, you need to, it doesn't... alleviate you of having to give it the spec which is work by itself and it doesn't alleviate you from now having that 20% of the spec that you didn't articulate being...
brownfield requirements on top of what the AI generates as V1 or whatever. There's a whole bunch of complexity. I think people just wave their hands over, oh no, AI will just fix it. I just don't see. And I need to be, you know, I need to understand what is the path there. And I just don't see one to actually, you know, hey, make the enterprise app that does FUBAR and that out comes this complex app.
the right packages and the right API versions and good performance and good security and it can be evolved and maintained without humans that need to understand it. It's just not going to happen. Yeah. I'm finding that the altitude changes so rapidly. I'm getting like the bends. It's just like going from like low level stuff to high level stuff, low level stuff, high level stuff. That's very, very challenging.
Cool. All right. That said, I mean, how much of a code, what would you say is the percentage of code that you have AI generate for you? 25, 30%. Yeah. The part that I'm getting better at is the refactoring. I was thinking about this HLS cell. I wouldn't just say generically, refactor this and make it better. I would say, pull out these four variables because I want to make those user changeable. So I wouldn't give it a four-step process. I would give it the four steps.
Because I know with four steps, I know I wanted to get from point A to point D through points B, C. Why would you do that? It could be that I'm not willing to let go of the steering wheel yet. It could be that I don't trust it because I've asked it to do stuff. that was unclear before both myself and david fowler give ridiculously long prompts like fowler is well known to give like five paragraph fonts to get a one paragraph answer and i'm like dude you could just type it and he's like no
That works for him. And I'm the same way. My coding prompts are very verbose. My coding prompts are short. Well, most of my coding is not enterprise coding. Yeah, I do a lot of web apps. I think you're poking around in different kinds of apps than I am. python at ai yeah towards you well like that chatbot that you all did recently with the temperature thing for me yeah that was largely written by ai right lab coding yeah actually we're going to show that one i think
next episode or two. Yeah, let's do a whole episode on that one because that's my favorite demo and the fact that they removed temperature from the log probs for OpenAI's completion stuff. kind of ruined my favorite go-to demo for teaching AI, and now I can do it with yours. Yeah. Cool. All right. Well, thanks a lot. This is Vibe Coding in practice, in production. Production vibe coding. Implications of vibe coding. Implications of vibe coding. We'll figure it out.
cool once again if you are one of the three people who made it this far into the depths of the podcast you can help us by sharing the show and reviewing it it cannot be overstated how much we appreciate your comments particularly folks that comment on YouTube you're very kind So please do like and subscribe. Smash that bell so that Mark Persinovich can become the Mr. Beast of Vibe Coding.