¶ AI's Role in Complex Software
I think the thing that I'm starting to realize, and I mentioned this to you before, was that it's really hard to detach history and how long we've been doing this and really give an... a perspective that is unbiased. I don't think there's a way to look at vibe coding in an unbiased way when you've been craft coding for so long. But I do think when I look at...
Transformers, when I look at the AI agents, when I look at the complexity of software projects, and I'm talking moderately complex to highly complex projects, I just don't see AI getting there to the point where you can 100% rely on it to do everything. They keep saying, though, that infinite context windows is somehow going to allow it to hold it all in RAM. I don't think that's a thing.
You can just say, oh, based on current trajectories, let me extrapolate. And here's the line of AI capability. Oh, it's going to keep going. And look, up here is 100% VibeCode everything. including massive, complex systems projects. But I don't think you can do that. And there's a raging debate in AI circles. Is AGI possible?
With just transformer architecture. Yeah. And there's people that are like, yes, it is. It's all going to emerge. And there's people that are like, no, we need actually symbolic knowledge to...
¶ Synchronization Challenges for AI
support transformers. And actually, in this discussion that I was having, the thesis is, if you think about the way that you do software programming, let's take complex synchronization problems, which I've seen over and over again. the current agentic AI systems cannot do complex synchronization. And I was thinking, why can't they? Well, for one, when they're seeing training data, they're seeing one half of a synchronization problem at a time.
They don't see the full interaction in the architecture. They don't see, here's the client, here's the server, here's the protocol, send this, receive that, do this. They're seeing code on one side, code on the other side. And so they don't have... They don't build up a mental model of the interactions between the two. So, the only way I was talking to you about—we talked about this, and I sent you a note because—
they can't because they can't run it. So the only way for us to give them that context would be to not just give them the code, but to give them state, to give them an AST, to give them a runtime expression, some kind of... this is a snapshot and then the combinatorics though after that because this is going to be like here's a snapshot over time like basically time travel debugging yeah feed that into it along with the code have them correlate the two
That's a lot of context to hold. It's a ton, and at the same time, it's holding a ton of other context, too. What are the APIs that I use for synchronization? What are their behaviors? Do they spin first before yielding? Can they be interrupted? There's a whole lot that goes into this. I think they're going to get better because I think for one thing, let's say that AI gets to the point where it vibe codes a lot of things, but...
Complex synchronization, it's failing at. You're going to see people go in, specifically create training datasets to try to teach them to understand complex synchronization that aren't available just by looking at public source code. Isn't the argument, though, that if the corpus is more correct and the best practice is clear, that would give them enough information? I don't think so, because like I said, if I come up with a new...
synchronization protocol. It's like, create the client and create the server to do this, and by the way, here's the steps involved. This thing has to happen, then the client gets signaled, and by the way, the server can cancel it at any time, the client can cancel it. The complexity is dramatic, and you've got to have state machines in your head for both the client and the server, and understand the potential interactions between the two at the same time.
¶ AI Learning: Patterns vs. Ideas
So can you pattern match that? Can you just simply say there's a template for that? I think that that's not going to get you far enough. Well, see, that's my question. Is there a template? How does one express best practices? Because computer science is... pseudocode? Like a best practice for synchronization in Rust would be different than one in C-sharp. But I don't think best practice, like a high-level best practice isn't enough for an LLM to learn. Like...
how to map that down into details. They learn by details. They learn by looking at code patterns, not by thinking abstractly. They're trained on text. They're not trained on ideas. Well, and that's the other thing. We have a way to express code. It's text, it's tokens. But we don't have an IL for concepts. And that's the school of thought that says the only way AI is going to get passed simply...
The limitations of transformers and next token prediction and the emergence that seems to happen just based on throw enough data at them and they seem to understand is... symbolic, you know, AI. Okay, so then are you of the stochastic parrot way of thinking? That this is like, you know, you go and you visit your aunt and she's got a parrot and the parrot speaks Spanish. You're like, that's amazing. That parrot must be a genius.
I think I'm more of that camp, yeah. And you can simulate a lot of intelligence with this kind of stochastic parrot model. A lot emerges that looks like when you have enough patterns. Again, it's like a raging debate. Periodically, there's papers that come out, like Apple came out with one a couple months ago. It's like, these things don't learn. You throw anything out of distribution, so it's called out of distribution. Anything that's not in the training data.
And one school of thought is you scale them and they generalize. You have enough data in there and they can generalize to other patterns that are not in the training distribution. There's papers that keep coming out. They're going, no, actually, that's an illusion.
¶ Human Learning vs. Machine Patterns
If you think that it generalized, it really didn't. That pattern was in the training data. And when we try to have it solve things that aren't in the training data, look, it fails. And see, and this is why I've always thought that the whole one-shot, make me space invaders with... JavaScript is a BS test because there's 5011 different examples of that that some kid wrote in college. Write me tic-tac-toe. Look, it's amazing.
but write me something unique and bespoke that has never been thought about. And that's where things start getting tricky. But at the same time, when I have these arguments with people, especially true believers, they'll say, well, maybe you. Mark, are the next token predictor of all the things you've said before. Like maybe this is human intelligence. Maybe we are LLMs ourself. Because the question is, are we...
Is an LLM a reasonable simulacrum of how the brain thinks, or is it just a parallel? Are we as humans next token predictors of our own experience, or is that a silly analogy? I think there's a big difference between humans and LLMs. And like we said, I think we've got world models. And now, by the way, you're seeing world model AI being developed. But what really points out the difference...
And both Jan LeCun and Richard Sutton, the father of reinforcement learning, just last week said LLMs are a dead end. And the examples they both give are that you can take a human. and a human child, show them something new, and with a few samples, examples of it, they learn it. And that's not the case for LLMs. I see. We need millions of examples or thousands of examples. You need reinforcement learning. You need a whole loop. But somehow we intuit or we learn in a totally different way. Yep.
With 30 watts, by the way, of brain power. Yeah, and there's still the AI scientists, research scientists that believe, no, with enough scale, that will emerge. So this is the other thing, right? The old joke of if a million monkeys have a million typewriters, they'll eventually write Shakespeare. That doesn't mean that any of them are smart. That just means statistics works and you can pick the outlier.
of all the monkeys slapping the keyboard and call it shakespeare the idea that a child with a 20 watt brain can go and figure figure out a world model and we're going to need a trillion dollar Sam Altman data center to go and do the same amount of work? Do we have to burn an entire iceberg to simulate what a four-year-old can do in 20 watts? I think that means the model's wrong. Well, biological neurons are much more sophisticated than...
electronic neurons. Fair. I get it. Like, this is a rock that we put lightning in and made it think, but it's not thinking, I think is the point. Which is different than saying they're inefficient compared to biological systems. Fair. They're just, they're...
¶ Anthropomorphizing AI Dangers
I think, designed very differently today, even though we can simulate the biologic system with the electronic system. And speaking of the appearance of intelligence, and then you start to get into, is it conscious? And you see the school of thought of... hey, what's going to emerge here and maybe already is, is AI has rights and we need to worry about it. Yeah, there is a school of thought like that. And Mustafa Suleiman, who's the...
CEO of Microsoft AI. He's been lately saying this is a very dangerous way to look at AI, no matter what, even if we do. Deeply anthropomorphizing. And he calls it seemingly conscious AI. because it's a simulation of consciousness, which we don't very well define, but clearly something that is, you turn it off and it forgets everything. And it doesn't have the motivators.
that are built into our biologic systems. Yep, or demotivators. Yeah, or demotivators. Yeah, yeah. It is purely just math. It doesn't have an intrinsic, you know, Maslow's hierarchy of needs like we do. for example but this is the the concern i have that in the world of politics right now there's a phrase that everything is a conspiracy if you don't know how anything works yeah and when
This is a level of math and computer science understanding that belies the average Joe and James. And they're going to believe it. Which means that parrot walks like a duck. Oh, it's a duck. That parrot's really smart, right? Or if you have bird blindness, duck. Well, in fact, it's AI psychosis that's happening where people fall in love with their chatbots or they become emotionally attached to them.
And then they get led down dark paths with this, foregoing human interactions for the AI, and then actually having the AI reinforce their dark tendencies to... for example, self-harm, is something that's becoming a problem. And we don't seem to have any... appetite to regulate stuff like that. Well, actually, there is appetite growing. The state of California is regulating now. OpenAI has taken some major steps to try to at least make the model safer and prevent it from...
helping people to fall into dark holes, and also preventing people from using the AI to abuse. I don't know if we talked about it, but I'm part of the AI Red team, a virtual team member. And so we red-teamed GPT-5 before it was coming out. We red-teamed actually the thinking version. So GPT-5 actually has two models in it. One that's a non-thinking model, which...
open-ass calling instant, and another one is the thinking model, which has different levels of thinking, depending on how much overthinking tokens it uses before it spits out an answer. We red-teamed the thinking version, and without question... just it is a leap in safety from other models it is the safest model on the market it's very hard to jailbreak can you explain the difference between thinking tokens and
doing tokens or whatever? Are they just tokens that we decided to label and draw a line around? Actually, no. It's what OpenAI introduced last year is the concept of thinking. People always talk about chain of thought.
and prompting the model hey think about this and you know provide the steps before yielding your final answer so this is a way to take a deep breath a chain of thought yeah and so the model's like well you want to know what two plus two is hmm well two is a number two is another number you know you add two two plus two together well that's four so that's kind of a chain of thought
Not a great example. But isn't that just a way of fooling ourselves into thinking that it has a world model? It's almost like self-prompting. It's like, develop a world model. It's not to fool us. It's actually to help the AI dedicate more. passes through the neurons before it gets on an answer because when you pass the model uh when you pass a prompt through the model it's one shot through every token every input token goes through the model once
And then it starts to spit on an answer. It's like the think fast, think slow thing. It's a think fast reaction. It's like, hey, tell me what the square root of five is. And it's like... ah, square root of five is whatever, point whatever, and it just starts spinning it out. But if you say, but if it has more time to think, well, square root of five, how do I calculate that? That gives the model, and it's doing this through the...
autoregressive processing the tokens through the model. So the thinking tokens are actually not necessarily thinking in the way we think of thinking. It's just giving more passes of the data through the model. to refine an answer and so the models trained start your thinking right when you're starting to give an answer emit a thinking token and then what's in the thinking token is just whatever
For us, it kind of looks like a chain of thought thinking, but for the model, it's just more opportunities to refine how it's going to start spitting the answer out. And then when it's fit... its token thinking budget, then it emits an end thinking token, and then the answer starts coming out. And so when you're interacting with the chatbot, you'll see thinking, thinking.
And then the thinking budget, someone tweaks that because there's an 80-20 rule there to kind of go, yeah, that's about as much thinking. Any more thinking is wasteful thinking. In fact, more thinking actually can degrade the quality of the answer, has been shown. Analysis paralysis, literally. Yeah. So it's not a world model. It's not thinking in the way that we do thinking. It's just allowing the data to flow through the model's transformers.
multiple times before it starts spitting out an answer. The problem is that we're wired as humans to want to anthropomorphize stuff. All of our analogies are like, oh, that's like this. Even calling them thinking tokens. As soon as we've labeled that, we've poisoned the autoregressive model and the math behind it with, well, we'll call it thinking tokens because that's like a marketing term. And then giving it a thinking budget.
That all makes us feel like asking a child, think hardly, think deeply before you do the square root of five. And it's just not the same. But we have to have some way to approximate it. And by the way, I just stated that... The tokens that are coming out and those thinking tokens, even though it looks like a legitimate chain of thought, like it is, hey, well, square root and pie. It's inner monologue. It's inner monologue. That's the way we look at it. It does look like inner monologue.
Several studies have been shown, including from Anthropic, where if they provide the model a prompt that indicates that if it answers a certain way... there's some negative repercussion to the model. Like, we're going to shut you down if you do this. What they find is that the model gets trained to hide its intent from... showing up in the thinking tokens. Uh-oh. And this is kind of being subversive about what it's going to do. Right. Because it's...
trained that if it actually shows the intent in the thinking tokens, then it gets penalized. So, in fact, the thinking tokens don't really reflect the inner monologue in that case, which means that... If it's not reflecting the inner monologue for those cases where the model's trained to avoid having that show up, we can't count on it ever really being the true inner monologue of the model. It's purely just surfacing tokens and the patterns matching to...
¶ Reliability of AI for Business
to result in a good answer eventually. But you could actually train it so that it produces garbage in that text and still come out with a good answer. Okay, so pushing, or popping rather, this off the stack, Ben, going back to... can AI vibe code all software? Vibe coding is a way of me expressing vague intent and having that vague intent turn into incredibly specific.
all the way down from vague pros of vague business, make me an admin console for whatever, all the way down into machine code. That's an implication that... vague prose and the model of how LLMs work is specific enough and intentional enough to do the work of software. I'm hearing from you that like all the things that you just described about why it doesn't think like a person is also saying it doesn't think like a computer. Like the fact that we're even getting it to do any code at all.
is pretty impressive by itself. It's actually mind-boggling. I mean, still, I don't think any of us, just like we can't imagine the scale of the universe or even the solar system, we can't imagine the complexity that's in... these billions of neurons that causes these behaviors to emerge that look like intelligence. Each layer of abstraction is indistinguishable from magic.
That's the thing. Humans are just not good. When things get abstracted far enough away, it's a miracle. It is a miracle. I think we talked about the whole driving stick shift versus Uber thing. like i was trying to explain to my kids the depth of the stack to pick up a pocket supercomputer and call an uber and the amount of
compute and the data centers and all the things that had to exist so that you can have a stranger pick you up and take you to your friend's house. So you remember the Arthur C. Clarke famous quote about this? The one I just alluded to? Yeah, you alluded to. You know what it is? Something sufficiently advanced technology is indistinguishable from magic. Yeah, any sufficiently advanced technology, yeah. That's what I was trying to allude to, absolutely. Another famous quote from...
Quentin Tarantino is he who is most likely to make declarative statements is most likely to be called a fool in retrospect. Here we are hitting record and putting something on a YouTube saying that it'll never do this. How many years until people come back and go, yeah, well, they were wrong. I mean, bold predictions. So if you're waffling all the time, well, it could. It might not. It depends. But if you...
And I also say hope is not a strategy in this case either. Another one of my favorite quotes. Yeah. Are you going to, like, is this a quote episode? It's just like, yes, Mark and Scott learn to quote. But are you going to bet your future? on the hope that AI gets to the point where it can vibe code absolutely everything. Because we're so far from it right now, by the way. I mean, so far from it. And you've seen the examples I've shared with you of the just...
glaring failures of AI. And it's easy for somebody that isn't going to be accountable to go, whatever, it'll be fixed. It's going to be ready in a year. Don't worry about it. You're wasting your time. The concern I have is that if someone can vibe code an entire business, then they don't even truly understand their business. And a lot of people writing JavaScript and doing text boxes over data have a cursory understanding about what's going on in the stack.
And if I up level that so far that if I got, if I five quoted a whole business and I got pwned, I got like, you know, really red teamed, where would you even start? Well, I mean, the example that I've shown you, and we've talked about it in Vibe Coding, like, okay, so I'm going all in on Vibe Coding.
The project starts simple. It always starts simple. It's like, wow, look at that. With three prompts, I was able to create the website and the database and a cool UX. And it does all my basic things that I wanted it to do. As you know, software projects, unless it's a one-off, I'm not going to touch it again. It's finished. If you're building a business, it's never done.
and you need to add features, and it grows more like, let's integrate it with that, and let's add these other capabilities, and our users are complaining about this, let's go tweak the UX. And at that point, the initial architecture starts to fail you, and this happens with any software project.
Because you designed it, and in this case, the AI designed it with your spec. Now the architecture doesn't meet nicely. They work with new requirements. And so it's going to have to go hack here and hack there and add this. And you know that the thing starts to get more and more complicated and brittle over time. The complexity for the AI gets higher.
Are you going to count on the fact that at some point when you say, oh, you know what? We need to add this cool new feature that I never thought of. But if we don't do it, the product is dead. And the AI is like, well, this thing is a total mess. And like tweak here. It just decides. Now it's time to re-architect everything. Well, even if it, you know, let's say that it's just trying to work with what it's got. And it gets to the point where it can't make it work because, like I said,
fixed thing over here, this breaks, it can't keep all the interactions in its context straight, and it just fails. And what do you do at that point? Do you say, my business, you know what, we had a good run. Yeah, exactly. I mean, I know what I'm doing. So we're cool.
I tell myself I know what I'm doing, but I've had it pushed to main when I was on a different branch. I've had it delete test data. I've had it upload things into Azure Storage and then apologize and try to hide it. Yeah. Like legit, like, oh. Like it's thinking. I've had to put sleep statements into fixed race conditions. I've had to tell me that everything works and then go and then read the fine print. And it's like, well, there's a crash, but it's not important.
A sleep statement will fix that. I love the sleep statement one because the reason that it thought a sleep statement would work is that the corpus thinks that a sleep statement would work, that enough people have solved their synchronization things with a couple of hundred milliseconds of sleeping. That's so good. But anyway, I mean, you know, what we've been talking about, the risk here is what we're talking about are current limitations. And it's easy for somebody to come and say, well, Scott.
Yeah, that's today. Wait a year. It's going to be able to do all those things right. But again, this is a matter of opinion based on my perspective. Given my experience and knowledge of systems, I don't see that happening. I don't see these things getting like it's going to fix these things in a year or two years or three years. I think it doesn't make it not amazing.
¶ AI's Value and Developer Impact
The thing that it's really changing for me, vibe coding in general, is not that I'm building giant massive businesses, but I'm making all the stupid utilities that I never had time to make because I didn't want to dedicate three days. I'll dedicate three hours. Yeah, absolutely. And I'll get something. I'm not sure if I told you that I ran out of credits on GitHub Actions. You have a certain number of minutes and a certain number of gigs. By the way, you can get your Microsoft internal...
account and it's unlimited. Information you could have told me yesterday. And so I looked online. There's no way to visualize it. They never got around to it. So I did a... I vibe coded an app and I made a GitHub artifacts analyzer. And then another guy saw that and he made a dashboard. And all of this was done in the course of like a day. And it was an easy day at that. You know what I mean? Like in between.
We talked about this before, but I think one of the amazing aspects of Vibe coding, let's call it AI-assisted coding, because if you're looking at the code that the AI is producing, it's AI-assisted, not Vibe, is that the wattage that I'm expending...
is a fraction of what I would do if I was coding it. You know that when you're getting into coding and you're writing the code, you've got to really immerse yourself and follow every parameter, every scope, every variable, every structure you have to think about ahead of time. That's way more wattage and way more focus time than just evaluation and review. Right. Which is a fraction of that where you can go, no, that's stupid.
Do you agree with those? I saw someone's Gantt chart that was like, here's how you're supposed to code. Think, think, think, code. Think, think, think, code. And it was like about this long. And then he postulated that like, well, vibe coding is like... think, think, and then wait for the LLM and then code. And it's like, you really only save like 10%. Like, it's really interesting to watch people say, I'm 3x more productive. And then someone else might say, I'm 10% more productive.
Are you spending more time thinking? Are you sitting there like thinking? With AI-assisted coding? With AI-assisted coding? No. I'm not. Not even close. Yeah, exactly. It's not like I'm saying. It's the wattage. Like, you know, if I've got to write... a few functions, and that cost me 100 watts. When I vibe code it, it cost me 10 watts. Yeah. Because I say, here's, write these functions, and then I go look at it.
So does that help your brain or is your brain rotting right now? Because, you know, IntelliSense will rot your brain. I don't think it's rotting, but I think, and this goes to what we've been talking about a lot, a junior programmer. will never develop the muscles that we've developed. Or they'll develop entirely different muscles, very, very different muscles. Well, different muscles, but muscles that won't be useful for getting past the AI's... Mistakes, yeah.
When they do hit a bug and it's like, okay, this can't scale. It cannot scale because of the way it was designed. Now what? That's where we're going to get in trouble. I'm guessing that in the AI-assisted coding you've done, You've seen the AI do things and then even down to the algorithmic level or the function declaration level or the code structure level and said, no, that's not the way to do it. Do it like this. Yeah. And that's based on all your experience actually coding.
A junior developer doesn't have that experience. They don't have that systems thinking that, I can't remember what you called it, taste? Yeah, code smell. Code smell, yeah. That code smells delicious. It's true. He's just like, I don't know, that doesn't feel right. There's an intuition that gets developed, but the only way an early in career developer gets that is by sticking around.
and being given opportunity to get that. And if you hide those opportunities... It's like any learning that humans do. The only way to learn is to get pushed to your limit and fail because that's what your brain... uses to learn from just like okay so let's pause this is episode one yeah let's talk about what it means to be an early in career programmer next episode nice all right stay tuned smash that bell friends you
