Building Pi, and what makes self-modifying software so fascinating - podcast episode cover

Building Pi, and what makes self-modifying software so fascinating

Apr 29, 20261 hr 33 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Summary

Join Mario Zechner, creator of the minimalist AI coding agent Pi, and Armin Ronacher, creator of Flask, for a deep dive into the fascinating and sometimes challenging world of AI in software development. They discuss the origins and philosophy behind Pi's self-modifying capabilities, learnings from 30+ engineering teams adopting AI, and the critical downsides of over-automation, including declining code quality and the erosion of open source maintainability. The conversation highlights the essential role of human judgment, the hidden costs of complexity, and the need for developers to "slow the f down" to prioritize quality and thoughtful integration of AI tools.

Episode description

Brought to You By:

Statsig — ⁠ The unified platform for flags, analytics, experiments, and more.

Sonar – The makers of SonarQube, the industry standard for automated code review

WorkOS – Everything you need to make your app enterprise ready.

Mario Zechner is the creator of Pi, a minimalist, self-modifying AI coding agent, that is the foundation upon which OpenClaw (created by Peter Steinberger) is built. Meanwhile, Armin Ronacher is the creator of Flask, and a longtime user of Pi. The pair are also friends.

I sat down with Mario and Armin for the latest episode of the Pragmatic Engineer Podcast for an interesting conversation about AI and their reservations about it – even though both are heavily invested in building AI-powered tools.

Mario explains why he built Pi, and gives his take on why it has become so popular. Armin walks us through how he uses AI tools, including building a game with Pi, and why he always puts human judgment firmly at the heart of his approach.

We cover the risks of over-automation, the limits of agentic workflows, and why strong engineers with informed judgment still matter. We also get into the challenges of working with code written by non-engineers, and whether open source can withstand a tidal wave of agent-generated code.

Timestamps

(00:00) Intro

(07:30) How Mario, Armin, and Peter Steinberger met(15:15) How 30 dev teams use AI agents: learnings

(21:50) The importance of judgment

(24:26) Challenges when non-engineers write code

(28:30) Downsides of over-automation

(32:18) Pi

(48:09) OpenClaw + Pi

(50:54) “Clankers”

(57:32) Open source and AI

(1:00:22) Complexity as the enemy

(1:02:50) Building an AI-native startup

(1:11:52) “Slow the F down”

(1:16:40) MCPs vs. CLI

(1:25:03) Predictions and staying up to date

The Pragmatic Engineer deepdives relevant for this episode:

The impact of AI on software engineers in 2026: key trends

Cycles of disruption in the tech industry

The AI engineering stack

The creator of OpenClaw: "I ship code that I don't read"

What is inference engineering? Deepdive

Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email podcast@pragmaticengineer.com.



Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

Transcript

Intro

What if I told you that one of the most influential AI coding agents of twenty twenty six was built by a single developer in Austria who got frustrated with existing AI coding agents. This is Pi, a minimalist self-modifiable coding agent, which has quietly become the engine behind the wildly popular personal AI assist, OpenClaw. Mario Zechner is the creator of Pi,

and joining him today is Armin Ronicher, the creator of Flask, and now an early adopter and contributor to Pi. In today's episode we cover the backstory of Pi and why self-modifying software is much easier to do with AI agents. What Armin learned interviewing 30 Plus engineering teams about how AI agents are changing how they work, and why software quality feels like it's trending down, the case against MCP, and why CLIs are becoming so popular, and many more.

If you want to hear from two very grounded voices in the industry honestly talk about what's working and what isn't, and why we need to slow down as an industry, this episode is for you. This episode is presented by Statsig, the Unified Platform for Flags, Analytics, Experiments, and more. This episode is brought to you by Work AWS. Engineers love to build.

Today's episode will be a great example of this. We'll get into why and how Pi was built from the ground up. But when you're shipping a product, some problems are better solved with trusted infrastructure built for scale. Enterprise features like SAML, Directory Sync, and Audit Logs are some of those.

WorkWaz gives you APIs to add them in days, not in months. Shift faster without reinventing the wheel. And now let's get into the episode. Mario and Armin, it's so good to have you here on the podcast. Thanks for having us. Thank you. So as a kickoff, Mario, how did you get into tech and eventually into building AI stuff? Oh well that's a long story. How much time do we have? Yeah. So I'm a kids of the nineties, actually.

And uh I got my first PC at ninety ninety six and the trigger for that was that I loved computer games. We were kind of working poor, so we couldn't afford any of the Game Boy and and NES, Super NES stuff. But I had an uncle who had an Amiga five hundred. And I would go to his place every second day and just play games there. And eventually my my parents told me if you work, uh, you can save up and buy yourself a computer. And in reality my dad would do um what's he called?

Schwarze Welt. Well you're not necessarily paying your taxes on it. So he would do his normal he would do his normal job and after his normal job he would go fix cars and work at construction sites and Yeah, it's it's it's very common in Europe. Like I I know everyone's at that. And after two or three years or so they they just said, It's time and took me to a computer shop in the nearby big city and bought me a four eighty six and that's how it started basically. Antium four eighty six.

Yeah, an Intel four eighty six DX forty megahertz with turbo button. And that's where I started. And I've always been into games a lot, um, which also led to graphics programming. And through sheer luck I got uh a job while I was studying at university at the applied science organization.

who was doing L P stuff, um, machine learning, applied machine learning, basically taking research results and trying to stuff them in the industry applications. And that's where I learned the ropes of machine learning. That was all before deep learning became a thing. And I actually quit that kinda domain in twenty ten, eleven ish, because I joined a startup in San Francisco.

And then later came back and joined another startup with two friends in in in Sweden where we did uh an ahead of time compiler for a job byte code to iOS that got sold. And since then I have a little bit more time. And I've always kept up with machine learning stuff because obviously it's super interesting. Uh and yeah, and then GPT happened and that that's the story. And here we are. And then Ar Armin, where were your roots?

So my roots are definitely not working poor, but I because my parents run an architectural office. where they kinda adopted computers for cat drawing. My first computer was like old computers that they recycled. So my first computer, even though I'm younger, was in uh three six uh six. So sorry for you. And and so basically none of the computers that I ever had were capable of playing computer games properly. Um because one, they used uh Windows N T which at the time didn't

uh do anything. So you had to sort of like build your way through it. And like the only way in actually could actually get them to run was because before it didn't know yet how to get the Windows ninety five or like Windows three eleven. Um that was like before it booted into either one of those you could

put it into DOS uh like three old DOS games at a time when you could already get better stuff. But but because it was sort of this kind of thing, I I started toying around with Quick Basic a lot. Um with Too Pascal, I bought a bunch of books on that. Um, and I that that was my roots of of of learning how these things work. And it just I wasn't ever really good at this, but I found it really interesting. Like this this idea of like No for sure. We call it a tiefstab launcher.

No, I I swear to you, like I was when I when I started dabbling with this, I just really sucked. But like over time you like if you keep doing that you get better. Um And then in uh Two thousand two or three. I the I used I used to use uh Delphi a lot because I get uh visual version of of Turbo Pascal.

And in two thousand two or two thousand three, someone uh also showed me because I I I've I've got this idea like I wanna use Linux and then um I Delphi didn't work on Linux and then I found Python. And through that I started doing some Python programming and there was uh Ubuntu just came out in two thousand four.

And that was a venture backed vehicle, but it they created all this like local communities. So that was like Ubuntu Association. So I together with a bunch of friends we started the German Ubuntu Foundation. Uh not that foundation, association.

Um and we ran this online community called Ubuntu Users for four or five years. And we and it because Ubuntu was popular, the community grew and then the skating problems came. So like that's how I got into web development. Um and then Or building this, I just I I wanted to build

a templating engine, a a web library, all of this. And then eventually I bundled that together and made this Flask framework, which got very popular and even nowadays still is a a thing that clankers like to spit out. That's hilarious. Um Uh but yeah, but I I left it and then in uh in twenty thirteen, fourteen or si so I worked on computer games for a couple of years in London, but then afterwards I went back to open source and I I worked on Century for ten years.

And then left in April last year. To try something. Yeah. So both of you are originally from Austria. Uh in fact you right now live in Austria as well, right? You were doing uh games, you were working at Century, you also did games before. And then the third person who's not in the room but was on this podcast just before is Peter Steinberger, also from Austria. Where did the two of you meet? Where did the three of you uh meet?

Because uh I I've I've I've recently seen a bunch of photos, especially before OpenClaw and and Pi started, you hanging out, uh the three of you experimenting, playing with AI. I think the two of us met on on the internet, right? On Reddit. It depends because I I definitely met you once when I was at university. Yeah. So but you didn't recognize me at the time and I was useless. Um but yeah, we we sort of abstractly met on the internet. But eventually we met up in Vienna. Um

How Mario, Armin, and Peter Steinberger met(15:15) How 30 dev teams use AI agents: learnings

We were screaming a lot at each other, but uh on the internet. But uh in a in a very cute kind of way, in a very non confrontational kind of way. And even though we might not think alike in all areas of of of our lives, uh It was uh a cultured exchange, I would say. So that was nice. Uh and Peter, I Like six degrees of Peter Steinberger basically. Um I was working at an office in my town.

And the company that gave me free office space in exchange for being like a mentor to the CEO had some kind of business dealings with Peter's company, PSP D F Kit. Um and eventually came to the office in Graz. And I think that's where we met the first time. And then also the same year we met at the conference in Istanbul and just hung out for an entire night and that's basically where it all started.

Nice. And then how did the both of you go from being skeptical about AI when these tools came out? I think again both of you have to at at that point and by twenty twenty two You've been doing a decade plus of building complex software in different domains. What was your first reaction to it? And then eventually how did you kind of kind of come across to the side of like, well, this thing is actually really interesting? So for me it was I think in twenty twenty two.

I think Copilot, um GitHub Copilot came out before GPT. Yes, in twenty twenty one. Yeah. And through my previous startup stuff, I was working with Nat Friedman and Miguel DeCalse from Xamarine because they acquired the With with Samarin. Yeah, they acquired the company I talked about earlier, the Java compile thing. I I knew Nat Friedman uh from our early startup stuff and eventually moved to GitHub.

And then was in my DMs in two thousand twenty two, I think, and ask if I wanted to have access to Kitop Copilot, the tap tap tap autocomplete thingy. And it was like I I don't really care. I don't think this is going anywhere and it's like, No man, it's the future. Gotta try it, it's the future. So I tried it and it was absolutely horrible. So Ha ha ha. But yeah, after after when GPT came out and especially when when they started uh uh providing API access, I did a lot of

projects, just figuring out what works and what doesn't work, not necessarily in the coding space. But eventually once they had tool calling, that's when they became very interesting, or function calling as OpenAI called it back then. Um, but it took until two thousand and I would say twenty four, end of twenty four, October or so, for that to actually be useful. And that's where the coding agents also became kinda interesting.

And then twenty twenty five, um, the Cloud Code team came out with with Cloud Code. And that introduced the Gentec search. So basically just give the agent a way to plow through your file system and read all your files and then make the whole difference, actually. Like all the things that came before, like cursor with indexing and and any uh AST based stuff and and all of that that just went away. And I know that the CEO of uh Chroma is probably mad at me for saying this.

That that was the difference. That it didn't was it wasn't like a dense and sparse search thing that the agent could could go through. It was just give it access to your files. That was it for me. That's where it clicked for me. I think my path was kind of similar. Um, because I think Copilot came out quite a bit earlier.

But I know that um there was a program at GitHub that gave you early access to Copilot at the time. Um I think it was like this maintainers group or something where it still was in. I got the feeling for Copilot that this will actually be really interesting. Um, but not in any way in which it is now. Because I felt like, Oh, I am in open source for such a long time and now they're doing like training on open source data. It's like there is something

At the very least, this will be controversial. I mean, I didn't think about like it being productive. I felt like, oh, this is going to be Um it's going to be like a controversial thing as a big like training open source data. And and and I was I remember for like I almost like I was trying to probe it like really um Whether there's flask in there. No, no, I was trying to probe it like really adversarial. So one of the things that I I probed on is like I probe on like will it retail GPL code?

And I remember at one point I got it to um spit out the uh But I also found it like you can reca you can sort of tab in a certain way. then it would then continue putting like license text on top of it. It was completely wrong. So it's like it came from an open source GPL drop of of of Doom originally, I think.

Um and so it was like it would have been GBL code if it would have done that. But it actually attributed like MIT license from a random dude. And I did it's like, oh like Mr. Copeland, that's the wrong thing. And that tweet at the time got really, really popular and then sort of people started Sharing with me, like because I was at a time not really exposed to how much actual AI progress was being made in those labs.

Yeah. Like I I didn't come from this AI space or ML space. So like I was I learned about a university and like, oh there's AI winter and then nothing happens. But through this tweet and some other things, I like other than I Like I re recognize that there was something there. Like there's there's actually CEOs in certain companies are convinced this will get off. And that's how I started, like

paying attention to and I was essentially I was trying all kinds of stuff with the API, like can you do like bug fixing things? So I got really interested in it, but it didn't at all feel like the world is going to change until um close quote. And you also changed your stance on the whole, oh my god, this is spitting out open source code. It it memorized. So because like my like my shtick for many years now has been that I really

I I'm a like a I want people to share stuff. Like I I think like human progress comes from like building on top of each other. And I I'm a huge supporter of the fact that in the US you basically take knowledge from one company to another company that they don't know competes. Like I I like this pirate kind of approach. Sharing. Yeah, spread knowledge.

Yeah, and so like I I was like my optimal version is like copyrights don't exist in a way, or like very, very like a limited kind of version of this. I was like, I really didn't care that it spits out GPL code and doesn't attribute it. Like I was like, oh, maybe this will just completely destroy

copyrights and like I for me that was like oh this is I uh like if if that's the outcome of it like I'm I'm fine with it. So it was but it was it was an interesting kind of thing in the beginning that it sort of like

it sort of creates this license violation. Like I want to see like what what chaos will emerge from it. And so far I think it ha mostly what has emerged from it is like a strong belief now that like the the the system in place for copyright has some presumptions is assumptions in the US about how it's supposed to work.

And we're all kinda like ignoring that right now because we wanna create the mess first and then re regulate it probably because like at least in theory, a lot of the uh things that we're producing right now are probably by historic readings of the copyright. Interpretation. Yeah. That that's an interesting one. But speaking of jump jumping to today, so an interesting thing that you did recently

We talked about it just before, is as part of your new startup is building things on top of agents. And you talk to about 30 different engineering teams saying, hey, how are you using agents inside of your company, inside of your team? What did you learn from large companies to startups? I think the the the a bunch of learnings uh are entirely unsurprising is that whenever people had vacation, there was more time spent on um trying these tools.

And and just to be clear, like you talk with like folks at the likes of like meta startup. Yeah. Like a bunch bunch of different people. Right. So a bunch of different people from like different like European dinosaurs, like Yeah, yeah.

Well, I mean like the European dinosaur would be someone like CMETS. Yeah. Or I also talk to two companies which are sort of in a critical space. And what I mean like when P adoption happens when people have vacation is that like when when your CEO or your tech lead comes and says, like, you gotta use Cursor now, you gotta use Clot Code now.

is actually you don't get it in a way. Because it you you need to actually spend some time on like there's a there's a it's like a two to three week kind of thing until it really clicks on you. And so I I always felt like with the people that I knew, like I had a lot of free time. Like I I left the company in April until October, I was like, i I can dive into this. And I I was like, this is like how does nobody get this? It's like catnip.

It was a it was crazy catnip. I didn't sleep much, all of this. But but what happened within the company seemingly is that when there was like Thanksgiving, there was um for the Europeans a lot of it was over summer and then uh Christmas. A lot of people sort of and they also get free credits during those times. And so like more and more people get You mean the the EA companies often give you

Yeah, generally. More and more people went into this and and especially after Christmas, I would guess like in it more than half the companies I talked to after Christmas it really exploded. um and and it and explore it in in sort of in all the ways and would expect it where like all of a sudden the quality drops.

And and and and it doesn't necessarily drop because like people want to make worse code, but because it actually takes some effort to to stay within this. And we we have seen this In the startup ecosystem already in the summer last year, like if you if you pay attention to like the the YC startup.

A lot of them some of them have their stuff on GitHub or for some period of time on GitHub and you can look at it and like at the time because of like plan MD files checked in and like all everything attributed to Claude. So like that vibe coding kind of thing was was a for like prototypes and whatever and like that built that out. It was already out there to see. But then gradually a small version of this has like been code bases with a little bit of

Vibe slop on top. And I an interesting sort of part of this was like how engineering teams and companies are now responding to that. Um, with all kinds of like different findings, but but a lot of it has been challenged to review PRs. They're getting larger and larger and they're becoming like more pikes psychological. And engineers specifically are having a hard time keeping up with the the longer PRs, that the they're more frequent.

Yeah, and they're also there a lot of the code in those PRs is how an engineer wouldn't do it because as an engineer you sort of get a really bad feeling committing certain code because you're think of your future self.

And the agent really does not care. This is I I will retell this story over and over, but like I I worked uh for an Xbox One game at the time, um, right around the Xbox One launch. So that was like a fixed day, it has to release on that day. So I worked on um Uh the Halo and Master S.

And uh there was a game where you had like a matchmaking component and you had to like store this thing in whatever. And and it was like it was an all hands on deck kind of situation where people had to go in and unslop the human made slop that was the matchmaker.

And it was like it was it was a system with like way too many states. We call it an emergent state machine because it was like sixteen bools on one massive thing. And like in theory there were only six valid states. But in reality, it was a dramatic explosion of possible states. And that's how a genti code feels.

Like where it really should only be like a very clearly defined system, but in all reality, they're like, oh, we can config doesn't load, let's catch it down and load the default config. So instead of actually failing, it now recovers. But now your code is way more complex than it should be because instead of failing properly, it is now recovering and entering these many more failure states.

And that makes it much harder to work with this code because you can also not really ask the agent to refactor it because it's like, Oh yeah, this could be possible, so we need to maintain this invariant. think it's kinda even worse to what you described, uh be about your human made complex system.

Because there are moments of brilliance in agents where they spit out perfectly fine simple code, exactly the amount and type of code you didn't need for that specific thing. And USD steering engineer looking at that and like, wow.

I can just sit back and not care because it's obviously doing the thing like two minutes later you have another agent running in this window and it spits out the worst horrible garbage because But you might not notice because now you have fallen into automation bias and think your your your agent is doing the job well.

D do you think this might be our bit of a human bias because because you know, like typically like onboarding a new engineer, uh you have a new joint, a new grad, you review their code, and if it's terrible code. You will be able to do that. will review the next one thoroughly until they get to the point that, oh, it writes the code that I do and then it typically takes, you know, six months or a year, something like that. But then, you know, I can trust this person.

Yes, but you don't have anything like that with ages. Like agents don't learn. You can put as much stuff in the agents and D or build a memory system, but that's not the same type of learning than um a human does. Obviously humans are failable as well. No m no no matter, but they have some capability of learning. And retaining that learning. Yes, and they also feel pain. And I think that's one of the defining things about humans.

It's kind of ties back to what you said, eventually if the pain gets too big, you as a human are incen incentivized to fix the cause of your pain. And in the code base, the cause is usually terrible interfaces, terrible complexity that you want to get rid of because you can no longer maintain that system. Isn't this why just holding on to the you know, like senior engineers are always in demand because from uh the CEO sees a senior engineer as like they just get it done.

But in reality, as a senior engineer or most senior engineers who are effective, they've had battle scars. They've been burnt. They felt the pain. And they saw what happened when they left Tech Def Spiral. So they now make all these decisions that they know they they will help avoid. And of course, through this, uh progress goes faster.

I personally think and your mileage may vary, but uh a good engineer is an engineer that says no a lot and I don't need this a lot. Mm-hmm. Because that keeps complexity down. If you're using agents

The exact opposite happens. You say, yes, I want this and that this one this and I want this and I want this because I don't have to type it myself. I don't have to think about it. I just give the little machine a prompt and it will spit out something that kind of looks like the thing I wanted. Good enough. And that's where all the problems start.

The importance of judgment

And one thing that I also think is like good engineering is all about knowing the trade offs that you have to make. And there is sometimes the right solution is actually if you were to sort of like sit at university and learn about it, you kinda learn that you shouldn't be doing this in a way. I think Kel Henderson had this once where he said like you you do the dumbest solution first until it doesn't work anymore.

Because the the actual problem is there's so much stuff that you need to do that if you actually do the right solution, the correct solutions, all of this, it is it you're creating the kind of complexity that kills you at scale. And the engineer learns that, but also like if you if if you don't have that battle scar, it's actually very hard for you to argue correctly because it is it is this learning process that gives you the authority.

To then convince other engineers in the engineering org that you should be doing it this way. That is part of it that you learn that. But the other thing is also that. The agents give you now world knowledge access. And one of the other things that I learned through interviewing engineering teams now is that the senior person says no, knowing something. And then forty eight hours later.

The junior comes by and said, like, I talked to the agent and I already had this inkling, but now you have all the evidence of why we shouldn't be doing it this way. Because like previously you really didn't have that ready-made access to Someone who can tell your senior off.

Yeah. This is creates other stresses now that were previously like not every team has that because they have a really Like people going to the doctor with a chat GPT printout and saying, This is what the machine said, you better do that.

Is it fair to say that we are based on what you're seeing and talking, we might face a thing where it's very hard for experienced engineers to s it's harder just for them to say no Uh in spite of the product manager or a junior engineer saying It's worse because the product management comes in and sends pull requests and automatically shoots them. Yeah, that's another thing comes like non-engineers participating in engineering processes is is a thing now.

Ask Armin how that works. Ask him how how does it work? Well it's hard because if because on the one hand, like it's well intended, right? If someone who's like What what what is your experience? Is this uh your your your company talking with other other people?

So first of all, like i like we have a little bit of this errand. We're like we're small and so um like like my covana for instance sometimes sends like a pull workers on the website. I talk to people that have that at scale where like the marketing team all of a sudden does stuff on a website.

Challenges when non-engineers write code

and and the sales team like creates ever more elaborate sales demos that sort of land up on a GitHub org. And partially at that is one one of the most funniest one was like where the sales demo built a feature that didn't exist, but nobody noticed. Right. So this this is all like this is new, right? Because like previously none of that happened. But I think it's empowering. It is empowering, it's like there's a good thing to it in too.

If your entire org, if everybody in your org can participate in in in in the creation of software, uh in some form, right? Previously people couldn't do that. Like you had a designer who could figure something out in Figma. But they might not be able to kinda put it into a a clickable dummy demo, whatever. Or you might have a PM who who wants to try out a feature without kinda wasting time of an engineer. Now you can do that.

The problem is that people are now so focused on everybody can do everything now that they forget that you still need a process to kinda guardrail all of all of that. And the integration part is the hard thing. It's like the Peter g uh pr uh gave this idea of like the prompt request, but I'm actually really warming up to this idea. Like once you've demonstrated it, I no longer need your code.

And just to recap the prompt request was him saying that he doesn't like to get pull requests and said he would rather see the prompt because he will run the prompt or he will tweak it and it will generate it in the style that For me it's less about like I wanna see the prompt as a like what is it supposed to be doing?

And now that we understand because like actually in many ways I think like the interesting part is like often you don't really fully know what you wanted to do in the first place. And so like the act of creating clarifies what you really wanted to do. And so like that part is highly valuable. Often the approach and the code that comes out of it is not what an engineer would

sufficient seniority would have done. So it's not like I want your prompt so that I can re-clank my clanker so that it does it slightly better. But more like now that we know what we wanted to build, it's probably faster for me to start. Yeah, and I I also kinda disagree with Peter on not just need your prompt. I actually value seeing a terrible implementation or something.

Um, like if I get a pull request and most of the pull requests we get at on the Pi repository are made by agents without a lot of human touch, let's say. Then I immediately know, okay, this is gonna be garbage. But it's valuable garbage, um, because someone has put in at least a minimum amount of thought instructing their agent to create this pull request.

And I get to see how a shitty implementation of what they wanted to build looks like. And I get to I I don't need to waste my own time on trying that out. So somebody else tried it out already. That the naive dumb agent do the thing, do no mistakes, uh version. And that saves me time. I'm not saying I like pull requests by agents because they're terrible and I auto close them now, but they have value. It's it's not just a prompt. It's uh on an exponential, right?

But uh I think we're gonna find out way earlier than in previous cycles that this is a bad idea. What I think is gonna be interesting, and I don't know the answer to this, but uh I read this fascinating retelling of the British industrial revolution and how it it changed the textile industry. industrial revolution, yeah. Yeah, and so the the the the the general thesis on that article was like

every time something at the head of the pipeline got optimized, it created an incentive downstream of the whole thing to create something. Right. So like in the beginning, like if you can weave the thing faster, then eventually you need to have

That can be weaved at faster speeds, then eventually you need to everything sort of turn the bottleneck all the way down. And like ultimately the biggest bottleneck in the entire thing turn out to be what I think like is actually the the next bottleneck we're hitting in in engineering, which is like

At one point you made a shirt and if you didn't like the shirt, you went back to the person that made it and they fixed it up for you. And so the the actual thing was like if if the shirt is bad, nobody cares about anymore who've destroyed the shirt in the process. Is it just going to get a new one?

Downsides of over-automation

Right like the the responsibility actually went from anyone in this chain to the entire factory as a whole doesn't have to carry responsibility anymore because we have we've commoditized the whole thing so much that that you don't you don't have to do this. And if you if like take the engineering approach of it, it's like a pretty significant part of

Running a company and running a service is like running it reliably. And so you have these postmortems on incidents to figure out like what went wrong in the process. I could fix the the shirt. Yeah, and and and the thing is like we we we we are running all on this idea that every engineer that sort of is in this creation process that ultimate let up is carries some responsibility.

And that we're going to that person and not saying to blame that person, but like to figure out like why why did you do wrong here? And so like if you do if it like the machine now produces stuff at like ten times the speed. the responsibility thing does not scale in the same way because a machine cannot get to be responsible. And I don't actually know if there is a future where you can abstract away human failure so much in an in how we run engineering.

that now the entire company now no longer cares about who signed off on a pull request or something like that. They be d that we automated in the same way, I think, as we are sort of automating T shirt creation. I I just don't yet see that. But So here's the thing, I think one thing we software engineers or or IT people underestimate is just how freaking complex the world.

And how much human squishiness is in each little nook and and granny and and and corner, right? So we're we we're thinking, Oh, we could all we we were now able to automate that thing. Uh now we can automate everything, like every bit of knowledge work. But but we as software engineers are so bad at becoming domain experts that we don't see all the non machine parts that go into a workflow.

And we are running into the same fallacy here again. We we are seeing models doing incredible things. I'm not disputing that. Like this is for me this is like wow. Basically all my research in the two thousands is now null and void because Transformers can do all the things.

But we are overextending that to to everything, like we always do in software. Like like we did in EdTech. Yeah, we have tablets in classrooms now. Sure. N now it's soft. Education is soft because we have now computers. Um Well in in fact I've heard I don't know which country it was, but they're now rolling back. Yeah. Sweden they're they're taking the tablets out from the cloud.

Yes, turns out if you do some scientific investigations into the tactics and effects on pupils, if you do just throw a bunch of tablets into a classroom, close it and hope for the best. Turns out the best is terrible. Um so yeah, I'm that for me I think the biggest takeaway in the past two to three years is the hype is Terrible. Uh because it dehumanizes everything. And uh I wanna not be part of that circus.

Well, speaking of not wanting to be part of the series, let's talk about Pi, which is uh which is a very popular Let me get my clown nose. And also minimalist uh coding agent. C can we start with the the backstory of why you decided to build Hi. at a time where there were already uh agent harnesses. Around, right? Because they were suboptimal.

Yeah, sure. I so I I was uh a believer in cloud code, uh just because they kinda created that whole genre um through the invention of a genetic search. I mean invention. There were precursors to that and shoals of giants and so on, but they were the first that packed it packaged it up in a really compelling package. And at the time it that fit my workflow really well. It was simply it was predictive sand the LLM um But everything around the LLM was kinda nice and tidy and easy to understand.

So we're you were a happy user of claw code, right? I was super happy I was proselytizing it. But eventually the team started dog fooding and Getting more and more tokens, I guess. And kinda increased velocity and team size and with that came more features and much, much, much more bucks.

And I personally like simple tools that are stable, um, that I can rely on, even if they have non deterministic parts, but all the deterministic parts should be as stable as possible. And that was just not the experience with Cloud Code around Summer two thousand twenty five. Mm-hmm. So I kinda soured on that real hard.

So they take away your control of the context. They would inject stuff behind your back, which is bad. And then your workflows that used to work sh stop working because there's now a system reminder that you don't even see in the UI um that will modify the behavior of the model. They would also do this to the system prompt. I had I I reverse engineered I mean, I wouldn't call opening an obfuscated JavaScript file and unofuscating it reverse engineering.

coming from a more low level background, but I reverse engineered cloud code during the summer of twenty twenty five and build a little service where I can track the progression or evolution of the system prompt and tool definitions in cloud code. And it's like every release it was like messing with stuff.

Cc history.marioSechna dot AT if you want to see that. And uh yeah, that that just messed with my workflows and I don't appreciate that. If I commit to a development tool, I want it to be a stable, reliable thing, like a hammer. I don't want my hammer to break.

a different spot every day. Yeah. That's terrible. So that's what happened with Claude. But again, I this is not like I'm not roasting the team. I think they're uh some of them are really nice people I got to know on the internet. They're just dog fooding and that's perfectly fine. We need somebody who like goes the the full velocity kind of way. I yeah. I but I don't want to work with a tool like that. Because I can't get worked up at that.

Sounds like the move fast and break things to break things was not for you. And uh then I looked into alternatives and AMP Android came out around that time, I think. Pretty early in two thousand twenty five. I don't remember AMP was AMP was early. I think they they sort of spun off from the same experience of taking because I think AMP was around when Claude Code came out. Mm. Yeah, in any case I looked into those harnesses and they were super good. Um they were just super expensive as well.

Because none of them could basically use what made cloud code enticing on top of it being a cool tool. um the subscription. And that works in an enterprise setting where you're paying by token anyways. Um but it doesn't work for the small tinkerer in the garage. While I'm not a small tinker in a garage in a financial sense anymore, I kinda still relate to that community and I would like to use my subscription with something, so I looked into open source alternatives and found open code.

But while that kind of wipes me with my OSS roots, um, it too did stuff to the context I didn't appreciate behind my back. Um pruning tool results after a certain amount of uh uh tool result token output or asking an LSB server after every single edit the model makes. Uh if there is an error. Yes, there will be an error because the model isn't done yet with its work, so the code doesn't compile, so the LSP server will So like reaching out to L Speed the language, um

Language server protocol server. Yes. So um when you go into BS Code and you type some TypeScript, you have like in the bottom some error diagnostics and that comes from an LSP server for TypeScript. And Open Code runs an LSP server on your behalf in the background and feeds the model with uh diagnostics from that server on every edit.

But we as programmers, how do we work, right? We we go into one or more files, we add it line after line after line, and only then look at the errors that resulted from that. In OpenCode's case or in other harnesses cases that that also support LSP, the model calls an edit tool to change lines.

And they would inject the diagnostics after every edit call. And that's just not smart because now you're confusing the model with you have an error, you have an error, you have an error on the model. It's like, Yeah, I know, I know, I'm not done yet. Oh. It's not yeah, it's not great. Uh anyways, TLDR uh open code wasn't for me um either. It was also I had to fork it to modify it, which I don't think should be necessary. So then I just thought, how hard can it be? I built my own little thing.

And then your own little thing is pretty min minimalistic. What does it use? What's the basics of of pi? The basics of Pi are um my own abstraction over all the LLM provider APIs because I didn't like the Vercell SDK, the Bercelli SDK for various reasons. Armin kind of wrote a blog post eventually about that as well. It's obviously Good to use. Lots of people use it. It just didn't fit my old man um sense of abstraction.

This is the beauty of software and open s uh especially open source. You can build your own, always. Yeah, and now with agents you can even do it faster and produce terrible complex software. No, so I I I built an abstraction with that, then I built a little abstraction for an uh a generalized agent loof with tool calling and streaming, all of that.

I built a bespoke little tool that doesn't flicker or not a lot and then I tied it all together into a coding agent that looks like clot code or codecs or whatever you have. Um that's it. And the extensibility comes from the fact that this minimal core has So many hook points uh that you can basically hook into with a simple TypeScript module.

Um that gets loaded into the same node process and that allows you to do things like provide the LLM with custom tools, uh do your own compaction implementation, uh fully revamped the TUI itself. You can modify everything in the TUI. So if you have a special The terminal UI?

Yes, exactly. If you want the TUI to behave differently for a specific workflow you have, like say you're non-tachie, uh you can change the TUI to become whatever you need as a non-techie. And I have a couple of non-tachie friends that did that. Because they don't need to know how to build this, they can just ask Pi to build it and Pi will modify itself. Oh, so this is the thing, right? So you can ask Pi to modify itself because of the extension points and it can write code that extends itself.

And it's trivial, but it's uh a big unlock. Is this what you meant when you said that? For open code you needed to fork it to to modify it. It doesn't have this It does have a plug in system, but there's not a lot of extension points and was very rigid. I think they changed it recently. I think it's much more open now. Um I I haven't kept up with it but Matip, I don't know. Outside? It is Pi Stars has this very minimalistic thing as I understand the the tools it has is read.

You'd write edited bash. That's all you need. And th and then you can actually like start to make it your own like, okay, like a at at what what are examples that people would have? Pi doesn't have MCP, people just ask Pi to build MCP support into Pi. Pi doesn't have a plan mode, Armin goes and my plan mode must be fantastic, bespoke, and super. But he has like five implementations of a plan mode until he realized plan mode is entirely useless.

Other people just like messing with the UI and making it their own like a different visual style of the editor box where you enter your prompt, stuff like trivial stuff, more cosmetic stuff. Um other people have re-triggered it for a full blown RL environment for open weights models where they use Pi as the agent that does uh that that is part of the RL execution environment. You can do anything really.

What drew me to it beyond like actually using the library abstraction was was in fact the the custom tools part. one moment for me was um over Christmas again like many people had some time and I tried to build other things and I and Peter was talking to me in in November that he's like

vibing without looking at code, more or less. I don't know exactly how he said. Like, but like he's like he can do this now. Like, okay, I I wanna build a thing where I don't look at the code. I wanted it to not look like slop. So I felt like I wanna I wanted a version of it where like afterwards, like even though I don't really look at the code, I it should look like what I would have written. And so and I wanna make a game. And so then I I basically

started the whole experience with like a just basic pie. I was like, we want to build a game, but actually before we build a game, I want you to set up the code base in a way that you can validate the changes that you're making, but also I can see them. like a like a two-pronged kind of approach. Like I wanted to be in the loop, but also have the agent be able to validate itself. And and what what sort of emerged out of that was

Well, first of all, like it builds itself some debugging tools into the game so you can make screenshots and like s run a simulation and sort of dump out state and read it again. But also Pi can can show images in a Tui and And and I added so a bunch of like I talked with the clanker to figure out like what would be interesting things to do, but we we ended up having like a

All the screenshots I can tap through quickly in the UI or I can Pi has also this great feature I can reverse to an earlier state in the conversation and then it can branch within the conversation to build a bunch of stuff around that. Um because like these these sessions, especially with screenshots and it became very token inefficient very quickly. It was actually one of the other things that Pi was rather quickly rather good at was having a lot of screenshots in it.

Because OpenClaw people uh had a lot of screenshots in their chats and OpenClaw is using Pi. Yeah. So we had But but having this like it it felt really magical for me to actually treat the problem as I don't know what the right way of engineering here is.

But very clearly part of it is like I should be in the loop so we can figure out like how to specifically for the problem at hand do that. And and it turned out like for web project and computer games and some of the other things I tried, they're kind of different. But very many of them are sort of come down to similar thing where like

The agent interacts now with my program and should do it in the most optimal way. And I wanna interact with it in conjunction with it interacting with the program. And the entire experience should be as little confusing as possible. to both me as a human and to the agent. And I found it very, very fascinating just to see how that emerges. Where like your tool all of a sudden when you launch it in this program looks and feels different than if you launch it in the other program.

I really like this point Armin made just a few seconds ago, that AI works best when the engineer stays in the loop and the system can actually validate what changed. And this is a great time to mention our seasoned sponsor, Sonar. AI can now generate code faster than you can verify it. Sonar, the makers of Sonar Cube, sees this leading to series gap in verification.

With the rise of coding agents autonomously writing code, verification is no longer nice to have. While the latest coding models are extremely intelligent, they also are error prone, and they don't fully understand your code base and your context or your objectives. This is why verification must be mandatory in agentic workflows. SonarQ provides a zero-trust, multi-layered approach to code verification that is consistent and repeatable. It analyzes semantic syntax.

Data flows, and architectural boundaries at agent speed, acting as a critical trust and verification layer before any code reaches production. Covering forty plus languages and seventy five hundred issue types, SonarCube is the most comprehensive code verification platform available.

And with easy integration via MCP, CLI, and hooks, it fits right into your existing AI tool chain. Let agents move fast and have Sonarcube as the independent, multilayered verification for safe, reliable, and auditable agentic development. head to sonarsource.com slash pragmatic to start verifying your agentic workflow today. I'd also like to talk about our presenting sponsor, Statsig. Statsig builds a unified platform that enables both experimentation and continuous shipping.

Built-in experimentation means that every rollout automatically becomes a learning opportunity with proper statistical analysis showing you exactly how features impact your metric. Feature flags let you ship continuously with confidence. And because it's all in one platform with the same product data, teams across your organization can collaborate and make data-driven decisions. To learn more, head to statsic.com/slash pragmatic.

With this, let's get back to the episode and to the topic of general versus purpose-made tools. I mean... I spend a lot of my youth on construction sites to earn money and you don't use a hammer for all your problems at the construction site. You have a screwdriver, you have your hammer, you have your drill, you have whatever. And I think in engineering it's kinda the same. Um I I'm not using the same tool for every task I do as an engineer. So now if I use an agent

I don't want a general agent for every task per se. I want a specialized thing where I know the performance will be top notch for that specific task because we built the harness in a way. that the agent can be most effective at this this task just because of the construction of the way the the harness is constructed. And that's what I wanted to enable with Pi. That said,

I'm probably the person that has the least amount of modifications in Pi. I have like two extensions that I use and they're trivial. They're basically just if you see a URL that looks like a GitHub issue or pull request thing. pull down the details via the GitHub API and display me a small little widget on top of the editor that gives me the issue title, the author account uh and a link to the That's basically all I do. Well but it might work for you as as a minimalist.

Yeah, I mean it that's how it works on the on the Pi Mono repository because I might have two or three of them. of sessions open in which I process an issue or pull request. That way I I remember what session what the session was about. But sounds like you all also made your pie for that for working on the Pi Mono Repo specific one and if you if you were working on uh if you went back to building games, you'd probably

I never thought of the fact that you might want a different harness for a different task. I guess we just kind of assume that most developers you work on your main thing at work. You might have a site project and just experiment experiment with whatever, but This button this m I wonder if this is a new new thing that we we could never have. We could never have custom tools for a project. That that just sounds crazy, you know.

Here's here's the like my intuition is this. I think where we are going is software that modifies itself. on behalf of the user's wishes and needs. And the agents can do that now if you give them enough rope to modify. And I think with Pi that is my first foray into this kind of self modifiable malleable thing. Um just for the coding agent sector, but I think this a this this actually can be extended to all kinda knowledge work.

So I agree. For specific tasks within the broader set of knowledge where obviously dehumanization and so on, you know. But yeah, I'm the next plan here is actually to have an alternative user interface to the TUI because the Tui is obviously limited. And the best of the

Alternative stack is obviously the web because it works everywhere and can do anything. So once I have that built out, that then it really becomes interesting because then you're not limited anymore to the line based rendering of a terminal. Now you can do really, really interesting stuff. And so yeah, we'll see how that works out. And one reason that I learned about Pi b before I I knew that uh it was this minimalist interface is how OpenClaw is using Pi. How did uh that come?

And in October I started building out Pi and Peter started beating building out Var Relay, his little WhatsApp assistant, so to speak. Oh, that's how it started. Yeah. And he was in search of a agente core he could reuse or copy. I think it started out by him taking pie and

c cloning it and calling it tau and then modifying it, but eventually he he got tired of having to maintain that. So he just said, I'm gonna use your stuff and that's how it ended up being Pi wouldn't have compaction if it weren't for open call. I specifically built that because Peter was crying in the in chat and I need a compaction. Okay, you get compaction. But I'm gonna tell all my users don't use compaction. It's bad for you.

OpenClaw + Pi

Yeah, but that's I guess the beauty of a building on top of open software, one another. Right. I mean, it has pros and cons, yes. I now get to enjoy all the open claw instances that think uh bugs in open claw are actually pie bugs. So they autonomously send me a gazillion issues and pull requests without their users probably even knowing and I get to deal with that in my open source. So that's not that's a negative side effect.

Well so you're you're really on the receiving end of of this, I guess. I mean just just like OpenClaw itself is, uh which is much more exposed to this problem. I mean there are tens of thousands of issues now and there's no way they can get a good uh grip on that. But but how are you dealing with the fact that you now have OpenClaw, just AI autonomously opening uh things on your repo as as a maintainer? And do you build tools to battle this and try to close them out?

Build a tool for open claw ones which embeds issue and pull requests into a 3D space so I can see the clusters of similar things that agents would have sent to the repository. And then I can bulk select things and close them out in in Oh really? So you actually have a three D like visualization? OpenTriver context at I think it's less crazy now, but end of December to I think mid February.

Like I mean it was exploding obviously, but like this explosion almost like directly translated to I I I was on this repo refreshing pull request and the number went up. Yeah. Yeah. We we we actually tried to contribu contribute and help out Peter a little bit but I immediately gave it. I didn't know how to do anything useful there. I was looking at this and I was like, this is a w the type of software engineering I'm just not used to.

I I I would fix two things and spend an hour on them and then five minutes after I committed and pushed it, some clanker comes along and just reverts my fixes. And this is not how I Can we talk about the name of the name Clanker? Oh sure. Um so Clone Wars? Star Wars? I I actually never watched it, um, but uh kids of friends of mine watched it a lot while we were visiting them, so I kinda through osmosis got the lore.

And there is an army of robot robots and the Jedi would call them clankers or people would call them clankers because when they move they clank clank clank and yeah. That's the origin of that. Yeah. So an an AI, a droid, uh

Yeah, exactly. Yeah. But coming back to the how do you deal with the influx of agentic pull requests and issues, I just auto close every pull request. A human agent doesn't matter. Um What I do is if I haven't had contact with you previously, my GitHub workflow knows about this because if you had you're in a file in my Git repository, your account name. So if you're not in there and you send me a pull request, your pull request gets auto-closed. Mm-hmm.

"Clankers"

And then my little workflow posts a comment on a Your pull request that says, Hey, thanks so much for contributing, really appreciate it. Could you please open an issue in a human voice? Uh no longer than on a screen's worth of text? And if I like it, I type looks good to me, and then that account name gets put into the file. And the next time they send a pull request, they pass. And it turns out agents don't r see the comment my GitHub workflow posts underneath the pull request.

So this is a great filter for filtering out agents and keeping the humans safe more or less from from It's this is interesting. I I wonder if this might be the like an unavoidable future where like we just need we need a way to separate is this coming from a a human with an intent or an AI? I don't necessarily care if if if it were actually a good PR, then if it came from a machine it's it's it's actually

Fine ish. I think what's interesting in Pi is like and and OpenClaw even more so is like it it accumulates pull requests. Well actually there was no intentionality behind it at all. And so the the person that Dispatch the machine. Didn't actually care that much. I didn't even know about it. Or didn't even know about it. And I've done open source for many years and th there was also there was a

There was a big difference between someone that's sending a pull request up or like an issue and was like, hey, please fix this. But actually didn't care enough. to even reply to questions anymore. Like this is this not uncommon. And then you don't actually have to fix that.

But you have to close it out because like maybe it's it's still useful input, but like it clearly that person wasn't caring enough. And with the pull requests, it's even worse now because they come in so quickly that many of them cannot be merged anyways without manual resolution of the conflict. And there's a there's a lack of back pressure mechanism. Because even I as a human, if I see there's like

500 pull requests open. I was like, I probably will not contribute to this thing now because at worst I will make it worse. Yep. And and I I think previously in open source you had the people who would just send issues and be very entitled and say you're the worst person on the planet if you don't fix my little issue. But that's fine, that can be handled. And pull requests were kind of special because it needed a human to invest quite a bit of time to produce them.

You don't have that anymore. You just have people Oh, this this should be easy. Uh agent, please do a thing, make no mistakes, send it to this repository and that's just not going to happen. So basically what we need are bottlenecks. I'm not necessarily I don't necessarily need human verification or verification that you're human. I just need a bottleneck that allows me to process the amount of

incoming things as a human. Because in order for Pi to not deteriorate into a pile of garbage, I still believe that it needs me and other capable people reviewing at least the important code. And for that I need bottlenecks because otherwise I can't deal with It's the second law of thermodynamics, right? It's like everything degrades towards chaos, and you have to put extra energy into to keep it away from this. uh from this outcome. And we don't see and feel like

the pain of the code base anymore if we stop looking at it. And people don't feel the pain or like they feel no restraint anymore and and it's it's Th the issues are also interesting because on the one hand it is something great about someone doing an investigation and sending you a description of that. That can be good and can be bad, but it looks

Very similar. And like it takes quite a bit of energy to tell apart a good and a bad AI generated issue request. And unfortunately, like most of them are not great. But some of them actually good. And that's also kind of it's weird. Like all of it is weird. I I really don't know what the feature of open source is in many ways because like the

A lot of open source really worked because people piled out on hard problems and so they congregated around it and said, like now we need to have a good database. So we're going to put all this energy on building a good database. It's like the the value of open source came from there's some heart problems and we're going to throw our energy together and we're trying to figure out how to solve it.

And and now it feels like open source is all about like throwing stuff up. What what really grinded me is so mad. was people particularly like a lot of genetic engineering right now is like building more stuff for genetic engineering. So it's like it's Uborus, or Uborus or what I call it.

And and I I see this tweet and it's like, oh, I solved problem XYC and here is my solution for it. And you click on this thing and it's like it's 48 hours old. That person probably never used the thing that they built. I would like to suggest to the viewership to look at Armin's GitHub account over the last year and what happened there. Yeah, I built a lot of the stuff, but I don't then go on Twitter and say like, Hey, I solved the problem.

Right. It's like I I have a shit ton of vibe slop on my GitHub account and I wish I could l mark it differently because like maybe there's some utility in it. unless you're going to actually have that code base still be there a year, a year and a half from now and someone is still using it, the utility of that is actually not validated in a way. And there's so many markers and and metrics you can look at now for GitHub that really demonstrate this this explosive growth of it.

But if you were to then h maybe find some other number to see like how many of the things that are being created are actually turning into like really fundamental pieces that can sustain open source communities that can s that can actually deliver this value that scales Amazingly, we haven't actually created many Vibe engineered projects. that have become that. Good night.

I like how you mentioned energy and how open source always worked if we just think pre AI, again, let's say Linux, the most successful or or widely used open source project. It has both an energy and a structure. You know, people come in With intent that they want to add something. They have a process where it goes through. There's human trust at every level. There's a little pyramid. And in the end, it all goes back. His change request goes up one level. And in the end, Linus uh does the cut.

But there's a lot of energy, there's a lot of intent. Uh there is There there's a lot of humans and it was always hu about human energy. And now we suddenly have this AI which It's just tokens right now they're who knows how much they're subsidized or or not or it's just machines doing it and suddenly you'd be able they create plausible things that that look like human energy and it's hard to differentiate and suddenly just like throws this wrench.

Open source and AI

I I don't think a lot has changed for open source. Okay. Um No, yes, uh but that's just a number. Uh the the amount of as you said, the amount of actually useful and maintained projects It's probably not changed a lot. So you're saying that the the ones that were there they're so useful, I mean

Not even the ones that worrier, there might have I mean there's a specific rate of new open source projects that survive longer than two weeks. Mm-hmm. That's always been the case, right? Mm-hmm. So now we just have more projects that die after two days than before. But we still have the same amount of projects that will have a long term viability just because there are humans that actually care to maintain the thing over a long time, build a community of humans that support the entire thing.

Build an ecosystem around the entire open source project. You're saying not you're not believer in tumults. No, I mean...

Good job, Meta, butting that up. Super useful. Um no but I I think at the end of the day we we are kind of freaking out when we don't actually need to because mm Apart from the fact that I personally can now generate code faster at a speed of light, for me building an open source project and that entails not just the code, but the community around it, the spirit around it, the ecosystem around it, d nothing changed.

Um, what changed is mechanical parts. I I need the bottlenecks to deal with the influx of exponentially growing uh agents pull requests, whatever. Um GitHub itself is under immense pressure because now it's not just humans hammering their infra, it's now billions or millions of open claw instances hammering their infra. Yeah. Everybody complains about GitHub going down. I actually think they're doing a pretty Good job. Like that's a lot of traffic that's coming their way.

Basically Christmas. It's basically open call. So yeah, I I I I would be a little bit more optimistic. We're just in the Messing around and finding outstage at the moment, and everybody wants tokens to be a KPI, just like lines of code used to be a KPI. We've seen this.

Spe speaking around of of things that don't change and messing around and finding out, you you wrote a uh a tweet or or you wrote somewhere that your biggest enemy is complexity. It's also your agent's biggest enemy. Can we talk about that? Very simple. If I have a six hundred lines of code, code BIS, and my agent uh can at best be effective effective up to a context window size of around two hundred thousand tokens. How much of the code can the agents? A third, right?

Right. Um if you manage to get all the relevant code for a task into that context window. You're probably okay. Although that is a separate project, an information retrieval pr uh uh problem, which is not solved and which Argentic Search also doesn't solve. That is does Are you sure that the agent finds all the relevant code it needs to find to to fulfill a thing? That's also where all the garbage code comes from, because it doesn't see all the thing in.

Complexity as the enemy

In this case, let's assume the best case, information retrieval is solved, everything fits into the context, agent does a good job. That's not the reality we're living in because now the agents spit out so much code that they themselves cannot possibly read into their context on a new task anymore. You know what what I mean? Mm-hmm. Yep. They develop their own context window. Yeah.

Exactly. The complexity they add is their own worst enemy because eventually the copus will be so big and so complicated and so interconnected. Um that the agent has absolutely no way on a technical level to ingest all the context it needs to do the new task. And I would like to point out that the agent has learned all of this garbage from the internet and from us, because on the internet there's all our old code. While there are some pearls, uh there's also a lot of swine.

Um, because we have a gazillion GitHub projects from the olden days where we just tried out things. And because uh instances like Linux or any other really well maintained and well written open source project are minuscule in compared to all the rest of the garbage. And a machine learning model will kind of converge towards not the well, simplified to the mean, right? And what is the mean then? It's it's not the handful comparatively of excellently engineered projects.

It's all the garbage on the internet, all the cargo culting, all the trend type of the day kinda stuff and that's what we get when we let the agent do all the things for us. Yeah. So we have this problem of the things are getting more complex, which slows agents down, which will in fact uh impact quality, which we we were just talking about. But Armin, now that you're you're building your own startup.

Well y you two if you're building your startup now. Uh how are are you and you're working with agents, right? And they they will have these things. How are you dealing with Generating code building products, balancing quality, tech depth, complexity. Oh we're dealing with that. We're coping, we're not dealing.

I don't know if I wrote this in the blog. I definitely have it on my slides for for the for a conference here. It was Um I enjoyed the time from from April to about October immensely because it felt like I can do so much, but also like there was no heightened expectation. Like the world has not yet gotten used to this idea that everything has to now also move at ten times the speed.

Building an AI-native startup

And there was a there was a moment of time where I felt like like we we worked in this vibe tunnel thing in the beginning and it was like it felt so much fun because like I I have time now to play with the kids and I just prompted a little bit on my phone and like it felt Vype telling was where you could set up with your phone talking with your Yeah, it was just like a More wasn't strategic.

Terminal basically. Yeah. And it's not that we did much with it, but like I it it it had this like happy vibe and like I know that I spent too much time on the computer, but it's like it didn't th I didn't feel any pressure. But now it's like there's like we're collectively feeling like everything has to ship faster, it has to like iterate faster. Like the the the the baseline that we want to achieve in terms of fidelity and everything has to be higher. And so now it feels very stressful.

E even in your own startup. To some degree you cannot you can be the most stoic person in the world and it's still going to get at you in a way that I'm slowly learning to m work with my own emotions in a way on on like on dealing with this. But it's I I find it very, very hard in a way to

Because like I was I was used to things working in a certain way and and I I I knew how I do some stuff and and then I fell a little bit too much in the trap of like giving in to the machine and actually doing things in a way that I normally wouldn't have done things. Crit. It's definitely a genetic regret. Gente recret.

And so like the qu quite frankly the answer is like I I I feel like Now with a little bit of power of hindsight, um, learn some things that I wish I would have learned probably November. Well I mean like a lot of it is like really the recognition that if you there is no back channel to the to me or to any other engineer when under normal circumstances there was a back channel. There was this this this feeling of like things are

not quite right in the code base. Like there was this now the changes harder and like the complexity like do you sort of see then the complexity of the pull request getting higher, but like if you rub a stamp it then like what's what's the back channel there. And so like this this mechanism, this back pressure, this friction in the code base, you don't feel when you work with the agent.

I think there's a way to kind of measure it and um Like if I scan through my sessions on a project from start to current date. I think the frequency of curse words increased. Because the agent starts messing up more because it itself cannot deal with the complexity it added to the project. And I would be actually really interested in whether this measurable, because I feel it uh in most of my projects now. That occurs a lot more.

But you you mentioned friction in the software. You didn't say tech depth, you didn't say complexity. Wha what what what what what is this friction? Because I I don't remember us talking about this pre-AI at all. So I found this ironically kinda funny and it it's kinda sad but so I will not name any names but

Uh there was a uh what I what I assumed was an incident related at least in parts to Gentic engineering on on a company uh where they they shipped out a configuration change that ultimately resulted in a security issue. And look things happen. But the link that I saw on this

had the social preview of that company's tagline and the tagline was ship without friction. And and that gave that gave me really gave me pause because like you I I know as an engineer, like we used to talk about like you gotta get rid of like all the things in the way so that you feel happy shipping stuff. But there there always were changes where you really wanted to think. It's like, do you wanna drop the database? Like, do you wanna r merge this migration which might take a table lock

that could potentially take you down. Right. It's like there's there's this this moments every once in a while where you really you were really supposed to think and and you and people created checklists or people created um like like mechanical gates that would where you would have to confirm something. Like there's there's certain things that we used to put, particularly if you run a SaaS company, did it put stuff in so to slow things down or

And in in some of the best engineering teams, in order to mature a service, you have to define an SLO. You have to define um like ex yeah, expectations and like if if your service is supposed to be critical, but like there's some other stuff that unlocks on this sort of tree of

of requirements that you have and and and like and a lot of engineers feel like, Oh, this is also this bureaucracy. But like the reality is like if you do this correctly, then it saves you time and it like it makes you happier. You're not waking up at three o'clock in the morning. Like all of this is useful.

It's is like friction injected to deliberately slow things down. I guess the easiest example in any decent sized company, you have services based on tier based on criticality, the highest tier uh software. now needs to have let's say two or three code reviews or an approval from a director to do a configuration change, which again all slows down, but it's kind of like we know.

But th this is on purpose. Like by adding this friction, we want you to think, do I want to push through this friction in terms of time invested or effort or having to justify things, et cetera? It makes you think about do I really want to add this to the code base if I know that the end effect will be that it has to go through this entire chain of order. Um so we we're coming back to saying no To avoid pain going through that process.

And and then taking on the pain when you know that you you have the convictions you you have the the backing, you have the confidence as well, right? Like so typically when it's a high friction thing, let's say a tier one service or a highest tier service where a director has to sign off. When you're a new joiner on the first day and you don't know the context, you probably know that that's a pretty large ask and you'll probably

socialize, get get buy-in from a from an experience and to say like, oh, this is the right thing, you'll go with them, right? Back to human dynamic. A little bit. I think the the the the thing is like it's there's a there's a there's a very delicate balance in the whole thing. Because like you don't want the friction to be just an accident of having created bad developer experience, right? But some things look the same. Like

And and but they are but they were deliberate, but they maybe were not sufficiently documented. But but there's this this feeling now like you'll get rid of all the friction so that the agent can be very autonomous, so that it can run many of them simultaneously. A lot of it comes from that. are actually rather slow. And the only real time saving that you get from it is parallelism. And so Somewhere there is is this trap. I feel like a little bit more.

experience now in managing the trap, but I I don't have the solution for that either. And I I will not like say that is an example code base where I felt like really, really great about the stuff that I built, except for pre-existing libraries from before Atentic days. where v where I still feel like a strong emotional attachment to them and I've much more careful about doing them than than any of the code that we other Empire uh to which I don't have like Oh no, there's there's Yeah.

Still no right access. Um there there's a lot of slopping pie, but I try to avoid it in the in the bits and pieces where I know that's important code. Like we have an HTML export functionality where it takes the current session and just spits out an HTML file that you can then host on GitHub and whatever.

I have not looked at a single line of code for that function. I don't care if it's broken if it looks right when it comes out. But then then there is the the agent loop itself or the the extension loading mechanism and all of that stuff and that's important and the way I deal with Ensuring that that has or at least trying to ensure that it has high quality is I refactor mercilessly because that Pulls me into the code base.

I need to understand what I want to change structurally, not just line per line and syntactically or whatever. I need to understand what's going on to do a good refactor. And I'm doing that every now and then like I'm doing now at the moment, prompted by wanting to add a new feature that's currently not possible with the current architecture.

Being in the code is the one thing that keeps the code base quality high and the complexity low. But that's against the industry wisdom of burning as many token maxing, basically. Yeah. That's uh that's that's an interesting one happening. But you just recently wrote on on the same theme a blog post called We All Need to Slow the F down. Can we rehash some of the thinking and what triggered you to let's just put it out there?

Okay. So basic is this okay, your agent can all spit out ten times more code a day than you can. But it also means it spits out ten times more boo boos, errors. Even if it has half your error rate, then okay, it's not ten times more, it's five times more. It's still more than you would speed up. So the rate of deterioration in your codebase has now increased. And now go dark factory. Now take a hundred agents that do this to your codebase. What's the end result of that?

So that's the first problem, right? And uh you need some way to review all of that code it now gets generated to fix all the boo-boos. but you can't as a human because as a human you're used to spitting out 1.5k lock a day and that's about the limit that you can actually

"Slow the F down"

review well, right? If your agent spits out ten times that No chance you can review that. And not all not all of the code by the agent might be important, like the HTML export thing, right? But even if the agent speeds up three to five K a day, you have no way of reviewing that in any meaningful sense. And then if you do the the armies?

Yeah. I mean And then the armies, this is interesting. So you call it the dark factory. The idea being that tens or hundreds or thousands of agents you give them a spec. they go and they break it up, they they organize themselves, they like the mayor and all all that jazz. They have the qual the QA agent, they have the you know, you you give them roles, you give them context and then

You give them enormous amounts of tokens and spend. And the idea is or the hope is that oop your software will be done in Definitely something's gonna be done. First your purse and then if Yeah.

Uh no, yeah, sure. More power to the people that make that work. I can't make it work. And the reason I think I can't make it work is because I still care about the quality of of my product. And I don't care if it's built by by hand or by agent. I just want the quality to be good, both in terms of How easy it is to maintain it and add new stuff to it on a developer side and on the user side.

All the companies claiming that all of the code is now written by by agents, yes we know. Quality is garbage. We feel it in our bones when we use your product. It's garbage. Um, so I don't want that. And yeah, basically I I think people need to turn around and say, Hey, what what are we even doing here? Um we have these wonderful machines now that can take away so much pain from us by doing the stuff we hate doing and doing that really well.

Why don't we start? By giving up some more free time to work on the interesting bits and delegating the stuff we know they can do. to them uh lar on large, like like uh across the entire organization. Find all the things that annoy the out of you and have the agents automate that for you. And then you suddenly have time to think about what do we actually want to build? What do our users need?

And if we decide to build a thing, then we can pull in the agents again and say, and we're going to polish the sh out of that. Because now we have the time and the means and the tools to do an excellent job. But that's not how we are working. We we we build an army of agents and install beats and uh make a big spec.

that hopefully will sh result in something basic. But here's the thing, we we talked about where did the agents learn their knowledge from, right? The internet. So garbage to mediocre. Now if you write a spec What what's the best possible spec you can have? The best possible spec is well you you define exactly how it should work, you give it test cases. Best possible stack spec is the software it's Oh, I see what you mean. Yes. Okay.

Okay, you're at a spec that's not the software itself. So that means there's a lot of planks that need filling in. Yes. What do you think is the agent gotta fill those planks in? M most likely from seeing stuff it's from his training data. Yeah. And we already identified what the quality of that training data is, right? Garbage to Miocre. Well and even b even before AI don't forget like

Stack Overflow had a really big criticism because it there was this thing of like, Well, you control C control V from Stack Overflow and oftentimes there will be some answers where the first answer was either not correct or not correct in many cases. Regex for email was a good one. You emailed regex for email, first page was Stack Overflow, everyone just copied the first solution. And I think underneath number three it was said it missed a bunch of cases.

But but but here's the thing though, I I'm I'm not saying agents or humans are better. They are clearly not. But agents also don't solve that problem. And if you then don't let just one agent that's already ten times more productive as you do the thing that it's bad at and that you as a human are bad at. But a hundred of those, what do you think is the outcome? It's just very simple math. Let's talk about another controversial topic, MCP versus C L I. Oh my god.

It's it's it's coming up and you know right now I'm hearing a lot of people really going for CLI is the future, and I think I'm sitting with two of them. But also MC MCPs are also really popular inside of large companies, especially when you talk with a bunch of people working at at large companies. It seems MCPs have found a real product market fit inside of larger enterprises. Despite People might think I don't actually hate MCP quite as much. Oh wait, d we have it on recording.

Yeah, no we don't deal in absolutes. We're in sif. So I my fundamental challenge with MCP is that I think but first of all the spec is very complex, I think, for for it, but it's like this is this is just generally how specs happen to be. So it's a bit l the the the korba of its time. So there's an inherent complexity in it. But if you if you were to say like okay, so w what is it really doing at the end of the day?

MCPs vs. CLI

it it's it's authentication and it's sort of invoking some stuff and MCP even theoretically there's structured responses but MCP for the most part is run some stuff, put stuff back in the context and then work with it. So it fills your concept very quickly. And there's a Cloudflare has this code mod MCP, which I think in principle I really like. I have an MCP um for testing which

Uh is a JavaScript interpreter that gives me access to the Google API. And between an MCP like this and a scale, there's not a huge difference because the scale also needs to be in a system prompt so that defines it. But the the agents are just very, very, very, very good at running code. And MCP is not quite running code. It's basically rack.

It's like input in and do some stuff and maybe some state transition that the model also doesn't see. But it is in that sense just in it's a hard problem to solve, but it does solve off it solves a whole bunch of things. Um I want it to work. I just still don't get it to work. Like I wish it could work and

I I my my suspicion is still the glue is is has to be code execution. But because MCP servers are largely not defined in a way that the model actually understands them, I haven't found ways to compose MCP tools. Reliably. I f I I found ways to make the MCP itself be composable. by having the MCP be one tool run code, but I haven't found ways to then orchestrate larger ones. I want it to work. And and I think it has found its niche and I don't think it's going to f go away.

I I think it's just a victim of its own success really. When when the whole thing started I think it was in October twenty twenty four. It was more or less a solution to get external services into a consumer facing chat app. Yeah. Connect your emails, connect your OneDrive, connect your whatever. And then the IDs also took it over because it was convenient. Yeah. The cursors, the windsurfs. Yeah. But but I think the origin was basically the consumer side, not not the developer side.

And I think that's a totally great use case. I don't want my mom to m having mess around with code generation or whatever to invoke some API or call some API and so on. So perfectly fine use case. And then Developers I had also picked it up and thought, Oh, this is a great way to provide tools to my LLM. Tools as in in the system prompt somewhere that is if you wanna call this tool, provide this JSON payload, and you get this thing back, right?

And that kind of felt right at the time, because if you read the tropics uh documentation, um they would say our models can deal with about thirty to forty tools in the context. And even that wasn't. twelve twenty. breakdown, but doesn't matter. Um but but there was still like a yeah this can work if you kinda keep it small and contained and very specific to your use case. And then people started building MCP servers that would just basically map an entire open API spec. into a gasillion tools.

Yep. And that's where it all fell apart. So that's the first problem. Very bad MCP servers from big corporations that thought, we need this now. What's the the the fastest thing we can build? I just push the open APIs back of our APIs through this thing and make it an MC. That's garbage.

The second problem is that it's inherently non-composable. Um, if you want to combine a tool out the MCP tool outputs of two different servers, they need to go through the context. The the model itself needs to do the data transformation, the c the the the the Yeah, the composition of the same thing. And compared to this with a seal, I it's a pipe. Right? Exactly. You the the model only sees the end result and it is it is super free in how it

Massages that data. And that's also the idea behind code mode. Basically, it's a hack. It's basically okay, we now have MCP, we know it doesn't work for this specific use case. We will have multiple sources of true uh data and you want to combine them, but don't Pull that through to context.

So let's build code mode. And code mode is basically we take all the MCP servers, we expose that as functions in TypeScript, and then the the model can actually just write some code that calls the MCP servers and then does the composition in the code it. It's it's like how many interactions do we want here?

We can just let them on write the code. We we don't need the MCP server. And then the third part is David from Century is a big proponent of MCP because it's of the off thing. And uh honestly that's again for me super valid. But uh the the the model itself kinda doesn't make sense anymore. I think that there's a there's a world for MCP two. Which is ironically maybe based more on I said there's a company called Stainless, which basically generates SDKs out of uh OpenAI spec.

And I I'm I'm really warming up to the idea of like maybe it is an MCP that's entirely based on off plus Uh like uh I agree with you. libraries or or or like directly like HTTP requests against OAF specs because if you compose it together there.

And I think like one of the things that's also like kind of underappreciated and and the sort of SCC I think if you see Pi do its stuff because it's kinda transparent of the tool costs that it does, it's kind of magical at times like how creative agents get at large output.

Like for instance Pi when it when it runs a program in bash and it produces too many lines of code, it actually only reads I don't know what the the the cutoff is, but it reads the first couple and it's like oh if you want the rest of the file it's 20 megabytes large and it's in this file. And then the agent is like, oh, 20 megabytes, that's too much. I'm going to grab on the file.

Right. And they it they they get really ingenious in in how they are interacting with it. And like an MCP takes that away. The question is like how would how would you define MCP in a way where it wouldn't take that away? Where where it still has all of that magic and and and capability and and and I don't really know the answer because I think it's hard, but off need solving and and composability needs solving and

And I think there is a there's a bright future of of that kind of stuff. And and also like what Mario said, if coding agents wouldn't have become so popular. then the idea of code generation code running for like non code related problems probably wouldn't have taken off quite as much tour. Um but like m the most capable personal agents, OpenCloud being a good example of it, they're just coding agents.

hidden from you. And then that just naturally some random person who is not a programmer is going to say, how am I going to do this? And the model doesn't say like install this MCP. The model says like, okay, I can write a Python script that does it. And so you naturally have this in the sort of the crazy space, you have the adoption of of more code execution and in the compliant enterprise space, you you don't have that.

And I personally don't think that models are gone going anywhere else other than cogeneration going forward for any kind of a change agentic task. I think that's that's mostly a a function of there being a lot of training data for code generation. And code generation being a very easy means to control computers. Uh so I don't see a different paradigm there coming out of the model apps anytime soon.

So I think taking that as the assumption where the future is going, we just need to figure out how to make code generation kind of work within an enterprise setting with auth and all of the other enterprisey things that entails. So so let's do a fun Trying to predict a year out, which is hard, but in in twenty twenty seven, knowing some of these basics, just again from first principles.

where do you think these coding agents might be and the software engineering workflow might be? You know, basically this is just like again, speculation. We know we cannot predict the future, but Where do you think that there'll be a lot of focus in the coming year and we might in an optimistic case see some results in in tools and how we work and what's working, what's not working? I have no idea.

I honestly have no idea. I I could make up something that's probably not gonna happen. I I think the self maliability thing is obviously something I believe in. I I think we will see more of that. And and seeing Self mutable software. Yeah. Yeah. Including the tools themselves with with which we build the software. And I think that will expand. I uh not only to the tech sector but also to non-tech applications of agentic uh

Predictions and staying up to date

Do my is it dog years with your time seven? Is that how it works? So that that's that's basically the model I have right now of like how this stuff works. It's like when you ask me like what's going to be in in a year, it's like seven years. Right. And to me that makes it incredibly hard to have any sort of predictions about the future because like it's still not one year. I maybe now it's a one year from like people starting to use in cloud code, but it feels like it is much, much longer. Much

more more time behind. And more time has passed. And and I think like right now the the closest that I can imagine is going to be like we we we know that code execution and code generation and like this harness thing around it. This is

this is going to be it because reinforcement learning gets more of that data. And my my my strong hypothesis is that as more and more people are starting to wake up to this, you can do interesting things with agents, there will be a societal recognition also of how much more dependent you are on basically two companies. And I think we'll have a conversation about that part.

Uh w we should have a conversation about that part, particularly as Europeans, um, because we don't really have these labs over here. And so I hope we have that conversation. But like my my best guess is that we'll wake up to the fact that we are now I mean, engineering teams already now telling me that they have code bases that they think they couldn't maintain anymore without a machine. My my guess is that one of those companies will be public and all of

And it will be expensive. And I think that might actually dominate Or at least become uh a conversation that's much bigger than the question of are you using Pi or using clot code or or something like this. I I also see a we we've seen this with was it Mrs. The new cloud model. Oh no, SPOT. The new GPT model. They will only give this to select partners. So now we are seeing a split in who can get the best intelligence.

Or the perceived best uh intelligence. Yeah. That'll be interesting d dynamics. So both of you are working on AI on popular AI tools. You're building a startup that of course you're using AI and it it's also around agents. How do you both keep up to date on I've just seen things and it's not as easy to get me on a hype train as it used to be. But that comes with age. It's definitely easier not being in San Francisco because I would I I think that just

drive me crazy. Like I he I hear so many things from my peers over there and that's just like yeah, I'm not gonna go to San Francisco. So having a peaceful environment around you where it's not all about tech might be helpful. Yeah. It helps just going outside, climbing trees, going ice skating, and then looking back at what you did just half an hour ago and be like Why would I do that? That's just stupid.

I mean to the dep detriment of maybe people that are trying to stay in contact with me, I I got very good at not muting notifications, not reading emails and and that has in part become necessary, I think, over over the last year or so, but this like it actually turns out that passage of time sometimes clarifies stuff a lot because like if it's really necessary it's going to going to reach you again. Like I have a unhealthy Twitter addiction, which I'm not particularly proud of.

Um but in in terms of source of like interesting things that is still a thing. But I I try to now sort of consume it in a form of if it's really, really important, it will stay in the discourse for quite a while and I just wait it out. Um and if it's if it's there until i like three weeks after it originally happened, then yeah, probably something to it.

And and uh and and I don't need this three week head start necessarily. But it is honestly it's really hard. It is really hard to to deal with this because there's a there's a genuine excitement in it. And I feel like my my my tw more than twenty years of experience in that space of software engineering doesn't it tells me a lot of stuff.

But at the same time, it hits you in certain ways where you felt like there will be grounding and there will be something to build on and a strong foundation. And now it feels like. Well, seemingly everybody else doesn't care about that foundation anymore. So maybe you don't need the foundation. And for quite a while, it sort of works. And And that is sort of weird. And

I kind of feel like since we've been fun employed in two thousand twenty five when all this started that we had like a head start. Like I see all the excitement the two of us and Peter had in April last year. Nobody else no no, but nobody else at the time has kind of shared that excitement uh that much. And then the Christmas break came. And now everybody else has that excitement that we had in April, right? So now they are their ingroves, now they are catnipping themselves to

Immeasurable amounts of lost sleep uh and at terrible code bases. Um and I think it will self correct because it's not sustainable. Yeah. We we we did see this as as well. I did a deep dive in the pragmatic engineer uh early March. when a lot of people who were very excited in January about all and they started to use the new models, what they can do, they went all in at work or on side projects. In about two months time a lot of them were like

Hang on. It introduced all this complexity. It has these things. I'm not going as fast as I thought I would be, etc. So I guess there's just an actual thing where you you have a time anything new, right? A a job, anything. You have a honeymoon period where you've got the blinders on, which you should, by the way. And then you start to realize and maybe overcorrect. But but there's a natural thing where it In general, like it just takes time to see the outcome of your decisions. Yeah.

So so I'm not worried about all the dark factory and all the software is dead and SUS is dead and all that. I generally believe this is just part of the hype machine and that will self correct. Yeah. As closing, what's a book that you would recommend and why? Coat by Petsalt. Classic. I just love it. It's just such a great read. It's also for non techies and it's the first thing I recommend if anybody asks me, What's your job? I'm pointing at that and psych.

It has much less to do with computers than you think. And I read recently Breakneck. Uh, which I unfortunately forgot the author of. Um that it sort of goes a little bit into an exploration of like how China works and how maybe Europe and and the US are different. And I found it at at least um thought provoking. Well Mario and Arbin, thanks a lot for for this conversation. It was great to have it in person. Thanks for having us.

This was a really fun conversation, thanks to Mario and Armin. The idea of self-modifiable software really grew on me. Mario set how Pi doesn't have MCP support, plan mode, and many other features that devs would want from it, but You can build it into its own code. So far, it's working. Pi is popular because it notifies itself. I wonder if and when this concept of

Self-modifying software thanks to AI will spread outside of just the dev tool. I also liked how we talked about the observation that agents don't feel pain, but humans do. When a code base gets too complex, The human engineer feels the issues this creates. And this tech dept is what pushes refactors and rewrites. But agents simply do not do this. They just keep adding to the complexity. And in a code base where devs regularly feel the pain of the code base and do something about it.

The quality will probably be also better. And finally, the MCP versus a CLI discussion. This was a good one. MCP is more about offering tools for AI through context. And CLIs allow piping one tool after the other. Both Mario and Armin are more of the fans of the CLI, but in all fairness, MCV has its use cases, for example, inside larger companies. The right tool for the right job. Do check out the show notes below for related to pragmatic engineering.

That go even deeper into related topics. If you've enjoyed the podcast, please do subscribe on your favorite podcast platform and on YouTube. A special thank you if you also leave a rating for the show. Thanks and see you in the next video.

This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.
For the best experience, listen in Metacast app for iOS or Android