Scott & Mark Learn To...Vibe Coding, for Real (Again) - podcast episode cover

Scott & Mark Learn To...Vibe Coding, for Real (Again)

Mar 25, 202625 minSeason 1Ep. 34
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Summary

Scott Hanselman and Mark Russinovich delve into Mark's experience implementing a shared-memory transport for gRPC across Go and .NET using AI models. They explore the significant productivity gains that reduce months of work to days, alongside the frequent frustrations of AI agents hallucinating, misinterpreting instructions, and requiring intensive human guidance. The episode also touches on solving other hard engineering problems, like a panorama screenshot stitcher, and discusses the future of developer tooling.

Episode description

In this episode, Scott Hanselman and Mark Russinovich dive into the realities of building complex software with AI coding agents. Mark shares his experience using modern models to implement a shared-memory transport for gRPC across Go and .NET, explaining how AI dramatically accelerated development while still requiring constant oversight. They discuss the surprising strengths and limitations of AI coding tools, to the massive productivity gains that make the frustration worthwhile. The conversation also explores the challenges of solving hard engineering problems, including an attempt to build a scrolling screenshot stitcher, and wraps with thoughts on the future of developer tooling and a potential live episode of the show. 

 

 

Takeaways:    

  • AI coding agents can speed up complex development but still require human oversight 
  • Developers often need to guide and correct the model throughout the process 
  • Even with challenges, AI can reduce months of work to days 

 

 

Who are they?     

View Scott Hanselman on LinkedIn  

View Mark Russinovich on LinkedIn   

 

Watch Scott and Mark Learn on YouTube 

       

Listen to other episodes at scottandmarklearn.to  

         

Discover and follow other Microsoft podcasts at microsoft.com/podcasts   

Hosted on Acast. See acast.com/privacy for more information.

Transcript

Shared Memory gRPC Project Introduction

Speaking of all your test passing, did you get that that that thing working? Are you allowed to talk about that? The thing you're building that is uh Which one? Oh shared memory G R P C Yeah. Yeah, it is working. Oh sorry. That thing. I'm working on like four things, so I didn't Well I but I the problem is I can never know what's secret or not. So I just kind of was like kinda like that thing? Yeah. Did you ship it?'Cause I know that it was like right on the edge.

I haven't half the test could have been like commented out. Actually, man, I've run into so many As everybody's like, Oh, I just give your you know, write your spec and the AI runs away with it. First of all There was a paper Here we go. Yeah. There was a paper just published that looks at um model performance on or agent performance when there's an agents.md file. Yeah. And they found that yeah, they incrementally follow the spec Better if they're magic. Sorry. Sorry. I didn't do it. Yeah.

This is like the uh the Covid video where the b the kid comes into the room. Yeah. Push the kid away. Yes, you're tell people what you're trying to make and what's going wrong with it. Well, so I'm trying to create this uh so G RPC, Google RPC, it's originally stood for and maybe it still does. Is uh inner process

communications standard where you can pass buffer typically protobuff formatted data between processes and you you generate client and server stubs. It's, you know, like middle, you know, old Microsoft people remember that. Except it works uh distribu across distributed systems. It works over HTTP two, on top of either TCP or or some of the language does support Unix domain sockets. Uh but if you're on the same box, process talking to process.

It makes no sense to go over TCP when you've got the ability to share memory and just drop data into buffers that can be copied from one side to the other. my hypothesis theory is that that'll give you a big performance boost, especially for large buffer trans sharing, which is interesting for two cases. One is Dapper. which uses sidecar to pass data between the app and the Dapper component in the sidecar.

And it's also interesting for Go potentially, because Go doesn't have a dynamic loaded module support. But if you had Modules that are effectively separate processes with GRPC to connect them to the main code, you kind of have a dynamic loading capability. If it's high performance enough.

gRPC Implementation Across Languages

So I started down this path and I had multiple m starts and stops last year with the models at the time just unable to get concurrency right. When Opus four five came out I started it up again and made fast progress and now have I would think you had one shot this. I mean you basically did It's definitely not one shot. It was many, many, many, many dozens, hundreds maybe of of shots.

to get to the where I am. But I've got GRPC Go shared memory working. And then a few weeks ago I started and said uh it'd be nice to have two examples because that way you can show, hey, this can be supported by multiple languages and they can interoperate together. So

I did I started with GRPC dot net and have that working now. And by working I mean it passed tests that are equivalent of the TCP tests. It's got end to end examples that are equivalent. It supports all the GRPC functionality that's relevant. And I've got benchmarks running on it. So I've got to that point. Now it's still you know, some in yeah, go ahead, you've got a singing question you wanna ask me.

I wanted to underst so you did the go one first. Did you get the go one completely done and then port the go one to.NET? Or did you back up and then do it idiomatically in.NET? It's a separate repo. The way GRPC is organized is there's a GRPC core which supports a bunch of languages like Python and C, and then there's language specific implementations like one for J Go and one for.NET. Right. I started originally with C but just

Then I was like, why am I wasting my time here? Because I really want to support the Dapper and go DLL. So I started s so I reset last year and started that, which also failed, and then I restarted it. And then That's what took me down this path to to go to dot net next.

Okay. But my question is, as someone who said, I'm gonna make a thing that is that has to pass tests, it has to meet the specification, it's understood, I'm gonna do it in two languages. Do you do it in one and then go laterally and go, okay, we have a complete. Yeah. Two separate things. Yeah. But the thing is though you know stuff now. Yeah. Because you I don't know. You are now infected. Was it easier in.NET? Yeah, it was easier.

In fact, I pointed it at the Go implementation of the b core transport, which uses few taxes on Linux and and shared memory on Windows, and I just said copy that. So I did get it started so it didn't have to recreate all that. It's you know from scratch, but it could just borrow the architecture and just implement it in dot net. Okay, so...

AI Agent Frustrations and Hallucinations

But, you know, as we've shown many and talked about many times on this show, I might had to micromanage it the whole way. You have such a love-hate relationship with this stuff. I think it's so funny because you are 100% acknowledging that, like, yeah, this is the future three times a day. You're like, this is the dumbest thing ever. Is it just because you're working on harder stuff? If you would It's with the rest of us.

That's true. I think that's the problem. If it's just a web plus database thing with an a UX, then yeah, that's well worn paths, not of of s low complexity that these things are really good at. These things that I'm working on are Me I'd say um medium to high complexity. And there it's there I just see that they're I'm right at the edge of these things capabilities to begin with.

And then I'm spotting just nonsense and I'm spotting stuff that's just shocking. Like, um, and I've shared some of them with you that you know, it's just like, what? Like I'm asking it. I'm saying, yeah, I just wanna confirm. You so I try to be efficient with my time. So I'm not I'm not gonna go look if I can ask the model. Do all of the end examples that you've that have been written in here in here, which I've originally said they need to be idiomatic go GRPC. Right.

So I get the examples, I think they're all done. And then and I even it even has a task list that it's set out with all of them. And then I set a new context like go look and see if all the end to end examples have been implemented that are equivalent the equivalent to the T C P ones. And it'll be like Oh, you know what, I see that there's fifteen T C P ones and six of them have been implemented for Go. And I'm like, the model just told me it finished all of them.

So when that happens, like I I have found in general work, very rarely do I find it hallucinating stuff. Mostly it's just what I would call brain farts. Like I've told it forty times today, stop pushing without asking and it pushes anyway. And then you tell it why'd you do that? Oh, I just I don't know. I just pushed it. Sorry, dude. You know?

Yeah. They are they are brain farts like that. Oh, and I the f my one of my favorite brain farts ones was I'm like well, I'm asking it again to evaluate the end of examples, d are they all enumatic go? And it's like, no, they're not. I'm like, how could that be? Like I've I I with my instructions, I asked. You had one job. I had before this is in fresh context, so it's not like it's you know, amnesia or whatever. And then I'm like

Uh, I think you're wrong. I think they are. And it goes and like, oh yeah, you're right, they are. Like what? But see then now you don't know. Now you'll never know. No, then I go look then I go then I'm like, give me evidence. But it's like this How can the thing be completely autonomous? If it's making mistakes like that and just yeah

AI Agent Benchmarking and Giving Up

And it can't and people will tell you, well, just put a really great test harness around it and run it in a loop. No, it'll just it'll it'll lie on the tests. Yeah. It'll put a sleep in a Here's one for you. Here's one for you. The benchmarks? I have these sweeps of the the number of b the size of the buffers at benchmarks, starting with zero bytes and going up to hundred twenty-eight meg is what I want. And I can't tell you the number of times

where it's doing a whole ton of work and I'm having it like, you know, benchmark, make sure no regressions in the instructions to it. And after hours or the next day I'm like Okay, let show me the latest benchmark results and I go look at it and it goes up to sixteen meg. I'm like, what happened to the other ones? Oh yeah, they're they're not in the benchmark anymore. Never mind.

And then occasionally I'll catch it when I'm watching it process. It'll be like, hmm, there seems to be a hang at six 32 meg. Let's just leave that out. And so I'm like, that's what's happened. And it's happened like probably a dozen half a dozen times where it just takes out the thirty two to one twenty eight meg test out of the bench. It just happened again today. And so how do you think it's doing that?

Because it's like this failing on this and so we don't need to that's not what I'm being asked to do. I'm not being asked to fix things. I'm being asked to benchmark things. And so it just leaves it out. There's something to be talked about and I wonder if there's been papers written on this about when it chooses to give up. Mine's been telling me Opus four six has been saying, You n just go to bed. Like you're Oh yeah. Actually somebody somebody posted on this on Threads today. Really? Yeah.

And it then the guy wakes he's like the next morning he comes back and says, How'd it go? And he's like, well, all the agents crashed. I can restart that for you. Ah, my goodness. I uh yeah, having it go and and just say you should go to bed. I'm like, there's it's 10 30, dude. We're doing this. Like we're getting this done. He doesn't know what time it is.

And then it'll then it'll say it probably ran a PowerShell script, but then it'll say, Okay, I'm re-energized. Let's let's attack this thing with with fresh eyes, you know. Yeah.

AI Productivity Gains and Oversight Needs

And the way that it anthropomorphizes itself, like I was trying to build some C plus plus app today. But I hadn't loaded within the developer command prompt context. So it didn't have VC VARs and all the kind of stuff loaded. So it was going out of its way to try to get all the environment variables set, but it can't because it's an app within a larger process.

And I said, and I watched it like thrash, just trying to And I was like, would you like me to load you into a uh another context? And it was like, yes, please, would you would you please load me load me into another context? And then I I exit, load it, bring it back up and I go, Okay, you're back now. And it can it kinda like you can almost like imagine looking around going, Look at all these great environment variables. Now I can do the thing. Like you could have saved a lot of time. Yeah.

I uh yeah, knowing when to give up. By the way, the another one I I I think you talked about this one before. This project is using.NET 9 as the core for the tests. It keeps upgrading it to.NET 10. I'm like, stop. Go back to.NET Nine. It's like You're trying to do LTS? Yeah. Well it's whatever it's the GRPC dot net test base is oh Oh okay. I thought it doesn't really matter. It's just I don't wanna have to deal with, you know, uh yeah, upstreaming nuts and going

So here's the question though. We joke about how much frustration it is, but it's still better than doing it yourself. There's no there's no like what it's done on this. Just like Absolutely mind blowing. Okay. You're into it. Even though it completely sucks, it's the most amazing thing ever. Yeah, rocket ship.

It is a rocket ship on my like so the.NET GRPC shared memory transport with everything that I've got into it in it. Like I said, tests, benchmarks, end examples, supporting all the r extraneous GRPC features. In basically a um I mean, I started at calendar time a month ago, but it's been a fraction, a very tiny fraction of my overall time has been spent on it. So like how many like man hours? A lot of time. They have forty.

Not even that. Seriously? Yeah. So this is like six months of disappearing into Walden Pond and going up and getting a cabin and you'd come back down with a G. That's exactly what the maintainers estimated it would take.

The Nature of AI Progress and Models

Six months of isolation. Yeah. And you did it in a week. Now, how close is it? Here's the question. Is it asymptotically approaching done but can never finish? That's the that's a good question. I don't know yet. So I'm meeting with the dot net maintainer, um, who you know well. Nathan King? Yeah. Love that guy. Um so he's so I just shared the repo with him today and he's also like, Hey, you're

Go and find something commented out, and he'd be like, What are you doing here, man? Yeah, why'd you comment out there's kind of? He says something about dot net targeting. He's like it needs to target dot net ten. And I thought it did. So He'll fix it. He knows everything. But the question is, can these things ever be done? I think they can. I mean, like Look.

I've of I've not written a line of code. I've done a lot of looking at code and looking at results and looking at at tr by the way, going back to one of our earlier maybe Might even be a year ago, six months, plus months ago, we're talking about which models we like. And the models that I really like are the ones that are giving you their thinking out loud as they're doing it. Because I can Immediately spot when it's going, ooh, wait, there's a hang of three Speaking out loud.

Like here's a hang at 32 meg. Let me avoid let me cut that out. I like watching Opus think and I'm frustrated that Codex will say nothing for twenty minutes. And neither Gem Gemini won't won't either. And I'm like Yeah. What are you doing? And you know, OpenClaw Pete, uh who wrote OpenClaw, you know, Claude is at the core of OpenClawed, but the whole thing's written in Codex. And he swears by Codex. He's like, I would not, he will not write code with Opus.

But I'm telling you, open claw from a complexity perspective is I think a lot simpler than some of these things that I'm working on. I think it's sprawling, but not complex. It's got a lot of state because there's fifty, eleven Booleans, and he'll say himself on Twitter that there's a ton of edge cases. Like he's just spending time doing whack-a-mole on all these weird edge cases that he's never gonna hit.

But the essence of what it's trying to accomplish is not rocket surgery. It's a community that he's built. Well so But I think it's interesting though that he he really likes the and then he he and my we had him on the podcast. He um was saying that watching it a think and then he fed the thinking back into itself. So now it knows he can see its thinking.

And it started to have like a little panic attack because it's like he he can he can the the the user like I'm thinking about like Tron, you know, like the user can see that I am thinking, yeah. Well I see it it while it's thinking is the user wants me to write the make the benchmark work at thirty two megabytes.

I wonder I wonder if it this is the part like you're talking to yourself in the mirror, you're talking to a parrot who's also talking to itself in the mirror. Yeah. Because thinking tokens and reasoning tokens and doing tokens are all different kind of things. Is it is it emergent or is it just No, I think it's just trained on human Just trained on us yeah having um an inner monologue. Yeah.

Challenges of Screenshot Stitching

by the way Looks like the show. Today I I had a I'm working also um on Something is it like a it's basically um another one of those cases of a problem that is beyond the frontier right now. And that's a panorama screenshot capability, which I'm building into Zoom it, where you can Select a region of the screen and then scroll underneath the region while it's capturing frames that then stitches together to take the scroll area and creates Yeah. Composite. Yeah.

Yeah, we yeah. Stitching. Yeah. There's a thing on iPhone called Taylor. You have to have a certain amount of overlap. Yeah. Yeah. Well, it turns out that this is an NP hard problem. No. Yeah. Are you serious? It's really It is really, really hard. Because when you're looking at the frames, the the system doesn't know how much each frame scrolled. Okay. So it's just gotta it's gotta go a pixel at a time until it's

And that's way too inefficient. And by the way, pixel at a time doesn't even really reliably work because if you take a look But it's actually here here's why you can't even do pixel matching. You've got to use something that approximates it, like what they call Luma, which is looking at the brightness of pixels and seeing how much overlap there is. Because if you take a look at ClearType.

Having a character on that s starts at offset why offset You shift it to Y offset one oh one and suddenly the colors and pixels all change. Well, it depending on if you're going left or right, because if you're going left to right, then you're gonna use sub pixel aliasing across R G B'cause most monitors are R G B and people don't people don't think about that, that you've got a red pixel, a green pixel, and a blue pixel.

And if you wanna, you know, like make the outside of a of a C a little bit you'll be you'll you'll make it red over here. That's all Bill Hilf's early true type work. So you're right, you shift that thing one sub pixel to the right, it's gonna shift from You're horizontal. Vertical it it also changes colors to Vertical changes as well. So oh wow. Okay. Because of compression and So how does it how does my iPhone do it so delightfully and easily?

They know because you you're on your iPhone, you have to use the iPhone to scroll. So it knows exactly how much each frame shifted. So I use an app on the iPhone called Taylor, you should look at, and you just take three screenshots. Screenshot, scroll, screenshot, scroll, screenshot, show, and then it stitches them together and it'll go and do it dynamically. It'll go whoop whoop whoop. And then maybe I should do, because maybe I'm...

I think it may be simpler. There may be an eighty twenty rule here. Well I'm yeah, I think you're right. I just thought it'd be nice just to be able to scroll and then have it Oh, I think it'd be great. I think it'd be great. But like Edge does this automatically. They have like Edge full page screenshots. I think they do it because they have the entire viewport available. They do. And they know how much you're scrolling, whatever. I think that's a good thing.

Well actually they're they're rendering the whole viewport off screen and just Right. I would argue that you're doing it right, which is you want to make it perfect. And I bet you that the eighty twenty rule would mean you could put much, much, much left into this. Already so far into this now. And I think Yeah, yeah, I've been I've been I've spent a week on this. Uh this is probably I spent as I sp I spent as much time on this as.NET Shared Memory GRPC.

I have to be careful because the problem with these kind of problems that you'll go and you'll do a whole thing and then you'll stick it up on hacker news someone in the chat. Right. Here's a for loop to do it for you in Python. Yeah. Or they'll be like, I just tried to capture uh this image and it and it failed.

Cross-Model AI Collaboration and Testing

How often we we by the way, I don't know if this is two shows now. This is probably two shows. Maybe this is part one and part two, because we need to wrap it up. But have you done the cross model um Oh yeah, uh no. And I don't try that. I don't see a good occasion for that. You know what I do is I just like when I see it flailing, I'll just switch models and be like Okay, so I want you to try this. Gimme gimme your your RPC thing. Yeah. And what I'll do is I'll run it.

And you can go into GitHub Copilot Cli and say, Codex and Opus, you both do separate ones. And then I want you to coalesce on the top five issues that you both agree on and then argue between each other how you think you should fix it. Delightful. Uh I d I mean the how to fix it thing, I uh I don't think that's the problem. It's the execution of the fix. But here's what I wish I I could do.

Hm is go into GitHub Copilot, give the same prompt and have it fire off against both models at the same time. And then I could see when they're done, I could pick one. You know how the one that I want. So we have that with s slash review. No no you talk a review. It's like each one like No, no, no, not even fight. I want them independently doing the same thing. Oh, okay. And that way I I can at you know, they get to a certain point I'd be like, That one I d is off the rails.

Like you know when you're in chat GPT and it'll like Yeah, exactly. I just want to be able to say, Hey, give this to four six and I can write that. I can do that with work trees. Yeah, that's what I want. Because that way I'm like Getting the right. This battle, model battle. Okay. See now you're nerd sniping me. I'm just gonna put a week on this cash, darn it.

Future Plans and Closing Remarks

By the way, what's going on with the terminal integration that I asked for? Oh yeah. Uh I can't tell you on this show. Why not? But I can tell you when we stop. Wow, that's mysterious. That's good. It'll the terminal is just going to keep getting better. There's a lot of really exciting stuff happening with the terminal. Well, the the TUI is having a moment right now, and it's very exciting.

Yeah, you know that I'm not a Tui fan, but... I know you're not a Tui fan, but the Tui will rise again, my friend. A twoy is just a crappy gooey. Like what it's just. What can you do with the too ee that you can't do with the gooey, tummy? Uh ASCII art You can do that with a GUI too. Yeah, but then it looks weird'cause it's like in courier new or something. I don't know. I just I love a tooy. I just love a tooy. I love a bulletin board. I just don't get it.

I don't get why some of your uh your Sys Internals tools don't have dark mode, but uh we can't have everything we want, can we? What was I in? I was in something a couple of days. Was it Resmond? Response number. The other problem with you is that sometimes I'll be in a Windows tool and I'll be like, ah, Rasinovich. I'm like, no, that's not even his thing. That's like That's ship with windows. Oh, speaking of shipping with windows, we should end on that. What is Sis one ship with Windows. Huh?

It's out in the insider ring. Very exciting. Yeah. Huge, huge milestone. That's my first code in Windows. I think I've got to do that. Okay. Finally. People will know your name. Yeah. Shipped something in Windows. It's pretty exciting. Yeah. You gonna get a patent cube? I've got a few. Not for there's no patent on that. Yeah.

Thanks for watching, Scott and Mark. Uh learn to. Uh one of these days we'll learn to have a show of an appropriate length and we'll remember to stop and uh and set up for the next show. But we appreciate you. And if you make it this far, if you made it this far. Go to the comments. Smash smash the subscribe button. I've learned that that's what you're supposed to say. Smash that bell. Like and subscribe. We need to also talk talk about chat.

You refer to the chat. You can hey chat, are you watching chat? What do you think about this chat? That's the people in the chat. Yeah. Yeah. We have a for the live ones. For the live ones. We should do live ones. We should do a live one, can we do that? that yeah Pretty sure Rob, can we do a live one? You can just do stuff. No one can tell you like Let's do a live one. I can do anything I want. Okay. Live episode. We're gonna do one.

This is gonna be very exciting. And then we can talk to the chat on Twitch and YouTube. Let's do some giveaways to encourage people to come and watch live. I love that idea. Maybe we can get an Xbox. Yeah. Or a band. They still make Xboxes? Or is June? Too soon. You can't say that. Take that out. We'll give away a ban that is Zune. Yeah, that's really useful. Only if it's the Zoom that was on Guardians of the Galaxy, then that would screen you soon. Was it was it on it on in the movie? That one?

Uh this one may have been in the movie. There were like there were like ten. Yeah. People would like that if it was in the movie. I'd have to pr yeah, this none of these are written. I probably I don't know where those are. Some they're probably in someone's very nice gentleman I worked with down there, uh probably has those all. But uh a couple of these have the Guardians playlist loaded on them. I need to charge the batteries. Alright. Smash that bell.

I think we said last time to stay until the end. And like a bunch of people were like, stay telling me. Yeah, that was great. I think that's awesome. I love people like that. Thank you very much.

This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.
For the best experience, listen in Metacast app for iOS or Android