#473 A clean room rewrite? - podcast episode cover

#473 A clean room rewrite?

Mar 16, 202646 minEp. 473
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Topics covered in this episode:
Watch on YouTube

About the show

Sponsored by us! Support our work through:

Connect with the hosts

Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Monday at 10am PT. Older video versions available there too.

Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it.

Michael #1: chardet ,AI, and licensing

  • Thanks Ian Lessing
  • Wow, where to start?
  • A bit of legal precedence research.
  • Chardet dispute shows how AI will kill software licensing, argues Bruce Perens on the Register
  • Also see this GitHub issue.
  • Dan Blanchard, maintainer of a Python character encoding detection library called chardet, released a new version of the library under a new software license. (LGPL → MIT)
  • Dan is allowed to make this change because v7 is a complete “clean room” rewrite using AI
  • BTW, v7 is WAY better:
    • The result is a 48x increase in detection speed for a project that lives in the hot loops of many projects. That will lead to noticeable performance increases for literally millions of users (the package gets ~130M downloads per month).
    • It paves a path towards inclusion in the standard library (assuming they don’t institute policies against using AI tools).
    • Thread-safe detect() and detect_all() with no measurable overhead; scales on free-threaded Python 3.13t+
  • An individual claiming to be Mark Pilgrim, the original creator of the library, opened an issue in the project's GitHub repo arguing that Blanchard had no right to change the software license, citing the LPGL requirement that the license remain unchanged.
  • A 'complete rewrite' is irrelevant, since they had ample exposure to the originally licensed code (i.e. this is not a 'clean room' implementation).
  • Blanchard disagreed, citing how version 7.0.0 and 6.0.0 compare when subjected to JPlag, a library for detecting plagiarism.
  • Blanchard told The Register he had wanted to get chardet added to the Python standard library for more than a decade since it’s a core dependency to most Python projects.

Brian #2: refined-github

  • Suggested by Matthias Schöttle
  • A browser plugin that improves the GitHub experience
  • A sampling
    • Adds a build/CI status icon next to the repo’s name.
    • Adds a link back to the PR that ran the workflow.
    • Enables tab and shift tab for indentation in comment fields.
    • Auto-resizes comment fields to fit their content and no longer show scroll bars.
    • Highlights the most useful comment in issues.
    • Changes the default sort order of issues/PRs to Recently updated.
  • But really, it’s a huge list of improvements

Michael #3: pgdog: PostgreSQL connection pooler, load balancer and database sharder

  • PgDog is a proxy for scaling PostgreSQL.
  • It supports connection pooling, load balancing queries and sharding entire databases.
  • Written in Rust, PgDog is fast, secure and can manage thousands of connections on commodity hardware.
  • Features
    • PgDog is an application layer load balancer for PostgreSQL
    • Health Checks: PgDog maintains a real-time list of healthy hosts. When a database fails a health check, it's removed from the active rotation and queries are re-routed to other replicas
    • Single Endpoint: PgDog can detect writes (e.g. INSERT, UPDATE, CREATE TABLE, etc.) and send them to the primary, leaving the replicas to serve reads
    • Failover: PgDog monitors Postgres replication state and can automatically redirect writes to a different database if a replica is promoted
    • Sharding: PgDog is able to manage databases with multiple shards

Brian #4: Agentic Engineering Patterns

Extras

Brian:

Michael:

Joke: Ergonomic keyboard

Also pretty good and related:

Links

Transcript

Hey, everybody. Hey, Michael. It's great to be back. So we kick it off. Let's kick it off. Let's do it. Hello, and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is episode 473, recorded March 16th, 2026. And I am Brian Okken. And I'm Michael Kennedy. And as often lately, this episode is sponsored by you and us.

For everybody that supports the show through Patreon or through mostly through a lot of our offerings, like the courses at Talk Python Training and PythonTest.com and books. We've got lovely books coming along. I might bring up a book later in the show, but we'll talk about it. Anyway, thanks a lot for everybody supporting us.

It keeps us going. And also thank you to everybody that sends in topic ideas, either by going to pythonbytes.fm and submitting something through the contact form, or we're going ahead and sending it to us on socials. So we're at BlueSky and at Mastodon, and all those links are in the show notes, or at pythonbytes.fm. You also can watch the show if you'd like, either real time or after the fact. You can join us at pythonbytes.fm/live and be part of the audience.

And one of the fun things about that is while we're recording this, you can add comments, and we might comment back or highlight your comment. That's fun. Anyway, the last thing I want to bring up is that you don't have to take any notes while we're talking because it's all the stuff is on the show notes to links.

But if you'd like that delivered right to your inbox, plus a little background information, some extra stuff, especially helpful for if we're covering a topic that you're slightly thinking about, but maybe not, we'll send you some extra information. And you just sign up to be a friend of the show at pythonbytes.fm and say, join the newsletter. So with that, what do we have to start with? Well, I've got a doozy, as they say. A doozy. And this one comes to us from Ian Lessing.

So thank you for sending it in to your point about sending us ideas about the show. This one somehow missed my radar, but shouldn't have. So, yeah, it's a big one. So this, you probably have seen char debt as in character debt. Maybe it's care debt. I don't know. I'm always, you know, sidebar. It's always funny to think about abbreviations like lib or lib. You know, it's, if you pronounce it L-I-B, I think it goes lib. But if it's an abbreviation of library, shouldn't it be lib? I don't know.

So anyway. No, because that's weird. I know it is weird. So, car debt is a library that I believe was originally done, originally created by Mark Pilgrim, but now is maintained by Dan Blanchard. So I think this alone makes it interesting because there's something happening with this project. And a lot of people are pushing back, saying the maintainer can't make a change to it, but they're the maintainer. Like, I don't want you to, but you can't. So here's the headline.

Care debt, does the care debt dispute shows how AI will kill software licensing, argues Bruce Perens. Perens. And subtitles comes from the register. Alarm bells are ringing in the open source community, but commercial licensing is also at risk. So, told you it's a doozy. What is going on? So earlier this week, Dan Blanchard, and I want to point out, the maintainer of the library, released a new version of the library under a new software license. It was LGPL, and he released it under MIT.

In doing so, he may have killed Copy Left because, well, MIT is a do-whatever-you-want-just-don't-sue-me-about-it, you know what I mean? License, which almost all the stuff that I do is MIT as well. I just want people to just, I'd rather have people just have access to do whatever. Like, I don't want people to just, hey, could I get an exception to use this for this thing like that? Like, nothing I'm doing is that important, you know?

Yeah, and I think of the MIT stuff as, like, a good license if you don't care if people use it in their commercial product. Yes. So. And look, if it's okay, I was not familiar with what this is. I still don't quite get. What does Caradet do? So it is a character detection library that's used by millions of projects. Let me see. It has 130 million downloads a month. So it's used by a lot of things. Okay. Character encoding detector. Yes, yeah, exactly. Like, UTF, I think, you know.

Is this UTF or Unicode or is it, how do I have bytes, what am I going to do, you know? Yeah, okay. So it takes a guess, right? Right. You're getting bytes and you're not really sure. It's like, wasn't declared or whatever. So previously this was an LGPL project. Dan Blanchard wanted two things from what I can read between the lines. Putting words into their mouth. One, wanted to dramatically improve this library to make it better. Check. You'll see he did that.

Two, wanted to set the stage such that this can just be part of Python. Like, could this just be part of the standard library? So previously, like, there's this move to say we should have less in Python. And I agree. But detecting whether or not something is Unicode or ASCII or whatever, maybe that does belong in the library. Anyway, that was the goal. It's like, could we put it in there? Well, LGPL says no. It would change the license of Python, I believe. Right?

So as long as it's a GPL-based license, you can't move this library into the standard library. I don't know if the core developers or even if Dan is a core developer was interested in this. But that was one of the goals, right? So no problem. We're going to change it. Well, an individual claiming to be Mark Pilgrim because you can't verify people on the internet for sure. The original created the library.

So it's a little bit like Flask where Armin Roeneker created it and then now David Lord maintains it. And David Lord gets to do whatever he wants with it. It's his project now as the maintainer. But I'm sure Armin still has influence over the community's opinion if he were to take a strong position one way or the other, right? Neither of them chimed into this as far as I know. I don't think. Maybe Armin did. I can't remember. But there's a lot of chat about it.

I still think it's kind of like giving somebody a puppy and then telling them what they have to, where they have to take it, what vet they have to take it to. Yes, exactly. Yeah, exactly. I'm leaning on the side of Dan Blanchard here, just setting the state. I have a slight, there's a lot of complexity in this. So I'm not like totally just saying this is how it is. But let's keep setting the stage. So Mark says, you can't do that. You can't change an LPGL. I believe that's the typo.

LGPL license requirement. It requires that the license remain unchanged. License code, when modified, must be released under the same GPL license. But I get that when somebody gets it from the source, they make a change. It must be released under the same license. As the owner of the project, I thought you could change the license on Nuco. I don't know. It's your software, effectively. If you want to change the license of it, I don't know.

This is a little bit of shaky ground here to say that you can't change the license. As the owner of the license, you know what I mean? Anyone else in the world should not, they have to follow what I just read. But as the owner of the license, is that true? Well, so here's what Dan did. Dan said, here's what, I'm going to create a new better version.

I'm going to rewrite this entire project from scratch, not using any of its source code, and re-release it into the same package channel as the old one. Okay? Now, one of the problems under that is, as the maintainer, he's deeply familiar with how it works. And one of the challenges is, if you know how it works, your idea, it's like hard to do a fresh from scratch rewrite if it's burned into your mind, how it works, you know what I mean?

So what he did is he just gave the specification to Claude. And said, Claude, write this so that the test pass. And Claude wrote it. And it wrote it extremely differently. There's a plagiarism detection algorithm. So it's probably more for English, but whatever. It said it is only 1% similar to version 6. Version 7 is only 1% similar to version 6. So that means it's pretty different. Dan also said it's like structurally the files are not named the same. They're not organized the same.

It is basically not at all the same thing. Nothing. The only thing that's that 1% is like argparse structure and stuff because you have the same arguments, you know? And so they believe there's nothing here. This is a new project. This is what gets the MIT license. Now, to be clear, this is a mega improvement. It results in a 48 times improvement in detection speed. It now supports multi-threading for Python T. So you can do free-threaded Python and it supports that.

There's a lot of benefits to this new version. So I don't think anyone is saying you've messed up the library. It's like clearly a better library. It's only this we hate AI or AI is theft or like there's a lot of these different angles that are like focusing in like a laser onto this change. You know what I mean? Yeah. So Dan says, I was just trying to accomplish these goals with the tools and times I had available. I'd never been paid to work on this and I have a full-time job, you know?

Software licensing and the laws around it haven't been tested a lot in this new world of AI-assisted development. And a long-time open source developer, I'm also curious how this is going to shake out. Yeah. But somewhere it says, yeah, after maintaining this library for years, I've wanted to make these improvements, but I couldn't. Claude gave me the ability to do this in roughly five days. Right? So I think this is also really interesting. But why change the license?

Because he wants to put it into Python. Oh, okay. Yeah. Or maybe he just wants people to be able to more freely use it and he just doesn't care about copyleft. I don't really know. But I believe, for the article makes it imply like he wants to put it into Python and LGPL to make that not possible. And Arnaker actually did, or Armin actually, Arnaker actually did post about this saying that he welcomed the license change and he's wanted it for years.

That's kind of what I would expect that he wanted to say as well. Now, there's an issue that has been created that version 7 presents unacceptable legal risk to users due to the copyright controversy. There's so much. Why? So much. There is so much going on here. I don't know. Because to me, the license only goes from less restrictive to more permissive or more restrictive to more permissive. And if it turns out that it's the old version, you know, you're back to where you started.

So I don't know. I'm telling you, there's a lot of... This may be the bigger issue is there's issue 327 fired by, I think, Mark Pilgrim. Hi, I'm Mark Pilgrim. The title is No Right to Relicense This Project. And it's absolutely... Toxic is the word. So I don't know. It's very interesting here. I mean, version 6 is still there. People can just keep using that or fork it if they want, if they still want the old license.

I think what's happening is this is becoming a lightning rod for the debate of licensing intersecting with agentic AI. I mean, how many people actually care that much about character detection? You know what I mean? I mean, it's a utility. Apparently a lot of people. I know a lot of... It's extreme. This is like... There are just pages of stuff to go through on all of this. Like page after page. It's crazy. So... Okay. I understand the sensitivity of it. I forgot my popcorn. I know. I know.

And I get it that AI ingesting the world's work and then turning that into automation. I'm not even sure where that sits legally. At the time, it felt like a lot of theft. I'm not sure if it's a good trade-off or not. I don't care how that relates to this project, though. Okay. So let me add in one more detail. Like I said, there's a lot going on here. We'll wrap this up pretty soon, but it's a super interesting discussion. I think... So one of the reasons that said...

She said, well, I did it with Claude Code. They said, well, it doesn't matter. Claude Code trained on GitHub. Therefore, it trained on the original source code of Caradet. Therefore, it's not a clean room reimplementation. So I asked... But I don't know if that matters because I can lead one company, go work at another company. As long as I don't take the source code, if I just take what I remember and do similar work, I'm allowed to do that.

Yes, you're a human being that gets to interact in the world. Yeah, exactly. It's not like, well, I saw a picture once and it was of a tree and it was copyrighted. So I can never, ever create a picture of a tree again because I looked at it, right? Yeah. That's why I said I'm on the side of... I feel like I'm on Dan's side here. So I... Let me look. So I wrote a little... I didn't write. I asked Cloud for some research on like, well, what is the legal precedence of this? Here's the situation.

At least in the US, are there rules, rulings that have come down previously? So I put a little document up for people to look at, but it says the closest precedence is this Thomas Reuters versus Ross Intelligence, where somebody... I can't remember who. They took a bunch of Westlaw head notes for legal advice and then did their own custom AI training on it. and built an exact tool for legal research. That turned out to be a violation. But the exists... This is interesting.

The existing copyright framework requires two things to prove infringement. Access to the original work. Check. Cloud did have access to the original work. And substantial similarity in output. No. Not even close. Not even 1%. I know it's at 1.3%, but that was like structure of argpars. You know what I mean? That's argparse structure. That's not chardet structure, effectively. So I think this strongly fails. Those are the two criteria that you have to have to prove similarity.

Hmm. Hmm. But there's other stuff. The emerging judicial consensus is that developing... Is developing that training a general purpose AI model is highly transformative, therefore is free use. But there were some specific examples where it wasn't. The copyright... The U.S. Copyright Office's position is that using copyright materials for AI model development may constitute prima facie infringement.

And what's really crazy, Brian, is if things like that said, no, this is copyright infringement, like, what happens to everything created by AI? Period. You know what I mean? And I don't know how that's going to shake out, but it's so far down. I mean, let's take, like, the extreme case. And you go, well, you know what? All the current models have been trained on licensed stuff. So let's just, like, not really just start over. It's going to cost a ton of money to retrain a model.

But do it right. Yeah, that's true. Only train on the stuff that's available license-wise. Right. You could just look and say, is it a GPL license? We're not trading on that. Is it an MIT license? It's on. You know what I mean? Yeah. There's probably plenty of information still out there to build out your models. Anyway, it's pretty wild. I think people can have a look. I would certainly say the folks who took the time to comment are very much against this. There's a lot of toxicity.

And Dan's support going out to you just mentally because I've been on the receiving end of these types of things. And they're not fun. But I'm – I kind of – I think Dan has a point here. However, this could all be solved if he just said, okay, version 7 is char.2, new project. And just put like a strong warning. Like we will never change char.1 ever again except for security patches. And then all the things that depend upon it, go fine. We'll just take this one.

Like I want 48 times faster and multi-threaded sounds better to me. Let's just do that. Yeah, and if like we push it too hard though, one option is he just stops maintaining it and doesn't transfer maintainership to anybody else. Yeah. And we don't want that either. So, yeah. Yeah. I certainly – I think this debate has far, far outgrown character encoding concerns. It's its own special lightning rod, like I said. Yeah. All right. Over to you.

Can we talk – I got just a small tool that I – this is a small tool suggested by Matthias. Well, it's not a small tool, but it's quick to cover. Refined GitHub. And this is awesome. I didn't know about this. So, this is a web page or website? A web browser plug-in. Browser extension. Browser extension. Thank you. That does some cool stuff if you work with GitHub a lot. And I, you know, looking through this, I'm like, what's wrong with GitHub right now? Well, there's a bunch of stuff.

The highlights, there's some highlights at the top. Makes white space characters visible. That's cool. So, I mean, that's cool enough to get this, but there's a lot more coming. Tells you whether you're looking at the latest version of the repository, if there's any unreleased commits. That's kind of neat. The – shows how far behind a PR head branch is. Tells you its base commit. There's a bunch of stuff here.

I'm going to highlight down to some of the stuff that – one of the nice things, there's lots of features. But they put fire beside things that you might care about. Like, adds a build CI status icon next to the repo name. Love that. Adds a link back to the PR that ran the workflow. Oh, that's cool. The – this one, I installed it just for this one feature. Enables tab and shift tab for indentation in comment fields.

Because, you know, if you're in a web browser, you hit tab, it goes to the next field. I just want to put a tab in the field. Anyway. Yeah. So – For Python people, it might not matter that much. But if you're doing C++ or something, you don't want to mix spaces and titles. Well, I still hit tab. I just – I just expect it to add four spaces. But anyway. Let's see. Auto resize the commit field. Add reaction – Add reaction avatars showing who reacted to a comment. That's interesting.

The other one that I want to highlight just to – because I think it's cool. It highlights the most useful comment in an issue. So it'll – you know, if there's a lot of people talking about a comment or whatever, it'll, you know, highlight that. So, you know, just scroll around. And actually, I haven't really noticed. I've turned this on. And it just sort of stays out of the way. There's just more features and more – it's just a nicer experience. So, yeah. Kudos to them.

Yeah, this is an absolutely mega-sized browser extension. And what's notable about it is it doesn't dramatically shift. It doesn't dramatically change how much – oh, hold on. My dog is going crazy. One second, Brian. I'm sorry. She snuck in there. We'll have to edit that out, huh? So what's notable about this is it's – you wouldn't look at your UI and know anything is different. But there's like 100 little changes, right? Yeah. So, yeah. Anyway, I'm always nervous to install browser extensions.

I have maybe five or six that I really love from places I trust. But go to the top. See how many stars this has. 30,000. Yeah, you know, at that level, I think it's all right. It's probably totally trustworthy, right? So let's – yeah, you know, I think it's good. I think it's good. I would probably install it. I've got to look and see if it will inspire me. But – Yeah, I don't know. I'll play with it for a while, see if my entire computer blows up.

Yeah, if your computer gets – It's also been around for a while. It looks like nine years or so. Wow. Really? No kidding. Well, at least in the front top, there's the – the editor config is nine years ago. So at least there's some commits from nine years ago. Yeah, yeah. Exactly. I would imagine it is. Yeah, that's very wild. Awesome. Okay. Let's move on to talk about databases, and in particular, Postgres.

So this project I want to talk about, I want to feature everyone across, and I think it's been around for a little while, but it's called PGDog. Okay? And what it is, is it's a performance-enhancing layer for Postgres. So if you're using – maybe you're using MySQL – not MySQL – using SQLite in dev, but then in production, you're using Postgres, right? Something like that. And it's starting to outgrow its performance. Okay?

So either it needs better uptime, the database is getting too large or something like that. Postgres doesn't have certain features like connection pooling and other stuff that could be better high performance. Right? So you don't have to reconnect as much. This thing handles a whole bunch of those. So if we go down here to their repo, it, by the way, has 4,000 stars, and its age is a year, two years, it looks like. Last year is probably its most recent things.

So there's been other projects like this as well. For example, PG Bouncer is a friend, a colleague, a software, I guess, another thing that does the same thing. So what this is, is it's a proxy for scaling Postgres. And it does connection pooling, load balancing for queries, and it does sharding of databases, which sounds bad, but is actually a potentially good thing. So you just create a toml file to set it up, and then off it goes.

I got a bunch of notes here for all these little things that kind of spread around. So let me look here. So for starters, it's a load balancer across Postgres. So you can run Postgres in a replica network configuration. So I can have a Postgres database, but then I can have, let's say, four other Postgres databases that are all copies of that same data, and they stay in sync. Okay? And from a read perspective, you could read from all five of them if they all have the same data, right?

And that basically 5Xs your database query performance, right? Okay. Just by simply sending them to different machines with exactly the same data in the same database, yeah? But the problem is the consistency, right? So it knows which one is the primary database, and it can do writing to that and make sure that it propagates to the others before it tells you that it's committed, which is kind of the magic of replicas.

Because if you write to it and then immediately do another read, but it happens to have gone to, this time it's round robin to a different database server, that's bad, because it might not be there, right? Like, I saved the database, I queried, and it wasn't there, and I went to test why it wasn't there, then it was there. I don't get it, what is going on with the world, right? So you want it to definitely manage that kind of stuff.

It also does health checks, and you've got this read primary replica configuration that I'm talking about. If one of them goes down, it will just take it out of its rotation, and if it's the primary one, it'll pick another primary, I believe.

So it has a single endpoint behavior, which I talked about, so it understands the Postgres structure, like the, basically, T-SQL, and so it updates, it knows if it sees an update or insert or create table and things like that, and sends that to the primary, and then leaves the other ones chilling to do their thing. It has the failover I talked about, and it has sharding, which is really cool, and it does a bunch of stuff to manage and keep that in sync.

You can even have different sets, different clusters of database, and say keep this one in sync with that one. So, for example, imagine you've got an e-commerce site, and it's starting to go too slow. People do a request for, I don't know, let me give you an example that probably resonates more with people, a health provider database.

I don't know about yours, but whenever I go to figure out something with my next doctor appointment or something, it's like the page slowly loads in, and then it spins, it says checking records, checking records, checking records, and like five seconds later, chunk will come in and more of it, and like what is going on with this? Why is this so slow, you know?

And there's probably just some huge database with a bunch of insane joins and weird queries and stuff just to tell me that my appointment is at 10 o'clock. So what you could do is you could say, okay, your health record ID is going to be the shard key, and we're going to have 20 different servers, right, running our cloud setup. And for that, we're going to somehow determine which database it goes.

So maybe we're going to say, take the hash of the health ID and use the first two letters to figure out which database it actually goes in. So like AA through B, whatever, right, goes to the first database server, and the second, the third, the fourth, and so on. So when you do a query, you say, I want the thing for this user. It just goes, okay, great. Well, that means I only query that one server. Instead of trying to query the 100 million records, you query, what did I say, 25s?

You query 4 million, which is way, way faster, right, on any given server. So that's a really cool aspect and one of its main features is the sharding capability. Okay. So pretty neat. Pretty neat. Well, but if you're really trying to find out like a health information, it might, the hash might be the problem. Stop doing hash, man. I don't know why these systems are so bad. They're so bad. Bad joke. Sorry. Yes. Yeah, that's true. I get it now. I get it. All right. Awesome. Over to you.

Okay. Well, I, this is partly a public service announcement. Maybe. This is, I want to cover Simon Wilson. So we know Simon Wilson's been playing with AI and agents and stuff since like, since they came out or something. And I appreciate all of Simon's work. And, and I've been watching here and there and it just like, learning from him and not having to do all the experimentation that he's doing, but he's really great at explaining it.

So what I want to, he's got this sort of book-like thing together that we're going to link to called agentic engineering patterns. And this is a series of blog posts, but they're fairly concise and short and it's really good writing as well. There, and I think anybody, especially, well, it might be useful for really everybody, but especially people with teams, it'd be good to make sure that everybody's kind of on a good. I think there's the, the information here is right. Good for everybody.

So there's principles getting started like some intro on how, how agents work and testing and QA. There's just three posts about that, which I love understanding code, using it to walk through using agents to walk through code and stuff. Even these are didn't notice these when I was looking at this, the other day, an appendix of prompts I use that might be interesting, but also GIF animation tool using WebAssembly and GIF, GIF sickle annotated prompts.

That might be fun, but maybe not appropriate for everybody. But, but the, the, the, the, the one that I love right here is anti patterns. So in principles, there's some anti patterns. Well, everything, everything in the principles, definitely go read. Uh, the writing code is cheap. Now. What is agentic engineering hoard things?

You know, basically, um, not basic, but like keeping track of like he, for instance, I'm making tools and doing snippets, doing little tools, having those available, not only for you to remember, but you can also tell an agent to say, Hey, um, I already kind of solved this over here in this project. So use that, but apply it to this, this, this other project here. Super cool idea. Um, and also, uh, the, these two AI should help us produce better code.

So if the, if you're having AI producer code, I think it should be better code than you should, than you would produce by yourself. Not worse. Uh, I don't like this, this notion of produce of people not reading their code at all. Um, and I think that's going to blow up on us. And, uh, especially if you're working in teams, a bunch of anti patterns to watch out for. And the top one is about inflicting unreviewed code on your collaborators.

This anti pattern is common and deeply frustrating, both in open source and I'm dealing with it at work myself. Don't file pull requests with code. You haven't reviewed yourself. I'm tired of reviewing reams and reams and reams of code that I'm know that nobody actually read that. And why, so why did they expect me to read it? So anyway, a great resource here. I love the, um, the, uh, the cheat sheet on red, green, your refactor is pretty great also.

Um, and approach to highlight that since the testing is kind of my thing. Right. Um, and this is the, all he has tested it and the phrase use red slash, green. So use red, green TDD is a pleasingly succinct way to get better results out of your coding agent. So you can tell your coding agent to do this and it will know to write a test first and, and, um, test, you know, make changes until it's green. What's interesting is normally we think of TDD is red, green refactor, the refactor part.

That's when you need to get involved. So you can have the agent do the red, green part come, which is come up with a test that describes what you want to do, write code until you have code that does that. Now you go and review that code and you can talk to the aid. You don't have to necessarily change it. You can talk to the agent and go, this part of the code is weird. Can we change it to a different pattern? Or, um, is there some way to clean this up?

And I've had really good results with that actually to just say, kind of good, but this part, why did we do that? And it's surprising to me to have the agent come back and say, Oh yeah, that's weird. And change it to what I would expect. Like, why didn't it just do that in the first place? But it doesn't. Um, so, and maybe it will in, in the future and the future might be next week. Who knows? Um, but for right now, these are great engineering pattern, great things to watch.

So thanks Simon. And I trust him to like, keep these up to date. So anyway, yeah, this looks super interesting. I definitely want to check it out. I've already spread this around to, to work. And especially the people that have sent me code reviews that I'm like, you didn't read this. I know you didn't. So. I think that's part of the pushback as well as like people are lazy or they don't know what they're doing. And they just, here's 2000 lines of code that fixes what I was asking for.

You're like, no, go away. Yeah. Where they spend some time. You're like, actually, can you narrow this down to a 10 line change? This is all I want. Please don't go do other things. Like just help me understand this and why this needs to change. And then, you know, I think we're still learning how all this stuff works. And there are engineering practices, but it's so, the stakes are so low for getting started. You know, normally you're like, okay, we're going to set up our build tool chain.

And then we're going to learn the language and the syntax and the structures and the keywords. And now it's just like, I'll just use regular English to just tell it stuff. And it'll probably figure it out. Right. That's, that gives the sense that I don't need to learn this as a skill, but you do. Yeah. I also think that we're getting a lot of advice about how to utilize agents from startups and startups have a different field. There are startups are greenfield.

For the most part, they're writing new code. Whereas a lot of software jobs are maintaining existing code bases that have been around for decades, possibly, or at least years. And, and you can't just not care what goes into it. Yeah. It's a, you've been handed this thing that is making your company money. You can't make it worse just because the agent decided to rewrite everything. So, yeah. Yeah. Anyway, for sure. Well, do we have any extras? I got a few extras. Why don't you go first?

Okay. Um, a couple ones. This, uh, the first one comes from John Hagen. Thanks for mentioning this because I almost made this a, a top level story, but there's not much to say about it other than this is awesome. Um, upgrading Python versions with uv. So if you, do you, you, we know, know that if to get all these new features, any new features from uv, you have to say uv self update. Um, I think, is that right? I think self update. Yeah. Um, UV self update.

Yep. Um, but after you've done that, now you can say uv Python upgrade. You can give it a specific one. So like for instance, if you say uv, uh, UV Python upgrade 312, it updates 312 to the, uh, the latest, uh, version, the latest dot release, um, which is cool. But if you leave that off, which that's what I do, uh, it just looks at all of the, all of the Python versions that you have installed on your computer through UV and updates them to all to the, to the most recent, like bug fix release.

Um, and what, like, why not? We should be doing that all the time. I'm going to set this up as a cron job or something. I don't know. Um, so that's cool. And, uh, yeah. So thanks uv making things easier. Once again, awesome. Awesome job. I've already incorporated it into my little updater scripts that I run periodically. Uh, next is also, uh, something that's suggested by a reader. And I understand New York times magazine is, um, is behind a paywall.

Uh, but, um, but for some reason I was able to read this fine. Maybe, I don't know. I have, I do have a New York times newspaper subscription. So maybe that's it. Anyway, uh, coding after coders, the end of computer programming, as we know it, this is a description of basically talking about whether or not, like, it's not just whether or not like AI is the end of coding, uh, jobs. I don't, you know, we don't think it is. Um, the, the conclusion here is it's not, but it's also more about that.

It's more than that. It's talking about basically kind of some different, different life, like different changes. And it also talks about, um, I believe it talked about the different differences between percent of, uh, uh, improve percent of efficiency improvement of greenfield versus legacy code. Whereas like a lot of startups say they're a hundred, a hundred times faster, but, uh, Amazon has said it's on average 10% faster, but that's not nothing to get.

You should still get excited about 10% faster, but, um, but don't expect, uh, make people maintaining your old code to be a hundred times faster. Uh, the reason why it was passed to me was, um, was because there's this great line. If I get, see if I can find it. pytest, uh, pushing code that fails pytest is unacceptable and embarrassing.

Apparently this is a, um, a, like a, a, an instruction that somebody has in their, um, markdown files to instruct Claude to, um, to always run the pytest and be embarrassed if they don't. I love it. This is good. Um, but anyway, uh, those are my extras. I actually, I think this is a, a well-written article for somebody that doesn't understand. Apparently this part, this, the, the author was, has been covering the tech world for a while. So nice. And also pytest got into the New York times.

Yeah, that's pretty cool. Hey, what you got extra for us. All right. Well, I've got a few, uh, let's start with talk by then training, uh, by per request from one of the users. They said, Hey, I would be really great if I could, when I log into my account, have more information. So I, I updated the people who have accounts there. If you go log into your account, it will show you all the courses you are actively learning. I have 48 of them. Uh, I haven't finished a bunch of them.

People might be like, Michael, you have courses on the website. You haven't finished by the time it gets to the website. I've watched the videos two to three times. I don't have to watch them a fourth time in sequence and like have the system record me watching them. So no, they're not all done, but it'll show you things like the ones you're working on. And how far are you through and when did you last watch it? And when did you start?

Apparently this is things like if you're submitting this as a training evidence for your employer, knowing when you started, when you finished and so on and, or whatever, how far you are. There's also a whole bunch. It shows you completed ones. I'm going to be generating certificates for people.

I'm just, it's easy enough for me to make PDF downloads, but I want to make stuff that you could say, post to your LinkedIn profile as an accomplishment, you know, like I've done the FastAPI course at talk Python as part of your like LinkedIn record in other places. You can put those kinds of things. So it's not as simple as just a PDF, but hopefully stuff like that comes. Anyway, this was fun to build. I think it looks really neat.

I think it's, especially if somebody is buying the bundle, they have access to a ton of courses and they might not remember like what course was I taking last month? Exactly. You didn't even buy it, but you took it. Then you forgot which one you're doing. This totally solves that problem. Yeah. Yeah, exactly. Cool. Yeah. That's what the request was like. I know I took a course. I don't remember which one I was working on in which order would, you know, help me get back to that.

And then of course, when you're in, in a course, it has a resume button. So you just click that to presume where you left off, but it doesn't have a cross cutting resume. You know what I mean? I do like how you split it up so that if, like you said, if someone, if I took the whole course and there's something I want to go back and review, I can just look through and go and watch those videos. It's labeled well. So. Yeah. Thanks a bunch. All right.

I talked about using latency to increase security for supplying chain stuff, right? Like, Hey, if I do a pip, a uv pip update or upgrade sort of thing, or similarly with sync and add and so on, just doing like an exclude newer than, or whatever, give it seven days or a week or however you do it. There's this article by Andrew Nespen that says package managers need to chill. And right at the top, we have this post requested by Seth Larson, the security guy at the PSF. So yeah.

Anyway, it talks about all the different, how you make your dependency manager tool chill. Like uv has an exclude newer, which I've been using. And it's mostly awesome, except for when there's a vulnerability that appears in one and you get a notification that you've got to fix it. But it just came out the fix. So you don't want to exclude it. But in general, it's, it's, I think a better thing than, than not. Why? Like, remind me why, why would you want to exclude newer stuff?

Because for popular packages, if somebody uploads a virus inside the package, like they take over the build chain or they fish the person who created it, like let's take shared it. They fish Dan, they get access to his GitHub and they install a subtle thing that downloads some rootkit or whatever, info stealer to your account. That usually gets found within the first couple days.

And if you're always just going update, update, update, give me the latest, give me the latest, you know, the chances that you hit that are pretty high, right? Because they won't get found in the first hour. Even if it's found in the first hour, will people be able to react and communicate within the first hour to deal with it? You know, but if you just say, give it a week, like probably most of the popular ones, if there was something wrong, it would have been found out by then. But okay.

What if it got found out and got fixed and the week boundaries there. And I like upload the week old one that has the bug or do they remove it completely from if there, if there was a virus, they remove it from pipe. Yeah. Okay. It's not even there. Even if somebody picks an old one. Exactly. I knew that I was just sort of playing along. Yeah. Yeah. Yeah. Yeah. Exactly. Yeah. So basically just more people singing the same message, but this is a nice cross technology. Are you in.net?

Are you in Ruby? Are you in JavaScript? Here's how you make it chill. Okay. So back to AI. Real quick. Paul Everett and I did this video debate, although it was not, not that much of a debate, but it was more of a conversation, but in kind of debate format about will AI kill open source, not the licensing part of it, but just will it make open source unnecessary? Will it just stop using open source and so on?

We don't think so, but we had a really nice chat and did a little quick writeup, but mostly the writeup links to the video. So check out the video. Okay. I also did a writeup called always activate the VE and V, a shell script. So I talked about this before, I believe. This is not the thing. This is the lead into the thing I want to talk about. And so as I change directories around my computer with just the terminal, it automatically finds and activates virtual environments.

But there's like, there's like, this was a thing in dirt, dirt inf. They said, well, we can't do this. What if somebody maliciously sends some kind of virus and like commits a virus called VE and V into the repo? And like it runs the activate script. What if that activate script is malicious? You know, that kind of thing. So with some nice feedback from Scott H, I made a much more secure version that whitelist them.

And if you're, it's not whitelisted, it says, Hey, do you, do you really trust this thing or do you not? Cause you might just open up a folder and go, Oh my gosh, there was a virtual indirect virtual environment somewhere and it activated and ran something that I didn't know was going to happen. Right. So all that I think is super polished and really nice. And I'm, I'm loving it. So here's the news.

Viraj Kenwande or Kenwande said, I wrote the antidote for ZSH plugin management and I ran across Michael's secure aware virtual environment activator script, which was pretty awesome. So this is now a Z shell plugin or Oh my Z, just Z shell plugin, a ZSH safe V and V auto is what it's called, which I thought was pretty awesome. That's pretty cool. Yeah. Yeah. All right. That's it for my extras. Cool. We each got a joke, right? Uh, yeah, I took mine down though.

Uh, so I'm going to have to rely on you to bring it into your thing. So, all right, I'll find it. No worries. So this, this one is so good and it follows this AI theme that we've been going. Remember the stack overflow keyboard and this is exactly the same vibe as the stack overflow keyboard. The stack overflow keyboard was like the coders keyboard and it just had a control and a C and a V for the joke of just copy and pasting from stack overflow.

Yeah. Well, if you've done anything with Claude Code, it often asks permissions to make changes and it says, do you want to allow this once? Allow this always? Or do you want to reject this change? And so it's the super fancy Apple looking keyboard that just says allow once, always allow or reject. So this is funny on its own. You all have to check out the picture. I put it in the show notes. It may or may not show up in your podcast player. I don't know.

Maybe I can, I'll just make it the, the poster art. Um, so, but also there's, there's two, there's two, too many buttons. I think you just all need allow always. I know. Well, let's, let's review the comments because, oh my gosh, they're so good. There's 223 comments. Yeah, exactly. Issue says waste of two buttons. I truly productive. I should only have allow all. Um, did it, did it get somebody? This is like the, remember the joke last week that was like, so you're new to a sarcasm.

The person that looks like an AI generated image. Yeah, exactly. It didn't. Obviously. And there's the secret button dangerously skip permissions. Somebody added it to their stream deck for real, that it actually allows it. Yeah. So Matt says too, too many buttons. But if we go down, oh my gosh, there's, um, there's, there's, this is the one, the actual one, Brian, there's, there's a used version.

It says update after day one, it shows the same picture, but the allow always is like cracked and smash. And just like, it's just been hit like brutally. Just use. Yes. Oh, this is really good. And so someone says, you got to be safe. And they create a little like Rube Goldberg machine that just like automates hitting allow once. But forever. Yeah. Just a little bobber, bobber thing that just hits it all the time. That's funny. Yeah. Devin says, no, no, no, we need the, you know, Claude code.

We'll say, we're going to, I got to ask you a few questions. Here's three options. Do you want one, two or three, or sometimes there's four or you got to choose other. So there's one that has like a second row that says one, two, three, four other. I would actually use that. I know. It's so good. Another one has like the three, the allow once. I'll always reject it. It has a microphone button to dictate to it. These are so good. You got to look at the comments. That's funny. Anyway, that's good.

Yeah. Yeah. That's my joke. But my favorite one of it is where it's like crushed with like after day one. Yeah. So, well, I'll, uh, let me try to get mine up. Let's see. I can pull up for you if you don't have it yet. Okay. Yeah. Just go ahead and pull that link up or something. Or the picture. Uh, so, uh, this was submitted with something else submitted by Paul Cutler, uh, has some news about AI too. Um, uh, are you getting it? You want me to, I get it? It's just slow.

Okay. It's just slow loading. There we go. So, uh, set this over on, uh, Mastodon, um, Paul Cutler today. It was mandated at work that we install Claude Code because as they said, it has built in PowerPoint creation capabilities. What a reason. FML. Yeah. Uh, cause you know what's coming next hour long meetings with lots of PowerPoint. You know, I thought this was super funny at first, but also like it drives me kind of nuts with it. Cause, because, you know, I'm a coder.

So if I have to write a PowerPoint presentation, it's unusual. So, um, this probably is a good idea that I could save some time and not waste time on creating PowerPoint. So, yeah, no kidding. Well, it's actually, it was a, it's a pretty neat integration. It's not just that it knows how to do PowerPoint, but it, if you open up the cloud desktop app, the same one that does cowork, it has a little, what's new button and you click it and it says install into PowerPoint.

And it actually adds like a cloud section inside of your PowerPoint presentation. Why? So you could like highlight a picture and say, could we get a different picture for this or highlight the text? And could you, could you animate this in from the left? Oh, like not while you're presenting though? Like, no, it's during the building time. Okay. Yeah. That makes more sense. No, you got like format picture animation tab, and then you've got Claude now.

Yeah. It's actually pretty, as opposed to just read my PowerPoint file and do this, you know, sorry, Paul, I want it to be during the presentation. So when you're presenting and go, Hey, Claude, does anybody in the audience stopped at Starbucks before they got here? I forgot what I'm talking about. Claude, please tell the audience what this means. Yeah. Anyway. Yeah. Awesome. Well, fun talking with you as always.

And I don't know if we need to, to, to change the name of the podcast to like Claude bites. Um, or, or, or how we, I don't think so, but I mean, honestly, it's, it's a good point, but as a meta comment for the audience out there, it's really challenging to cover this stuff because so much of the energy in software development and tech in general is in AI. Yeah. But we obviously realize that there's plenty of stuff. That's not really AI at all. At the same time, it's transforming the industry.

Like basically like the web, when the web came around and it's like, well, now we have the web, but we don't talk about it because it's, you know, I don't know. It's, it's tough. It's a balance. Also, I just am aware that there's people that care about Python, but also they have to care about this right now, whether they want to or not. So it's something I'm willing to cover as well. So, yeah. Yeah. And it's just, and it's wild. And may we live in interesting times. Bye. Later, Brian.

Bye. Bye.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android