181: Memory Management - podcast episode cover

181: Memory Management

May 12, 20251 hr 46 minEp. 181
--:--
--:--
Listen in podcast apps:
Metacast
Spotify
Youtube
RSS

Episode description

Intro topic: Video Game Prices

News/Links:


Book of the Show


Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h


Tool of the Show


Topic: Memory Management

  • Motivation
    • Avoid thrashing / crashes
    • Allocate resources efficiently
    • Keep high uptime
  • Where
    • OS Level
      • Heap management
      • Virtual Memory
    • Language/Compiler Level
      • Cpp
      • Garbage collection
      • Ownership
  • Tools
    • Instrumentation
      • Export to Datadog / Grafana
    • Python: psutil & tracemalloc
    • Valgrind
  • What to do when your program uses too much memory?
    • Reduce data sizes
      • Compression
      • References
      • Lazy initializer
      • Generators & Back Pressure
    • Ring buffers
    • Arena allocators
    • Disk based caching


★ Support this podcast on Patreon ★

Transcript

00:15 - Jason Gauci (Host) programming throwdown. Episode 181 memory management take it away, patrick welcome to another episode of programming throwdown. 00:26 - Patrick Wheeler (Host) My topic, I guess it's the opening rant yeah, it's okay, I know I'm curious. 00:33 I actually don't. I didn't talk to you about this before we recorded. So this is, this is uh, live on air, but uh, video game prices, all right, so everyone's like a different kind of video gamer. They're not a video gamer, casual, whatever, and and I get that. But then it's really interesting, specifically when we're recording this is right around the launch time of the, the switch to. So we have to say this now, because 10 years from now people will be like what are they talking about? Um, but the switch to is is coming out and there's been a lot of chatter online, you know, in people I see at work, and about the price of the system Probably reasonable, but the price of the games were considered more expensive, so sort of like 80 USD dollars to 90 USD dollars. 01:18 - Jason Gauci (Host) Just for my edification. What's the price of the system? Because I'm not in the loop on this. 01:22 - Patrick Wheeler (Host) Oh, I don't have it in front of me. 01:24 - Jason Gauci (Host) Roughly yeah. 01:25 - Patrick Wheeler (Host) Yeah, I think $400. 01:27 - Jason Gauci (Host) Okay, and what was the switch? One Was it cheaper than that? 01:31 - Patrick Wheeler (Host) Uh, I think like $50 cheaper. Okay, got it. Okay, it looks like it might be 450. And I think the original switch was 400. That's right, you get? I don't know because I just opted out which is part of the rant. 01:45 So some people are saying you know, oh, this is inflation. Or even, you know, when there were Super Nintendo and I remember going to a Kmart and Super Nintendo games were like what they're how expensive and it was crazy, yeah, like $70, $80. So people are like, well, you know, we're just kind of whining and if you look at the number of hours you get, like compared to, let's say, the price of a movie ticket, then you know it's a super. You know good value or price of going to an amusement park or something right, like you can kind of make these value propositions. 02:17 And what I realized is like price anchoring, price economics, these things are just so complicated, even in such like a I don't know a small chasm of the world, uh, that you can end up with such different logical reasons why the price is good or why the price is not good that are consistent for that individual, and then it just leads like to to you know outcomes all over the place like of course I'm going to buy this and here's my rational. 02:44 Like of course I'm going to buy this and here's my rationalization, or of course I'm not going to buy this. And then here's a made up rationalization and then other people who are legitimately trying to, they're on the fence and they're sort of trying to decide. So I have my sort of opinion on which side I feel this comes down on and my logic. But I wanted to hear from you Like feel, uh, this comes down on and my my logic. But I wanted to hear from you like do you think in general, like launch price of video games being sort of 70, 80, 90 dollars, like yeah, that's fine, it's worth it, or oh, no, that's way too expensive? 03:15 - Jason Gauci (Host) so yeah, here's my take on it. Um, I think that the difference between a video game and a movie is that the movie has a higher lower bound. In other words, like a movie can only suck so much. 03:35 - Patrick Wheeler (Host) Oh, that's a bold statement. 03:38 - Jason Gauci (Host) I mean like, well, okay, I'll put it to you this way have you ever walked out of a theater halfway through a movie? 03:44 - Patrick Wheeler (Host) Personally no, but I see very few movies. I'll put it to you this way have you ever walked out of a theater halfway through? 03:47 - Jason Gauci (Host) a movie? Personally no, but I see very few movies. Yeah, that's true, I see very few movies as well, but in my lifetime I've been to a movie theater, let's say, a hundred times, probably more than that, right, my entire life, and I have never walked out of a movie. I've watched every movie start to finish, right, but know, video games it's actually the opposite. Like very few video games do I really sink the full amount of time into. 04:12 And there's some video games that you know I get I that, the, um, yeah, the media kit looks really good and the, the trailer looks really good. I play it and it's okay. You know, it's not terrible, I don't return it or anything, but it's just like it just doesn't hold me and so I let it go and so and so, um, and so you know you you kind of like invest into the whole experience more than a single game, right, and so you know the analogy of like, oh well, if you take your favorite game, you sunk, like in my case, what's the game I played the most? Maybe, I don't know. 04:54 Rocket League or RimWorld or something I don't know. Let's say, I put a thousand hours into Rocket League and I paid 20 bucks or whatever. Now it's free to play, but I paid whatever. I paid 20 bucks or whatever. Now it's free to play, but I paid whatever. I paid. When it came out, well, yeah, I totally got my money's worth, but that doesn't count. All the games that I bought that I wasn't really that interested in. And so if every game costs $80, now you have this paradox where I didn't know that I was going to be really into Rocket League, I wouldn't have spent $80 on it, and so it just wouldn't have happened, and so I think that's the real challenge. So, for example, I could see Madden Football. It's like Madden Football is a known entity and there's a cohort of people who are going to sink a thousand hours into every Madden football every single one right, and so, yeah, they should probably pay like a hundred bucks, right, but but that's like a small part of the audience, I think. 05:59 - Patrick Wheeler (Host) I wanted to. I wanted to challenge you like really a thousand hours when it comes out every year, and then I realized, no, actually, yeah, you're probably right, there are probably people who play 20 hours every week. 06:12 - Jason Gauci (Host) Yeah, I mean, but even if we knock it down to 200, then you can still say, okay, 200 hours, you should pay a hundred bucks for that, but it's. The problem is, you just don't know. 06:24 - Patrick Wheeler (Host) So I think I'm on board with your, your reasoning. I mean I will say I guess to, to, to stick up for the other side, that not every game will be that expensive. So you know you, you can still have games that are probably priceless, but as a, you know, prototypical example, I hear you out. Um, for me the the kind of issue is realizing two things Specifically that Nintendo prices seem to stay up on their games. So if you sort of look, like you said, like at Madden, if you buy last year's Madden, you wouldn't pay anywhere near, you know, $80 or whatever. You'd be like $30, $20, $10. It's perishable. 07:01 They need to play it. Yeah, yeah, yeah, yeah, um, but there are other games similarly that that will launch, and then you know just discount over time, slowly and and assuredly, you know, go down, um, although famously factorio does the opposite. I guess only goes up, um, but good for them breaking trends. But the? I think they had a reverse sale once. No, maybe that was cards against you made it. Someone did a reverse sale once, nice. No, maybe that was Cards Against Humanity. 07:25 Someone did a reverse sale once Nice, but it made stuff more expensive for a day. But yeah, for me Nintendo prices seem to like come in high and stay high, and it's exactly the reasons when I end up playing them and generally I find them to be very high quality, which mitigates the risk you're saying, where, like you end up not putting the number of hours into it, and I don't spend that many hours a week in general. So it's very like a game will either increase the number of hours I spent in the week or, you know, just fit into the budget I already have. But the contrast I had is if you take something like playing on your computer or a Steam Deck or something else, or even one of the Android gaming gaming that they have, and you look at the prices there and like picking up stuff on deals that are like indie games or whatever, and just like games that are again a year or two old, prices seem to be so much lower. 08:15 So this fact of like the same game released on on, you know, ios, android, nintendo Switch and PC, even though it's exactly the same, have such greatly different prices, sort of like kills it to me to be on the platform with the most premium. Like why do I need to be on Nintendo? It's not the most powerful hardware, it's the highest price. Like if you're going to buy the same library of games, you're going to on general pay the most on the Nintendo platform. Like it's just very like it, it crushes me, like I'm like, oh, I love it, I like what they're doing, but that it kind of appeals to me as a casual gamer, but that that premium it really builds up if you're going to buy 10, 15, 20 games, even yeah, that makes sense. 08:59 - Jason Gauci (Host) I mean, it's kind of like a disney world where it's like, was it like 50 to park now? It's like what, is it like $50 to park now? It's like the prices are just getting so out of control and it's like you know it is a known quantity, you know that your kids are going to have like a higher, lower bound right. There's a higher minimum amount of fun than if they go to like Six Flags or something, but then it's like so much more expensive. Oh, one game that you recommended as a tool of the show a while back, and I played through tectonica um, has a very interesting story according to the dev team, and this is, you know, in the eye of of of them. So so, uh, uh, so take, take it with a grain of salt. But they basically said they did one of these things. 09:47 You know how Epic and other platforms will have a game free for a week, yeah, yeah, so the Microsoft Xbox store has this, and so they did all these free games for a week thing and, according to the devs, microsoft paid them a million dollars to do Tectonica for free for some period of time, and they took the million dollars, but what they said was the sales were just way higher, like so many people got it for free way more than they anticipated. Oh, that it gutted Exactly, dissipated, and then it gutted exactly, and then when they tried to, they couldn't recover. Basically, the free week, uh, was so good that they saturated the market and um, and so they went out of business and so tectonica is actually defunct I didn't know this. 10:38 Yeah, yeah, the the ending. Uh, I'm not gonna spoil like the the ending ending narrative or anything, but the ending is terrible Technically. The game just falls apart at the end it's just sort of unfinished. 10:52 - Patrick Wheeler (Host) I actually never made. I ate it pretty far, but I never I know what the ending was supposed to be around, but I never sort of like finished it. 10:59 - Jason Gauci (Host) Yeah, the ending is I didn't finish it either, but I got close enough. There's effectively the ending. It just trails off. It's kind of like if you lose your voice while you're giving a speech or something. It just kind of whimpers and it's just because they couldn't finish it, and so to your point. It's like so many people sunk so much time into it but there was no way to continue to charge, right, like that's one thing that a lot of these games do is they they charge for like a hat which does nothing. It's just cosmetic. But you know, I don't know if it's psychological or how this really works, but but even there's like single player games where you could buy a hat. Nobody's going to see it but you, um, but but if you can get some way to get people to pay once they're invested in the game, I think that might be the way to unlock the value okay, so we're way off off topic getting on other stuff but the. 11:55 - Patrick Wheeler (Host) So the opposite example that I've been trying to figure out and I guess I haven't actually gone and looked, but no man's sky, so this was like hyped up beyond anything it released. People were so frustrated that they had pre-ordered the game and it wasn't as good. And then now, for like what, six, seven, I don't even know like a long time, they basically continued to drop massive DLCs, massive updates. Like I tried playing a while like it's really complicated. I never really got into it, but like I mean, they've just continued to like completely change the game and, as far as I know, they've never charged for any of those dlcs. They continue to pump out huge content. They don't have any of the like you know in-app purchase stuff going on. So I don't know if they just like made enough money off of the, the pre-order, to basically like coast it in and just do one for humanity, but like yeah, it's fascinating it's. 12:46 Yeah, it is really interesting, okay, well all right, that was ben or warwin time for news. 12:52 So, keeping on the game game theme, talking about a over 30 year old game, I guess, uh, in super mario bros, which is, um, uh, youtube video actually by abyss soft I think I'm saying it right Step one jump in the lava In this video, and I've talked about a few of these, about like speed running tool, assisted speed runs, that kind of stuff, is a story about something that happened, I think fairly recently in the community where someone noticed on one of the Super Mario levels that they jumped into the lava in one of the Bowser castles and their game crashed and they sort of, like, you know, posted about it. They just kind of moved on. No one really thought too much about it and then eventually, you know, people picked up that like, oh yeah, there was this really weird condition where if a certain set of things happen in a certain way, in the details of like how enemies are loaded, as like the, the blocks, this kind of sprites that that kind of get put in that dude, in this one level out of all the levels where it even potentially could happen, there's only this one level have this setup and it would be very, very unlikely that you could trigger it, but this person happened to to kind of trigger a bad accident. They could explain why and then realize, wait minute, there's an opportunity for arbitrary code execution, which means, hey, you can make the game execute a jump command or you know, a human never could from timing or even input standards. They were able to basically put certain values in the place that would get read and cause the game to just jump to the ending and sort of like that counts as completing the game for any percent speed runs. 14:38 And then the challenge of sort of like getting humans to to kind of like, could they do it? Would it be possible? Is there a setup that physically a human could input the inputs to get it to happen? And this is not the only example of these arbitrary code executions occurring in these video games, but just crazy to me that somebody just stumbled upon it after all these years. How much detail people went into to like debug, you know, the super old code from assembly, to understand why it was happening and then to realize that you could set it up such that the game technically is crashing, but it crashes in a way that actually sends you where you want, into a valid ending of the video game and plays the ending credits. 15:20 And so just like one of the awesome I don't know, definitely not useful to your life in any way, but just like a fascinating um watch. And so for some reason these keep popping up into my feed every so often, even though I've never attempted a speed run. I don't really watch speed running, I don't keep up on it, but I you know have you seen the final fantasy 2 one? 15:41 - Jason Gauci (Host) no is it a? 15:42 - Patrick Wheeler (Host) summoning salt video. I just like I watch that one to just go to sleep. 15:45 - Jason Gauci (Host) It's amazing it's very meditative what is that? 15:50 - Patrick Wheeler (Host) what is the fun? 15:51 - Jason Gauci (Host) oh, go ahead yeah, so this one. Okay, there's a, there's a spell called exit in most of these final fantasy games, and so if you're in a cave or a dungeon or someplace as dangerous um, you or even a town, I think you can cast exit and it will take you to the overland, uh, you know where. So if you entered the town from the right, press Exit, you'll end up at the overland. One square to the right Makes sense. The same thing, but just for one level. 16:23 So imagine, if you walk down a staircase, you're in level one of the mines. You go down another staircase, you're in level two of the mines. When you cast warp, it puts you at level one of the mines, at the staircase to level two. See what I'm saying? Yep, okay, so there's no stack, so you can't warp and then warp again until you're out. So cast warp and if you try and cast it again, it I think it doesn't work or something, but anyways, so there's. 16:54 So for each level they need to keep track of, like the level you came from, and so there's this one room, or they need to put like a sentinel value there to tell you you can't warp, right, but there's one room in a town, in one of these villages where, um, they forgot to put the sentinel value, and so when you warp, you go to like some arbitrary place of memory and, um, because it's all wrong, it's like completely deterministic. 17:26 So when you cast warp in like this one room in this palace, you end up just in like random memory and it's just, every sprite is randomly generated, so it's like total chaos, but it's completely deterministic. And so the speedrun is like get to this castle as fast as you can cast warp and then like walk in this castle as fast as you can cast a warp and then like walk in this very specific pattern until you get the end credits, that's so good, yeah, I I feel like in another life this would be my hobby oh, totally, I could see myself spending like a whole lifetime, yeah, just figuring this out. 18:03 I mean, it's like it. It it's very satisfying. Um, and that is a good segue to what I actually did spend most of my life on, which is reinforcement learning. Um, this is a really interesting reinforcement learning paper. Um, and and I'll give a bit of background so, um, I remember talking to like Trevor Blackwell and like the founders of Y Combinator this is like pre-pandemic 2018 or something and saying like reinforcement learning really needs. 18:34 Like I said at the time, like a BERT moment. I mean, this is like pre-GPT, pre all that stuff. But basically, you know, bert at the time was kind of like GPT, where you could embed different sentences and you could finish sentences and everything. And it just felt like a very universal model, like it was multilingual and you could just kind of like finish almost any sentence. It felt like could you could just kind of like finish almost any sentence. It felt like and, um, I feel like reinforcement learning needs kind of something like that where, um, you're not training just totally from scratch, like when you start training, let's say, a robot arm or something, you would start with a model that kind of understands like, oh, if I swing this arm really fast, that's dangerous. You know, like like just that concept could just be there on the first epoch and it would just understand that, like high acceleration is dangerous. Now of course you could hard code that right, and that's what people do, but but like that's just one example. It'd be cool if there was a model that was like just like at a high level, understood like smoothness and efficiency and these things, um and uh, I think the community is still. 19:51 You know, here we are like almost 10 years later. The community is still kind of like marching towards that, and this is kind of the latest paper. This one is a really big advancement in this direction. Um, so it's a very simple to read paper. They paid for a domain name for the research paper, which I think is a really clever way to spend ten dollars and really boost your yourself. If you're a phd student, I mean, this is like a nice hack, um, but uh, but you can see kind of some cool examples. They have have nice videos. 20:24 The paper is like relatively easy to read. You know, if you get stuck, there's citations, obviously, so you can kind of go back, but without like spending a whole bunch of time on this. You know they basically are able to build world models. So I'll explain just. I think we talked about this in the rl podcast. 20:46 But, like you know, world models are really tough because most of the time you're just predicting like the next word, or you're predicting like should I show this ad to somebody? It's like there's always like relatively like small space, like a yes, no question, um, but in this case, like you have to predict then what the future of the world looks like, and so there's like a lot of things you have to predict all at the same time, and that makes it extremely difficult. Um, a really interesting paper called dreamer, and then dreamer v2, v3 have come out out, and this is just an incremental improvement on Dreamer, but these increments are getting big, and this one in particular, I think, really is inspiring, and so give it a read If you're into this area or if you want to be into this area. It's a good way to get started and I'm really excited to see where all goes so the world model isn't something I guess I hadn't really thought that much about it. 21:53 - Patrick Wheeler (Host) But the world model isn't something that a programmer writes. It's something that is like learned so the system itself is trying to like. I guess that's the part that always like leaves, like model in this always is a bit confusing to me. So it's not a set of physics equations, it's sort of like, in the same way of everything else, is a way of updating the values and the tensors or whatever, from frame to frame of of the planning in a way that reflects the progression of the world yeah, exactly so, like take like chess, for example. 22:31 - Jason Gauci (Host) So like chess, you could argue, doesn't need a world model because the game is so easy to simulate, right? Um, but you still have this problem where, if, if you don have a world model, then how do you rewind time? Now, in the case of chess, maybe you store the state of the board or you store the moves you made and they're all reversible. But for a lot of games, it's pretty difficult to be able to rewind time and and ask like what if questions? Right, so if you want to ask like a thousand, what if questions, you have to be able to ask the first one and then rewind back and ask the second one, sure, sure yeah and so look at like mario, like we talked about mario earlier. 23:18 If you wanted to do a reinforcement learning algorithm to play mario, you know, and you want to do counterfactual, so you want to plan. You you're going to have to like say, okay, what happens if I jump? Oh, if I jump, I hit the flaming rotating thing and I die. So let's go back and try something else. How do you do that? Go back Now, like you have to implement safe states and it could be really expensive. It's not really practical. Practical if you have a world model, then your input is just like an input to a neural net, it's just like a vector and they're super easy to store, to retrieve and everything so. So even for like chess and mario and these games, a world model makes, uh, planning practical got it, got it yeah, and so that's why you uh, you could have physics. 24:07 - Patrick Wheeler (Host) You know physics is extremely lightweight, but, um, but most like simulators, emulators, are just too heavyweight, so you'd build a world model regardless all right, you said the paper was easy to read, so I'm going to take it as a challenge and I will attempt to read the paper and see if that's true. 24:27 - Jason Gauci (Host) Yeah, I'm debating whether or not to tell this story, but the primary author was a visiting lecturer at a company I worked at and he's super nice, really nice guy. His name is Xiaolong Wang, but his auto-generated email was XLWang and I thought that he picked that email. So I was like oh, your email is really funny. I thought it was clever and I thought it was very funny. And he's like what are you talking about? All right, no, no, no. 25:13 I was like never mind, I'm an idiot, but if you're out there, I didn't mean that as like a insult or anything. I thought it was hilarious. It's still very funny, Uh. But uh, you know, these things happen. 25:33 - Patrick Wheeler (Host) We're in a hard pivot. Clever code is probably. What kind of pivot Clever? Uh, jason, you're in a mood today, man, all right. Cle, jason, you're in a mood today, man, all right. Clever code is probably the worst code you could write. 25:45 This is on a blog. Engineers Codex Link will be in the show notes, like always. But here you know one of those things where they're making a great point. They have some well-sourced comics that are applicable to this, but just talking about kind of like their personal experience that you show up, know, you're a new coder and you're like I'm gonna be so smart, I'm gonna write the fewest characters, the fewest taps on my keyboard, and then, as you, I don't know, I don't want to say become an, become an old programmer, but like now it's how do I write literally the dumbest thing that will get the job done, like, yeah, and it's okay if it takes a lot of I mean, within reason. 26:24 You ever see those kind of like running jokes on you know x or or uh, reddit or something, where people are like how to find prime numbers, one to a hundred, and they put like a giant if statement and yeah, like all the, all the prime numbers between one and a hundred hardcoded as if statements or something. Okay, yeah, I mean there's a balance there, um, but yeah, for sure you know these like very terse, very compact things. We've talked about this, you know, on the show it's not gonna kind of belabor it, but more than just observing that that that's bad, they were sort of which isn't always true. Sometimes you find something that's bad and it doesn't yield the good thing, um, it just says this thing is bad and you got to move somewhere else in the design space and try again. But in this case they were kind of pointing out which I think is a good call that actually here, doing the almost the opposite thing of the clever code and writing that is almost always a really good decision and harder than you think, harder than you think, and so writing code that works and is effective and is understandable by by you, know the audience and, most importantly, you know like understandable by you in five years, when you've forgotten everything about that code. 27:35 There's certain pieces of code base that I just like I dread going into, because I just know that, yes, it works, but it wasn't done with a clarity of thought and and uh, sometimes that's because you're trying to grow it organically. Sometimes you didn't really know what you were doing until you got done with it. But the places where you knew what you were going and you got in and got done and you followed you know, like your, your guiding principles about how the code should be you know modularized and segmented and you know all that kind of stuff are are much less problematic to go into. And it's one of those things where, again, it's it's sort of like going the opposite of that right. 28:11 So writing really compact lines, putting everything in one big function, you know, makes it seem like, wow, this is really good code, like look how dense it is. But again, like, if you think about the opposite, like those places where I know in the code base there's like one really big function that just does a bunch of stuff, it's like, oh, no one wants to touch it, but you occasionally have to go in and add yet more and it's like, yeah, this really needs to get broken apart and everyone just kind of dodges doing it because unpacking it is just such a heavy burden. 28:40 - Jason Gauci (Host) Yeah, totally, I mean, I'm thinking about. I saw this code one time where it used like templates to auto generate classes. It's like you needed like a, like a float handler, and an int handler and a handler and a bar handler. 28:57 It's like, oh, I'm gonna make a template and then I'm just gonna have like instantiate this template like 20 times for these 20 different handlers, and yeah, just like. Then, you know, someone like control F tries to find the function and they can't because it's auto-generated and everything. Yeah, it's like, yeah, you, you were clever and you saved, you know, maybe like three hours of development time, but you added like a thousand hours of development time, but you added like a thousand hours of debugging time yeah, um, yeah, that's cool. 29:30 Yeah, I think this is like, along with memory management and, and, uh and uh, uh, you know, instrumentation, this is really what separates, I think, the wheat from the chaff. So, yeah, this is cool. I'm going to give it a read. My second news is this open source text-to-speech model called DIA. So the article says DIA has arrived to challenge 11 labs, open AI and more. Patrick, have you ever used text-to-speech recently, in the past few years? Used text-to-speech? 30:05 - Patrick Wheeler (Host) like recently in the past few years, no other than like an AI response service. That's like re-saying words that I said I've not attempted to like have something read to me. It is remarkably good. 30:17 - Jason Gauci (Host) So like you can now put emotes, so like in parentheses you can put laughs and like the speech, the character will laugh yeah, I know it will actually laugh and then continue talking. It's supernatural, it's amazing, and 11 Labs in particular has this really interesting thing which I highly suggest people play around with. 30:40 It's totally free until you hit a certain usage, but you can have it create a voice just by describing the voice, so you could say, like an ogre that's terrorizing a village, or or, uh, you know, like like a, uh, like a, a woman, uh, soccer announcer or something you know, and it will like create that voice and then you can say whatever you want in the voice. Um, so that's pretty mind-blowing, and I feel like text-to-speech is one of these things that like kind of needs to be open source because you kind of need to, on the fly, you know, generate a lot of speech. You know, I'm imagining, like my house, like if I wanted to have my own alexa. Well, like alexa kind of needs to be able to say anything at any time and, uh, it needs to be kind of real time, and so I guess you could do it with some kind of api charging or something, but for me it kind of feels like something that should just run locally and be relatively quick. 31:44 Um, and so dia is open source and they claim to be comparable with, you know, all these other kind of uh, you know, leading uh systems. So, and the models also aren't that big, like people are used to seeing gpt or even the text to image models that are really big and they barely fit on your GPU. You have to jump through all sorts of hoops. But because this is text-to-audio, the models comfortably fit on commodity hardware, and so I haven't tried this one in particular, but I'm going to try it later today and I think we're very close to getting to the point where if you ever just needed something to say something, that you could just do it and it just is a total commodity and works great. 32:37 - Patrick Wheeler (Host) How does it handle not getting a problem with copying celebrity voices? So they have that with image generation or video generation. Someone's pointing out you, if you say words that are like copyright, like I want a picture of mickey mouse, right, and. But if you say I want a famous animated cartoon rodent, right, like there's only one of those in the training data and so it'll you know, sort of like there's all these kind of loopholes and workarounds. If you want, like a specific person to you know, can you achieve it by describing them? Or do they also put the same kind of censoring safeguards in to try to catch it late that? Oh, no, no, no, this sounds too close to a known celebrity. 33:21 - Jason Gauci (Host) Like we can't do this voice yeah, I mean, what I know is uh, I went on 11 labs and I said basically, uh, it's my friend's birthday and I wanted the unreal tournament announcer remember this game unreal tournament yeah I wanted the unreal tournament announcer to wish him a happy birthday. 33:38 So I went on 11 labs and I said give me the unreal tournament announcer voice. And yeah, as you said, it came back and said you know I can't do copyrighted stuff etc. Um, and then I tried like a few different things. I tried like give me a video game announcer voice from a, from a uh, you know, a first person shooter video game. Yeah, I tried to be kind of generic and it caught me every time. So what I ended up doing is I ended up having to use a different tool. It turns out there's a website you can go to, or someone else has an Unreal Tournament voice and uncensored yeah. 34:14 - Patrick Wheeler (Host) Well, that's what I was going to ask, because you could just do like, give it a voice sample and, even if you don't do like voice cloning, say, can you describe this voice for me in in excruciating detail and then have it, you know, give all of the idiosyncrasies of the pronunciations and all of the you know timber and tone and all of that, and then feed that in and say like, okay, now I want these words in this description. But, like you said, I there feels like there's probably still some other mechanism where they work hard to censor, which is always crazy because it has nothing to do with the actual like quality of the outputs, it's strictly just for censoring purposes. 34:52 - Jason Gauci (Host) Yeah, I mean, I doubt that 11 labs censors or even open ai. I don't think they actually go back and look at your picture and then say, whoops, we messed up, or do that? 35:04 - Patrick Wheeler (Host) No, so they definitely do for images. Because this happened, I was trying out the new chat GPT image generator, which is really good, by the way, and does a different approach than a lot of the other diffusion stuff, and so I had it generate something which I don't know if it was a very famous style of puppets from a picture of my family, and it got like you know, if you've seen, if you've tried to chat gpt1, because it's not to be, it goes top to bottom, so it starts rendering, um, you know, top to bottom, and it got half the image, like whoa, this is really good, like it perfectly kept the background, used a famous, a female pig, a famous male frog, okay, yeah and you know a famous drummer. 35:46 So we're gonna lose our sponsor? No, no, our patreons are. Our patreons are bailing on us. And so, like it got happened, I was like this is great and I wish I had taken a you know screenshots. I was like. It was like this is skill, is amazing. And then, uh, as soon as it got like part of the way through, it like realized it was generating copyrighted content, and then it like it bounced. It was like, oh, I can't do that, wow that is wild. 36:13 - Jason Gauci (Host) I don't know if 11 labs does that. To be honest, I would have said emphatically no, but now I'm not so sure. Um, um, they do support voice cloning, I feel. Feel like you could probably put a low-pass filter and some other tricks to make a voice not get picked up, to circumvent all the fingerprinting. 36:38 This DL1 also supports voice cloning and it's open source, so I think in this case, there's really nothing preventing you from getting a bunch of unreal tournament voice samples, running them through dia, generating a voice and then having an unreal tournament announcer uh, say whatever it wants to your friends. So, um, uh, yeah. So I think, uh, yeah, I think that that's, uh, that's a really interesting idea, though the the whole like, uh, post hoc censorship, like maybe you could take a screenshot quickly, or or uh, or who knows, but um, but yeah, I think that the there's another debate around the open source versus closed source. Um, you know, situation. I mean these open source models, closed source situation. I mean these open source models are getting really good and it might come to the point where, like there's a lot of things like this, like compilers are an example. So way back in the day, patrick and I used this compiler called Green Hills Patrick. I don't know if you remember the Green Hills compiler. 37:47 - Patrick Wheeler (Host) Yes, yeah Are you going to play it, yep. 37:50 - Jason Gauci (Host) That's an example where, like, you could get GCC open source, and for 90% of people it's good enough, but like you need this Green Hills and honestly this is out of my area of expertise, so I wouldn't even really know what's going on there. But like, suffice it to say our company bought the green hills compiler and so it might get to the point where, like dia and other open source models are good for like 95 of us and you know you just use open ai or one of these other models when that five percent makes it kind of worth it. So that might be where we end up with all of these. 38:29 - Patrick Wheeler (Host) Yeah, I saw we're mixing the news. We can move on, but I saw OpenAI was saying they're going to release an actual open source model soon. So, yeah, I'm curious where the money will be made. You know, inference, test, time compute is it going to be the models I mean? I think, yeah, it's going to be all over the place. I'm fascinated by running it at home, but it feels like I don't know. I see people buying rigs and doing like video generation, but then it's so hard to keep up with. So people did that when they were doing like cryptocurrency mining and they get the latest miners and you know, know, and for some people that paid off. I'm not sure there's a payoff here. 39:10 In the same way, if you go out and buy hardware to do all this stuff at home, yeah, if you're a business, I feel like renting it is still better, even if you're choosing to be on an open source stack for, for whatever reason, like renting at least the hardware, um, and then using the models and stuff because the it moves so much like you can end up being in the wrong place, uh, and and sort of behind and really wanting something different than what you have. 39:37 - Jason Gauci (Host) Yeah, yeah, I totally agree. Yeah, I see no point in, uh in, in paying like buying this stuff outright, when you can rent like even an EC2 instance from Amazon and it's only like 90 cents an hour. You'd have to run it for so many hours to justify, you know, buying it yourself and the electricity and the depreciation and all that, all right. Well, or book of the show. Patrick, what is your book of the show? 40:09 - Patrick Wheeler (Host) all right. So mine's a bit out of left field and by no means, uh, you know, am, am I a uh, exercising person. I've been trying to be diligent about, uh, the amount of exercise I do over the next last few years and and going to the gym. Well, I have gym, have gym equipment. You know, just just in my, in my, one of my rooms, that I, you know, lift heavy objects around for a few minutes is what I say when I'm going to go do it. I'm going to go move heavy objects up and down for no reason, Nice, but but trying not to to, you know, have the myriad of health problems that can creep up from sitting in a chair in front of your computer all day. 40:47 Yeah, and there's a guy on YouTube, Jeff Nippard, who does sort of I don't know it's, you know, mostly lighthearted content about in the gym. He's a very large guy, like in terms of muscle mass I am not that way at all, but you know he has some like sort of let's take a science backed approach rather than just, like you know, go until you puke, or, you know, if you're, if you're not, if you're not hurting, you're not doing it hard enough. Um and so generally good. Good, there's several good YouTube videos, I think, for for watching this stuff. And although these guys may spend you know 30 minutes or an hour in the gym every single day, you know he will also point out there's ways to spend you know sort of 30 minutes once or twice a week if you're really efficient, if you combine them in certain ways. Anyways, he put together an actual you know physical book. He had exercise routines and stuff before and it's always a bit sketchy for me whenever I hear YouTubers you know like, oh, buy my ebook, buy my PDF, buy my I'm trying to think if I've ever done that, if maybe once or twice. 41:58 Anyways, he had a book and it was on Amazon, like an actual hardback book. It was, you know, reasonably affordable, called the Muscle Ladder. The analogy is kind of weird, but you, you know talks about principles that are sort of the the sides of the ladder and then rungs on the ladder, but also includes a bunch of you know sort of workout routines and also pictures, which is very useful for me, because sometimes you just read this like, oh, you're supposed to, I'm again it's not my background a romanian deadlift or a sumo squat and I'm like I don't what is. What is that? I vaguely know what the word squat means, Um, but so then you're like going on YouTube trying to find a video to watch, and then the person's like hi, I'm so-and-so, subscribe and like, and you know five minutes. You're like, okay, just show me the like, what is the thing? 42:39 But he has nice pictures for each of the exercises. So, just, I find a very useful book, as I'm trying to, you know, find a, a sort of routine that works for me. So, uh, people will be like they're weird pick Patrick, go back to the sci-fi. Yeah, you're probably right, Probably should go back to the sci-fi, fantasy picks. Um, but it's a book that I have actually been reading, so, uh, I guess it's so what have you learned? 43:02 - Jason Gauci (Host) How have you changed your routine, like are you lifting heavier things, are you in the heavy lifting room longer, or what? 43:10 - Patrick Wheeler (Host) I think it's. Finding, like, what exercises kind of pair well together isn't obvious to me, so I'll tend to just go and do whatever exercise I know. But there are often, when you read these, a lot of variations on the exercise to target slightly different parts of like. For me, it's like I'm going to squat. Why am I going to squat? Well, I want stronger legs. It's like okay, well, that's that's very crude. 43:38 Like what part of your leg, what muscle, what muscle group? Like which compound? Like, uh, you know this kind of squat targets, you know your glutes and your hamstrings in this way this one does, in a different way, and so learning alternatives to the exercise so that after a couple weeks of doing it one way, it's not just like keep doing that one and lifting heavy, which is a philosophy, and that's kind of what I've been doing like just try to do the same thing over and over, slightly heavier or slightly more, but actually also throwing in variations every few weeks where the muscle you're working in complement with that is slightly different. Um, and and so personally, that's one way that this kind of stuff has helped me and also that keeps it from being super boring, like for me, it's really boring to just go in there and do like exactly the same thing every time. 44:26 - Jason Gauci (Host) Yeah, that makes sense. That is super cool. Okay, my book of the show needs a little bit of intro, because this is one of these ones where this could easily degenerate into us getting all sorts of hate mail. We do, by the way, get hate mail. We don't have sponsors, so no one can pull out of the show, but we have gotten hate mail before. But the reason I got to this book is because I was playing Crusader Kings 3. Have you ever played this game, patrick? 44:58 - Patrick Wheeler (Host) I own this game that I bought as part of a Humble Bundle and I would love to play it. It seems fascinating. I'm nervous. 45:04 - Jason Gauci (Host) I haven't started it. It's one of these games you're like wait a minute, it's 4 am, like what just happened, um and it's paradox right yeah, yeah, all these paradox games. 45:14 It's actually to the point now where, like if I, if it's a paradox game, you basically need a mod that will speed the game up, because I, I don't know, maybe this is like uh, conspiracy theory or whatever I feel like they purposely don't have the fastest setting on these games because they want you to spend more time, because, like like this game, like like uh, you can play it at a much faster speed than the fastest one, I feel like, and still be productive. But, um, and it's stolaris, you know, same thing. It's like I feel like I'm always on the fastest possible which one of those is the best? 45:50 - Patrick Wheeler (Host) sorry, before we like, if I'm going to start, I want to play. I have a few, but I want to play one of these paradox games, so is it easiest? 45:57 - Jason Gauci (Host) yeah, what I would say is you should play based on the theme, like, do you want to be a medieval king or do you want to be, like a space empire? That's really the decision you have to make. Oh, so so Stellaris I basically play it and Crusader Kings I play both of these games as if they're turn based or like I want the speed to be infinitely fast until something interesting happens and then I want to pause, right, and so what I really want is a turn basedbased game and not like these, like real-time games, but the gameplay and everything the story and the theme and the setting is so good that, like it makes up for what I think is actually not that great gameplay, but the story is amazing. And so I was playing Crusader Kings very emergent, love it and I just got to thinking about war and going to war and and going to war and like why do people go to war? And like why, why do people fight each other? And because Crusader Kings is a very deep game, it's not just like, oh, I'm just going to get to the point where my army is bigger than the army next to me and then just kill them like you would in like starcraft or something in crusader kings is like all this diplomacy and and all this stuff right. 47:12 And so I started thinking about, like, why do these people even go to war? Like, why would these people fight for you? And and uh. And so that led me down this rabbit hole where I ended up reading this book called the metaphysics of war, and they talk about, um, basically, like historically, how people were motivated to go into battle. Like, imagine if there's a battle and you have a hundred people and the enemy has a thousand people and you're probably going to die. Like, do you like what? What? How do people actually fight those battles? And? And you know, people did obviously run away and everything too. 47:52 But, um, and so this, this book talks about how, like you know, in the medieval times, like around the times of crusader kings, you know they would talk to these crusaders and say, look, if you die in battle, it's actually even better than winning, because you get this like special place in the afterlife, whereas if you had won the battle and then just, you know, died of old age, you actually it's not like you, don't you end up in like a different part of paradise or whatever. You know, all of this like we're not making any this is what I was worried about like the hate mail. We're not making any moral claims, like we're not saying that's right or wrong or anything, but but just understanding what the mentality was at the time and everything was really interesting to me. And so, um, it's a pretty deep book. 48:43 Um, you know, it's something where, like you have, you have to want to know answers to these kind of questions and you have to put a bit of your historian hat on. But I found it interesting. Yeah, I thought it was kind of cool. I don't think I'm going to be reading any more books. I feel like I've kind of covered what I wanted to know, so I'm probably going to move on to a new topic, but, uh, but I thought it was kind of cool and I just finished it yesterday. 49:10 - Patrick Wheeler (Host) So, yeah, I mean yeah, this is a tough topic and yeah, like without making moral statements, I mean, in general it's tough, but it doesn't seem to be going away. So war comes and goes and yeah, I guess I consider myself lucky that I haven't had to think too much about it. Yeah, that's true, maybe if I played Crusader Kings, I would have thought more about it. 49:39 - Jason Gauci (Host) I mean playing a video game and then being like I think I'm going to read a book. I feel like this might be like peak Jason here, it might be like the nerdiest thing I've ever done. But yeah, maybe I should play a video game like Tetris or something. 49:57 - Patrick Wheeler (Host) All right Time for tool of the show. All right, patrick, what's your tool? It's a game, all right, all right. Hybrid tool of the show. All right, patrick, what's your tool? No, it's a game, all right, all right. 50:12 So jason alluded to this earlier. Uh, as far as um, money making schemes and games, I guess. So I'll say the game, but then I I'll give my my caveat to it. So that is the pokemon trading card game pocket. So, um, I've been playing this, actually with my kids. I, to be clear, neither me nor my kids have spent spent any money on this game. It is a free to play game, um, but it's it. It's one of those games that has timers and you have to check back in every so often and you, you know, there's this thing that's built in that I want to talk about. But you know, I had a Pokemon game for Nintendo Switch, actually, funnily enough, although older, last time. 50:49 Now, this trading card game and in general I've gone through phases where I've collected cards and then I've never been super into it, but always off and on it's intrigued me. But I've really enjoyed this game. I feel like it's reasonably generous as someone who doesn't play free to play games normally, or at least not this style. But you know, each day you can open, you know a pack of cards and then you know you add them to your collection and then it does all those things that feed into that certain part of your brain. That's like completing the collection, you got a new card, you got a rare card and sort of building it up. But it has been interesting to explain to my kids, like even if all of you know, let's say, there's 50 cards to collect and you open a pack and it has five cards in it, and even if all the cards were equally likely, people do this. Oh, I opened 10 packs and then I have 50 cards, 50 unique cards. It's like no, that's not how it works. 51:47 If you actually look at it, to get 50 unique cards, where cards can occur, you know, with an even distribution, you need a lot more than 10 packs. Because that last card, even if they're not juicing it, even if they're not having rarity, even if they're not cheating you, that last card and you don't know which one it'll be, it'll be sort of different for everyone. But that last card, just statistically, the time to a complete debt for some people, like 10 of people, is going to be just enormously large number of of you know, pack openings they have to do and so really, like, if you're going to play it, you have to have this attitude of like it kind of doesn't matter. Um, but I will say they have a competitive game. The kind of game is not super strategic. You can just kind of play around. They have single player battle events that come up and you can play against the computer, which is kind of interesting. So it fits in that, like you know, for a few minutes each day you can kind of open it up and do your thing. 52:45 Uh, people do the same with duolingo and their duolingo streaks of you know. Oh, I've been every day for 200 days I've been practicing this foreign language. It's not clear. They actually like know the foreign language, but it's just like certain thing of like doing the same thing over and over and streaks, and so if you, if you're interested, if that at all appeals to you they have iOS and Android. Again, I don't spend any money on it. If you're prone to wanting to spend money to complete something or to have the best or to have the ultra special, rare cards that you see online, probably stay clear. But in general, that's not my particular vice, so I can play it and just put it down and enjoy it for the enjoyable time it is, without spending money. That makes sense. 53:32 - Jason Gauci (Host) I saw this crazy stat related to what you were just saying. If you shuffle a deck of cards and then go through that permutation, it's like 99 and there's like so many nines after that percent chance, and nobody's seen that order before. 53:51 - Patrick Wheeler (Host) Yeah. So if you like, perfectly shuffle, shuffle, which is actually really hard to do. If you perfectly randomly shuffle a deck of cards, yet you're probably the only person in all of time that has ever had that permutation. 54:12 - Jason Gauci (Host) Yeah, it's so wild. Know, everyone knows GPT and all these things. There's another category of small language models. These are basically you know, imagine if you were making a video game and a crusader Kings. You're making crusader kings 27 and you and you actually want an llm to like generate the dialogue, right, speaking this up. But you know you would want that to run on people's computer. And so there's this question of like you know what could. When can we get to the point where you could deploy a desktop app with an LLM and be confident that a vast majority of your market could actually play it? It's kind of like Crysis or Cyberpunk. 54:59 When these games came out they were virtually unplayable because the graphics were just so intense. You had to have such an intense graphics card. Because the graphics are just so intense, you had to have such an intense graphics card. But now you know a commodity computer like you could buy a computer from Best Buy. You know, play Cyberpunk at Ultra right, and so you know we're marching towards that. 55:16 5.4 is the 5 models are just extremely, extremely high quality and I just can't stress this enough. I've been playing with 5.4 a lot this week and it is right up there. Benchmarks are just so easy to game and etc. So, you know, benchmarks say whatever. They say I don't even know. But when I use 5.4, I find it's comparable with all the best models and it's very small. I think it still requires a GPU, although there is something called BitNet, also from Microsoft, which shrinks these models down so they can run on the CPU and of course, you know, then it starts to get to the point where it is sort of degraded performance. But, as I said, you know this is a March and so we're not there yet as a community, as an industry, but we're getting there and so it's really cool to kind of follow along and see just how powerful some of these small models have become. 56:17 So this model can run comfortably on a GPU at your home and it does all the multimodal stuff. So you can give it an audio and say transcribe this. You can give it a picture and say do OCR on this, and it actually will blow away almost any OCR software. Like I compared 5.4 to Tesseract, to EasyOCR, to TROCR, which is also transformer based, and 5.4 just blew away all of them. It wasn't even close. So it's just amazing that you can just spin this up on your computer. And so we're at the point now where, if you're willing to limit your audience to just people who have NVIDIA GPUs, you can roll out LLMs as part of your desktop software, which I think is pretty amazing, and I think it's only a matter of time before that bubble, that pie, can grow to cover everyone with a GPU, and it only a matter of time before it can grow to cover everyone with a computer. So we're heading there and it's exciting to watch. 57:31 - Patrick Wheeler (Host) I'm curious what you know from a pricing, cost standpoint if there'll end up being some blend where, yeah, like you can run one of these 5.4 locally embedded in your software, even you know it's just like installed as a dll equivalent on your computer and you know various things can can kind of reference it. But then also the self-awareness that like, like you said, oh, the ocr is like pretty good, but you see examples on edge cases where the ocr, via the, you know that these models will produce coherent but wrong text. So if the text is just way too blurry it'll just kind of assume what was probably there and, you know, spit it out. And the ability to recognize, like I'm beyond my abilities, like go go to the cloud and ask a bigger model, or you know, come back and you know, oh, I like to get a sort of blended blended way. 58:24 I still haven't seen much where you know and people call hallucinations not exactly that right, but just like the self-awareness of how good the model thinks it did, it just seems to always assume it's doing really good yeah, you know what I would do there. 58:40 - Jason Gauci (Host) you know, if I was, uh, if I was tasked with that today. You know there's these models have a temperature parameter, and so I would crank up the temperature and then I would run the same model like 10, 20 times to get an ensemble of answers, and then, if they don't agree, then I would go to the bigger model. 59:01 - Patrick Wheeler (Host) Yeah, that's interesting. Or I wonder for some of like the OCR stuff specifically. Like eventually we get to approach where you, either in the model or you know whatever kind of project, back the text, right, so like, oh, read the text project, back the text, like is there agreement here or not? Yeah, you kind of like ask it to run itself backwards again, you know, and yeah, that's one of these things that people don't realize. 59:25 - Jason Gauci (Host) but like verifying is actually like, because some people say like oh well, the model that generated the bad answer clearly can't verify, but it actually can, because think about it in terms of like, like, uh, like watering your garden, so like if you turned on all the zones at the same time, then like it wouldn't really work, or it would work but it would be very low pressure and so you wouldn't get the same kind of watering. But like, if you're asking it to verify an answer you've already generated, that's kind of like only watering one zone you know, because it's like easier to verify than to have to generate the answer, and so it'll actually do a much better job. 01:00:09 - Patrick Wheeler (Host) Yes, there there's this. I've been working through this. I want to make sure we move on so we don't run too long. But I have been starting to use some of the code, ai stuff and it's one of the things that I realized is you, when you interact with another person, you would normally have you know some would normally have some distribution model of how good they are at what you're asking them to do. Can I give them a high-level, complex task and they'll about it and, looked at it, let's talk about the plan. Then you go work and then come back to me and like, let's do a check in. 01:00:50 You know, from time to time, but we don't treat the LLMs that way, we just give them arbitrarily complex tasks and just let them go. And so this thing that you're mentioning people are pointing out that it can often be helpful to say, first, what I'm looking for is a plan on how we could accomplish this, and then you know, let's get that plan really, really good. And then, once you have a plan, then now, like the plan is good, let's go execute on it, but let's do it step by step, like I only want you to do the first part of the plan and then only the second part, and then only you know, and so by doing it that that way you end up getting more points of interaction, but also you know better quality, even if you just auto approve everything that it, that it says, and you don't do that interaction yeah, yeah totally that makes sense. 01:01:39 I don't know. I maybe we don't anthropomorphize the lms enough or too much, like I'm not sure where in the spectrum we are. And then of course they're always changing and getting better. So you know, whatever you were doing before doesn't always work in the future. 01:01:51 - Jason Gauci (Host) Yeah, we should definitely do a show on Copilot. That's a desperate show that needs to happen. How to talk to your computer. How to train your train. 01:02:01 - Patrick Wheeler (Host) Are we eventually gonna have to have like the right way yeah, like communication skills classes with like appropriate ways to interact with your chat bot? 01:02:12 - Jason Gauci (Host) oh, interesting, like those, those communication classes in gen ed that I thought were useless, they're actually useful. I could talk to my computer with them okay, all right. 01:02:22 - Patrick Wheeler (Host) well, let's just, let's just hard pivot again there it is, I did it again, just for you. All right Memory management. 01:02:28 - Jason Gauci (Host) Memory management, so I suggested this topic. I feel like Patrick is going to know a lot more than I am, but I will start by talking about why I suggested it. Having an issue at work where, um, you know, we were just blowing up on memory and the pod would die and very. This is like a not an uncommon thing, um, where you run out of memory and then the os just starts like machine gunning your program, um, and uh, you know, it made me realize, like you know, it made me realize, like you know, as one of the more senior kind of like engineers or developers at the company, that like this is a skill that's not very accessible. Like there aren't a lot of people who know what to do when things run out of memory. 01:03:20 Like, like you know, a lot of people might just start randomly turning stuff off, or you might have some intuition, but like, what's the principal thing? Let's imagine like you walk into a code base, you're at McKinsey or Deloitte or something and you're, yeah, this is like the job from hell I'm about to describe, but like, your task is just walking into some random company's code base and reducing their memory footprint. How do you go about doing that in a principled way, and how do you maintain the right knowledge about your system so that it doesn't get to the point where it's blowing up and everything? So that's really the motivator, and so I think we'll dive into a bunch of different things that you can do, both to like catch, detect these things, and also to mitigate you guys got a better explanation of that topic. 01:04:17 - Patrick Wheeler (Host) I I misinterpreted it to mean just broadly all the things about memory management, because to me memory management is, uh, as a c++ program or something that is a haunts, your haunts, your dreams at night, um, but but no, no, I I think, I think we're gonna do a little bit of a dabble dabble across across both. I mean they are are interconnected, and I think your statement is interesting. I hadn't really thought about it that way. But when something's running too slow, the ability to make it go faster it, you know, maybe, is more at hand for people. Um, you study that in school, right like big o notation algorithm. 01:04:56 So generally people have some experience in sort of like data structures or algorithmic approaches to improvement of their code. And then there are levels past that. But people can kind of see to them because they're near to them. So there's things like you know optimizing for you know assembly, code generation or optimizing with SIMD or GPU offloading or right like these kinds of you know things that you can do multi, multi threading, the right like. But I feel like people are exposed to those a lot more than, like you said, the tooling is more under or not used. 01:05:34 As often for a lot of folks, and just in general, most people don't think about memory because it's sort of like a pass fail thing, right, like you run your code and if it finishes you know how long it took, sort of intuitively because you were waiting for it, but you don't really know how much memory it used right, I hadn't really thought about it, but yeah, I, I I guess you're right like it's sort of one of those, uh, you like, if, if your program busts out the memory size and you don't know of an obvious thing that you were doing, that you could just not do that thing, then yeah, what do you do next? 01:06:12 - Jason Gauci (Host) uh, yeah, so maybe you should start by kind of explaining to people like how memory works, like in a program. That'd be like a good way to keep it off. 01:06:21 - Patrick Wheeler (Host) So you go to best buy and you buy the kind of ddr ram that your motherboard supports and your cpu. I mean, it's complex, even that part, right like that's true dude, it's like it's all the memory technologies. Um, you know, memory can also be talking about L1, l2 caches, ram. Most of the time we're talking about RAM, but yeah, and then, as Jason mentioned, if you're in a multi-user situation where you're submitting something on, you know, kubernetes or Docker or whatever, up into a cloud, then you know have a hard memory limit because they want you to play nice with other other systems, um, and so I like to motivate my memory. 01:07:04 Okay, all right, we're looking at meme pictures while I'm trying to talk now. And who said we shouldn't twitch stream? 01:07:09 - Finale (Host) okay, um it was me, it was me all right, you just talked to me, okay. 01:07:18 - Patrick Wheeler (Host) So the idea of memory is when your computer is. You know there is a certain amount of things that fit into cpu registers, so the little bits of variables. But once you start having strings, um arrays of objects, these things aren't going to fit in in the in the registers, the sort of like very, very small, very efficient, uh, what your computer cpu operates on. So they may be stored out in you know l1 or l2 cache, but ultimately that cache gets filled in from your system memory, your ram, and that ram is relatively slow to access but way faster than you know going out to disk, even an an SSD. So SSDs, like you know, hard drives are pretty slow, rams a lot faster. Cache, you know, down to registers. So fitting in memory can be about staying in the cache size because you get a lot of speed up. But then more like what Jason is talking about is just how much total memory consumption your computer has is just how much total memory consumption your computer has. So if you're like me, one of the first times you bump into this is you write a recursive function and you screw up the terminating condition and so your computer just keeps. Every time it recurses into the function, pushes all of the state into the stack and just keeps pushing and pushing and pushing into the stack and eventually your stack grows too big and your computer crashes and it's saving some system state into memory. But the other way you can run into it is as you're reading data in. If you are deriving a lot of data and just keep adding to it, then you will crash. And how you handle that, can, you know, vary a lot. But ultimately the memory management happens at several levels. So your program that you're writing, depending on the language, the compiler, the runtime, is tracking the objects and is, unless you're in sort of like C++ or C, in general it's trying to free up as much memory as it can as it goes along, as you're not using it anymore, so that the longer your program runs it doesn't just, you know, consume arbitrarily more memory over time. So in C++ or C you kind of need to do that yourself or some people will, you know, handle that with shared pointers, unique pointers, things that will so called smart pointers to help them out. But in languages you know, like Java or something else, like you know, maybe Python, you'll get garbage collection. So the runtime is trying to monitor. Hey, no one's referring to this bit of memory anymore, so I'm going to go clean it up and kind of organize it. 01:10:01 In Rust, which is becoming, you know, more important, that is done through ownership so that it's, you know, more efficient but still kind of done for you, and there's careful tracking of who owns the memory and when they don't do it, not using it anymore. But once you go past that, ultimately getting memory. So when you allocate memory, either through an allocation command or just you ask for something and it, you know, generates the memory for you. So not a fixed sized object, but something that you're dynamically allocating. That's a call to your operating system, and so your language will have a way of asking the operating system for, hey, I need some memory, and your operating system is trying to give you blocks of memory that you can use and space in that, and then when you go to free it, it's trying to keep it coherent and together but also across all the programs running. You know, coherent and together but also across all the programs running. So the first thing people will do is, you know, open their task manager or, you know, thread monitor and look for the operating system's record of how much memory your program is taking, because ultimately, like Jason was saying, his pod got killed, right? What he's saying is the operating system, the hypervisor, is monitoring memory consumption by how many times the program has asked for more memory and when it hits its limit, its job is to protect itself and all the other programs running and will terminate your program and force it to free up all of its memory, because it'll terminate it and then go okay, now I can free that memory and so it'll free itself up. So watching the sort of first course of action would be monitoring what the operating system is reporting as how much memory is being used. So I think probably the most obvious one. But you know, walking through the steps, that's where you're going to find that. 01:11:56 Similarly, looking when your job finishes at a log of what's called the high watermark so the high watermark represents your computer your program was hopefully borrowing memory and then giving it back as it wasn't needed anymore, and so you may use a terabyte of memory over the course of you know your program running, but if you were only, you know, using a gigabyte and then, as you didn't need it, you were freeing it and getting another gigabyte, freeing it and getting another gigabyte. 01:12:25 We can talk about better ways of doing that, maybe later. But you could go through a full terabyte. But the high watermark, the largest amount of memory you ever used, was, you know, only one gigabyte. And the way to know that is to you know, either run something along with your job, that is, monitoring what the operating system is doing there are other tools for that, you know more built in but basically monitoring that high watermark and then saying I want to stay below you know if your pod has 16 gigabytes, or whatever saying hey, I don't want to get more than 13 or 14. Because if I'm getting up past that, there's a strong chance that unless I know very clearly what the max is going to be, it'd be very easy for a spike to cause it to bump over the 16 gigabyte and just get force killed, and so if you don't want to do that, you would sort of monitor those. So the easiest way to tell how much your program is using in general is looking in the operating system. 01:13:20 - Jason Gauci (Host) Yeah, that makes sense. And like as a as a you know, as a desktop user, sometimes you've seen this where you know, maybe you've had some program go off the rails or or whatever, but like you, it uses all of your RAM and then you find like you can't even move the mouse and like everything's kind of deadlocked and you're kind of hosed and you have to hit the hard reset. Well, like you know, on amazon they don't want to do that, like, like amazon doesn't want your program to cause the entire machine to like lock up and potentially ruin other people's programs who, uh, who, uh, you know, don't even know you exist or whatever, on that same machine, right, and so they will really aggressively. So there's a way it works. 01:14:05 When you're running in the cloud is you say like I promise you, mr cloud, I'm only going to use 10 gigs of ram and um, what happens is they constantly monitor and, um, they can't actually prevent you from using 11 gigs because of just, I guess, the history of compute and languages and everything, like they can't actually stop you. So what they do instead is if you go over the 10 gigabyte mark for more than a few seconds, it just kills your entire pod and that's their way of ultimately protecting themselves. Or even sometimes there are situations where um and this doesn't happen very often, but you can imagine let's say you and two other programs are sharing a pod, sharing a physical machine, and those other two programs both go above their watermark and that causes the whole machine to start to run out of memory. It might just kill all three of your pods, and so that wasn't even your fault. So that does happen. So it becomes really important to have tight bounds around your memory desktop, you know, not in a way that replicates the server. 01:15:38 - Patrick Wheeler (Host) You can sometimes bump into an issue where, like you said, your whole computer slows down. That's one thing, although I feel like modern operating systems have gotten better about protecting themselves by limiting, you know, that from happening. But what they will do instead, because they want to try to let your program finish, is they'll start giving you, telling you that you have more memory, but that memory is actually coming off of disk, and so what they'll do is they'll say, hey, here's a new chunk of memory. They'll take your old memory, put it out to disk. It's called paging. So they'll page your memory out to disk and then they'll give you new memory that you can use and you're writing to that. 01:16:16 And then you go try to look up something in the other address that was, you know, corresponding to what got wrote off disk, and it'll go okay, hang on, I'm going to put your current one on disk and bring that one back in. 01:16:27 That happens transparently to you, except your program sort of gets paused, waiting for that swap to happen, but you'll hear that called thrashing. 01:16:34 So it's thrashing because it's writing these pages in and out of your disk and you have to kind of wait, and so your program's runtime just goes exponentially longer. Because it's sort of indicate and it does this because you may have something like your browser open that's taking, you know, lots and lots of RAM, but you're not using it right now. You're doing development, and so you actually want the operating system to be able to take all of that browser memory and put it out to disk, and until you go click on it again, it's sort of just hanging out, chill on disk and you can bring it back and it pops right back up. It's a little slow. So if you ever switch back to an app when you're using a lot of RAM and it's sort of like sluggish in the beginning, this is what you're experiencing. The operating system is trying to let everyone have their cake and eat it too, and this is the trade off that they make. 01:17:35 - Jason Gauci (Host) Yeah, totally, totally makes sense. So I think you totally hit the nail on the head. The first thing to realize is that you need to pay attention to both the high watermark and just the average memory consumption and the variance. So if there's really high variance when you run on the cloud, you can choose, like, a minimum amount of memory to reserve and then you can choose an upper bound. And the way this is implemented in the cloud is you know, you're guaranteed to get your minimum amount of memory. Let's say you ask for four to 10 gigs, you're guaranteed to get four gigs, but just just, there's other people sharing that machine, and so it's, it's really just whatever they're doing. So if you say I want four to ten gigs, it'll give you four, and if you go over four, it'll check and see okay, what are the other people on this machine doing? If they're not using all the remaining RAM, they'll give you another gig, and so your process continues and it gives you uh, you're up to five gig now and you're paying for five gig, and then you can go six, seven, eight, all the way up to ten. Right, and it it's not linear, but you get my point. Um, but here's the problem if you're at five gig and you go over and it says, oh, I can't give you six because there's just nothing left on this physical machine. Well, guess what it's going to do? It's going to kill your pod, so, um. So for a lot of web services they actually um, set the minimum and the maximum to the same number, and this way you don't have to ever worry about growing uh, you're just statically getting 10 gig and then you can use zero to ten of that at your leisure. But you, you never have to worry about getting killed because you can't grow um and so. But in a way, that's kind of just like moving the problem somewhere else because you can, uh, obviously get killed for going over the max. So, so, um, it all gets pretty complicated, which is why you know it's, it's, it's um. An extremely important skill as an engineer to to be able to be on top of that um. 01:19:52 I'll talk about what I do on the python side and then I'll let patrick jump into the c and c++ side. Um. On python, there's two tools you really should become familiar with um. One is called ps, util and uh. This is a cross-platform tool. You know, despite the fact that you know, when I hear ps and ps util, I immediately think of linux. But they have total windows support. So, um, uh, total cross-platform. 01:20:20 It will tell you, uh, you know, your cpu usage, the machine's cpu usage, memory usage, etc. Etc. If you're running in the cloud, it will give you, um, the usage of your pod. So you know, the machine could have 128 gigs of RAM. If your pod only has 10 gigs of RAM, that's what is going to show up in PSUtil, and so you could imagine doing things like taking the output of PSUtil and writing it to Datadog or writing it to a Postgres database somewhere where you go and look at that. This is all very common. 01:21:02 Another tool that's really important is TraceMalloc. And so, you know, psutil gives you this snapshot and it asks the kernel for this information. So it's relatively cheap, like maybe even free. Tracemalloc is not cheap, and so what? Tracemalloc is not cheap, and so what TraceMalloc does is you say, start tracing and it keeps track of all the memory allocations and it builds like a ledger of all of that, and then you say, stop tracing and it freezes that it probably converts it to some other data structures that are easier to manipulate, and then you can go through and get high watermark and a bunch of other interesting statistics you can get, like, what are the files that generated the most memory consumption? That's extremely useful. You can also filter that down and say I want the files that generated the most memory usage but I don't need like torchpy because, like clearly, pytorchpy generated like 99 of the memory if I'm running an llm, but that's not helpful for me, right? So you can filter it and everything. 01:22:15 Um, both of those are amazing tools. Every time I start a project I always have some way of running those tools pretty easily Some type of util library with a context manager so that I can do TraceMalloc and PSUtil. I will, and that's pretty much all you need to get you most of the way there. I guess the other thing I would add which I think is universally applicable, is to have a place to send that information. So at the company I work at now we use Datadog. In the past I've worked at bigger companies which had bespoke solutions. Oh, I did use Prometheus at another company a lot of different options, but just having a way to collect and aggregate that information is really useful. 01:23:09 - Patrick Wheeler (Host) I haven't actually used either of those, but you know it is important, like you're saying, actually making sure you have a good way of at least having a special mode where logs are coming out with priority so even if your pod gets killed, you know what was happening right before that happened. Uh, can be overlooked because right in those last few seconds, you know, or second or milliseconds, is like exactly the thing you need to see, and so sometimes you need to be a little careful to make sure that you're not like waiting for a buffer flush or something to happen and you're not getting out that insight that you needed at the last moments. Another tool that I assume probably can run on servers, but we mostly run on desktops we need is Valgrind. So, similar to what Jason is describing, running Valgrind will give you a lot of insight about how much memory used, but also other related memory adjacent topics to not just size usage but total number of allocations. So every time your program does an allocation, there's some cost to be paid, and then every time you free, there's a cost to be paid, and so the more you do allocate and free, so, like in my example, before you know, gigabyte at a time through a terabyte that's. There is going to be a cost to that that you wouldn't necessarily have if you didn't do as many transactions, and there's also costs associated with as you borrow and free and borrow and free. 01:24:37 Sometimes the OS needs to do optimization, cleaning up, making sure that you know it collects back stuff it has so it can give it out to other programs, and so Valgrind will help track all that. 01:24:49 But it'll also do things like tracking which pieces of memory you have and haven't initialized which ones belong to your program. So if your program tries to read we were talking about arbitrary codex exploits earlier in the show as one of the news items but what happens is you try to read a piece of memory you weren't supposed to. Now, valgrind can't always tell if you're reading from an array that happens to be next to your current array, right. But it can tell if you're reading from a location that's just clearly not part of what you're supposed to be next to your current array, right, like. But it can tell if you're reading from a location that's just clearly not part of what you're supposed to be, and so it can report those to you and can be very useful. I will say it's not the easiest tool to run, so in general I don't just run it because I'm bored and like looking for something to do, but if you have a problem being able to reach for those tools can be very valuable in tracking it down. 01:25:41 - Jason Gauci (Host) Yeah, that makes sense. I remember and this is a long time ago this tool might not even exist. I use this thing in C++ called address sanitizer. Yeah. Yeah, that was pretty good. 01:25:51 - Patrick Wheeler (Host) I removed it. But yes, there are compiler options where you can turn on address sanitization and what it will do is configure it so that, again, if you try to read from memory that hasn't been, you know that you're not supposed to be, that you, basically the compiler, will alert you to that. Hey, this thing here you're doing may read from uninitialized memory or may, you know, be dereferencing a pointer that doesn't point to something valid, and so it'll catch those things for you so valgrind does all that and more, but it's kind of expensive but it does it after the fact, right, so you don't compile with valgrind. 01:26:28 So valgrind is doing it to your. So yeah, it's doing it after you've already built your program. The address sanitization is something that is like a compiler flag that you add and, and so it's part of your program. Build, got it. 01:26:42 - Jason Gauci (Host) Okay, that makes sense. So, Patrick, what do you do? You're using, you have 10 gigs of RAM. You can't, you don't want to pay for 11. Your program is sometimes fine, but sometimes it hits 11 gigs and gets killed. What do we do? 01:26:59 - Patrick Wheeler (Host) Okay, well, with that tee up, we'll talk about first reducing the amounts of data size that you're using. So one option can be to compress things. So now, compression doesn't always mean like zipping it up right, like that would be counterintuitive. But sometimes you can trade memory cost for runtime. So if you think about, you know, I have a stock price right, so you could naively say I'm going to store a double of all of the stock prices and you know, at every tick or every you know. Second, I'm going to store double for these stock prices and then you can kind of work out how many seconds, how many stock tickers that you could track in your memory. So one way to say, hey, I need to save space is to figure out, instead of storing a double for every position, which is overkill. Hey, how do I do this? In fixed point, you know, do stocks actually trade to one E minus, something ridiculous? You know amounts of precision or can I use a sort of known thing? I would say that's a form of compression using fixed point math. 01:28:08 Similarly, you could use something like an arithmetic encoding where you store how much the signal is moving by. 01:28:16 So, when it always blew my mind when I was younger but like I had a CD player that said one bit DAC and I was like I knew enough to know that's really weird, like how do you have a one bit DAC and why is it something they would boast about? 01:28:30 Well, I still to this day don't know why they boast about it, but you can kind of figure out what a one bit DAC means, which is when it's playing the digital audio signal. It basically, at every timestamp, is what a one bit DAC means, which is when it's playing the digital audio signal. It basically, at every timestamp is either a one or a zero. That's all you get when it's one bit, and if it's a one they move the signal up and latch it. If it's zero, they move it down. And so if you wanted to have a steady signal, you would just do one zero, one zero, one zero at equal weight, and it would just bounce around at roughly the same value, no-transcript compression, but more just using the minimal amount of data to represent what you actually need to, even if it sometimes means sacrificing precision or runtime because you, you have to convert those, those back um another thing another thing is like references to data. 01:29:45 So if you think about, you know, I store a bunch of web pages in memory for some reason, uh, and web pages have all these repeated html tags in them. Right, so you could go through and say trading a lot of processing speed for memory. But you could say every time the string, I don't know what's a angle bracket, p angle bracket, like I'm going to have angle, p angle bracket just represented by a single value. That's a pointer to the string angle bracket, p angle bracket. Probably not the best example, but if you have these strings repeated over and over and over in your memory, you can sort of make references to them and pull them out. And again, it's kind of compression but kind of referencing, where you kind of layer these things up so your total data size becomes a lot smaller. But if you need to process it then then you, you know kind of pay, pay for it, and then by a similar thing every time, you, you know kind of derive data on your sample. 01:30:45 The instinct is to kind of do that ahead of time so it's ready to use. But if you have just vast amounts of data it can be a problem. So you can do lazy initialization, which can help with other things as well. But this is where you're sort of saying I'm not going to derive this data until I need it and then, when I'm done using it in that moment, I just let it go and if I need it again I'll just generate it again. And so you do this through all the pieces and it requires code and data change. But you're kind of going through the stuff that you hold in memory and saying anything that I'm deriving I just re-derive every time I use it, and that prevents me from you know, if you think through techniques we've talked about on here before, like memoization or caching and that kind of stuff, you know those are great. Those are trading increased memory for decreased runtime. But you can flip that around and actually say the reverse, like I'll trade more runtime to have less memory impact. 01:31:41 - Jason Gauci (Host) Yeah, that makes sense Along those lines. Another thing that I see a lot, especially with Python developers, or actually mostly with AI developers in any language, is like missing back pressure. So I'll give you an example. Imagine if, like you, have one thread that loads data and another thread that does the ml right, well, doing the ml and the auto grad, and all that slow, and so the load data thread will be able to load data faster than the other thread train the model exactly. 01:32:14 Yeah and so you get to the point where you, you could take this to the limit you have the entire data set minus the little bit you've trained on already in memory, and then that first thread is done, your couch is on fire. Uh, something happens where, because you, because that thread ran away, it generated way too much data for the other thread to consume, and so the way, the way this is handled in python is through generators. So so the way it works in python and actually I'd love to know how you would do this as c++ but on the python, python side, there's functions where you know, instead of a return value, you can actually have a return value, although very few people do this, but you typically would just return none. But along the way, you're yielding, and so you can, like, have a for loop, or in the for loop, you're yielding values. What that does, is it? Uh, um, it allows, like, whoever is calling that function, to consume that value right away, even though the function that you, they've called hasn't finished yet, and so you can have a generator that yields data to be trained on, and if, uh, and this is getting way down to the guts of the language. 01:33:41 So I'm definitely no expert, but but I think the way it works is like yeah, oh, yeah. So the way this works is you know, the training model says you know, give me, you know, the next value. And then when the data one you know goes and gets a new value and then calls yield, it's going to wait. It says, hey, I have new data that I can yield, but no one's asked for it and so I'm just going to wait here, and then the training system could take however long it needs to take. It'll come back and say hey know, yield me something else, generate something else, and so in this way you have back pressure. So I mean also, when I was building the um, the ssh replacement, the eternal terminal, it was a constant struggle to maintain back pressure so that at any time you could control c and kill a program, even if it was trying to send you, you know, an infinite amount of output. Yeah, the C++ side is there like an equivalent of this, or how would you? 01:34:46 - Patrick Wheeler (Host) design a system like this. Well, I think you unintentionally segued to my, to my next point anyways, but that was perfect. But no, I mean, I think what you're saying is a version of sort of I'll say it's probably the more just generic approach, which is, as long as if you're in a multi-threaded environment, you know, producer and multi-producer, multi-consumer and then what you want is a fixed sized queue. I always think about back pressure, slightly different, where you're asking the producers to slow down, but it's sort of the same thing. So you have a pool of producers that are reading, training data, downloading from URLs, whatever, and they're putting into a queue. At some point that queue could fill up, and if that queue fills up, then they're basically blocked, waiting to send the next data into the queue. The consumers are pulling data out of the queue and as soon as they, you know, empty something out, one of the producers gets to put their thing in, and you have to be careful with, you know, thread control, but with proper locking and everything you can, you know, do that without much problems and there's even lock free ways of having data structures for doing that, and so, in general, that would be very similar to kind of how you're describing, but without maybe the keywords. There are ways to do sort of the same kinds of things with keywords, but like I think in general that would be how I would describe is like a set of threads consuming and a thread of a pool of threads producing. In the same way, to like level that up one notch and a lot of those. You know things could be done this way is with something called a ring buffer. 01:36:21 So if you talk about sort of a naive implementation of the queue, every time you go download you know your training data and let's say it's images, so they're all going to be. You know roughly the same resolution. Right, that's something you will have done before you started. So you know probably generally how big one can be. You say we're not going to do that initially, you're just gonna. I have a JPEG, it's 100 kilobytes. You go ask the operating system give me 100 kilobytes of memory, and then you say, oh, I'm waiting to put it in the queue. Waiting to put it in queue. Okay, I put a pointer in the queue, a pointer to that. You know 100 kilobytes. The object takes that pointer out, pulls the image, frees the memory and then the next person downloads the next thread, downloads the next image 101 kilobytes gets a new allocation right, and so you're doing all this allocation free, allocation free and we talked about there's expense to that as well as part of memory management. 01:37:14 Um, and so what a ring buffer does is say, if you know like the max size is going to be 200 kilobytes, or even just you know, make it a lot bigger, a megabyte, let's just say a megabyte, and then say I'm going to have the fixed size be 20. What I'm going to do is each of the 20 slots in this sort of think about it as a ring has its own memory and a thread pulls from the ring to go read the image into that memory block. That memory block then is consumed out when it gets to its turn, and you have a head and tail pointer for where that is gets consumed out. So you get a limited number, but you also get to reuse the memory blocks. So each new producer can't produce until it gets a memory block, but then it's refilling. 01:38:00 It doesn't have to go to the operating system again, right, because it's already has that memory. 01:38:04 Your program owns and holds that memory, and so that is reducing the amount of overhead to the operating system by saying in advance if I offer some more limits, then I can reduce that interaction. And so you get the fixed size queue, you get queue API behavior. But the implementation is such that you know you need to ask the queue for the memory to write to. And this is very, very common in embedded systems where you either want to or are required to completely avoid dynamic memory allocation because you don't really need it. In that case, you just allocate to your program at startup all the data for the ring and don't really need it. In that case, you just allocate to your program at startup the all the data for the ring and then you never need it anymore. Right, you're reading into that buffer and then you're just you're doing your own memory manager, right, your own little operating system to keep track of it and I guess that's you're saving time, because the the pulling, getting the memory from the os is a lot more expensive than keeping the references. 01:39:03 Well, even if getting the memory was really really cheap, you still have to call out to the operating system. The operating system is in a different context, has to run work. So, even in the best case, the system call has overhead, and so you want to avoid that if you can. System call has overhead, and so you want to avoid that if you can. But then, like you said right, if let's say you were doing a billion image reads, that's a lot of allocation, deallocation, thrashing of the memory. Every one of them is slightly different sized. So you're getting all this. You know fragmentation and the OS is trying to handle all that versus if you manage it yourself. Yes, your program is doing more, so I wouldn't reach for it in the beginning. But if you kind of know hey, I'm at the point where I need to optimize it's a great tool to reach for. 01:39:46 - Jason Gauci (Host) Yeah, and it also reduces your variance because you know you've allocated this up front, so it reduces the upper bound of what your program is going to use. 01:39:55 - Patrick Wheeler (Host) And your OS wouldn't know that, like in the example you were giving in python which is fine because it's fast to write it wouldn't know in advance how many consumers are going to be pulling from the yield at the same time from the generator right, or how many generators are going to be, or it can't know so here and it would be very complicated to ask for it in this way. So this is the trade-off where instead you're sort of putting more limits on your system but in exchange you're getting better performance. 01:40:23 - Jason Gauci (Host) Yeah, that totally makes sense. 01:40:25 - Patrick Wheeler (Host) So the next one is a little similar. This one comes into if you've ever used protocol buffers or any kind of message where you're, similarly, you're reading along and as you're processing data, you realize you need more memory to put stuff in. So in a protocol buffer you have like a tree of messages and as you're decoding them you're finding new branches of the tree. You need to go get a little bit more memory, a little bit more memory, a little bit more memory to put that data in, and at the start you don't know in memory how much data you're going to need. When you read a JPEG image, it's similar. If you want an array of zeros and ones, that is the resolution of your image, so you can put it into a neural network or whatever. You kind of know that size. But as you're reading the JPEG you know you're sort of how fast that conversion is happening varies as you go through through the image. So if you didn't know the output size, it would just end up being dynamic the whole time. So then what you can do is something called an arena allocator. An arena allocator basically says every time you ask for memory, you ask it from this special allocator you have, and all it does is it never frees memory, it just keeps growing within the bound. So you block out 10 megabytes and as long as if you stay under 10, it just keeps giving you new memory from the 10 megabyte block. And then that whole block even if you quote, unquote, free stuff from that you were using earlier, that whole block just stays with you and then at the very end you just delete the whole thing all at once. So instead of 1000 tiny allocations, you have one large allocation, and so that is also an advantage. 01:42:10 Distinct from all of that, you know, let's say, none of those strategies work. You know you tried them all or whatever. Your other option again going to add runtime is having an on disk cache so you can write things to disk that you know you aren't going to need for a while, similar to what the operating system might try to do for you. But you write things out to disk as you're reading them in, and then you know I'm processing something. I'm deciding whether to use it or keep it. If I'm going to throw away most of the things I'm reading, then I just write to disk the ones that I want to keep, and then, when I get to the later stage of my program. I'm reading them in one by one and that way you don't just have this like growing. You know list of all of the images before I am going to. You know process any of them. 01:43:03 - Jason Gauci (Host) And so implementing your own disk or you're using a library which is going to provide that for you. Various levels of simplicity or sophistication you could do with that. Yeah, totally yeah. On the Python side there's this tool called Shelf. There's actually an even better one, but it's not built into Python, called SQLiteDict. But basically these act as if they're regular dictionaries. You wouldn't even know the difference, like they implement all the dictionary APIs, or most of them, but under the hood all of that data is being put on disk. And so if you have a situation where, like, maybe latency is not that important, but you're going to have to store a ton of information, or you need it to persist from one run to another, but in this case you need to store a ton of it and you don't want to go over your limit, you can use these various tools. I think in C++ there's mmap right, which, like maps, memory maps disk memory. 01:43:55 - Patrick Wheeler (Host) That does, yeah, but that's for a slightly different purpose. But, yes, okay, maps to disk and memory, that does yeah, but that's for a slightly different purpose. 01:43:59 - Jason Gauci (Host) But yes, oh, okay, all right. So what's the C++ equivalent if you need to like, dump objects to disk and then bring them back later? 01:44:07 - Patrick Wheeler (Host) Yeah, I mean you could use SQLite, but yeah, ultimately there's no. Unlike Python, there's not really a serialization built in, right? So it's generally not advised to just write your structure as a block of memory out, so you would need some serialization. You know library to help you with that. 01:44:23 - Jason Gauci (Host) Like protobuf or one of these things. 01:44:24 - Patrick Wheeler (Host) Yeah, protobuf, or yeah, yeah, exactly. 01:44:29 - Jason Gauci (Host) That makes sense, cool. So, yeah, I think this we did a great job kind of covering a lot of this. If you have, if you have, stories kind of covering a lot of this, if you have stories, other tools, don't hesitate to email us or post in the Discord. The Discord is actually even better because there is an audience there. If you email us, then we have to remember to go back to previous episodes and if you're a longtime listener, you know that we're terrible at that. So go to the GitHub, sorry, go to the Discord and post ideas there, thoughts there. 01:45:06 There's a community of folks, really interesting conversations happening over there and it's really exciting to see and Patrick and I do check it every now and then and in general, general, we really appreciate everyone's engagement and also your support. We definitely appreciate your financial support through patreon. Um, you are our sponsor. There is no person, uh, paying us to say anything, uh, other than other than you folks. So we really appreciate it. It and we will put a bookmark in this one and we will catch you all next time. 01:45:58 - Finale (Host) Music by Eric Barndoller, programming Throwdown, is distributed under a Creative Commons Attribution, sharealike 2.0 license. You're free to share, copy, distribute, transmit the work, to remix, adapt the work, but you must provide attribution to Patrick and I and sharealike in kind.
Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android
Open in Metacast