Time For Computers

Matt Godbolt

00:19

Hey, Ben.

Ben Rady

00:19

Hey, Matt.

Matt Godbolt

00:20

So I was giving a presentation at work the other day to a bunch of new hires, and, uh, one of the things that I asked them during the presentation is I show a bit of code, which is deliberately awful, c++ code, just to sort of prove a point. And it's just a bit of string formatting. But I ask them, how long do you think this function will take to run? Just ballpark it, you know, order of magnitude, in fact, two orders of magnitude, where do you think it'll be? And it's amazing that a room full of people, and there were some folks who are much more, um, um, experienced in the room too. The range was absolutely astronomical from people saying hundreds of microseconds to people saying 10, 15, 20 milliseconds. You know, and it was kind of, you know, it was, we were sort of playing price is right rules, you know, who was nearest without going over. And, you know, one person got it about right. This particular bit of code was tens low, tens of microseconds. Right. Even though it was absolutely awful. And it struck me that we all don't really know how fast, we don't really internalize very well how fast computers are and what they're good at and what they're not good at. Even if you spend most of our waking lives thinking about it, it still hits us for surprise.

Ben Rady

01:39

Right, I have to think about it like very, I have to like do the math on it. I have no intuition. Right. Like looking at a piece of code. Like it's just, I have to like sit down and be like, okay, that's gonna be this and this is gonna be that, and I gotta probably look some things up on the internet. It's not intuitive at all to me.

Matt Godbolt

01:54

Absolutely. So under, you know, trying to get a handle of how fast a computer is is, is tough. I mean, I'm just, in fact, just thinking about it right now, like one of the things that amuses me very much is when I plug my laptop into a docking station, like at work. And there are these two giant monitors plugged in with, you know, 32 bit color per pixel, however many, you know, 4K displays. And I plug the tiny thin little USBC thing into my laptop, and somehow, miraculously that amount of data is flowing out of my computer continuously to drive these screens. And again, like it boggles my mind how much it is, but if I sat down and did the math, it's probably fairly reasonable. Right. Well, it has to be cause it works. Right. We're looking at each other right now and there's not a problem. But developing intuition about these things is, is tricky, especially when computers are, have surprising edge cases.

Ben Rady

02:53

Yes, yes. And it's really easy to be off by many orders of magnitude.

Matt Godbolt

02:57

Exactly. Exactly. I mean, how, how fast is a modern pc? Like, like, let's say on the computer I'm on now, it's like, let's just say three gigahertz, which means every tick of the clock is a third of a nanosecond and just putting that.

Ben Rady

03:14

Stupid fast

Matt Godbolt

03:15

my golly. A third of a nanosecond. Now, I mean, you brought this up when we were talking about this before, but there's a nice way of thinking about what a nanosecond is.

Ben Rady

03:25

Yeah, yeah. So, so, so Grace Hopper, who is uh, obviously a famous woman in computing, has this great talk that she did many years ago talking about explaining, I think to generals why certain satellite communication wasn't gonna be possible in the way that they were thinking about it. And she had a piece of wire that was, you know, she sort of called a nanosecond long. Right. And that was like the amount of distance that light could travel in a nanosecond, and it was a little under a foot, I think like 11.8 inches, if I'm remembering correctly. She sort of held it up and be like, this is a nanosecond. Right. So your satellite is way up in space, many, many, many, many nanoseconds away. Right. And that she's obviously talking about like communication, but it's like so useful to when I am trying to intuit, when I'm trying to sort of break out of the, like I don't have an intuitive sense for how this works. I think back to that all the time of like, you know, about a foot is a nanosecond. Uh, and that is how fast light is moving. So if you're doing something

Matt Godbolt

04:28

That's the cosmic speed limit, nothing can go faster than that. So, it gives you like the, the baseline at least, like this is, if everything was perfect, that's as fast as you could go.

Ben Rady

04:37

Exactly. And I will many times sort of envision that wire that's a foot long and sort of like twist it around into a small shape and, and project it on to the chip of like, uh, CPU and be like, there's a wire that's in there that is a, like, as you say, a third of the length of a foot. Right. So only a few inches. And that's how long it takes for the, for the, the light essentially. To move around in there. Now it's not actually exactly like that, but it's just sort of like, my, my intuition sort of clicks in a little better when I think about it.

Matt Godbolt

05:08

When you think, and I mean, that obviously doesn't take into account the fact that electrons in wires go slower than light in a vacuum. They're not just moving in a, in a sensible way, although there's arguments about how the propagation of, um, of, of charge and, um, uh, voltage really works. But there's, there's a transitior that takes some time to, to, to flip state and all that kind of stuff. So that's all the kind of like the physical reasons why, but as you say, it gives you at least something you can stare at. And look at, and you look at like your arm from your elbow to your wrist and go, yeah, there's a nanosecond there. They're roughly speaking. Right.

05:43

But first of all, well a nanosecond, right? I dunno, you know, as, as engineers, we are probably more used to talking about time or, or things that are nano scale, nano something or other, but to put it into perspective, right? Like a million, sorry, a millisecond, which is like the sort of standard human, wow that's fast, is a thousandth of a second and Right. You know, you'll be hard pushed to, um, do anything on a human scale that isn't indeed tens of milliseconds. So the way that I think about this is like when I used to work in video games, we, um, we always wanted to try and get things in under a frame, like the refresh of the TV screen that we were projecting to which in the UK is 50 times a second over here it's 60 times a second. So you need to re completely recalculated the next, uh, viewpoint taken into account all of the ai, the, the user's movement on the controls, anything else that's going on.

06:43

You're playing music and all that kind of stuff. And you have to be repainting the screen completely from scratch every 60th of a second, which is roughly 16 and a bit milliseconds. So that is sort of human scale time. Right? And so when people were guessing, would this string formatting routine take 20 or 30 milliseconds, even though it was egregiously bad? There's no way, well never say no way

07:34

So like when you're running non-trivial slices of code, you might be measuring things in terms of microseconds. Cuz that's, again, it's a sort of sensible, um, domain for most things. Um, but we're way out below what humans can perceive at that point. And then a thousand times smaller than a microsecond is a nanosecond. And now we're into the domain that the computer itself is actually ticking along at. And so we're all, that's just to kind of get our head round how many thousand times smaller than things that we know

08:36

Uh, wrapper, which Right, right. Is just what it is. Right. It's, yeah. So it, when we were talking about this earlier, it reminded me that I had, um, totally an utterly stolen an amazing, um, article, very short article by Jeff Dean, I believe from Google, one of the original, uh, engineering folks at, at Google where he had, um, put out like a list of these are just the things that take computers time and here's how many nanoseconds or milliseconds they take mm-hmm. And I've got a spin on that where I try and equate it to human time so that you can develop a bit of an intuition about like, well, what does this mean? So that when you go, oh yeah, I, I I didn't, um, uh, the information wasn't in the cache, I had to go and get it from main memory. How long does that kind of quote really mean? Because I don't think any of us have a decent intuition about what, you know, oh, it's 230 nanoseconds. You're like, that's still so small, I can't get my head around it.

Ben Rady

09:34

Right.

Matt Godbolt

09:35

And so I was just gonna go through some of these with you and we can sort of talk about them as we go, but. So the first thing, computers can add numbers and exclusive or numbers and do other sort of elementary arithmetic, not multiplication division, but like add subtracts xors ands compares in, usually in one cycle. Now there's a bit of pipelining going on here. I'm gonna draw a line over that. But that means that like, it takes one CPU cycle to add two numbers together, which is a third of a nanosecond, and that's already bonkers. But let's use that as a baseline. Right? And then rather charitably, I'm gonna say that a human adding, say a few digit number together takes one second. Like who's good at it, right? If you're good at like mental arithmetic and I say, what's 392 plus 4,000 hundred 64? Perhaps you can do that in your head in under a second. Right? So we're gonna use that as our scale for the rest of this conversation. And so, so

Ben Rady

10:29

A third of a nanosecond equals of computer time equals one second of human time.

Matt Godbolt

10:34

Of human time. Exactly. Right. Yep. So then let's sort of move up the hierarchy of the kind of elementary operations that a computer is doing under hood all the time. The next obvious thing is multiplying, right? It takes, and again, I'm, this is hand waving completely of course cuz it's different for every revision of computer. It's different for different architectures and stuff. But let's on a, on an x86, such as I'm on right now, it's anywhere between four and six cycles to do a multiply of two numbers together. These are integers for what it's worth, anyone who's really counting. And so that works out as 1.2 nanoseconds, thereabouts, 0.3 times four. Right. And in human time, that's four seconds. And again, now we're still in roughly the ballpark of that makes sense to me. I, you know, probably faster than I could multiply numbers together, but it's not a bad, um, approximation. I'm sure there are folks who can do, you know, three or four digit multiplication in their head, like like that. So four seconds. So yeah. What about division? So what would be your instinct, your guesstimate of, uh, how long it would take a human, and we'll work backwards to do, uh, a long division of two numbers,

Ben Rady

11:37

Uh, long division of two longs might take a while, but, um, yeah.

Matt Godbolt

11:47

Yes.

Ben Rady

11:48

Right. You haven't got it basically like memorized where it's like a hundred divided by 10, um. Then I, yeah. You know, many seconds, like you're, you're gonna maybe bust out some pencil and paper. Write it down like that. Yeah.

Matt Godbolt

12:00

And it turns out that intuition is about right. So, um, computers can't divide in, uh, integers that much better than humans can actually, with a big fat caveat that the latest round of intel machines have, have somehow made it go a lot faster. But at least when I first wrote these slides, and for most CPUs that I'm used to dealing with on a day-to-day basis integer division is anywhere between 30 and a hundred cycles. Which is a lot longer. So that's 10 to 33 nanoseconds, which in human time of this scale is 30 seconds to a minute and a half. So, sounds about right. You know, you're sketching a bit of paper and that's how long it's gonna take you to divide things into. And again, the reason it's such a wide range actually on the computer side is that it does depend on the numbers you gave it, unlike multiplication then in addition and stuff like that where it just does 32 bits worth.

12:45

It's worth noting that just like long division, once you kind of get to the end, you're like, well, there's nothing else to divide, let's stop now. We might as well finish. We've got the answer now. There's no need to go through the other bit. So it, it varies. And, um, again, that's something I like to point out to, to people who are to new to computers is unlike most of the other things in the chip because the division takes up a load of space, there is usually only one divider per, um, per cpu and it's not pipelined. So some of the other things, although I'm saying it takes four or five cycles, um, there can still be multiple multiplication going on at once. You can have, you know, three or four multiplication going on at once each of the different stage of the multiplication. It's just that there are four or five stages that the multiply goes through before it comes out the other end.

13:31

So it takes four cycles to get the answer. But, um, with a divide, if you're doing a division, it's taking you 30 to a minute and a half. You can't be doing anything else at the same time. At least in that divide, you can't get another divide started. There's no way of breaking the work up. The reason I bring that up is because everyone's favorite data structure is a hash map and almost every hash map, at least naive implementations, use modulus with how big the hash table is to find which slot. You know, you do your hash. You get like some giant number and you go, uh, mod 257, there's 257 entries in my, in my hash table. Which slot am I gonna look in? I will just mod. And that's a division and it's long. It takes a while. You can only do one at a time. So, um, it's, so there are actually some trade offs. Um, you'll see some hash tables will actually not use a modulus or they'll deliberately use a power of two sized, um, hash map table bucket size. Even though it doesn't give the best distribution of hashes, it's probably worth spending more cycles with a better hash function and then using an and to get you into your table than it is to rely on a divide because it takes a minute,

Ben Rady

14:41

Yeah. Right. Yeah.

Matt Godbolt

14:43

So then one of the other things that CPUs do that I've, we've talked about on this, this podcast before, or at least I love to talk about, so I know that we've brought talk to it more than once, is that, um, they try and get ahead of themselves. They look along the, um, stream of instructions and they try and do more than one thing at once. So they, they try to unlock parallelism by finding sets of instructions that can run together in parallel, even though they weren't necessarily written explicitly to be parallel. So this is not like threads, this is just a single stream of, of, of instructions. It's like, well, there's an ad and there's a multiplier and the ad and the multiplier are distinct, let's do them together. But in order to do that, it needs to go beyond branches. So there will be some conditional branches in the, in the flow of instructions.

15:27

And that would normally stop you if you were trying to get ahead of yourself. Cuz you're like, well, I don't know which way it's going to go until I get to it. So I guess I can't do any more work. But thankfully hardware engineers have gone well, why don't we just make a guess and then if we guess wrong, we'll we'll undo the work that we did speculatively and you know, we'll chalk it up to experience. But if we get it right, then hey, we're unlocking more parallelism. So branch mis predictions are, is the name for when it that is guessed wrong. Guessed incorrectly.

Ben Rady

15:58

Okay, Yeah.

Matt Godbolt

15:58

So that means that the pipeline has to flush, it has to refill up and it has to do a bunch of, of extra work. Now, the average branch branch mis prediction, depending on when it was noticed, can be anywhere between, uh, um, here I've got it in nanoseconds for reasons I can't remember, but anywhere between five and 30 nanoseconds. Which is between 15 and 15 seconds and a minute and a half. So it's almost as bad as one of those, uh, divides that we were talking about, which is really interesting.

16:26

What's that sorry?

Ben Rady

16:27

I said you can multiply quickly too.

Matt Godbolt

16:29

Yes. Right.

Ben Rady

17:03

Through. Yeah, yeah.

Matt Godbolt

17:03

At all. And you're like, oh crap, now I have to go redo all that stuff and I have to throw away the stuff that I was doing and that you lose a minute and a half clearing up the desk and kind of going back to it like, okay, step three

Ben Rady

17:13

I mean, how much human effort is wasted by trying to half ass two things at the same time?

Matt Godbolt

17:17

Well, there's that too.

Ben Rady

17:18

Probably a lot

Matt Godbolt

17:20

Exactly. So yeah, maybe, maybe it's a false, uh, a false comparison. Although actually no,

Ben Rady

17:25

I think it's a good comparison.

Matt Godbolt

17:27

I was trying to cook two. So we, um, we found a local, uh, place that sells, uh, pre-ground and pre, uh, made little packs of spices for Indian food, which is like my favorite thing to cook and I love cooking it myself, but, and I'll always support somebody who's, um, got a new little business. So we went and saw her and spoke to her, whatever. And so we, we bought every single one that she had. And then I was like, well, what are we gonna do about this? Uh, well I'm gonna cook two recipes in parallel, which definitely had exactly as you said, the sort of symptom of I was half assing two things rather than whole assing one thing. Yeah, right. It came out just fine actually, and it was delicious. That's good. But, um, um, that's more like multi-threading for what it's worth. Although there is only one CPU in this instance.

Ben Rady

18:12

Yeah, that's good point.

Matt Godbolt

18:15

So what else do computers do that take time?

Ben Rady

18:19

Uh, reading from cache.

Matt Godbolt

18:20

Exactly. Yes. So well reading from memory at all, right? You access variables all the time, right? Right. Vary variable's either on the stack or it's on the heap or it's wherever, but it's in memory. Um, and so we know that memory is slow, we're told, well, that's why we have these cache that are supposed to make it go faster. The average access to, uh, well average the, an access to level one cash is about the fastest thing you can get. So this is the tiny, tiny cache that's nearest the cpu. It's usually about 32 K, which is, you know, absolutely ridiculously small. Like, although, you know, my first computer only had 32 k of memory, so it's still quite big in that scale, but it takes three cycles to read from L1, which is like three seconds in human terms, right?

19:07

We've sort of said, so that seems that's a bit like the piece of note paper that you're currently working on. Maybe this is a little bit slower than that. Yeah. You ben's holding up a, a sheet of paper in front of him. Right. If you just had to find an arbitrary bit in your flip book, flip notebook in front of you, two or three seconds. Sounds about right. I mean, again, slightly. So it's L1 and it's tiny. It's as small as a tiny notebook. If, uh, L2, which is a, a bigger further away cache. Now, if we were thinking like L1 is the size of a notebook, this is like a, a ring binder or a set of ring binders that you've got, you know, on your shelving behind you. And so typical L2 s are, um, hundreds of K, you know, 512 K, maybe a meg, ish.

19:51

Actually I should check, I should check that. In fact, I'm gonna check that by running the command on my computer now because I've actually forgotten, which is super embarrassing and I'd hate to get it wrong. Um, so my L2 cache is oh 18 mega reckon, but I don't think that's right. All right. Well anyway, megabytes of information local to this individual cpu. I bet you that was the sum of them all. So Yeah, I should probably, anyway, so hundreds of K to low megabytes, um, that's 10 cycles away, which is 10 seconds away. That seems a bit quick if it's ring binders on your shelf for me. But, you know, maybe the analogy still stands. Level three is the final layer of the cache that's the furthest away and it's shared with all of the other CPUs. So this is a bit like, I guess,

20:33

A bunch of folks sitting at desks and having like a, a centralized, uh, library in of commonly used books. In between them. All right. That's gonna take you around about 40 cycles to get information. It varies there because it depends on whether it's in the part of the library that's actually physically close to you or not. That actually matters Now, and so that's 40 cycles. So 40 seconds again now that seems a bit fast for a library and my analogy, but really what we're coming, getting to now is like what happens if the cash system fails and you actually have to go out to the real memory, you know, the things you literally slot into the motherboard when you're building your computer and you have to actually get the data off of that.

Ben Rady

21:11

Right.

Matt Godbolt

21:11

We're talking like a hundred nanoseconds then, which is six minutes. So that's a trip, that's a dr, that's a trip down on the elevator to the, you know, the archives to then find that one book you want to get out and then get back on the elevator back up to your right and then put it in the shared library. Mm-hmm.

Ben Rady

22:03

Mm-hmm.

Matt Godbolt

22:05

Um, just to sort of finish this off then before we go into more general stuff, but like, um, if you're talking about reading from peripherals, like the real genuine outside world as opposed to things that are literally soldered, on your motherboard or very close to. Um, reading something from an ssd, at least when I wrote these slides, which is a while ago was, you know, 50 microseconds, which is like two days,

Ben Rady

22:29

Right? That's jumping orders of magnitude there.

Matt Godbolt

22:32

Yeah. We're off now to ordering from Amazon and waiting for it to come through, right? Yeah. This is that book we didn't have and Amazon, you know, it's got prime delivery and it'll be through tomorrow. So that's, you know, when you are, um, we need to read something from from disc, obviously you tried to make sure, you have lots of things to read from this. So you don't wait for any one particular piece, but if you are using an old school spinning disc and it's not in the right place and has to seek to wherever your disc, uh, your information is stored on the disc drive, we're talking milliseconds. Now, this is another three orders of magnitude different from the microseconds. Uh, so one to 10 milliseconds on a good day, which is one to 12 months on this scale

Ben Rady

23:09

Matt Godbolt

23:10

So this is sending away to somewhere, you know, some, some, you know, some obscure company that has to custom make the thing and then it comes through and it says, you know, 60 days business days for it to come through, kind of feel to it. Right? That is what the kind of level we're talking about when you're reading from a regular old school hard disk.

Ben Rady

23:29

Right. I was gonna say it's like this is like, oh, I need this book. But the problem is, is it hasn't been written yet.

Matt Godbolt

23:37

Right? Yeah. That's more of that I guess so I mean yeah. Cuz even nowadays they can print stuff on demand, right? Like I've got a couple of books on my, my shelf here that are on demand printed, which surprised me. I like flick through and I looked at the end and went, wait a second. It says it was printed what, three days ago,

Ben Rady

23:52

Huh? Yeah. Yeah.

Matt Godbolt

23:54

Which is crazy. Well, how far things have come. So my, my analogy was books maybe is not exactly right, but, so then if we, if we're gonna go from from disc drives, which are physically inside the chassis or chassis, uh, of your computer and we start talking about networking more generally, like the internet, uh, if you ping your switch neighbor, that is the computer that is plugged adjacent to you in the switch that you're both plugged into, we're talking hundreds of microseconds, which is about a week. So it takes a week to get to the closest thing that's not your actual computer. Now maybe obviously networking has gotten better since I've wrote these things and if you're using cool techniques, I'm sure you can go faster than that, but like, just as an order of magnitude thing, a week of time to go and get that thing from the network.

Ben Rady

24:40

Yeah.

Matt Godbolt

24:41

Yeah. So all of these things sort

Ben Rady

24:43

But so much faster than the spinning disc, right?

Matt Godbolt

24:46

Isn't that funny? Yeah. Yeah. It's actually faster. As long as the computer you're talking to has the response in its own in its ram, then maybe you can get the answer back quicker, which is, you know, used by things like memcached and Redis as exactly for that reason. If you, uh, ping when I ping google.com for my computer, it takes me just over a millisecond, which is a month and a half. So Googling it is not the answer pinging the other side of the Atlantic. Back to my, my home country, if I ping bbc.co uk it takes 90 milliseconds. So it's nine years. Uh, is is that, that's like going to Mars, right?

25:32

So when someone says, can you not just turn it off and on again, uh, if I have to turn it on and off again, especially as this particular machine takes so down long to go through its post, um, I've, I've conservatively put that at five minutes, which is probably a bit high, but five minutes of turn your computer off and then it booting back up again and then you remembering to come back and log in and all the kind of things that you have to do cuz you've wanted to wait, make, bake the cup of tea, um, in human time, 32 millennia. So that is a civilization ending event in the CPU time. So just think of the computer man every time

Ben Rady

26:05

destroying civilizations every time you turn it off and turn it back on again.

Matt Godbolt

26:08

Right

Ben Rady

26:47

Don't kill all the robots inside. Don't kill the robots that make it work. So I was gonna say like, let's, let's, I wanna think a little more about that. Those sort of time scales when you were talking about like network access and, and, and disc access. So, so going across the pond over to the UK you were saying is 90 milliseconds, which is, did you say nine years if we're scaling this up into human time? Is that right?

Matt Godbolt

27:16

Get my slide back up is, uh, 90 milliseconds is yeah, nine years apparently. Again, someone will probably check my maths and find that I'm completely wrong here, but like I'm sure it's order of magnitude correct.

Ben Rady

27:23

Right. And then like, but like, and then just pinging Google, uh, which is probably hitting some edge server that's, you know, geographic located.

Matt Godbolt

27:31

It's almost certainly in a, the same place that like my provider is plugged into. Right. It's gonna be a pop there or, yeah.

Ben Rady

27:37

Right. And how long was that again? That

Matt Godbolt

27:38

Was a month. 1.2 month and a half. Yeah, six weeks.

Ben Rady

27:44

Okay. Okay. And 1.2 milliseconds

Matt Godbolt

27:44

And I mean actually let's think about it. So the UK is 4,000 miles away. Ping is the round trip time. So it's 8,000 miles. Yeah. I'm gonna type into Google right now, 8,000 miles in feet and you can hear my appalling clicky keyboard and it is uh, oh god, 4.2 times 10 to the seven feet. Uh, so that's four point. Yeah.

Ben Rady

28:13

2 million or four 4.2 million.

Matt Godbolt

28:15

I do that. So that's the number of seconds. So we're gonna divide it by, uh, that's that many minutes, that's then that many hours. Okay. And then that many days, 335, oh that's not right. Uh, 488. So I dunno where I got nine. Well, I mean I know the number was right from the actual measured point of view, but according to my appalling maths here, just the light going across um, takes uh, 1.0 hang on a second. 1.30 no, sorry, I've just been, I've completely balls this up, haven't I? Yes. I've got several orders of magnitude that I need to work out first of all, um, uh, and I'm gonna, yeah, what's that 1.33? Uh, no, I've lost myself. This is daft. Right. I made a mistake in terms of that. So it's 4.2 times 10 to the seven feet, therefore it's that many nanoseconds away.

29:19

Okay. Right. And then let me do that. We are, gosh, so that's one E nine, so that's Yeah. Okay. That was a much easier thing. I dunno where I was doing than years and whatever. Um, according to this, it would take 42 milliseconds with this stupid approximation of one foot is one nanosecond mm-hmm.

Ben Rady

29:59

So yeah, there was some division in there I think so. You know,

Matt Godbolt

30:01

It was, it did take me a minute and a half to get right. Onset. Yeah. Quite

Ben Rady

30:07

But, but this is, so that's interesting because I think one of the sort of more surprising things for me in the last few years, uh, has been, and we have uh, run into this at work actually this sort of emergence as the network as an incredibly fast device for data access on par with sort of local storage mechanisms. Um, especially when you sort of design your network to facilitate that kind of thing. Cause certainly, you know, when I was, when I was sort of generally thinking about these sort of order of magnitude times many years ago, naturally you'd be like, oh, we wanna avoid network access cuz it's be much slower, so, you know, we're gonna cache this locally on disk. And even back then it was like probably a spinning disc, but that was still faster. Right. And I feel like the, the tables have, uh, flipped a little bit if you sort of, uh, you know, take that into account with your network storage and the design of the network storage where that network storage can be on par, if not in some cases maybe even faster, um, than what you would be able to purchase for the same price, uh, stored locally.

31:19

Um, and then when you couple that with the fact that network storage has the benefit of being able to be accessed by many computers at the same time, then things get also very interesting. So this is like obviously like the, the sort of very micro optimization benchmark areas. One place where your preconceptions and your intuition can be wrong. But I think it can also be true at these sort of like more macro, uh, levels where you're thinking about the design of whole systems and, and how they interact where your, your intuition about like what's fast and what's slow is prob maybe off by many orders of magnitude.

Matt Godbolt

31:50

That's a really good point. Yeah. I mean it is definitely true that, um, yeah, like accessing your neighbor's, uh, ram or your neighbor's, uh, uh, is fast than re or at least the same speed as reading your own SSD sort of in the same ballpark. And that could make a big difference, as you say, rather than filling all your servers with terabytes and terabytes of local storage. Then, you know, being smart about using, um, shared storage where maybe the, the, I mean one, one thing that, that we glossed over in this of course is like that was reading from disk where it wasn't in the file system cache. So that's obviously a system level operating system level cache, everything's a cache, right? It's just cash is everywhere to make everything go faster. Um, and so very often if you've just written a file to dis and then you're just reading from it again, then you're not actually touching the spinning disc so you get it from memory.

32:39

And so there's that, but obviously that blows up over after a, at some point if you will run outta memory or the the cache gets flushed, the disc and, and new other things come in. Whereas in a shared storage environment, the cache could be on the shared node as well in ram. And I think, you know, we, certain devices that we've got access to have layers in themselves of, well this stuff is all in fast ram, this stuff is all in SSD and it's all backed finally with actual big old fat spinning discs that can write the sort of journal of record for, for, for, for forever storage. And so the layers go through that, but it means that most of the time in a shared environment, it's probably faster to just keep asking for it off the network, have it streamed you than it is to try and store it on your local disc and then get it back later. So that, as you say that that affects the way you think about your systems design, which I've not thought of. That's great. Yeah.

Ben Rady

33:31

Especially when you start getting into the concerns around having to manage that local storage much more carefully and it's like, oh yeah, well this is gonna be faster if we get it off a disk, but we only have like two terabytes of disc and then when that fills up, we gotta swap it out, then we gotta re fetch it. And then all these other sort of tradeoffs you make, it's like, well if you're just going to, you know, degenerate into reading stuff from the network all the time because you don't have enough local storage to actually make it worthwhile. Like just cut the local storage outta the equation and read it from the network all the time. Right. Like

Matt Godbolt

34:00

Yeah, It's, it's, yeah, it's interesting. And then maybe put, concentrate any caching in your, to your inside your application and, and cache stuff in from the network in memory if you need that as well.

Ben Rady

34:11

Well, yeah. So this is a super interesting exercise though. I love, I love this sort of metaphor of like, you know, going to get something from your desk or going to the shared library or, you know,

Matt Godbolt

34:21

I've got waiting through, yeah. In my head I've got like this sort of mental plan to like come up with like a really good set of analogies that work this way and then, and when I dream about, um, retirement or whatever I've got like, maybe we can make it someone to do like an animation of this and you know, I've got like elves and goblins is like the real way I'd like to express this. I wanna make it interesting to like, like the little robots. Like that's the reason I brought that up. Those books were so important to me as to how I internalized, like how computers quote really work that I think that there's a new world where we can show this is now how computers really work. And both you can start at the top and say, well this is just what computers do, adding and subtracting whatever, haha.

34:59

And you can have your little goblin with this, uh, parchment paper writing out the answers to whatever, and you're like, oh, there you are. And then now we've got two goblins because we've got two CPUs or whatever, and how do they agree on this and whatever. And then you can start going further and further in and you're like, well, you know, the goblin's instructions come from the forest and depending on where you are in the forest, it might take you longer to go and get the forest and grow the thing. And you know, again, it's not really fully thought fully formed, but you know, out of order execution can be done this way. And, uh, cache misses and all that stuff, I'm sure in an interesting and entertaining way rather than like the boring library analogy we used earlier, but

35:46

Um, um, for example, we were looking at whether or not raising something to the power of two was the same as multiplying it with itself. And obviously we have a lot, you know, we work in finance, we have a lot of folks who are very mathematically focused. And of course those two things are absolutely equivalent, right? Raising something to the power of two is the same as multiplying it with itself, which is to say it's squaring the, the value and these values sometimes are matrices or giant arrays of numbers and things like that. So it's not as straightforward as literally a number. But if we were to just think about it in terms of a single number, like the computer, like I described before, the computer can do a multiply in a single cycle, oh sorry, four cycles, wasn't it? Four cycles.

36:32

Yeah. But raising to the power is not a primitive operation that it knows how to do. It has to be built out of code to do that operation. Just like, you know, taking the inverse tangent of something is like a procedure. There's a program that runs. And so there's a vast, vast difference in the number of instructions that need to be executed to raise something to a power. Then there is to just multiply it with like the circuit that does multiplication and, but they're, we can see they're equivalent, right? You can look at them and say, well, this is a power, but you've asked the computer to do two different things. Now in some very optimized, compiled languages, there might be scope for the compiler to go, well these things are equivalent, but typically, um, something that is so high level in terms of what the compile see is that like it's a call to a function to raise something to a power. And unless it has special knowledge that this is, this is really what a power is, um, it doesn't know that it could be replaced with, with, uh, multiplying by itself. Right. And so that, that was surprising to some folks. But, you know, um, I guess not to me I'm like looking at going like, gosh, that's a very different operation that you are asking you to do that are functionally equivalent.

Ben Rady

37:42

Yeah. I mean there's a, there's a lot of things with the intersection of computers and mathematics where the computer and, and the pure math don't really agree at all. I mean, you know, obviously floating point operations are an example of that.

Matt Godbolt

37:57

Well, there's that too. Yeah.

Ben Rady

37:59

Um, you know, uh, and we can go into that, but I feel like there's a lot of these kinds of areas where it's like, if you're looking at things from a purely mathematical standpoint, um, it's, it's different when you map it into computation, right? Um, and it's like sometimes a limit of like the way that computers are actually implemented now and the technology that we have. But I think there's also some, I could be wrong about this, but I feel like there's some things that are just like, no, no, no, this has nothing to do with the way that we tricked, uh, little bits of sand into thinking this is just like a fundamental limitation of computation, right? Like, you just can't do this in the same way that you can do it mathematically because the physical world just doesn't work that way. I'm trying to think of a good example of that though.

Matt Godbolt

38:45

. And I was gonna ask you, but I, I was sort of thinking myself now what there else there could be,

Ben Rady

38:50

I don't know, maybe I'm making stuff up, but yeah, I feel like I want to think about that

Matt Godbolt

38:56

now we got quiet thinking about stuff and it like, this is not a good, this is not a good podcast material, but we're by staring at each other, like with our fingers and our chins going,

Ben Rady

39:04

Hmm, yeah. This is like, we're nerd sniping each other with this. That's why. But yeah, I mean those things are, are, they can be, they can be very surprising. They can be very surprising.

Matt Godbolt

39:12

Right. And I think in our, in our case as well, um, my, my teeth, in fact my, my sort of speaking career is based on the, you know, website that bears my name, um, and the kind of cool things that compilers can do. And that's what really gets me excited. And so my intuition is based around what compilers are able to do. What compilers often can do is to do exactly the kind of transformations that you might reasonably think, like seeing, not maybe not specifically power, but things like that. Hey, look, you're doing this, this sequence of operations there, there is a faster way of expressing and that has the exact same semantics and meaning to the cpu. So I'm gonna do some work to to to change it for you. And so you often, you don't have to worry about these things as a program, which is great, right?

39:53

I mean that's, I ideally you wanna be able to express your intent at the, the highest level where you can achieve your goals. That seems to be a reasonable thing to want. Um, but in an interpreted language like Python, like this was, this particular thing was in Python there isn't an opportunity to, to have that sort of high level view and kind of go, oh, I noticed that you are always doing these two things together, or I can prove that this is a constant on this side and therefore I can do something different from what you said. That transformation is not really part of it. Now that's not to say that there isn't some clever code somewhere deep, deep, deep down in numpy that's kind of doing a comparison and go, oh, this is raising to the power of two. That's just multiplier it.

40:31

But if it is doing it, I was un it doesn't seem to be helping

41:13

Then you keep squaring and adding and squaring and multiplying, sorry, squaring and multiplying with either the thing you just squared or with the original value again to kind of get you up in log two steps, which is all cool, but if you were to generally apply that all the time, um, it takes probably more time to work out the best way to do that than it would do to just have done it the long way. So it's only when you have a compiler that can say, well look, I'm gonna do this the once now, right? You pork a program writer, uh, you get to suffer this time while I compile. But then everyone at run time benefits from the fact that I saw that this was actually this particular kind of multiplication and in fact, I can replace your multiplication with shifts and adds or whatever, whatever, you know, there is all that kind of stuff. So, so maybe there is a bit of, um, bias in my, my recent experience because of it being an interpreted language, which obviously has its own trade offs. And one of 'em is that you can write it really quickly and another one is that maybe you can, um, make a lot of cups of tea while it's doing its work

Ben Rady

42:11

Right, right. Saving, saving on, uh, the writing time at the expense of the running time and the coffee and the tea making time.

Matt Godbolt

42:20

Often that's the right trade off. Right? Yeah. You know, that's, I mean, certainly if you wanna write a little command line tool, then you wanna be writing something which is quick and easy and not necessarily hard to write. So there's always trade offs.

Ben Rady

42:32

Yeah. All right. Well this has been a really fun adventure from like the, you know, third of a Grace Hopper wire. All the way up to, uh, rebooting your computer, taking 32 millennia

Matt Godbolt

43:03

Absolutely. And I'm gonna leave on the, uh, a tweet that I saw from a friend, or not a tweet. He was a, a, a conference talk that he did. And then halfway through he said, the first rule of profiling is that you are wrong. And I think that that's the intuition that everyone should take away from this. Is that you're always wrong

Ben Rady

43:19

Yes. Uhhuh, start with that. And you, uh,

Matt Godbolt

43:28

Cool. All right, my friend. Well,

Ben Rady

43:29

Right.

Matt Godbolt

43:30

Until next time.

Ben Rady

43:31

Until next time.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript