605: Goodbye World

⁠¶ Intro

Clips

00:00

Enhanced BPF is another hotness of Linux for the last couple of years. And when the patches were first added to Linux, the lead developer, Alexei Storitov, said this allows you to do crazy things. This is normally the words you don't tell Linus when you want him to accept patches into the kernel, but fortunately the patches were accepted. Enhanced BPF puts a virtual machine in the kernel that we can program from user space.

Chris

00:36

Hello, friends, and welcome back to your weekly Linux talk show. My name is Chris.

Wes

00:41

My name is Wes.

Brent

00:42

And my name is Brent.

Chris

00:44

Well, hello, gentlemen. Today, we're digging into a superpower that is inside all of our Linux kernels. We're going to talk about how eBPF works and how anyone can take advantage of it. Then we're going to round out the show with some great feedback, some picks, and a lot more. It is a special out-of-time episode. As you listen to this right now, we are at Planet Nix and Scale. So this is an episode in between.

01:08

You're going to get all our Planet Nix and Scale coverage coming soon, but we wanted to take a moment in between episodes and do something kind of fun and really dig in and get technical. But first, I want to say a big good morning to our friends at TailScale. TailScale.com slash unplugged. They are the easiest way to connect your devices and services to each other wherever they are. It is modern networking. It's a flat mesh network that is protected by...

Wes

01:35

Waggaard.

Chris

01:36

That's right. And it is so fast, so quick to get going, and it gives you superpowers. Not only do you get a flat mesh network across complex networks, so maybe you have multiple data centers or VPSs, or you've got mobile devices, or you've got double carrier grade NAT, it'll smooth all of that out. That's fantastic. But then there's also a whole suite of tools that make it really convenient to use.

01:59

Sort of like AirDrop for your entire Tailnet, including like your Android device and your Linux device, so you can send files around. They will manage your SSH keys through your Tailnet for you, so you can just log into all your individual devices. You don't have to manually copy keys around like an animal.

⁠¶ Not an Instagram Filter

02:12

And they also offer more advanced features, so you can set up ACLs and really manage the system and lock down only certain things to certain people. And when you try it now, when you go to tailscale.com slash unplugged, you get it for free up to 100 devices and three users. No credit card required. I mean, you can really cook with 100 devices, and then maybe you'll discover it's great to bring to work too.

02:33

Thousands of companies like Instacart, Hugging Face, Duolingo, Jupyter Broadcasting, and many others use Tailscale, and we love it. Try one for yourself. Go get yourself a little Tailscale right now. You're going to love the way it tastes, and you're going to love how easy it is to get going. If you've got five minutes, you'll probably get it running on three devices. I have no inbound ports on any of my firewalls. Talescale.com slash unplugged.

02:58

Well, like I mentioned, we are at Planet Nix and Scale right now. But we did think it was sort of a perfect out-of-time episode because we really first got excited about eBPF at Scale back in 2019?

Wes

03:15

Yeah, a million years ago.

Chris

03:17

And it was a great presentation that really just brought home to us how powerful this was going to be.

Wes

03:25

Yeah, you heard from the man himself, Brendan Gregg. observability, performance, tracing, guru, eBPF was still kind of new then. He's kind of well-known. You've maybe seen his famous video online using D-Trace to show how hard drives don't like it when you yell at them. So, you know, has deep insight into this area and was, even in 2019 and earlier, was already getting excited about eBPF.

03:49

So it's kind of neat to look back now as like a whole giant marketplace of eBPF-based, observability tools now exist.

Chris

03:57

And the name is sort of a misnomer, right? Because it sounds like a packet filter. And so you think, well, how? Okay, what is this a firewall, guys? No, no, no. That's why we wanted to play that intro clip. It is so much more. It is really a VM inside the kernel that can run simple code that you can create and craft that is protected.

04:15

And so since this is a prerecord, we're not going to have your boost this week, but we do want to know if you like this type of deep dive into this particular topic. So boost those in and we'll bank them for when we come back. But let's talk about the extended Berkeley packet filter.

Wes

04:28

Yeah. So it does do some networking stuff still to this day. But it did, as you say, start out as a packet filter introduced in 1992 to efficiently filter network packets in BSD operating systems. and.

Chris

04:42

Pf sense and other firewall products like open sense of you know they've been using bpf as part of their core product i that's one of the reasons i was an early pf sense user.

Wes

04:51

And you know before too long within the same decade it made its way over to linux in the form of tcp dump and already you're seeing this thing right where you can kind of use user space to help better observe what's going on on your system, And so if you've ever written sort of an expression to filter things or look at packets using TCP dump, well, you're using a language that then gets compiled down to BPF bytecode and executed.

Chris

05:14

That bytecode right there is kind of the magic, right? Because it turns out that this thing is essentially capable of running this bytecode. So it's not just a packet filter.

Wes

05:22

Yeah, and that's like how the implementation works. As it started out, it was a very simple virtual machine. And don't think virtual machine like QEMU necessarily are like simulating a full computer.

05:32

the point is it's like a very limited restricted bytecode that can only do certain things relevant to at first filtering packets and that lets you make sure that like it's not going to do anything crazy it can't go into infinite loops all kinds of other nice things and optimize it and then be able to load it in and have you know you run the program it supplies the packet and anything else that you need is input to it and then the machine executes and ultimately that's how you tell like

05:56

do i accept the packet or do i drop the packet uh but at first it was a you know a very limited i think i had like a two two registers to use super limited thing but ebpf extended bpf was introduced in linux 318 this was like 2014 so bpf had been around for a while there'd been various developments but ebpf really kicked things off in the linux side of things bpf hadn't you know caught up with the times in some ways it was still 32 bit it had some of those register limitations.

06:27

So they upgraded to 64 bit registers added more instructions they added the verifier to the kernel which is a big part of it that lets you analyze bpf programs to make sure they're safe no infinite loops they don't do invalid memory access there's a checker process right because it's like you're loading something from user space into the kernel that's a big security concern so you you want to make sure that you have and just be on security operationally too right you

06:51

don't want it to be able to crash the kernel so.

Chris

06:53

We actually started to see this stuff land you know progressively in linux 3 18 and beyond so this is actually like you said 2014 that's a this has been landing for a while.

Wes

07:02

Yeah definitely and then so that was kind of like the raw stuff and then 2018 and beyond people started adding more tools like the bcc but the bpf compiler collection bpf trace which we'll talk about too uh the company psyllium and their their products have a bunch of like ebpf stuff for, Kubernetes offerings plus we got better like compiler support there's something called Kori or compile once run everywhere so like compilers have better support

07:31

to be able to make you know you can compile your BPF program and not have to worry about as much necessarily depending on what you're doing with it about how compatible it'll be with the kernel where you compiled it versus where you're running it so.

Chris

07:41

The idea being like it should be compatible across multiple versions of Linux as long as they have this correct implementation.

Wes

07:48

Right okay This has been so successful that newer versions of Dtrace on Linux, They're basically just some extra user space stuff that uses eBPF and other kernel primitives under the hood.

Chris

08:03

Really? Yeah. Huh.

Wes

08:05

And eBPF was also ported to Windows.

Chris

08:07

Yeah, I heard about this. They're getting a lot of good stuff over there.

Wes

08:11

One important thing with the extended part is they were also able to make the instruction set be more sympathetic to modern hardware, and they implemented just-in-time compilation as well. So that meant eBPF can be really fast. They've also made it so that there's relatively stable APIs we've kind of been talking about, so that as kernels change that, you can use eBPF to hook into internals. If you do that, there's no guarantees, right? But that's kind of on the tin.

08:37

But there is some stable interfaces that you can use, which is nice. So the other important point to know is there's kind of various things it can do. There's XDP, which is Express Data Path, which intercepts packets basically at like the earliest point that you can.

08:54

you have limitations on what you can do but it can be super low level and performant and so you can see this sometimes maybe unlike responding to ddos attacks where you're getting flooded with traffic that you have you know maybe you can identify various ways that you can program in here so you could essentially.

Chris

09:10

Ideally catch it before it begins to overwhelm the system because you're catching it much further in like the driver stack.

Wes

09:16

Yeah that's the idea and it doesn't do as much as like the very general and powerful but you know full-featured linux kernel networking stack you can kind of just be like, well, no, if it looks like this at all, just cut it off immediately.

Chris

09:26

Yeah, if I'm getting DDoSed, I don't want the network stack trying to figure out what to do with all this traffic because that's what's going to take me down.

Wes

09:31

Yeah, one way to say it is limited context but maximum performance.

Chris

09:35

I like that.

Wes

09:37

Okay, so then you can also do various types of K-probes.

Chris

09:40

I'm sorry?

Wes

09:40

Yeah, K-probes. And this is dynamic instrumentation. That's why it sounds quite probing. You can hook almost any kernel function. but you don't there's no like clear definition of what you're going to get you have to go look at the function you're hooking into it's all going to be dependent on that but no kernel internals any kernel function almost i'm sure there are some limitations that's pretty a lot of them yeah i.

Chris

10:03

Mean that could be like from you know keyboard input to network traffic to disk io i mean that's there's all kinds of things.

Wes

10:10

There so that's where a lot of some of the power comes from yeah but that may or may not be stable there's no guarantee about it being stable across kernel versions, can change at any time. People can update the signatures. That's one of the things that's been happening in the Rust discussion is the kernel, many developers expect to be able to make a change like that and because they have one big code base, they can update it everywhere and be able to do

10:29

a refactoring like that. And, but the other one, Trace points. This is an important one. Trace points. Predefined stable instrumentation.

Chris

10:39

Ah.

Wes

10:40

Low overhead, but only available where explicitly added by kernel developers.

Chris

10:45

So, okay. So a trace point would be like a spot you hook in and start getting metrics or information out of. But the developer of the subsystem has to implicitly support that. Yeah.

Wes

10:56

You have to add that in versus totally dynamic with a K-probe.

Chris

10:58

Okay.

Wes

10:59

But the upside is you get a structured data specific to each trace point, right? So the trace point tells you what it is.

Chris

11:05

Yeah, yeah, okay.

Wes

11:06

And they're maintained across kernel versions, so you can kind of rely on them for more long-term use.

Chris

11:10

I have a feeling this might be relevant later. Okay, good to know.

Wes

11:14

Yeah, I think that's probably like the quick, high-level version of what eBPF is and kind of what it can do.

Chris

11:19

And it is so, so, so simple, but yet so powerful is what I love about it. And we have a couple of examples in the show, and I think some hands-on stuff that people could take away. And maybe we should start with BCC tools themselves.

Wes

11:36

Yeah, because we've both had a chance to play with those.

Chris

11:39

Yeah, I was, you know, I was joking around with Wes and be like, it'd be kind of great to know where in the system when I open up this directory that's a fuse mounted directory, like where is the actual delay happening and what part of the system is sitting around waiting. And through this process of trying to get to that, I came across tools that let me look at my disk IO analysis or let me look at the network traffic and really kind of using these different tools, putting them together.

12:06

You can actually start to get a really good picture of where the delay was happening in the system. And, you know, for me, it was like, this is going to be amazing. I had discovered by running it on my desktop, just as when I was experimenting with this stuff, that, oh, yeah, I've still got this errands app that we talked about on the show that I'm not currently using like today. But I guess when I close it, it doesn't actually close. It doesn't leave an icon.

12:31

I had no idea it was running. But using these tools, I started seeing this application that was hitting my disk every so often. And I'm like, I don't recognize this. and I discovered I actually had that running the entire time. And it was kind of useful.

Wes

12:44

Yeah, it kind of skips through a lot of the boundaries and other limitations that can sometimes pop up when you're trying to look at your system. So it can be surprisingly insightful. They do have a nice tutorial, which we'll link. They also, I like that they have a set of things to run first. Like before you go to these tools, make sure you check all like the regular system monitoring tools first because we'll see as a theme, like, you know, your H-tops and B-tops and all kinds of things.

13:10

kind of get a broad look and you could see some specifics whereas you could you can make broad eppf programs but by default they're going to be a lot more specific when you're looking at like one thing.

Chris

13:21

Like disk.

Wes

13:21

Latency or something specific to the file system.

Chris

13:23

Or networking yeah yeah that's a good point um all right so we could talk about some of the commands we tried i know that uh file top and tcp top we played around with um i did not get a chance to play around with ext4 slower and XFS slower, but this is an interesting way you can use it too.

Wes

13:41

Uh-huh, yeah. It'll tell you when, you know, there's things happening in your file system that are causing latency more than, like, a programmable amount you can pass on the command line. Or they've got ZFS dist, which traces ZFS reads, writes, opens, fsyncs, and then summarizes the latency of all that in a histogram for you.

Chris

14:00

That's really cool.

Wes

14:01

Yeah, right? And then two really useful basic ones are exec snoop and open snoop.

Chris

14:08

Yeah, OpenSnoop, isn't that kind of like, oh, there's like a Mac app that monitors traffic? Is that what it was? OpenSnoop something else?

Wes

14:16

Yeah, no. So OpenSnoop, you're thinking of, there's a Linux one, I think, right? OpenSnitch? Yeah, just one thing.

Chris

14:22

Snitch. Snitch. That's, yes.

Wes

14:24

OpenSnoop tracks file opens.

Chris

14:26

Okay.

Wes

14:27

So you can run this and it hooks into the kernel via BPF. And then anything that's running on your system that opens files, it'll just print out in your console right in front of you.

Chris

14:36

I did play with this one. I did play with this OpenSnoop one. You're right. Yeah, that's really cool because I remember I launched, you could launch particular things and then just watch all the crap that happens on your system.

Wes

14:44

Right. And especially, you know, you can filter it, you can have it look at specific processes or you could, you know, grab output or whatever. So you can filter on a busy system, but I think it's especially useful on a system you think should be, you know, relatively quiet and just as a way to see, like, what is actually happening in the background? And for short-lived processes that might be hard to see in a summary-type program that are still doing things, it'll still print in OpenSnoop.

15:10

That's where ExecSnoop does a similar thing, but for process opens. So again, for things where it's hard to see in a general tool, but you really want to see at a nitty-gritty level what is happening in being Exec on this box, ExecSnoop is handy for that.

Chris

15:25

I have a sense Exec Snoop has come up actually in real life for you as a handy tool.

Wes

15:31

Yeah.

Chris

15:31

Do you want to share a war story with us?

Wes

15:32

I do. So a few years ago, I was adminning a VPS box. I wasn't totally in charge of, but had some, you know, administration over. And... Things were seemingly broken. I'd noticed some messed up statistics first, right? The metrics were a little odd. And I went on and I started just kind of doing regular updates and poking around at the system. And I noticed that I wasn't able to actually update the initRAMFS, right? I was doing all the, it was an Ubuntu box. I had everything going.

16:01

But there was some like dynamic library linking problem that was happening after I started digging into the output and tried rebuilding it way too many times. And I'm like, what, why? Why can't I get it?

Chris

16:13

Yeah, well, I can't get my initRAMFS updated.

Wes

16:15

The kernel's here. Everything else in the update's fine.

Chris

16:18

But you can't boot in that new kernel if you can't get that updated.

Wes

16:20

Yeah. So this was maybe closer to when I really started playing with these tools a little bit more seriously and having them installed on more boxes. And so they were already available. And I used OpenSnoop because I wanted to see exactly what was being opened and touched by the update initRAMFS process.

Chris

16:39

Right.

Wes

16:40

And then that led me to look at some weird file paths. And I also started, then I did exec snoop, and that showed similar file paths running on the system.

Chris

16:51

Oh, uh-oh.

Wes

16:53

And then I was able to figure out that there was, in fact, a crypto miner running on the box.

Brent

16:57

Oh, no.

Wes

16:58

But it had loaded a custom kernel module to hide itself from tools like PS and top and htop.

Brent

17:05

That's pretty slick.

Wes

17:06

But not from exec snoop.

Chris

17:08

Clever. So that's why it was breaking when you were trying to update in that in an FS.

Wes

17:12

Yes. Something it did to the system. I never quite figured out exactly what it had changed, but it had messed with some of the files on the system in ways that meant that the linking was no longer cleanly happening.

Chris

17:21

Yeah. So that's a good way to stack those two tools. You know, OpenSnoop and ExecSnoop are really a couple of nice tools you can stack together.

Wes

17:31

And there's a whole bunch of the, this is part of BCC, the BPF compiler collection, which includes a whole bunch of other stuff. But these are the tools that they ship by default, which leverages the framework they've built to implement these via eBPF. And so there's like, yeah, all kinds of stuff, file system specific stuff. They've got things for, you know, looking at network connections, TCP connections. They've got stuff for monitoring databases. It's broad.

Chris

17:55

I don't know if it's necessarily best practice, but you could say that with these tools, you could be fairly confident that you had cleaned up the crypto miner and the infection. You know, because you can actually watch at a much more intimate level what's happening at the system.

Wes

18:10

Definitely useful. Yeah.

Chris

18:12

I'm not saying it's probably a best practice, you know, but probably just wipe the box. Yeah, but it is sort of nice if for some reason you don't have that option, you can use these tools to kind of confirm that stuff isn't coming back after a reboot.

Wes

18:23

Right. And the more, you know, depending on, but you can see exactly where they're hooking and modify the, because a lot of this is implemented via a combo of C because that's the part that gets compiled and like it's like a limited subset of C that does the BPF stuff that gets converted into a loadable thing for the kernel.

18:40

but there's a bunch of python utilities around it to wrap it so you could fork that copy it modify it if you need it even more or to try to observe specific things once you identified like a particular problem or threat now.

Chris

18:54

That's really going deep, onepassword.com slash unplugged. That's the number one password.com slash unplugged, all lowercase. And you're going to want to go there because this is something that if I had, when I still worked in IT, I think it would have sustained me for many, many more years. The reality is your end users don't, and I mean, without exception, work on only company-owned devices, applications, and services.

19:22

Maybe you get lucky and they mostly do, but I don't find that to be the reality today. And so the next question becomes, how do you actually keep your company's data safe when it's sitting on all of these unmanaged apps and devices? Well, that's where 1Password has an answer. It's extended access management. 1Password extended access management helps you secure every sign-on for every app on every device because it solves the problems that your traditional IAMs and MDMs just can't touch.

19:52

And it's the first security solution that brings unmanaged devices and applications and identities, like even vendor, under your control. It ensures that every credential is strong and protected, every device is known and healthy, and every app is visible. This is some powerful stuff, and it's available for companies that use Okta or Microsoft Entra, and it's in beta for Google Workspace customers too.

20:17

OnePassword changed the game for password management, and now they're taking everything they've learned there and expanding it to the login level and the application level. You know what a difference it makes when people have proper password management. Now let's have proper login management and authorization. 1Password also has regular third-party audits and the industry's largest bug bounty. They exceed the standards set by others.

20:40

They really do. So go secure every app, every device, and every identity. Even the unmanaged ones. You just go to 1Password.com slash unplugged. All over case. that's onepassword.com.

Brent

20:52

Slash unplugged, now Wes it sounds like BCC is an abstraction to EBPF what's going on under the hood here.

Wes

21:02

Yeah so there are tools that you can just run like we've been talking about but how did those tools come to be well that's where some of those abstractions and a framework comes in that allows basically embedding C code directly within Python scripts and then And BZC's tools also sort of handle making sure you've got the whole tool chain available. So it uses LLVM and Clang to compile things. It handles verification of stuff.

21:28

It handles loading it into the kernel for you and attaching it to the right hooks, basically through like Python method calls instead of you having to run the right commands in the bash shell. So you kind of get like a unified approach where you can write kernel level programs without necessarily having to deal with like all the other stuff.

Chris

21:44

You know I was thinking gosh this really seems like going way beyond what my skill set would be able to manage but then I thought, python code i bet an llm would get me 90 of the way there nice these days and then i could probably just finish it off.

Wes

21:58

And there's a fair amount of like existing tools that you can either feed into an llm and ask about or modify or try and you know hack on yourself yeah and so here's a simple example that we can play with it uses xdp okay and so there's a simple c program that has a function called xdp drop all okay uh and that's going to attach to the xdp hook and so we get like a little data summary of the packet info, which we're not going to care about.

22:22

All we're going to do is return XDP drop, which is a magic value that tells the kernel, hey, just drop this.

Chris

22:29

Just drop all.

Wes

22:29

So it's going to drop everything. And then in the Python, that's it C-wise. That's all we do. And then in the Python world, there's a little setup to make, hey, I want a new BPF thing or whatever. And then we tell it to load in our little blob of C. And then we attach it to the interface we want. That's two lines of Python. And then it prints a nice little message and starts working.

Chris

22:49

And what it should do is drop all network traffic?

Wes

22:53

Yeah, so we specified a specific interface.

Chris

22:56

Yeah, so you say on this interface, just drop everything.

Wes

22:58

And that's because we attached it to that specific interface.

Chris

23:01

And this is like a quick kill switch. So what we're going to do as a test here is I've set up a ping with an audible sound so we can see when we're getting a result. So we can see we're pinging the box right now. And then at some point, Wes is going to kill. He's actually going to run that kill script, and it's going to drop all traffic. and we'll hear it drop off. Whenever you're ready, Mr. Payne.

Wes

23:20

Three, two, one.

Chris

23:24

There it goes. Yeah, it takes almost no time at all. It happens almost immediately.

Wes

23:30

Now, and of course, in my regular SSH session, I can't stop it, but I do have a sneaky console here, so we should be able to take a peek.

Brent

23:39

Wes always has a sneaky console.

Chris

23:41

All right, I'm leaving the ping going, so if it resumes.

Wes

23:43

We should hear it. I'm hitting Control-C now.

Chris

23:45

Okay, out there it goes. Look at that.

Wes

23:49

Yeah. So like that wasn't that much work. This is just an Ubuntu 24.04 box. So like, you know, all the stuff you needed was in the repos already.

Chris

23:58

I'm thinking, you know, like that's just a it's it's not a total practical example, but it's a quick example of the power you have there and you're executing that from user space.

Wes

24:08

Yeah, you do need, you know, root permissions to be able to load it into the kernel.

Chris

24:12

But then you're executing things inside the kernel that just immediately cutting off traffic to that interface.

Wes

24:17

Yes. And I didn't have to build a per kernel kernel module with the right headers and like have to worry about messing up my implementation in a way that's going to crash the box.

Chris

24:26

You didn't even have to create like a IP tables like that rejects traffic because that would actually be much further down in the processing stack if you're using something like IP tables. Yeah, that's cool. That's fun.

Wes

24:37

So in reality, you'd want, right, you would do some sort of filtering where you would look at the data structure you get from XDP to figure out like, oh, is this one I want to block or let it keep going through processing? But yeah, this is your basic. It's not hello world. It's goodbye world.

Chris

24:51

Yeah. It's goodbye world. Exactly.

⁠¶ Following the Breadcrumbs

Wes

24:52

But the main point to show that was that this BCC framework is one way if you want to start developing custom tools. But you can get even more ad hoc than that.

Chris

25:01

All right.

Wes

25:02

Because there's BPF trace.

Chris

25:03

Yeah. I want to talk about this.

Wes

25:05

And this is basically sort of like the closest thing to D-Trace for Linux if you don't want to use the Oracle tool, which I believe Gentoo now has.

Chris

25:13

Oh, all right. So BPF trace is sort of what I know of D-Trace is like this ultimate tool when you're debugging or trying to figure out where your system has gone sideways. again like i was talking about like why is this one fuse directory taking so long to open and so i imagine this is a similar type of.

Wes

25:28

Yeah one way to think about is you basically get like a nice targeted little scripting language to write tracing programs oh that use the trace points in the kernel.

Chris

25:39

Wow okay all right.

Wes

25:41

And there are some other options there's a program called ply that i haven't tried but looks also nice um but bpf trace is quite popular uh so we talked about right there's different there's xdp there's the k prob stuff these are the trace points which is the the easy stable ones because they're defined and they give you a known data structure so you can do bpf trace dash l or the bcc tools has tplist which is also just lists all the trace points it also shows you can do other

26:06

user space stuff but we're not going to talk about that today so.

Chris

26:09

There is so tplist i guess that makes it a lot easier if tplist is going to give you all of the kernel trace points so you know what you have to work with that's really useful.

Wes

26:17

Right And it tells you the shape of them, too. You get the sort of data structure.

Chris

26:21

Okay.

Wes

26:21

Um... Let me pull it up here.

Chris

26:26

He's got it there on the machine.

Wes

26:28

Yeah, so here's sort of like, here's some stuff about block dirty buffer. And it tells you you get a dev device, you get a sector and a size. And so you know those are the things you can work with. And then it has a name and you just tell it that you want to trace that thing. So this came up because, you know, we talked a little bit about BCC having file system specific tools. Like, oh, I want to see watch for slow operations from ButterFS, say, right?

Chris

26:51

Right.

Wes

26:52

Well, they haven't yet. I suppose I should step up or someone should step up and add these, but there isn't yet a BcacheFS version of these tools, right?

Chris

27:00

There we go. That'd be cool.

Wes

27:01

Obviously, probably what you want to do is just, you know, fork the existing ButterFS one and modify it and make it work right for BcacheFS.

Chris

27:08

Sure.

Wes

27:08

But if you want to be ad hoc about it, BPF Trace can handle this too. So I did TP list and then grepped that for things that said BcacheFS.

Chris

27:18

So you got all of the kernel interfaces regarding trace points, as I should say, trace points regarding BcacheFS.

Wes

27:25

So here's just like a list of some of what the trace points look like.

Chris

27:29

I see. Okay. So now, okay, so what I'm seeing here on Wes's screen is like an output, BcacheFS colon, and I can get data update, rebalance, extent. There's a lot of different options here. Essentially, just what is the file system up to?

Wes

27:41

And so these would all be things that Kent or team have put in explicitly so that there's a way to like easily and with low overhead watch these things.

Chris

27:51

So since these trace points already exist, you can use... bcc to write your own kind of monitoring.

Wes

28:00

Yeah or bpf trace which is right.

Chris

28:02

Yeah thank you.

Wes

28:03

Um there's a lot of terms there's a lot of bpf terms so here's one that stood out uh copy gc wait and bcashfs is a copy on write file system and it does this bucket based allocation and so it has what's called a copying garbage collector so as you're copying files it'll handle moving things around so that it can then like get good bucket allocation and defragment things on the fly that kind of thing so copy gc wait is a trace point that tells you when you're when bcachefs is waiting for copy gc

28:33

to complete so it can be a reason that your file system is being slow or not responding especially if you're moving or copying files around so you can see on your screen uh just a little tiny script uh sudo bpf trace dash e and then you pass it a little string and we tell it we want to use the trace point bcachefs copy gc wait and then we have a little script here that, one of the inputs is the device and then it has little stuff that extracts the

29:01

major and minor number from that you don't have to do that and then all it does is print what device we're waiting on and how much we're waiting yeah and so it's you know i don't know 10 lines of code.

Chris

29:12

And you get a nice kind of structured output it's easy to human readable it tells you the total flushes the total buffers flushed you get you know the total output whatever whatever the stat is you're watching.

Wes

29:22

And then so on my system i'm just doing it on you know one rudifest disc so i knew which disc it would be it was just kind of interesting to see like i went around and i dd'd some big files and copied them around and i could see like oh yeah right the operation completes and then immediately the print tells me that oh we waited that much and.

Chris

29:38

The practical use here is you know in a rate array, It might be useful to discover that one of your disks is significantly underperforming the others. It could be indicative of a larger problem. It could be indicative of why you're having performance issues.

Wes

29:51

So, like, it turns out that like this, there's actually probably a fair amount of situations where single trace points, just on their own, might be something you want to look at, right? So other ones for BcacheFS, write buffer flush. That's an important event. Or journal writes. You might be wanting to know stuff about how the journal works.

30:07

but you can also use the scripting language and the fact that eBPF supports basic data structures like maps and histograms to make more complicated combined programs so i had an llm i fed it the output of that tplist stuff that told it the available trace all the traces you had and what the structures looked like so you had to write the right code that's clever and uh bpf trace makes it easy to have stuff that runs

30:34

at an interval right so you can set up all the traces And then every five seconds, it can print out a little summary that it's done. And so each time it traces, it can update a variable, and then it can calculate latencies or deltas.

30:45

So this is a quick one that it did that looks at that right buffer flush trace point, and it samples it over every five seconds, and then it tells you how many flushes happened, how many buffers were flushed, how many were skipped, how many were fast path flushes, and then the total amount of data that was flushed. So just like a quick way to get, you know, an accurate little like every five second little printout.

Chris

31:07

And what's great is like, as far as I know, this is really putting no measurable strain on the system.

Wes

31:13

It's one of the more performant ways to do it. Yeah.

Chris

31:14

So you can really get deep insights without some of the overhead you sometimes get by that kind of monitoring.

Wes

31:20

So then to kind of further stress just how far this whole having an AI help me out would go, I had an idea. I did ultimately tone it back a bit, but basically like, what if I wanted to monitor some of the mesh network traffic that was going on? There's a lot of options for that, but can this do this too? And so it built me a little script that you give it an interface name, like tailscale0, say, right?

31:42

and then it'll do a similar thing where every five seconds it'll just look at that interface and it'll count sent packets sent bytes receive packets receive bytes and it'll do a send latency histogram for you on the sent packets yeah.

Chris

31:57

So you get this you actually i mean it's not a gui but it's it's it's a bar graph on the command line.

Wes

32:03

Yeah right and so the combination of the built-ins ebpf and then the built-ins into bpf trace which has some of this stuff to do nice histogram display and stuff from their print command um it's you know this is a little bit longer some of it's kind of because there's a print statement for each thing we're printing it's like five different things we're measuring right so it takes up some space on the screen but like it's a single page of code here so it's not a it's not crazy

32:28

to start trying to understand it.

Chris

32:29

Right i've never even seen some of this stuff but i under i could like the trace points they all make sense they're all it's just real plain english really it's really easy and then you have the print and the time interval.

Wes

32:40

And so under the hood, it is tracing something called NetDevStartTransmit.

Chris

32:45

Okay.

Wes

32:46

So then it filters, it gets data, it filters on interface name from that data, and then it has a little counter it's keeping, so it increments the counter. And then it has a thread ID that you can use, and so it gets the current timestamp in nanoseconds and stores that in a map based on its thread ID.

33:04

And then it also does a trace point for sent packets with NetDevTransmit, and then it grabs the start time if it exists for its thread id and then it can compute how long it took to send from that and then that's where it can use the histogram stuff to print out a nice little thing now that it's tracking latency and then it has another uh trace point to use uh net if receive skb which tracks incoming packets and then it just has a little bit of code here

33:31

to kind of tie it together every five seconds and do the printout and then it clears all the counters so that it can do the next cycle.

Chris

33:39

That's a neat little magic, like, superpower, a pocket of power that's in the Linux kernel there for this.

Wes

33:44

Now, it does mean, right, like, you can't use it if you don't have trace points you care about. And so you have to start understanding some kernel internals and, like, what trace points might matter and what they mean. But if you have a specific problem or a mystery on your system that you're trying to look into and you're willing to try to chase down hypotheses using these tools, I think you could get pretty far.

Chris

34:04

Yeah, that's for sure. And then, you know, there are actual complete products out there that are dipping into this stuff. I was reading online that there's actually several Kubernetes products that are tapping into eBPF to do things under the hood. Also, Falco, which is a real-time security monitoring application you can run on Linux. It says, Falco uses eBPF to monitor system calls and network events directly in the kernel, enabling rapid detection of anomalous behavior with low latency.

34:33

This kernel-level approach is faster and less resource-intensive than the user space monitoring tools.

Wes

34:38

There's also user space tracing you can do. So if you set it up right, you can trace JVM events or Ruby things or Python stuff. And so you have products now, too, that will use things like open telemetry or other tracing standards, and you can have a trace then that can have both the kernel side via the ePPF stuff and the user space stuff without necessarily having to do as much custom observability implementation in the code base because they can kind of do more dynamic.

35:07

And you might be able to then see stuff, right, where like, oh, there was a problem on the kernel layer, and then that's why we started seeing increased latency in the application layer. But you can use that. There's a lot of open source products or open core things, and there's just a lot of sort of primitives in one layer above primitives that are probably already on your kernel.

Chris

35:27

I wonder if it's possible people out there listening have already been using eBPF for a while and this is all old news to them. Who didn't tell us? Like, have you used it in what applications and what utilities has it been for you? I mean, we've gone through a few examples here, but like Wes's story with the VPS and my story with tracking down like a background desktop application, there's just these also very practical day-to-day uses for this stuff.

35:52

that's nothing more than just using some of the pre-existing tools that you just run and tap into this, and you don't have to write any kind of scripting code.

Wes

35:58

Yeah, I mean, it's been around for long enough now that it's well-packaged in most distros.

Brent

36:03

Can you use eBPF to figure out why I have so many tabs open?

Wes

36:08

Well, maybe we can add a trace point to Firefox to keep track at least.

Chris

36:15

Jupiterbroadcasting.com slash river river is the most trusted spot in the u.s for individuals and businesses to buy send receive bitcoin and they make it easy in three simple steps jupiterbroadcasting.com slash river hey.

Wes

36:30

I just got set up with them and it was in fact very easy.

⁠¶ Feedback

Chris

36:33

I think their best low-key feature actually has to do with cash just as an aside so bitcoin is dipping as we record and if oh man this is this was the moment so they have a 3.8 percent interest cash savings account and they pay the interest out in sats so you put your cash in there and it's fdic insured 3.8 percent interest and then when the when the price dips on bitcoin you can use that balance to smash buy and so you can you can essentially dca with that 3.8 percent interest

37:03

and then you can smash buy when the price dips it's just brilliant and river is a really great company i had a call with their community management person and was really impressed with what they said. And if you're in Canada and you're looking for another trusted source to get sats, bitcoinwell.com slash Jupiter. There you go. That's our tip.

Brent

37:28

Well, since this week we are hoarding our boost for next week's episode, we decided to do a little dive into the old mailbag. Zach sent us a little note here. I'm behind on listening to episodes. But the NixOS episode from last month, Linux Unplugged 5.9.8, was really interesting. I wanted to throw out some thoughts responding to a comment that was sent in about wanting a NixOS-like system, but being able to use the old traditional system admin tools.

37:58

For context, I've tried out NixOS, but quickly got to the point where I would need to dive in and figure out those flakes. At that point, it just became one more thing on that list to do. Since then, I've gotten into bootable containers. I've specifically been using the Universal Blue ecosystem with Bootsy, but have wanted to try out Vanilla OS and Common Arch that also use OCI images for delivery and customization.

38:22

Listening to that NixOS episode, I felt like many of those talking points that were given in favor of NixOS were also very in line with the benefits that I've seen in using OCI images to build out customized OS images.

38:37

I don't use any of the headliner distros like aurora or bluefin although i have tried a few and overall they were quite well put together but i've taken their more low-level image builds and layered my own customizations on top of them it's been a fantastic way to manage everything as when i need a new server or desktop all i have to do is install my image and everything's ready to go i will add the caveat that there may be some learning if you haven't

39:05

already been familiar with building OCI containers, but I live in a world where professionally and for hobbies, that's what I'm doing. So that wasn't a huge burden for me to learn. Thanks for the show. Really have been enjoying it and wanted to send this little message in just in case someone else might find some use.

Wes

39:24

Well, cool. Thank you for sharing. We love success stories like this. I wonder if you have any of it on GitHub or anything too.

Chris

39:30

If people are curious. Good question. I really feel like this nails something that we've been feeling and chatting a lot about behind the scenes is uh the really really brilliant thing that the universal blue folks have tapped into is this existing knowledge set around how oci images work and how you can layer things and really tapped into the whole devops workflow that people live every day and now you can use it that skill set that you've learned

39:59

and you can use it to customize your Linux desktop. I mean, that is just really powerful. It's a different approach than Nix, right? Whereas Nix, you're building it from the ground up. You would use Nix to maybe generate those OCI images or something like that. But both, like he says, are essentially accomplishing similar goals.

Wes

40:16

Yeah, and a lot of the immutability side of things and the image-based approach, right? Like starting to think of your system as a cohesive whole that you deliver in those coherent units rather than ad hoc imperative updates.

Chris

40:31

Now, admittedly, I don't use them as much, but Brent, when I think of an image-based system, I don't necessarily think of flexibility.

Brent

40:38

Yeah, I was kind of wondering about this. I've had a few people tell me, oh, yeah, images is totally the way to go. Of course, we've been playing with Nix and NixOS at the same time, and the thing I keep tripping on with images is... You build them once and then you use them many times. But the way, Chris, I think you and I have been using at least NixOS on the desktop is your Nix config is like constantly evolving.

41:03

Therefore, the next time you go to install it somewhere else, it's always that new updated copy. And I would imagine one of the downsides with building an image is, well, you have that intermediate step. You have to go ahead and build your image every time you want that newest, freshest version. Am I understanding that correctly? Is that a proper downside here?

Wes

41:20

I know some of the silver blue types, right, they're still using a lot of RPM OS tree, which has ways to apply things live. So it might depend a little bit on how exactly what you're putting in those images, or maybe you have like a dynamic switch image thing. But yeah, I do wonder about that.

Chris

41:35

And then you also got to remember, it depends on what you're doing. You know, it might just be maybe a lot of your applications are flat packs and you're installing those and updating those separately anyways.

Wes

41:43

Or you're using, you know, various containers or other things for your dynamic environments.

Chris

41:48

I'll take Liam's email here. He says, good day, gents. In the most recent LUP, you mentioned not having received much feedback about multi-monitor setup. So he sent us a picture of his. I've got a laptop screen at 1080 at 144 hertz. That's my primary display, you know, for things like Skype. Then I have two externals, a 32-inch 2K. I do think that's the sweet spot, he says, at 60 hertz. And then my rightmost monitor is in portrait mode.

42:10

That's where I can see more lines of code. All of this works without issue on XFCE. In case you do navigate to the image, yeah, that's a treadmill desk. I'm on my second one since the start of the pandemic this.

Wes

42:22

Is a nice setup.

Chris

42:23

Listeners since Sidebyte, Lunduk and Last Days and Early Tech Snap member since 2022 annual member since December that's awesome thank you very much Liam and I like your setup I am as you boys know big fan of the vertical monitor, For our show docs or, you know, like reviewing a config file or just having a terminal up there.

Wes

42:45

Journal D.

Chris

42:46

Yep. The vertical monitor is a productivity hack. If you can afford the luxury of, because you're not, it's not great. But one of the other things I will use the vertical monitor for is I stack two chats. Like if I have two different work chats or something, I'll stack them on the vertical monitor. Works really great.

Wes

43:03

Right. Yeah. It doesn't fit every activity. So it's sort of a less generally useful screen at times.

Chris

43:08

Yeah. Yeah. It is more limited, so it's more of a luxury. I do have to admit that.

Brent

43:12

I have a bit of a question here for us. Did we ever divulge what our current setups are? Chris, I know you've gone into details, but I don't think Wes and I have actually shared that.

Chris

43:22

Wes, I bet you kind of move around depending on, because you're mostly using laptops.

Wes

43:25

I do a lot of laptop. I've got a dual monitor set up in the sort of main office-y bit, and then I have just a single where I do laptop with one screen in my living room sort of desk.

Chris

43:38

You got a monitor at the living? That's nice.

Wes

43:40

That's for the like casual work where I want to do stuff, but with the TV on.

Chris

43:43

Yeah. Okay, Brent, what's your setup?

Brent

43:46

Well, I don't actually do dual monitor anymore. I am now doing the tri-monitor thing. Chris, I know you're like a quad monitor guy, but I did steal your little tip. You've been trying to get me to use this for years. I have same as Liam here on the right-hand side, a vertical monitor.

44:06

It's just an old monitor I've had forever. it doesn't, matter though because it's just chats or like currently i have our episode recorder there i have the window on which we're connected together and then i also have the jb chat even though we're not doing it live for some reason it just makes me feel better and it's perfect for that and so i've got three monitors running usually for me off a laptop and got this big massive 20 well i say massive although liam beat me

44:34

on this one but it's this 27 inch like 4k dell monitor right as my main laptop as my main monitor in front of me and that's been a pretty sweet setup.

Wes

44:42

I'm on to him.

Chris

44:44

He asked.

Wes

44:45

This question because he wanted to.

Chris

44:46

Answer exactly and i'm here for yeah i love it um you know i've been i know i mentioned this recently on the show but i have i definitely have always been a multi-monitor person for many many many years now but i have been enjoying the single extra extra ridiculously wide screen it's actually a curved wide screen at home and uh and i use gnome there as well and i i really like i mean you can you can lay out three full windows side by side on the screen it's just

45:14

really you get a lot of room and when you start getting that kind of room you don't really need a second screen as much and i will say it is a much easier way to do desktop linux.

Wes

45:23

I do think that'll be in one of my future monitor purchases for sure.

Chris

45:27

If you go with like an intel or an amd gpu right now maybe one day in the video But Intel and AMD and a single monitor, I promise you, your life will be easier than if you try to be a quad monitor guy. It's not an easy path.

Wes

45:38

But what if I want to chain DisplayPort together?

Chris

45:41

Yeah. Okay. Yeah. Oh, man. And, like, I don't know what my deal is because I've only been using computers for, like, 30-plus years. But, like, I still have major strugs trying to make sure that the correct monitor is the default, like, bioscreen monitor.

Brent

45:59

Oh, gosh.

Chris

46:00

And I don't know maybe this isn't how it works, hand to the mixer if it doesn't change i get it just set up right and then like a couple of days go by and the next time i turn the machine on like the different monitors lighting up and i've gone through and i've i i thought i did the proper ordering you know in the plugs and it changes i swear that happens i don't know maybe maybe i'm making it up.

Brent

46:23

You need to like use uuids for your monitor setup.

Chris

46:26

Yeah well it's at the bios level like you know props to plasma by the time i get to Plasma, it's always fine. And I've been actually having really good success with locking my screens. They go to sleep. They wake back up. Everything's fine. All the orientations are correct. And when it boots, it always nails it. So Plasma has been just doing great with my multi-moder setup.

46:48

But my BIOS, right? Or like when the system's booting, the console where you're just seeing the output, like that is a moving target for some reason. And I don't think that's technically how this works but maybe it's me you know moving monitors around i don't know you.

Wes

47:05

Could take the next year off.

Chris

47:06

And just.

Wes

47:06

Rewrite your bios.

Chris

47:08

Oh i thought maybe i'd do a study where i just empirically like write every move i make down okay this one's in this plug boot yeah you write that step one i.

Brent

47:17

Think maybe you should study the cameras that you have in the studio because every time i show up i tend to shuffle all your cables around for your monitors.

Chris

47:23

That is true that is true,

⁠¶ Pick

47:28

Okay, well, the pick this week has to be something that gets you up and running with BPF right away, right? Like, we don't want to sit here and talk about eBPF this whole episode and not give you a great tool to check out. And this week, it's Network Top. It helps you monitor traffic from your network using BPF.

Wes

47:46

Yeah, we should be clear. This one is classic BPF, but it's still handy. And that means it is broadly useful, too.

Chris

47:52

Yeah.

Wes

47:52

It's built in Rust.

Chris

47:54

Oh, it is.

Wes

47:54

Yeah, so it's super performant. It's easy to use. And it means you get to use like the TCP dump style syntax in a nice little Tui. And so I sent you an example. What do you think of it?

Chris

48:09

Well, first of all, it's very readable, right? Because it is, like you said, you're getting charts in a terminal UI. And I mean, it's immediately understandable. I'm trying to figure out what you're doing here because I see ginormous, like you have almost no traffic and then it jumps up to almost 26.3 megabytes a second and it hovers there for about 40 seconds and then it drops down to absolutely nothing.

Wes

48:34

Yeah so you'll you see how there's okay there's an input section at the top and then under that there's rules yeah you'll see what it says all on the left and then to the right it says host and then a specific ip address.

Chris

48:45

So you're you're looking at all traffic from just this host.

Wes

48:47

Yes um and so those are two separate rules so basically it's it's looking at one interface and just watching it totally by default and then you can add bpf rules that it'll add as additional things that it'll monitor at the same time and then you can use the arrow keys to toggle between the views for different uh rules okay and, Um, there was no traffic because, uh, I SSH to another machine here at the studio.

49:12

We hadn't really been talking, right? I exchanged a few packets for that, but that was all. Um, and then I did a net cap back to my laptop, just reading from random. Uh, and so that was the big traffic spike. And then I killed it just to watch it all drop off. But you know, like it could, you can do, you can do like specific ports that you want to watch. You can do specific IPs, host names, TCP states, like whatever you can do with TCP dump. Basically you can put those rules in here.

Chris

49:35

And you're getting a command line TUI, a terminal user interface. So it's really easy to understand. Like, these rules Wes are talking about are clearly delineated in the UI. It's really simple syntax.

Wes

49:47

So maybe there's something you're trying to hunt for. You find it here, and then you can go do the TCP dump to capture the traffic and do some more analysis.

Chris

49:53

And it is MIT licensed, so it is free to use. And then there is a bonus pick, BPF Tune.

Wes

50:03

Yeah, well, you kind of hinted at this a bit. And I happen to know you turned it on for at least one of your systems.

Chris

50:08

I have it running on my home workstation.

Wes

50:10

Despite the fact that this is indeed a GPL 2.0 with Linux Syscall Note licensed Oracle open source project.

Chris

50:18

BPF Tune aims to provide lightweight, always-on auto-tuning of system behavior. The key benefits it provides is by using BPF observability features, it continuously monitors and adjusts system behavior. because we can observe the system behavior at a fine grain. We can then tune at a finer grain too. So like individual socket policies, individual device policies.

Wes

50:42

Yeah, right. So think of all those things like CTL can tweak. It's using BPF hooks to watch your system and then automatically go tweak those.

Chris

50:51

Yeah. Now I haven't really noticed a difference, but I've only had it on for a couple of days. But the idea is brilliant. It's just so brilliant. Like here's the system. I'm monitoring. OK, now I'll just go make adjustments here to make essentially maybe like ease pressure on the system or something like that. I don't know if anybody has experience with this, because I just started, but it seems like a fantastic idea. BPF tune. Chris will have a link to these in the show notes.

51:16

And in Nix, it's like just one, it's like, you know, service enable BPF auto tune. And then you just, you're good. There's more you can do, but that's, it's really simple. So I was like, all right, I'm just going to turn this on. And I just love the concept of the system auto tuning itself.

Wes

51:31

Yeah, and they kind of make the case here too, if you're doing the cattle not pets approach and how many of your systems ever really even have a human person who might be on it who could tune it. Okay, maybe you can hand-tune the database, but you're not going to hand-tune all your dynamic web workers.

Chris

51:46

This could be great for my Odroid.

Wes

51:48

But if some of them are longer lived, this king can make sure they're running all right.

Chris

51:52

I should put this on my Odroid. I really should. Little Odroid's doing a lot of work and I never really check in on it. Just sitting there being a little yeoman about it.

⁠¶ Outro

51:59

Well, we will have links to this and everything else. There's a lot of links this week. Linuxunplugged.com slash 605. And remember, we want to know if you enjoy these deep dives. We can get into the weeds here a bit too much, but if this is the kind of stuff you like, let us know because there's, it's honestly for us as creators, it's a little, creators, it's a little scary to do topics like this. I know that sounds stupid, but it is. It's just a little scary.

52:24

So we always like to know what your thoughts are. We'll be back at our regular live time. Tuesdays as in Sunday Sunday at 12 p.m. Pacific, 3 p.m. Eastern.

52:38

Now if you want more show remember our members get the full bootleg which is clocking in at like an hour and 20 minutes or something right now and this is a short one, and of course you can get details of that at linuxunplugged.com Thank you so much for joining us on this week's episode of the Unplugged program we just really appreciate your time for listening if you want to share it with somebody we always like that too word of mouth is the best advertising for a podcast

53:04

we always appreciate that, thank you so much for being here and we'll see you right back here next Tuesday as in Sunday which isn't Tuesday at all.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript

⁠¶ Intro

⁠¶ Not an Instagram Filter

⁠¶ Following the Breadcrumbs

⁠¶ Feedback

⁠¶ Pick

⁠¶ Outro