Welcome to our deep dive into the world of Linux system programming.
Yeah, we're going to be exploring the core the Linux operating system. Okay, using this book is our guide. O'Reilly dot Linux dot system dot Programming, dot second edition dot pdf as our guide.
So we're really getting down into the nuts and.
Bolts we are. We're going to be uncovering any gritty Yeah, how things work at the lowest level.
Awesome.
And you know, it's not just about code, it's about understanding, like those fundamental building blocks that make Linux tick. Okay, So think system calls. Yeah, those are the direct lines to the heart of the OS, to the kernel, to the kernel exactly, okay, and the logic that drives it all.
So we're not just driving the car. We're actually seeing how the engine works exactly.
And here's a fun fact for you to kick things off. Are you ready?
I'm ready?
Okay. Linux actually uses way fewer system calls than something like Windows.
Really.
Yeah, we're talking hundreds versus potentially thousand.
Wow. I would have thought a complex system like Linux would need way more ways to talk to its kernel.
That's where the elegance of Linux comes in.
Okay.
It achieves a lot with a surprisingly streamlined set of tools.
That's pretty cool.
Yeah.
So before we get too deep into the weeds, though, let's take a step back and defind something. Okay, what exactly is system programming?
Oh that's a great question.
When I think about using a computer, I don't think about this level.
Right, You're not really meant to You're using all the nice applications and things, exactly. The system programming is kind of like, think of it like this. It's like being the architect of a building.
Okay.
So you're working with the raw materials, the foundation, the structural framework.
Not just like decorating the rooms exactly.
Yeah, you're building the load bearing walls.
Ooh, I like that metaphor.
Yeah. So we're writing software that interacts directly with the core of the operating system, the kernel. Okay, and it's closest partners okay, like the C library known as glib online. It's the bedrock upon which all those user friendly applications are built.
So if I'm writing a game or a web browser, that's like decorating the rooms. But system programming is making the rooms.
That's a great way to put it.
Okay, I like it. I'm with you.
And just like a good architect needs the right tools, right, absolutely so do system programmers of course. So the g and UC compiler okay, where GCC is one of our main tools in this world.
So that's the blueprint and our hammer.
Yes, like that our tool belt.
And let's not forget about system calls. Ooh, yes, these are like the language we use to communicate with the kernel. Okay, So need to open a file, yep, send data over the network, even just allocate some memory. It all starts with a system call, got it. So you're directly asking the kernel to do something for you.
Those are those direct lines of communication we talked about exactly.
And the book also spends a lot of time talking about files.
Yes it does.
Why are files so central to Linux? Yeah? I was wondering that too, because I know, I know we have them.
But well this might surprise you, but in Linux, pretty much everything is represented as a file.
Wait a minute, what everything is a file?
We're talking devices, directories, even your terminal window.
Hold on, So my keyboard is a file. Yep, my hard drive is a file.
It is.
That's kind of blowing my mind a little bit.
It is. It's a beautifully unifying concept it is.
But how does the kernel keep track of all of that.
That's where inodes and file descriptors come in.
Okay, I'm intrigued, tell me more.
So. Think of an inode like a file's unique ID card.
Okay.
It holds all this essential information, size, permissions, ownership, time stamps.
It's like the file's permanent record.
It's it's the file's permanent record. A file descripture, on the other hand, is like a temporary handle your program gets to interact with that file.
Okay. So the inode is the files master record, and the descriptor is my program's key to access it precisely.
I like it. Okay. Now, all these files they live in a file system, right, But unlike some other systems where you might have different drives and they have their own separate spaces, like your C drive and your D drive. Yeah yeah, yeah, Linux uses a unified name space.
So it's all just one big, happy family exactly.
It's like having one giant filing cabinet for everything, okay. And this makes life much easier for programs. They don't need to worry about where a file is physically located. That makes sense now, Historically, this name space was shared by everyone. Okay, but Linux has evolved.
It always does, it always does.
It's always changed to support per process name spaces.
Poopresss name spaces. Now, what is that?
It means that each process can have its own customized view of the file system ooh if needed. So it's like having your own personal filing cabinet within that larger one.
Okay, that's a cool feature. Yeah, all right, so we've got files, we've got this unified name space. But what about processes?
Ah? Yes, processes.
Those are those mysterious things that make our programs actually come to life.
You're exactly right. A process is essentially a running instance of your program. Imagine it's like this little virtualized world with its own memory space, got it resources and the illusion that it has the whole computer to itself.
So each program gets its own little sandbox to play in.
Yes, and it's unaware of the other kids playing in the other sandboxes. Right, and that isolation is key for stability and security.
Makes sense.
Now, A process goes through a whole life cycle, right, it's born, It lives its life executing your code.
It has a purpose.
It has a purpose, and eventually it meets its end. Okay, Sometimes it leaves behind a temporary ghost known as a zombie process.
A zombie process that sounds a little ominous.
It does sound ominous, but don't worry. They're usually harmlessness.
Okay, good.
It just means that the process is finished running, but its parent hasn't yet acknowledged its demise.
So it's still kind of hanging around.
Yeah, it's just waiting for that final goodbye.
Interesting.
But during its life, a process has certain permissions that dictate what it can and can't do. It receive signals, okay, those are like interprocess messages, and it has to handle errors gracefully.
So every program needs some rules in some ways to communicate exactly.
It needs some etiquette, some manners. Yeah, out reading and writing data.
That's a great question, because, yeah, how do they do that.
Well, that's where input output are IO comes in the art of reading and writing data, and in Linux this relies heavily on those system calls we talked about.
Earlier, those direct lines to the kernel.
Exactly.
Awesome.
So imagine you're downloading a movie. Okay, the first thing you do is you find the right url.
I find it on my favorite streaming service, Yes.
Exactly, that's like the open system call.
Okay.
Then you start receiving those data packets yep, that's read it. And finally you save those packets to your hard drive. That's right, right, okay, and when you're done, you close the connection. That's closed.
Okay.
That makes a lot more sense now, Yeah, it's nice little analogy.
Yeah, I like it.
So those are the basics of got it. What about this buffer io i've heard about?
Yeah? Is that just like a fancy name for the same thing.
Not exactly. Buffer io is all about performance.
Ooh performance. We always like that.
We always like that. So remember each system call has a bit of overhead.
Okay.
So if you're reading a file like one bite at a time, yeah, you're making a lot of system calls.
That sounds inefficient, it is.
That's where buffering comes in.
We're bundling those calls together.
They get it, Okay. The c standard io library, which you access using stideo, right, it uses this technique. Ok So it uses functions like fopen, fred, fright and f close to work with files. But it does so in chunks okay, and it reduces the number of times it needs to talk directly to the kernel.
That makes sense. It's just being a little bit smarter about it.
Exactly. It's being efficient.
It's like ordering in bulk exactly. I like it.
But what about when we have multiple threads uh oh accessing the same file.
Yeah, that sounds like it could get a little messy.
You could get messy. That's where thread safety comes in. Okay, So when you have multiple threads accessing shared resources like files, you need mechanisms like locking to ensure that they don't corrupt data, of course, by trying to write at the same time.
Yeah, you don't want them to step in on each other's toes exactly.
That's like having a traffic light system.
Oh. I like that.
Yeah, to control access to a busy intersection, keep.
Things flowing smoothly exactly. Okay, so buffer io is great, but we need to be careful when multiple threads are involved.
We do, we do.
This is getting pretty deep. Yeah, what about techniques that go beyond the basics of io Ooh okay, it's lurking in those advanced waters.
Get ready to level up your IO skills.
I'm ready. Let's go.
We're going to talk about techniques like scatter gather io, event pulling, and even memory mapping, each with their own superpowers.
Okay, I'm ready for those superpowers. All right, let's start with scatter gather io. What kind of magic does that involve?
Imagine you have data coming from three different sources, okay, and you need to write it all to a single file instead of making three separate right calls.
Yeah, that sounds inefficient.
It is scatter gather io. Let's you do it in one shot, okay, combining those data chunks into a single efficient operation.
So it's like optimizing a delivery route by combining multiple packages exactly.
I like it. Linux provides system calls like rehobar and writeev and they use a special structure called iovec to define those vectors of buffers.
Got it?
Making this magic happen.
That's awesome.
But what if you're dealing with a lot of files and you need to be notified when something happens, like new data arriving?
Ooh yeah, that sounds like a lot to keep track of.
Is that's where event polling comes in, like having a surveillance system for your files.
I like it.
So the older system calls select and poll Okay. There were early attempts at this okay, but they had limitations when you had a huge number of files to watch.
Yeah, that makes sense.
Thankfully, Linux introduced the epole facility, okay, which is much more effe at monitoring multiple files.
So it's like upgrading from a security guard watching grainy CCTV footage, yes, to like a high tech system with AI powered alerts.
That's a great analogy. And finally, we have memory mapping, which is where things get really interesting.
Okay, I'm intrigued.
Tell me mar Imagine you have this massive file I think a huge database or a high resolution image. With memory mapping, you can map that entire file into your process's address space. Oh wow, you can treat it as if it were already loaded into memory. No need to read it in chunks.
So I have instant access.
You have direct access.
Why wouldn't we always do this.
It's a fantastic tool for performance, especially with large files. The system calls map and mun map are your keys to this kingdom.
I'm ready to rule.
But it's important to understand that there are different ways memory mapping can work, and you can use flags like map shared and map private.
Interesting, what's the difference So with.
Map shared If multiple processes map the same file, any changes one process makes are visible to the others. Okay, it's like a shared Google doc got it map Private, on the other hand, creates a private copy for each process, so changes are isolated.
So it's like working on my own personal copy exactly. Okay, that makes sense.
Yeah, so memory mapping incredibly powerful.
Wow, very cool.
But all this talk about memory makes me realize we haven't really talked about how Linux actually manages memory.
Yeah. That's a big one.
Yeah, it's a big one.
How does it ensure everything runs smoothly?
Well? Memory management is crucial, I would think. So. Yeah, it's the unsung hero of any operating system. Okay, making sure that each process gets fair share of memory. Okay, and that everything plays nicely together.
So it's like a master conductor.
It's orchestrating this complex symphony.
I like that.
So first we need to understand the process address space. Remember those isolated sandboxes we talked about earlier. Yes, well, the sandboxes are further divided into pages, which are these chunks of memory. Okay, not all pages are created equal, though, Oh, you have valid pages that correspond to actual data in RAM okay, or on disc and you have invalid pages that represent unallocated regions.
So accessing an invalid page is like falling into a trapdoor in the sandbox.
That's a great way to put it. And trying to access memory that's not yours is a big no no. Yeah, that's bad and usually results in a segmentation violation, a programmer's nightmare.
Oh, segmentation violation. I've heard that term.
It's not good. That's bad, it's not good.
I don't like that.
But thankfully, goodness, we have the memory management unit or MMU.
Okay, the MMU.
The MMU to keep things in check. It acts as a translator, converting those virtual addresses used by our program into physical addresses okay, where that data actually resides in RAM.
So the MMU is like a postal worker who knows the real address behind every po box.
Another perfect analogy.
I'm falling.
So the MMU also plays a vital role in memory protection, making sure that processes don't wander into memory territory that doesn't belong to them.
They stay in their own lanes.
They stay in their own lane. Okay, but what about when a program needs to allocate memory on the fly.
Yeah, like when I'm adding things to my shopping cart online.
Exactly, and the website needs to store that information somewhere right exactly. That's where dynamic memory allocation comes in, okay, and it's handled through a special region of memory called the heap. The heap, you use functions like malik, calic, realic and free to request and release memory from this heap as needed.
So the heap is like a self storage facility. It is where programs can rent units of various sizes.
That's a great way to visualize it.
Okay, I'm picturing it.
But managing the heap can get tricky. You can end up with something called fragmentation. Fragmentation okay, where you have enough free space overall, ok but it's scattered around small chunks.
You can't fit that big couching.
You can't fit that big couchin exactly.
Now you technically have enough space exactly. That's frustrating.
It is frustrating. But Linux has some clever ways to deal with that, okay. Good. For larger chunks of memory, it can use something called anonymous memory mapping.
Anonymous memory mapping.
Yes, okay, reserving a contiguous block of memory. It's like renting a whole warehouse.
Unit instead of trying to squeeze into those few small lockers.
Exactly, that makes sense.
Yeah, now, I remember you mentioned memory locking earlier. Yes, when would we want to use that?
Memory locking is particularly useful when you have a critical piece of code that needs to access memory with absolutely no delays.
Of course, you.
Wouldn't want the kernel to swap that memory.
Out to disc right, No, that would be terrible.
It'd be like pausing a movie right in the middle of an action scene. So memory locking done through the lock system call okay. It ensures that those critical memory pages stay in.
Rank they're locked in.
They're locked in guaranteeing predictable performance.
So we're putting a do not disturb sign on those memory pages, you got.
It, preventing the kernel from relocating them. Awesome. It's a powerful tool. It sounds like it for performance sensitive applications. Yeah, but it's important to use it judiciously, as over using it can have drawbacks.
Of course, too much of a good thing.
Exactly, especially in multi threaded.
Environments right where things are already complicated.
Exactly.
This whole deep dive into memory management has been fascinated.
It is a fascinating topic.
It really is. I'm starting to realize just how much complexity is hidden beneath the surface of something as seemingly simple as running a program.
And we've only just begun. Oh No, there's a whole universe of fascinating concepts and techniques to explore. Bring it on. I'm real, but I'm glad you're starting to appreciate the elegance and power of Linux system programming.
Absolutely i am. I'm already looking forward to uncovering more of these hidden gems right as we continue our deep dive.
Awesome, Now that we've explored memory management, let's shift gears a bit, okay and talk about something a bit more tangible.
All right, I'm with you.
Files and directories.
Okay, I can definitely relate to those.
Yeah, we all have to deal with those.
I spend way too much time organizing files on my computer. So, right, what kind of insights can system programming give us about files and directories?
Well, first of all, we need a way to gather information about a file. Okay, think of it like a detective doing some reconnaissance. That's where the STAT family system calls comes in. They give you all sorts of details about a file, file sized, permissions, timestamps.
So it's like a file's fingerprint.
Exactly. It's a file's fingerprint, okay. So you can find out if it's a regular file or a directory, when it was last modified, who owns it, and even what permissions are set, so.
Who can ReadWrite, or execute it?
Exactly.
Wow, that's a lot of information.
It's a treasure trove of data, it is.
So once we've done our investigation, how do we actually manipulate files, like changing permissions or ownership?
Well, Linux provides these handy system calls like shab and choun Okay, So schmad lets you tweak those permissions. Okay, who can do what with a file?
Got it?
Choun, on the other hand, lets you change the owner and group, essentially deciding who's in control.
It's like being the administrator of your own little file kingdom.
Exactly. You get to decide who has access and what they're allowed to do.
I like it. I have the power.
Now. Sometimes you need to store extra information about a file, okay, something beyond those basic attributes we get from stack. Yeah, that's where extended attributes come in.
Extended attributes, that sounds interesting. What are we talking about here.
Think of them like custom labels.
Okay.
You can attach to files so you can store things like author information, version numbers, even application specific data.
So it's like adding notes to a file that only certain programs or users can see.
That's a good way to think about it, okay. And Linux provides system like get center, set sadder, and list sadder to manage these extended attributes.
Okay, so we've got our files all figured out, but what about directories?
Ah? Yes, directory?
How do we navigate those and keep everything organized?
Well, first, we need to know where we are in this vast file system, right right. Of course, that's where get SETI comes in.
Handy, gainst SETI.
It tells you the absolute path to your current working directory. Think of it like a GPS for the file system.
So, no, we're getting lost in a maze.
Of folders exactly. You always know where you are.
I like it.
And when it comes to actually creating and managing directories, we have m deer to create new ones and room deer to remove empty ones. Okay, those are familiar, Yeah, those are the classics. Of course, we can't forget about links.
Links, Yes, those are always.
A bit tricky for me, maybe be a little bit tricky.
I always get hard links and symbolic links confused, right, what's the difference again?
So think of a hard link like an alias for a file.
Okay.
It points directly to the files in.
Ode, right, that unique idea, That unique.
Idea we talked about earlier. So it's like having multiple entryways leading to the same room.
Okay.
A symbolic link, on the other hand, is more like a signpost okay, pointing to the actual file. It stores the path to the target file okay, and can even point to files on different file systems.
So a hard link is like having multiple copies of the same key, yes, while a symbolic link is like having a note that says the key is under the flower pot.
I love that analogy. That's perfect.
Okay. Good, I'm glad I got that one right.
Yeah, And Linux gives us system calls like link for hard links, some link for symbolic links, and read link to find out where a symbolic link is pointing.
Very cool. This whole world of files and directories is starting to make a lot more sense now.
Good. I'm glad to hear that.
But I'm curious what about when we want to do things like copy or move files. Okay, do we need to write a bunch of code to handle that ourselves.
Here's where things get really clever.
Okay, I'm intrigued.
Linux doesn't have specific system calls for copying or moving files. It doesn't, no, but it gives us the building blocks to do it ourselves.
Okay.
And it all comes back to that core concept. Everything in Linux is a file right right, even devices.
Okay, I'm following along so far. But how do devices help us with copying files?
So some devices act as data sources or sinks. Okay. So, for example, there's the null device devnyl, which is like a black hole that discards anything you write to it.
Okay, So it's like a digital trash can exactly.
And then there's the zero device dev zero, which provides an endless stream of zeros.
An endless stream of zeros okay, useful for certain things.
I can see where this is going. Yeah, we can use these devices as intermediaries to copy data exactly.
So to copy a file, you can open the source file for reading, okay, open the destination file for writing okay, and then just stream the data from the source to the destination.
Okay, that makes sense.
Moving a file can be as simple as renaming it.
Which essentially changes its location within the file system.
So it's all about leveraging those existing file io mechanisms in clever ways.
It is. It's very elegant.
It's amazing how much you can accomplish with those basic building blocks.
It is. It's the power of abstraction.
Now, before we move on, there's one more powerful tool I want to introduce you to. Okay, hit me in notify and notify. The name sounds familiar, but I can't quite place it. What is it used for?
So and Notify is like a real time surveillance system for your files and directories.
Okay.
Imagine you want to be notified whenever a file is modified or a new file is added to a directory. Yeah, and Notified can do that for you.
Wow, that sounds incredibly useful. I can imagine using that for all sorts of things like backing up files or keeping track of changes made by other users.
Exactly. It's very versatile.
Okay.
And it uses this file descripture based interface okay, so you can integrate it seamlessly into your application.
Awesome.
You can add watches for specific files or directories and then receive events whenever those watches are triggered.
So it's like setting up trip wires around my file alerting me to any changes.
That's a great analogy. And it can monitor a wide range of events from file access and modification to creation, deletion, renaming. Wow, it even provides details about what changed and who made the change.
That's pretty powerful it is.
It's a powerful tool.
So in Notifies sounds like a must have tool for any serious system programmer.
I would agree with that.
This whole deep dive into file and directory management has been really eye opening.
It's a fascinating area.
I'm starting to realize just how much goes on behind the scenes to keep everything organized and running smoothly.
There's a lot happening under the hood.
There is so much complexity. Yeah, but it's also really elegant.
It is when you start to understand the principles behind it. Yes, now, let's shift our focus to another crucial aspect of system programming. Okay, signals.
Signals. It sounds like we're about to enter the world of inner process communication, where we're.
Going to explore how processes talk to each other.
Okay, So what exactly are signals and how do they work in Linux?
So, signals are the way processes talk to each other. Think of them like text messages, okay, short asynchronous bursts of information.
So they're like little pings between processes letting them know what's going on.
That's a good way to put it.
Yeah.
Yeah. For example, let's say you're running a program and you press futrol plus c to stop it. What's happening is that you're sending a signal to that process, okay, telling it to terminate.
So that's what's going on behind the scenes. When I hit ut trol plus c, it is. I never realized it was a signal being sent.
It is, and the kernel is like the postal service delivering these signals to the right processes.
Okay, So what happens when a process receives a signal? Does it have to like constantly check for them like we check our phones for messages.
Not quite No. Linux provides a mechanism for processes to register signal handlers. Signal handlers these are special functions that are executed when a specific signal arrives.
So it's like setting up a voicemail box. For specific types of messages.
That's a great analogy. And just like with voicemail, a process can choose to handle a signal, ignore it, or let the default action happen.
And what's the default action? Hopefully it's not always terminating the process.
Well, it depends on the signal. Okay, Some signals are designed to be fatal, like the infamous sigsahv.
Oh yeah, I've heard of that.
One, which indicates a segmentation fault ray, meaning your program tried to access memory it wasn't supposed to ouch. That sounds painful, it's not good. And in that case, terminating the process is probably the best course of action.
Right, stop it before it causes any more trouble.
Exactly, it prevents further damage. But for many signals, the default action might be something less drastic, like pausing the process or to simply ignoring the signal.
So there's a whole range of responses depending on the type of signal.
Exactly. It's a rich vocabulary.
This is pretty intricate. But what happens if a process is in the middle of something important and it doesn't want to be interrupted by signals?
Ah, good question. Linux has a solution for that too, okay, I was hoping. So signal masking.
Signal masking, you.
Can temporarily block specific signals from being delivered, okay, essentially putting them on hold.
So it's like setting your phone to do not disturb mode.
Exactly when you need to focus.
Okay.
The sigproc mask system call okay is your tool for controlling which signals get through and which ones are blocked.
I like it. We have the power. You have the power, so we can block signals. But how do we retrieve those pending signals once we're ready to deal with them.
There are a couple of options, okay. One is to use the SIG suspend system.
Call sixty cent.
This allows you to temporarily unblock a set of signals while simultaneously suspending the process waiting for one of those signals to arrive.
So it's like saying, wake me up when one of these specific messages.
Arrives, precisely. Another option is to use sigpending okay, to peak at the queue of pending signals without actually unblocking them. It's like checking your voicemail without listening to any messages.
Okay, so just seeing if anything's there exactly. This whole signal handle system is surprisingly sophisticated.
Is there's a lot of thought that goes into it.
I never realized how much thought goes into managing these interprocess messages.
Yeah, it's a crucial part of making sure that everything runs smoothly and that processes can cooperate effectively.
Wow, this deep dive has been packed with information.
It has. We've covered a lot of ground, from.
Those fundamental concepts of files, processes, and signals to those advanced techniques we've just touched upon.
It's been quite a journey.
It feels like we've uncovered a whole hidden layer of the operating system.
And that's the beauty of Linux system programming. There's always something new to discover, something deeper to explore. It's a journey that never really ends.
I'm definitely feeling that sense of wonder. This deep dive has not only deepened my understanding of Linux, but has also sparked a real desire to keep learning and experimenting.
That's the best outcome we could hope for, and to our listeners, we encourage you to continue your own explorations into the fascinating world of Linux system programming. There's a universe of no waiting to be discovered, and who knows what amazing things you might create.
Well said, Until next time, Happy coding everyone,
