Welcome to the deep dive. Today. We're really getting into the weeds with Windows System Programming.
Yeah, this is part two, building on those fundamentals exactly.
We're thinking about our listener, the learner who's maybe bumped into terms like DLLs, memory management, security, maybe even performance analysis.
Right and wants to get a clearer picture without drowning in super technical details.
Our goal is to give you the learner, that clearer, more insightful understanding.
Think of it like getting the essential toolkit for these complex topics. Wow, we want those aha moments, not just the list of functions for sure.
And we're drawing heavily from Windows ten System Programming Part two. It's a great resource for digging into the advanced Windows API stuff.
It really tries to bridge that gap for programmers working with modern Windows, covering well everything from memory tricks to security layers.
So our mission today extract the really crucial, the fascinating.
Bits exactly, give you those key insights that solid foundation will hit advanced memory management, the whole world of DLLs.
The complexities of Window security, and.
The tools you need for debugging and diagnostics when things go sideways.
Okay, let's kick things off with memory management. We're going way beyond the basics here.
Definitely think raw control. First up is virtual memory allocation virtual alec and uh virtual alex.
What's the core difference there, y x.
Well, virtual ALIC is for your own processes memory. Virtual alec though, that lets you allocate memory inside another process, assuming you have the permissions of course.
Ah. Okay, so that's key for things like debuggers, right reaching into another program.
Space Precisely, it's fundamental for those kinds of tools. But it also hints at more complex system interactions.
But that sounds potentially risky. How is that controlled?
Security is paramount? You need specific privileges to target another process with virtual alex. We'll touch more in security later, but it's tightly controlled. It makes it a powerful tool, but not easily misused.
Okay, makes sense now the book mentioned something specific about security with these functions. Committed pages get zamboed out.
Yes, that's crucial. When Windows commits memory using virtual alec or x, it wipes it clean, fills it with zeros.
Why do that extra step?
It prevents information leakage. You don't want one process accidentally seeing leftover data from whatever process. Use that physical.
Memory before think of it, like wiping a whiteboard.
Got it, unlike say, malik, which might just give you whatever.
Was there exactly a clean slate every time.
Now Reserving versus committing memory why the two stages.
It's like planning versus building. Reserving is like staking out a claim in the virtual address space. You say this range is mine, nobody else.
Touch it, okay, prevents conflicts, right.
Committing is when you actually tell the system, okay, I need physical memory RAM or page filespace backing this part of my reserve space. You make it usable so you.
Can reserve a big chunk, but only commit what you actually need right now.
Precisely reserve the whole plot, build houses one by one, very efficient.
What about these memory set and members Satendo flags. They sound a bit niche, they are, Yeah. Member set basically tells Windows, hey, I don't care about the contents of this committed memory anymore. Feel free to take the physical pages back if needed, but leave it allocated.
Sort of like marking it as discardible.
Kind of, and memorys Tottendo's signals you might need it again, but there's no guarantee the old data is still there. If the physical pages got reused, it's for specific optimation scenarios.
Okay, Now that micro excel to example in the book using page based allocation and exception handling. That's the takeaway there.
It's clever. Virtual allock works on pages right, usually four kb chunks, so microcell reserves a huge virtual address space for its grid.
It doesn't actually commit most of it initially exactly.
Then if you try to access to sell in an uncommitted part, it triggers an access violation and exception.
Ah.
The application catches that specific exception, commits the record fired page using virtualolock inside the exception handler, and then tells the CPU Okay, try that instruction again. Wow.
So it looks like a massive spreadsheet but only uses RAM for the bits you're actually using Memory on demand.
BINGO super efficient for potentially huge data sets.
Brilliant. Okay. Let's shift gears slightly within memory management working sets. What are those? Exactly?
A process is working set is basically the set of its memory pages currently residing in physical RAM, stuff it can access without a page fault.
So the stuff it needs readily available for good performance exactly.
The memory manager tries to keep the actively used pages in the working set and juggles this across all running processes to keep the system.
Responsive, and if a page isn't used for a while, it might get kicked out of the working set. The book mentions soft page faults though.
Right, If a page is removed from the working set but it's still somewhere else in RAM, accessing it causes a soft page fault. The system can quickly map it back in minimal delay.
As opposed to a hard page fault.
Yeah, a hard fault happens when the page isn't in RAM at all, it's been paged out to disc. That requires reading it back from the disc, which is much much slower.
Gotcha. Now, can you actually control the working set size? The book mentions set process working set size and empty working set.
You can influence it. Yeah, Set process working set size lets you suggest minimum and maximum sizes useful if you really need certain things to stay in RAM.
And empty working set.
That's more aggressive, especially with size at one. It tells the system trim this process is working set way down. Maybe if it's going idle for a long time. You free UPRAM for others.
And tools like the working sets app let you see the stuff they do.
Yeah, though you usually need admin rights to peek into other processes working sets in detail.
Right makes sense? Okay, Next memory topic peeps beyond the standard library malick free. What Windows specifics are there?
Well? Windows provides heap walk to look inside the default heap of the current process, but just the current one.
What if you need to look at another processes heap, like for debugging, Then.
You turn to the tool help library use functions like create tool help thirty two snapshot with the right flags. Then heap thirty two list first next to get the heap list for the target process, and heap thirty two first next to examine the blocks within each heap.
Okay, a bit more involved for external processes. The book also talks about heap coalescing and heap compact.
Right coalescing is when the heat manager automatically merges adjacent free blocks back together into bigger free blocks.
Helps fight fragmentation like tidying up the free space.
Exactly normally as automatic, but you can disable it with a global flag maybe for some micro optimization. Heap compact lets you force a coalescing operation if it's disabled, or just generally try to compact the heap.
Interesting trade off there now a UMA non uniform memory access sounds like it's mainly for big muzzy processor machines.
Absolutely on systems with multiple CPU sockets and dedicated memory banks. For each access time can vary depending on whether a CPU is accessing its local memory or memory attached to another CPU. That's the non uniform part.
So how does Windows help with that?
There are functions like get new mode Processor mask X to figure out the system's new MAY topology, which processors belong to which node, and then critically, functions like virtual alkix NUMA or the newer virtual loc too let you specifically allocate memory on a preferred UMA node.
So you can try and keep memory close to the CPU that's going to use.
It most precisely. For high performance computing or large database servers, NUMA aware allocation can give a noticeable performance boost. It's about minimizing that cross node latency.
Makes sense, okay. Last item in our memory deep dive memory mapped files or MMFs. These sound quite powerful.
They really are. Mapewol file is the core function. You take a file mapping object which might represent an actual file on disc or not, and map it directly into your process's virtual address.
Space, So accessing the file becomes like accessing RAM essentially.
Yes, The OS handles the magic of loading data from the file into memory when needed and writing changes back. It's very efficient for file io.
And great for inner process communication.
Right IPC Absolutely, It's a classic high performance IPC technique. On a single machine, multiple processes map the same file mapping object, and boom, they're sharing the same physical memory. No need to copy data back and forth through pipes or sockets.
Super efficient. The book mentions mapuof f alex NUMA. I guessing that's the NUMA version you got.
It allows specifying a preferred UMMA node for the mapping, just like with virtual alex NUMA.
Now this is cool. Anonymous MMFs you can share memory without even having a file on disc. How does that work?
Right? Instead of being backed by a file, anonymous MMFs are backed by the system's page file. The key is you still get a handle to the mapping object. You can pass this handle or often just the starting memory address after mapping to another process.
How do you pass it?
Any standard IPC mechanism, window messages, named pipes, even just a shared variable in a DLL. The both processes load the second process then uses that handle or address to map the same anonymous shared memory region into its own space.
That sounds incredibly simple for sharing data, just pass a pointer.
Essentially, it's a very elegant approach for certain kinds of IPC, and.
One less thing on mmf's MEMO large pages. What's the benefit there?
Standard memory pages are small, typically for KB. Large pages are much bigger, maybe two memb or even one GB on some systems. Using large pages for memory mapping can reduce TLB pressure. That's the translation look aside buffer cash and potentially improve performance for applications accessing large amounts of map data.
But there are requirements.
Yes, you need the Sellock memory privilege which is granted by default, and the file mapping object itself must have been created with the c clarch pages flag. So it's an optimization for specific scenarios.
Whow okay, that covers a lot of ground on memory. Let's switch gears now to dynamic link libraries DLLs. Everyone knows they're about res usable code, but how does Windows actually handle them?
Right? Dls are fundamental in Windows. The book gets into how they're linked implicitly versus explicitly how they get loaded is key.
Okay, implicit linking first, that's the common way where the OS just loads the DLLs your program needs when it starts.
Pretty much when your ex loads, the Windows loader looks at its import table to see which DLLs and functions it needs. Then it has to find those ds.
Is there a specific search order? Where does it look there is?
It's a defined path. First, it checks if the DLL is in the known DS list. These are core system ds always loaded from the system directory for security on performance helps prevent DLL hijacking.
Okay, known ds first, Then.
Then the directory the ex itself is in. After that the process is current working directory, then the system directory System thirty two, the Windows directory, and finally all the directory is listed in the path environment variable.
That order seems important for deployment. What if it can't find a needed DLL, usually.
Get that dreaded error message box saying that DLL is missing and the application just won't start. Process terminates. Figure fifteen to seven in the book shows a typical example.
Not very helpful. Sometimes you mentioned known DLLs helping prevent hijacking.
Yeah, By forcing the load from the system directory to find in the registry HKLM no ds, it stops malware potentially placing a malicious version of a core DLL in say the application directory to get loaded instead.
Smart Okay, so that's implicit. What about explicit linking.
Explicit linking is where you, the programmer, take control. You use load library to load a DLL specifically when you need it, and free library to unload it later.
When would you do that.
Perfect For things like plugins, you load the plug in DLL only when the user activates that feature. Or maybe if a feature depends on a DLL that might not be installed, you can try to load library and if it fails, you can handle it gracefully. Maybe disable the feature instead of crashing.
Gives you more control and resilience. The book also mentions set default deal directories and adult directory customizing the search path exactly.
These let you modify how the loader searches for DLLs, especially for explicit loads via load library. Set the fault deal directories can restrict the search, for instance, only to the system directory or directories added with adult directory.
And adult directory adds a specific path to the search list right.
There are various flags you can use, like application diet, user at I ers System thirty two, Table fifteen to one lays them all out. It gives you finer control over dependencies, which can be important for security and avoiding conflicts.
Useful for complex setups. Now, deal main. That's an entry point for a DLL.
Correct, It's a function the loader calls inside the DLL for various events when the dealers first loaded into a process, DL processed, attack, when it's unloaded, delll process attached, and also potentially when threads are created, DL thread attached or exit deal thread attached within that process.
Why the thread notifications.
Some DLLs might need to do per thread initialization or cleanup, but often they don't, so you can call disabled thread library calls as an optimization to tell the loader, hey, don't bother calling my dealmain for thread events saves a bit of overhead.
Okay, but the book has a big warning about delmain and something called the loader lock sounds serious.
It is very serious. The loader lock is an internal lock the system uses while loading or unloading DLLs, including when it calls delmain. The danger is deadlock if your delmain code tries to do something that also requires the loader lock, like calling load library itself or get module handle, or maybe waiting on another synchronization object that some other thread holds while it's waiting for the loader lock.
Everything grinds to a halt.
Exactly deadlock your process hangs. The strong advice is keep dealmain simple, do minimal initialization, don't call functions that might load other DLLs or wait on complex synchronization. It's a notorious trap.
Definitely something to watch out for. Okay, let's talk about DL injection and hooking. This sounds like it could be used for interesting purposes.
Interesting is one word for it. Yes, it can be used for legitimate tools like debuggers, monitoring utilities, accessibility aids, but also for malware. DLL injection is basically forcing another process to load a DLL it wasn't intending to load.
How is that typically done?
One common method uses create a remote thread. You need privileges first like process vm right and process create thread. Then you use virtual allox to allocate some memory in the target.
Process, write the pass to your DLL into that memory.
Right using right process memory. Then you call create a remote thread, telling it to start a new thread in the target process, executing the load library function and passing it the address of the DLL path you just wrote.
So you trick the other process into loading your code essentially.
Yes.
The book also mentions set Windows hook x. How does that fit in can it cause injection?
It can? Yes. Set Windows sookx lets you install hooks to intercept system events, keyboard input, mouse messages, window messages, et cetera. If you install a global hook or one the target's threads in another process.
The hook procedure itself needs to run in that other process's context.
Exactly, and for that to happen, the hook procedure must be located inside a DLL. When you install such a hook, Windows automatically injects that DLL into all the relevant target processes, so your hook code can run there.
Ah, So the hooking mechanism itself performs the injection correct.
The example in the book uses a whgt methage hook to snoop on keyboard input in notepad, and the hook code has to be in an injected dlll.
Okay, then there's itIt hooking Import address table hooking sounds very low level.
It is super low level. Every module EXE DL has an import address table that lists the functions that imports from other ds, along with the memory addresses where those functions actually live. IAT hooking involves finding the entry for a specific function in.
That table and overwriting the address.
Yes, you overwrite the address with the address of your own function, the hook function, so whenever the original code tries to call the imported function, it gets redirected to your code.
Instead, like intercepting the call. The book uses getsi's color as an example.
Right replacing calls to the real getsy's color with calls to a custom getsi's color hooked. You have to parse the module's pe header structure to find the import descriptors, locate a target DLL like user thirty two dot dll, find the function pointer, and then patch it.
But the book flags this as tricky.
Very tricky. First, the exact way you patch the address depends heavily on the CPU architecture. By eighty six by sixty four arm are all different. Second, you must do the patch atomically. If another thread tries to call the function while you're halfway through changing the address, crash yaikes. Yeah. It requires careful synchronization and knowledge of assembly or low
level intrinsics. It's powerful, but complex and often fragile. Higher level hooking mechanisms are usually preferred if they can do the job.
Good to know, okay, that wraps up DLLs. Let's move into our third major area, Windows security.
This is huge, absolutely massive, and crucial. The book breaks it down into key components, show nicely in figure sixteen to one SIDS, tokens, access masks, privileges, security descriptors, UAC integrity levels.
Let's start with sids security identifiers. These are the unique ideas for users and.
Groups and computers services. Basically any security principle, every user account, every group gets a unique SID. The book shows using creat well known SID to get the sids for standard built in accounts and groups like local system or.
Everyone, and you can convert them to strings.
YEP converts cit as strings in gives you that familiar S one five format and look up accounts. It goes the other way, finds the account name like administrators for a given SID. It also tells you the SID type, user, group, et cetera.
Okay, next, it up tokens. What's their role?
An access token is like a security passport for a process or thread. When you log in, the system creates a token holding your user SID, all the group sids you belong to, analyst of your privileges.
Privileges like being able to debug other processes.
Exactly, or shut down the system or change the system time. These are distinct from general user rights. Most privileges are actually disabled by default in your token, even if you're an admin. You have to explicitly enable them if you need them. See Change Notified Privileges. One exception usually enabled. Process Explorer shows this clearly.
How do you check or change privileges in a token?
Programmatically you use get token information to query the token's contents, including privileges, and adjust token privileges to enable, disable, or remove them. But you need the token adjust privileges right on the token handle first.
And privileges have those weird names like c DE bug privilege they do.
You use look of privilege value to get the internal identifier the LOUE for a privileged name before you can use adjust token privileges.
The book also talks about primary versus impersonation tokens.
Right, your main process token is the primary token, but a thread, particularly in a server process, might temporarily impersonate a client. Using an impersonation token means the thread acts with the client security context. They're sids their privileges.
Why would it do that?
So the server can access resources on behalf of the client using the client's permissions. If the client isn't allowed to access a file, the server thread impersonating them won't be either requires the same person at privilege. You use functions like in person log on user and then revert to self to stop impersonating. Figure sixteen ten illustrates this client server.
Scenario, very important for secure server design. Okay, access masks, these are the permissions themselves.
Yes, an access mask is just a bit mask, a thirty two bit value, where each bit represents a specific permission like read, write, to lead, execute, etc. Relevant to a particular type of object.
Are they always specific? I've seen generic reed.
Good point. There are generic rights like generic creed, generic right, generic xCT, generical. The system maps these generic rights to a set of specific rights depending on the object type. Generic creed on a file means something different than generic creed on a registry key. Figure sixteen sixteen shows an example mapping.
And these rights are controlled by security descriptors.
Correct Every securable object, file, registry, key, process, thread, mutex, et cetera has a security descriptor attached. This structure holds all the security info the owner SID, the primary group SID, less use now the SACL system Access Control List for auditing, and crucially the DACL Discretionary Access Control List.
The DACL is the important one for permissions.
It is the DACL contains a list of accs access control entries. Each ACE specifies a SID, a user or group, and an access mask stating whether that SID is allowed or denied those specific rights.
Order matters in the DACL.
Critically, the system evaluates aces in order. Typically, deny aces come first. The first ACE that matches the user group in the requested permission determines the outcome, so an explicit deny will override a later a LIE allowed for the same group.
How do you work with these security descriptors encode?
For named objects like files or register keys, you use get named security info and set name security info. The book show is getting the owner of a file. Remember not to free the return buffer yourself. For kernel objects like Mutex's, it's get kernel object security and set kernel object security.
What are the default permissions?
Usually unnamed kernel objects often get a NL DACL, meaning everyone has full access. Named objects usually inherit a default DACL based on the creator's token, and if you want the standard window security UI there's the edit security API, but it requires implementing a calm interface I Security information, which is more involved.
Right. Okay, switching to UAC User account control still relevant today, very much so.
Its goal was to get people out of the habit of running as admin all the time, even if you are an administrator. UAC means most of the things run with a standard user token.
By default, which has fewer privileges than the full administrator token.
You also get exactly to use the full admin token. A process needs to be.
Elevated, and the standard way to trigger elevation is.
Using shell execute or shell execute x with the runous verb that tells the system this needs admin rights. Thep info service then usually prompts the user with that consent dialogue, consent dot ex. If approved, the process starts with the full admin token. Figure sixteen twenty shows this flow.
What about UAC virtualization That sounds like a compatibility.
Thing it is. It's mainly for older thirty two bit apps that weren't written with UAC in mind and try to write two protected locations like program files or h KLM in the registry.
What does virtualization do?
It transparently redirects those rights to per user locations under users username, app beta local virtual store. The app thinks it wrote to the protected location, but it actually wrote to the user's private virtualized copy. Keeps the app working without needing admin rights.
Usually, can you see if it's active?
Yeah, Task Manager and process explore can show the UAC virtualization status for a process enabled, disabled, or not allowed. Figures sixteen to eight and sixteen to nine illustrate.
This clever workaround. Okay, then integrity levels another layer on top of standard permissions.
Kind of it's mandatory integrity control. Processes and objects have an integrity level, typically low, medium, high, or system. Most standard apps run at medium, Elevated apps run at high system, services run at system sandbox. Things might run it low.
Viewable in Process Explorer too, yep.
Figure sixteen twenty two shows it. The key rule is the no write up policy. By default, a lower integrity process generally cannot modify a higher integrity object.
So a medium integrity web browser can't easily write to files owned by a high integrity process.
Correct or even modify the high integrity process itself. In memory access is restricted. By default. Objects get medium integrity unless they have a specific mandatory label ace setting it. Otherwise, you can lower a processes integrity, but raising it needs a special privilege seria label privilege that users don't normally have.
Interesting. The book also mentioned specialized things like control flow guard CFG.
Yeah. CFG is a compiler linker OS feature to mitigate exploits that try to hijack the program's control flow. It validates targets of indirect calls against a bitmap of known valid function entry points. If the target isn't valid, the process terminates dumb, and CFG can show if a binary.
Uses it and process mitigation policies.
These are extra security arding options you can enable system wide or per process using tools like g flags X or specific APIs, things like blocking nonsystem fonts, preventing child process creation, or restricting system calls. Figure sixteen twenty nine shows using g flags sx to disable win thirty two K calls for Notepad, causing it to fail.
Wow. Lots of layers, Okay, Final technical section, Debugging and diagnostics. How do we figure out what's really going on?
Crucial tools here? One simple one is output debugstring. Your application sends text messages and a debugger can pick them up.
What if no debugger is attached.
Tools like c's internals debug you or the booksdwin viewer can monitor these messages system wide. The use a shared memory buffer and an event. The OS signals when new messages arrive.
Okay, then performance counters getting metrics on CPU, memory, etc.
Right. The PDH Performance Data Helper API is the way to go. Use pd open query than PDH ad counter to specify which counters you want, like percent process or time for a specific process.
Then collect the data yep.
PDH collect query data, samples the values, and PDHKT formatted counter value gives you the result as a nice string or number. The book shows getting CPU usage for all processes using pdhkt formatted counter array. Sometimes you might want pdhh crow counter value for raw data before calculations.
Useful for performance tuning. What about process snapshotting PSS?
PSS is cool. It lets you capture a consistent point in time snapshot of a process is state, memory, threads, handles, et cetera into a file or memory buffer.
Why would you do that?
Great for offline analysis of crashes or hangs. You grab a snapshot when the problem occurs, then analyze it later without holding up the live system. Scapture snapshot takes the snapshot ps free snapshot cleans up. Table twenty to one lists the flags for what data to.
Include okay now. ETW Event Tracing for Windows sounds very powerful.
Is the high performance tracing framework in Windows, the OS and many applications. EMIT detailed events via ETW providers identified by GEDs logman quaria providers shows you who's registered.
How do you capture these events?
You set up an ETW session using star trace. Then you tell the session which providers to listen to using enable trace x, often filtering by severity, level or keywords. The events usually go to an ETL file.
And then you analyze the ETL file.
Right using tools like event Viewer for simple logs or Windows Performance Analyzer WPA for really deep analysis. WPA is incredibly powerful for correlating events across the system. Figures twenty thirteen and twenty fourteen show these tools.
Can you process events in real time?
Yes? You use open trace with process racing meulial time and invide a callback function. Your callback gets invoked for each event as it happens.
How do you make sense of the event data?
The callback receives an event record. You use the Trace Data Helper tdhapis like PDH get event information and TDH format property to parse the event's metadata and format its specific data fields based on the provider's manifest. The book shows parsing and displaying event details.
It even mentions the kernel provider.
Yeah, the Windows kernel trace provider emits tons of low level OS events about processes, threads, disco, networking, et cetera. Requires admin rights. Obviously, you can even create your own custom ETW providers for your applications.
Extremely versatile. Lastly, actual debuggers, the tools that let you step through code right.
Windows provides APIs for one process to debug another. You can attach to a running process using debug active process. You just need its pig.
Or start a process under the debugger YEP.
Use create process with the debug process or debugul this process flag. The debugger then gets notified of events in the debuggy via the debug zet structure. What kind of events exception, thread creation, exit process, exit DLL loads some loads output from up with debug string Table twenty to five lists them. The debugger calls wait for debug event to get the next event, processes it, and then calls continued debug event to let the debuggy run again.
The book has a simple debug example.
Yeah, shows the basic loop wait for event, check the event type like output string er, DLL, load, print info continue. It's the fundamental structure of a debugger.
Okay, we have covered an incredible amount of technical ground there.
We really have, from the depths of memory management and DLLs, through the layers of security right up to debugging and tracing with ETW. The full book also dies into CALM and win RT, which are foundational.
To and hopefully for you the learner. This deep dive has connected some dots, provided those aha moments, and maybe demystified some of these complex Windows internals.
Yeah, hopefully gives you that solid footing. So maybe next time you're using your computer and something interesting happens or an application behaves in a certain.
Way, you can think about these mechanisms underneath. How is memory being used, how are these components talking, what security checks.
Are happening exactly? What new questions does this raise for you about the software you interact with every day. Something to chew on.
A great thought to end on. Thanks for joining us on the deep dive.
