Here we go. Take 64. Picked a round number too. Yeah. Power of two. Yeah. Somebody who was going to congratulate me on doing a thousand episodes of my other podcast. And I was like, talk to me at 1,024, because thousands, nothing. There's nothing meaningful about that. Doesn't count. Doesn't count. It's not even a real number.
Hey friends, it's Mark and Scott, or Scott and Mark, depending on who you like better, Learn2. You can check us out at scottandmarklearn2.com. Mark, you are well known for sysinternals, where internals is often thought... to speak to all the internal APIs that you were calling to get the information that you need out of Windows. In the early days, you were almost a Windows antagonist because you would do stuff that the Windows team hadn't done. You'd come in and teach the Windows team stuff.
Have you built your career on undocumented zombie and shadow APIs? I think I have. Yeah? Yeah. When did you find your first API that had not been documented and how did you find it? Well, let's see. Oh, probably 1985. I actually started by reverse engineering the Apple II, which BASIC was in the ROM. So I started reverse engineering that. There was actually a fantastic book, which was the Apple II disassembly. It was a book just of the disassembly of the Apple II ROM, which I studied.
And so one of the first things I did with that information was Apple II Basic didn't have a GoSub command. It only had GoTo. And, you know, with GoSub, you could... provide a list of targets based on a variable like go sub A. And if A was 100, it would go to line 100. So I decided to extend Applesoft by adding that command.
and did that based on the reverse engineering. And I published an article on that in Compute Magazine, which you can still find online. You can find those, actually. I found those as well at archive.org at the Wayback Machine, and they're scans. of the young and strightly Mark Krasinovich in a physical magazine and compute magazine. So that was the first time. But I got into Windows reverse engineering. First, for practical reasons, I was trying to...
Take this idea I had done with my PhD thesis of being checkpointing and restoring operating systems and tried to apply that to Windows 3.1 and then Windows 95, which took me deep into the guts of both of those to understand how to... capture state how to record keystrokes play them back and mice input and how to capture the state of the system so i could bring it back so i learned
not just undocumented APIs, but just the internals of the system. You did your PhD at Carnegie Mellon, is that right? And it was on memory management. So the title of the thesis was Application Transparent Fault Management, which is fancy for saying, how can we add fault tolerance and fault resilience? to an operating system without modifying the operating system. So including things like checkpointing, restoring, and injecting faults, catching faults, adding voting.
And doing that without patching the kernel itself? Right. Using extension mechanisms. Okay, by extension mechanisms. So if the operating system had an appropriate or some kind of extension, can every operating system be extended in that way or just certain ones that had a problem? I mean, I had...
Source cut access to the mock operating system at CMU, so I could extend it. And then Windows, you have your admin on the box. You can kind of patch it and extend it how you want to. So one of the first things that I did... with patching was write a tool called Regmon for Windows 3, Windows 95, and then Windows NT. And I did that by patching internal structures, which would insert...
call outs to me when a system call, registry-related system call would get invoked. In the process of doing that, I had to reverse engineer the system call interface for the registry because what is documented is the Win32 interfaces. Win32 is basically a facade, one of the personalities that Windows NT was designed to have, which is compatible with Windows 95 and 3.1. But underneath the hood, there was something called the native API.
And the native API idea was that you write like a POSIX personality for Windows NT, and that calls the same internal APIs underneath, or an OS2 one. And so I reverse engineered. That's when I really started to reverse engineer the internal interfaces of Windows NT. And a lot of other people did too, by the way. And I've got some books here that actually came out after I'd already done that predominantly, but other people were very interested in this.
Nice. Literally a book called Undocumented APIs. Documenting them. Yeah. So now they're not undocumented anymore. How are you doing this? Like, forgive my ignorance, but it's been 20 years. Dump in or strings or like what do you use? The combination. So both static and dynamic analysis. And the static one would be disassembling. In fact, I wrote my own disassembler, which...
I got to share this because my reverse engineering of Windows NT took a big boost. What Microsoft would do is publish the public symbols for Windows NT. And of course, documented APIs that are called with names. you could find out how they get routed into the operating system. But once you got past that public surface of the public APIs and the public variables...
Then you ended up in, you know, I'm full on reverse engineering and saying, oh, what's in this register? It's this variable. Oh, what's it doing? Oh, and then I'll give it a name so I can track it and understand it. But one of the NT4... betas, they produced a CD with the operating system on it for developers, for hardware developers, and they accidentally included the private symbols for it. So I wrote a disassembler that would take...
those private symbols and the disassembly and merge them together. And that kind of gave me a much deeper high fidelity look at what was going on. And like, are you just setting up a call stack and jumping into an address? Well, so that was just disassembly. Like I said, statically looking at it and tracing in my head, how is this going to go? Where does this go? But I also did dynamic analysis, and I did that with a kernel debugger.
And the kernel debugger I used was called Softice, which was there was a Softice version for DOS, for Windows 3.1, for Windows 95, and for Windows NT. In fact, I worked at Numega Technologies for a short bit. Where I actually added commands and wrote documentation for Softice. Yeah. Right at the beginning of me doing the sysinternals stuff, which back then was called anti-internals. That was the thing.
in the 90s. SoftEyes was like the crack it open, get SoftEyes. It was really cool, just for the people that are not familiar with this. You would run like Windows, and it looked like today except with an old interface. And you would hit a special key.
strokes i can't remember what it was but that would break you into the debugger which would end up taking over the video card and displaying the interface which looks a lot like if you're familiar with a debugger in visual studio it looks a lot like that where you have
The code, which back then was disassembly, or source code if you gave it source code with symbols. The memory view, the register view, or variable view if you're doing symbolic disassembly. And then you could just step through and the whole machine was frozen. It felt like you'd be doing your thing, you'd be in Windows, and you'd hit Control-D.
And it was almost as if you're like flipping. Imagine, it didn't literally do this, but imagine you just flip it around and look behind the TV. And it's like, oh, here's all the registers at the top. It kind of looks like top or H top. And you'd have all your registers at the top and then your call stack at the bottom.
Yeah. That's pretty cool. So why does an undocumented API exist and would I want to go looking for one now? And why are we not supposed to use them? I mean, so the reason that they're undocumented is that... that Microsoft didn't want to officially have to support them or to write the documentation for them. Were they not doing that? That's a good reason, by the way. Yeah, I mean, what it does by not documenting it is give them the freedom to change it.
Like from one version of Windows to the other, they can expand it without having to worry about what apps am I going to break. Once an app takes a dependency on it, if it's an important app, then it becomes an API that you can't change without. breaking the app and then annoying people. So that's the rationale behind, hey, these things are undocumented. It gives us the freedom to evolve them. And I knew that there's a risk taking a dependency on undocumented API, but...
For my tools, two things. If it was a commercial tool, I knew that we would continue to upgrade them and maintain them and stay current with the updated versions of those APIs. If it was a tool like a diagnostic tool like Regmon, I knew that if they break, it breaks, and I'll go fix it for a new one. It's not a big deal, and people have this kind of expectation a new version of Windows comes out. The fact is that those APIs are...
The core of them anyway are very slow moving just because they have the application compatibility internal to the system. that make them hard to change. Because, oh, so now there's a whole bunch of code calling this API internally. If we change that, then we got to go change everything that calls it. You know, back in the day, I want to say when Windows XP, just basically a little bit after Windows XP.
I would always find myself running the Windows compatibility troubleshooter, which would basically lie individually on a process-by-process basis. So like you run an app and an app would go... this is not Windows 95, you naughty guy. And then you'd right-click, hit properties, and say, lie to this. And you'd be running Windows 98 or whatever, 2000. And then it would go, okay, fine. And it would patch stuff.
And those were like checks you would do. And then we all as a community got used to not doing that. Don't make, is not, and then, you know, big statements. But do we do, do we for compat? fake APIs or move APIs around? Yeah. So there's AppCompat shim inside the operating system that get injected into apps that require that help to run on newer versions of Windows. Not that they can't run, it's just that they've got these kinds of...
I've taken a dependency on a documented thing, and Microsoft's like, we want to keep this working so that our customers are happy, or they've got the test for. And a lot of these tests, by the way, are... more subtle than, is this Windows 95? They'll be like, is this function at this address? I'm going to use that to detect what version of the system I'm on. And so lots of really subtle ways that apps get locked to a particular release.
Yeah, that's part of the Microsoft Application Compatibility Toolkit. And what's cool about it is that in the old days, people would install something that would see program files and then just write data to see program files. And if that app never got updated...
and the author's gone, you can correct the file path and redirect it, and the app doesn't know. They're cool. You can lie about the registry, lie about... But that, I think, and I'm going to sound like a fanboy here, but Microsoft doesn't get enough credit. for letting old stuff run. Other operating systems we won't mention. I'll tell you the credit Microsoft gets. Windows is running on a billion PCs. Oh, that's fair credit, I suppose. But until 64-bit...
I was talking to Dan Bricklin because I was running a 32-bit machine. I was like, let me just run VisiCalc. The Dan Bricklin, you were just... like tossed out that name. Yeah, he's like a buddy on social media and now we had him on my podcast and he like invented FizzyCalc. So it's a 1976, 1977 16-bit application and you can just run it on a 32-bit window. It just runs. 32-bit Windows 10, ran it. And I was like,
This is crazy. This is a 45-year-old program. It just runs. Like, that's DOS, but still it's a principle of the thing. Like, how many times on a non-machine, a non-Windows machine, do you call an app and it's like, no, you're orphaned.
You'll never run that again. I just think we should get credit for running old stuff. And the app compatible... Well, like I said, I think that we get the credit because of the install base is the credit. Because the reason to do that is we want people to move on to a new version of Windows where they get more capability.
and access to newer software that takes advantage of those capabilities. But a lot of, especially in enterprise, just won't move if they can't bring along their IT software, which is either from them or third parties, where they're not like, I'm... This thing does what I need it to do. I'm not going to go fix it or upgrade it. And it just needs to continue running. And if you can't get Windows to support it, then I'm stuck on this old version of Windows. It's not necessarily...
thinking about, Dan, I can't run VisiCalc anymore. It's 2024, it's 45, 50-year-old software. Why doesn't 16-bit software run? We have a 32-bit compatibility layer, but we don't have a 16-bit subsystem anymore on Windows. The world has moved on. But I haven't. You need to. Let it go. I'm just going to run DOSBox then, which is glorious. Did I tell you about this? You know about ExoDOS? No. It's a collection. I think that's Peter Molyneux. It's basically a preservationist thing.
it's all the dos games 8 000 of them on a drive and uh you can kind of like and it all runs in dos box and you think about these tools this one actually looks it's a it's a hard drive but it looks like a floppy And when I think about undocumented APIs, I think about like there's a Win32 API that just is not documented and call it, don't call it, good luck. But then there's...
like undocumented behaviors, like getting the Commodore 64, getting the outside frame of the graphics to do colorful things. I wonder how much of the world runs on undocumented APIs and how much of a problem that is. Have you ever called a non-documented API? I have called APIs where I saw a sample code calling a thing and I assumed it was documented. And then a couple of versions later, I got nailed.
And I was like, oh, crap, we can't do that. And I've also worked with teams internally that wanted to call undocumented APIs, went through a review process, and then were told, don't call that one. And they're like, what will happen if we don't? You know, that kind of thing. So you have to find yourself questioning this. But didn't we get in trouble? Didn't Microsoft get in trouble for the undocumented API thing? Well, Microsoft had to document the APIs its products used.
Okay, so if a product uses it, now it's in documentation? Yeah. Okay. Do you think it's overblown? Do people think that there's a whole host of hidden APIs that are just wonderful and exciting that we all want to call? No, because again... If it's going to be useful for a product, it'll be documented. Yeah, exactly. So some offshoots from the NT API. I wrote a tool called NT Crash that would fuzz. Back then it wasn't called fuzzing.
Fuzz the API. What's your blue screen thing, that blue screen, crash, not a crash, or what is it called? Yeah, NT crash. Sorry. Yeah, that's what. You have another one now that's a crash, like crash right now. Oh, not my fault. Not my fault. Not my fault. Yeah. So you wrote a tool to reliably crash NT? This was back in the 96, 97 time frame. I wrote a tool called NT Crash that would barrage the system call interface with garbage.
which we today we call fuzzing but back then there was no name for it and i'd been inspired by a tool that i'd used that i come across at carnegie mellon when i was working on unix on bsd unix called crash me And you run CrashMe and it would fuzz the Unix system call interface. And it would often cause a crash or kernel panic because of the code not checking thoroughly for parameters.
being inappropriate values. And so I'm like, oh, when I got to NT, I'm like, well, let me see what happens. So I wrote an NT crash and that found a bunch of faults, which I published the code on. ntinternals.com, Microsoft would find out about it, and then they would go and fix those. And then I expanded it to call Win32K, the kernel mode interfaces of Win32.
graphics and windowing APIs and found a whole bunch more crashes, as well as some new crashes in the NT API because I came up with more intelligent fuzzing. The funny thing about that is that when I got to Microsoft, and I was looking around the NT source code, I came across NT Crash and NT Crash 2 checked into the Windows NT code base because they used it. Checked in as a tool? Yeah. Was that bad?
No, I mean, I flattery. I flattery, yeah. Yeah, yeah, yeah. I mean, is there something we can or should do to protect ourselves from an undocumented call? I mean, you're the caller or you're the callee. It depends, right?
document everything, right? Because there's open source people that are like, someone could be listening to this and be like, yeah, that's why everything should be open source. But like there's open source projects that also don't document all their APIs. And they'll tell you. The good reason not to document undocumented is they're internal.
Like when you've got your program that you're writing, you've got a bunch of internal functions. You marked them internal for a reason. Yeah, like I can go change these at any moment because... They shouldn't be. Nobody but me cares about these, should care about these. All right. If you like this show, share it with your friends. Check out scottandmarklearn.2 and scottandmarklearn2.com.
Google with Bing for Scott and Mark Learn, too. And please, please share this. Put in the comments what you want us to talk about and what you want us to learn about next. We appreciate you for giving us a try and listening. We'll see you again next week.