The Ghidra Book: The Definitive Guide

Speaker 1

00:00

Have you ever looked at a piece of software, you know, an app or maybe part of your OS, and just wondered what's really going on under the hood, Especially when all you've got is that compiled binary, that black box of machine language. How do you even start to peel back those layers?

Speaker 2

00:17

How do you see inside?

Speaker 1

00:18

Our deep dive today is all about cracking that open We're talking GIEDRA. It's a real powerhouse for software reverse engineering. Sorry.

Speaker 2

00:28

Absolutely think of it like a universal translator from machine code exactly.

Speaker 1

00:32

Takes that dense, inscrutable stuff and makes it well understandable. And here's the kicker, right, GUIDRA wasn't just built by some startup. This thing came out of the NSA.

Speaker 2

00:41

Yeah, a National security agency, developed in house.

Speaker 1

00:44

And then amazingly they released it open source to the public, which was.

Speaker 2

00:49

A pretty big deal at the time.

Speaker 1

00:50

Still is it really is? So this isn't some niche tool, it's government grade stuff for understanding compiled code, and now you know, anyone can use it. So our mission today is to unpack what GEDRA can do from just looking at a file to really advance stuff like customization, even tackling antire tricks.

Speaker 2

01:10

Yeah, it's about getting you informed about that digital.

Speaker 1

01:13

Core, giving you that shortcut.

Speaker 2

01:15

What's really cool is how Geedra kind of levels the playing field. It gives you insights into binaries that would otherwise just be opaque. Whether you're a pro reverse engineer looking at malware or just you know, super curious about how software actually works. Guidra lets you see that digital core.

Speaker 1

01:32

Okay, so before we jump right into Gidra itself, let's back up a bit. Why do we even need tools like this? Why look inside compiled software? Shouldn't we just I don't know, trust it?

Speaker 2

01:41

Well? Ideally maybe, but the real world's mess here. A huge reason is vulnerability.

Speaker 1

01:47

Analysis, finding security holes.

Speaker 2

01:49

Exactly, discovering and analyzing potentially exploitable flaws. You might use fuzzing dynamic stuff to find crashes, but Guedra lets you do static analysis to really understand if something's exploitable and how what are the conditions? Static analysis is key? There?

Speaker 1

02:04

Gotcha? What else?

Speaker 2

02:05

Then? There's software interoperability, big one. If something's only released as a binary, making other software work with it, or writing plugins, it's incredibly hard.

Speaker 1

02:15

Ah like writing drivers for hardware maybe.

Speaker 2

02:18

Perfect example, drivers for hardware that only officially supports, say Windows, but you want it on Linux, you'll probably need to do some serious reverse engineering, maybe even beyond drivers into the firmware.

Speaker 1

02:30

Okay, that makes sense.

Speaker 2

02:31

And then there's always the let's call it source code recovery.

Speaker 1

02:34

Angle, right, trying to get back to something readable.

Speaker 2

02:36

Yeah, you'll never get the original source code back perfectly with comments and all that. We can't do something readable from a binary. That's still super attractive competitive analysis, understanding old systems. Lots of reasons.

Speaker 1

02:49

Okay, I get the why, But what makes it so darn difficult going from that compiled binary back to something human readable. You said it's like rebuilding a cake from crumb.

Speaker 2

03:00

That's a good analogy. The core problem is the compilation process itself.

Speaker 1

03:04

It's lossy, lossy meaning information gets thrown away.

Speaker 2

03:07

Exactly when you compile C plus plus or Java down to machine code. Tons of useful info is just gone. Variable names, function names, comments, forget it. Wow, even variable types. At the machine level, GIDRA might see, say thirty two bits of data being moved. Is that an integer, a floating point number, a memory address, a pointer.

Speaker 1

03:28

It doesn't know.

Speaker 2

03:29

It has to infer it, it has to look at how that data is used later on to make an educated guess ah, and that ambiguity that inferenced. That's why decompilation is still a really active research area in computer science. It's fundamentally hard.

Speaker 1

03:41

So gijra's first big job is to try and bridge that gap, to get those machine code crumbs into some kind of order, and that starts with disassembly right.

Speaker 2

03:49

Right at its most basic level. GIDRA takes the machine language, those raw binary patterns OPT codes and turns them into assembly language.

Speaker 1

03:57

The mnemonics like a push and mov.

Speaker 2

04:00

Exactly, those short memorable sequences pH EBP, mv EAX, stuff like that. It makes the raw instructions way easier for humans to read and track.

Speaker 1

04:10

But even that basic step isn't simple, not entirely.

Speaker 2

04:14

First, Geedra has to figure out what's actually code versus what's just data stored in.

Speaker 1

04:18

The program because they can be mixed together.

Speaker 2

04:20

They often are. Luckily standard file formats like PE for Windows or ELF for Linux and Unix have headers. These headers give GEDRA clues about where the code sections likely start and end.

Speaker 1

04:33

OK.

Speaker 2

04:34

Then it has to match the binary opcodes it finds to the right assembly mnemonic that's mostly a table look up, but it gets tricky with things like instruction prefixes or variable length instructions, which you see a lot in say Intel by eighty six architecture.

Speaker 1

04:48

Right, So, assuming it figures out the instructions, how does it know which one comes next. Programs don't just run straight down, they jump around.

Speaker 2

04:54

That's the next big challenge navigating control flow. A really simple approach is linear sweep, just assemble one instruction after the other.

Speaker 1

05:01

But that breaks easily.

Speaker 2

05:03

Oh yeah, it gets totally confused by jumps and branches, and it can easily misinterpret data as code. So GIDRI uses a much smarter method called recursive descent.

Speaker 1

05:12

Recursive descent how does that work?

Speaker 2

05:15

So when it hits a conditional jump like jump if not zero, it basically says, okay, the program could go this way, or it could just continue straight on. It picks one path to follow immediately and puts the other path's starting address on a list to check out later, a to do list pretty much a list of deferred addresses. If it's an unconditional jump, it tries to figure out the target and just goes there.

Speaker 1

05:36

Okay, that sounds much more robust, but What about jumps where the target isn't obvious.

Speaker 2

05:40

Ah. Yeah, that's where recursive descent hits its limits too, things like jump racks in I two. The target addresses whatever value is in the racks register at runtime.

Speaker 1

05:50

So GIDRA can't know that just by looking at the code.

Speaker 2

05:53

Nope, static analysis can't predict runtime values. Or think about a reinstruction return from function That tells you nothing about where execution will go next.

Speaker 1

06:02

So what does it do?

Speaker 2

06:03

Then? That's when it goes back to its to do list. It picks up one of those deferred addresses it saved earlier and starts disassembling from there. That's the recursive part of the name. It keeps exploring all these potential paths until it's mapped out as much of the program's structure as possible.

Speaker 1

06:19

Wow. Okay, so it's not just translating, it's actively exploring and mapping the possibilities.

Speaker 2

06:24

Quite the detective, you could say that.

Speaker 1

06:26

So GIDRA does all this heavy lifting with disassembly. How does a user someone sitting down with it, actually get started? What's the entry point?

Speaker 2

06:35

It's actually pretty user friendly. You launch gedra first time, you'll see the license agreement. Click through that. Then you get a tip of the day and the main Gidra project window pops up. Okay, your first real step is to create a project. Geedra uses this project to keep everything organized. The file you're analyzing, your notes, everything. You choose between non shared or shared. For working alone, you just pick non shared, give it a directory and a name. Simple, got it.

Speaker 1

07:01

Project setup. Then you bring in the file.

Speaker 2

07:03

Right, you import the file either file import file or just drag and drop it onto the project window. Geedra looks at the file, tries to figure out what it is pe elf, macho, and suggests a format a loader.

Speaker 1

07:17

And you usually just accept the default.

Speaker 2

07:19

Often yeah, getar's pretty good at recognizing common formats. If it's something weird or just raw binary code, you might select raw binary and maybe give it some hints. There are options you can tweak as you get more experienced, like telling it to load external libraries.

Speaker 1

07:34

Okay, file imported. What happens then?

Speaker 2

07:36

Then Gidra asks if you want to autoanalyze it, and you almost always say yes.

Speaker 1

07:42

That's where the magic happens.

Speaker 2

07:43

That's where a ton of automated work kicks in. Getra starts identifying functions finding data, recognizing common code patterns, building cross references, all that stuff we talked about. You can actually watch its progress in the bottom right of the main window the cod browser, or look at the detailed log file.

Speaker 1

07:57

So it does a massive amount of groundwork automatically.

Speaker 2

08:00

Absolutely, it takes that raw imported binary and brings it into a much more structured, analyzed state ready for you to start exploring. It really guides you in.

Speaker 1

08:09

Okay, So the auto analysis finishes and you're dropped into the Gidra code Browser. That's the main workspace, right, What does that look like? What are the key parts?

Speaker 2

08:18

Code browser is your command center. It's a multi window interface, quite rich by default. You'll likely see a few key windows, but let's focus on maybe the three most important ones to start. Okay, First, there's the listing window. This is your assembly view. The direct translation of machine code into mnemonics. You see addresses the bites.

Speaker 1

08:37

The instructions, and those arrows you mentioned.

Speaker 2

08:39

Yeah, crucial feature flow arrows. They graphically show you where jumps and calls go. Solid arrows for unconditional jumps, dash for conditional ones, hover over them, double click to follow them. Makes tracing the program's path so much easier than just reading addresses.

Speaker 1

08:54

Helps visualize the flow exactly.

Speaker 2

08:56

Then right next to it often is the decompiler window.

Speaker 1

09:00

Ah, the crown jewel.

Speaker 2

09:01

Arguably yes, this is where GADRA tries its best to reconstruct high level C like source code from the assembly.

Speaker 1

09:10

How good is it?

Speaker 2

09:11

It's often remarkably good. It gives you a much much higher level understanding of what the code is trying to do the logic compared to just staring at assembly. Now, it's not perfect.

Speaker 1

09:21

Right because of that lossy compilation precisely.

Speaker 2

09:24

Sometimes it has to guess types, maybe it overuses casts, and it always generates C even if the original code was C plus plus or something else. But still it's incredibly valuable for quickly grasping function logic.

Speaker 1

09:35

Okay, listing for assembly decompiler for pseudo C. What's the third key window?

Speaker 2

09:40

The symbol tree window. This is all about program structure. It shows you information extracted from the binary symbol tables if they exist, like function names, functions, labels, global variables, classes, name spaces, and really importantly, imports and exports.

Speaker 1

09:54

Imports being stuff the program uses from outside, like libraries.

Speaker 2

09:58

YEP functions imports from shared libraries and exports are the entry points into this file. Things it provides for others, including the main program entry point often called entry.

Speaker 1

10:09

Or start, and this helps you navigate massively.

Speaker 2

10:12

It's also where you often spot C plus plus code because of name mangling. C plus plus compilers tweak function names to handle overloading adding extra characters. GIDRA shows you these mangled names and often tries to demangle them back to something readable. Super useful for understanding C plus plus binaries.

Speaker 1

10:30

So these three windows listing, Decompiler, symbol tree give you different complementary views into the binary.

Speaker 2

10:37

That's the idea assembly for the ground truth dcompiler for high level logic, Symbol tree for overall structure and external connections. GET helps you bring order to the chaos.

Speaker 1

10:46

It's like giving you different lenses to look through. Okay, so you have these views, how do you actually see the connections? How does function A relate to function B? Or where is this piece of data used?

Speaker 2

10:56

This where cross references or xrafs come in.

Speaker 1

11:00

Absolutely fundamental the glue holding the program together.

Speaker 2

11:03

Exactly what they are. Think of every address in the program as a potential node in a giant graph. Xrfs are the edges connecting those notes.

Speaker 1

11:11

How do they show up?

Speaker 2

11:12

In the listing window? Next to an instruction or a data definition, you'll often see something like XRF three function AC data location R. It tells you what other locations refer.

Speaker 1

11:23

To this spot, and the letters mean different things. Are WCB right.

Speaker 2

11:29

For data references, you'll see R for read, W for wright. If its address is taken used as a pointer for code references, C means it's being called like a function call, and J means it's being jumped to.

Speaker 1

11:41

Okay, so how would you use that in practice?

Speaker 2

11:43

Let's say you find a suspicious string in the data section, maybe something like enter password. You can look at the xrfs to that string's address, and that tells you it'll point you directly to the code locations that use that string, so you can instantly jump to the part of the code that's probably handling the password input.

Speaker 1

11:59

AH very power erful for tracking.

Speaker 2

12:00

Things down incredibly or flip it around. If you identify a function that looks vulnerable, you can look at its XRIs to see which other functions call this potentially dangerous function. GEDRA has dedicated windows just for managing and exploring these references. It lets you build that mental map of dependencies.

Speaker 1

12:19

Beyond just lists of references. Does giredra help visualize these connections?

Speaker 2

12:23

Yes, absolutely, okay. Graphs are another strong point. The main one is the function graph few what does that show? It takes a single function and displays it as a flood chart. Each node is a basic block, a sequence of instructions with no jumps in or out except at the beginning and end. The edges show the control flow that jumps and branches between blocks.

Speaker 1

12:42

So you can see loops and conditional paths visually.

Speaker 2

12:45

Precisely. You can zoom in, pan around, even collapse blocks to simplify complex functions. It makes understanding the logic flow much more intuitive than just reading linear assembly. Guedra also offers function call graphs, which show the bigger picture how different functions across the entire program call each other.

Speaker 1

13:01

So putting it together, the xrafs and graphs really let you untangle that complex web inside a binary.

Speaker 2

13:08

They make the invisible structure visible, letting you navigate and comprehend behavior that would otherwise be incredibly hard to follow.

Speaker 1

13:15

Now, Geidra sounds amazing out of the box, but you mentioned it's extensible. What if it doesn't do exactly what you need, or you want to automate something repetitive.

Speaker 2

13:24

Great question. That's where Giedra's extensibility really shines. If you need a feature Gedra doesn't have, or you want to automate a tedious analysis task, you often can the primary ways through giedras scripting.

Speaker 1

13:37

Scripting like writing small programs to control Gijra exactly.

Speaker 2

13:41

You can write scripts in Java or Python. There's a built in script manager to organize, edit and run them.

Speaker 1

13:47

What can scripts do?

Speaker 2

13:48

Lots of things? They have access to a simplified flat API that lets you interact with the program being analyzed. You can get the current address, read or write bytes, find specific byte sequences, iterate through funds, examine instructions.

Speaker 1

14:02

So you could write a script to say, find all functions that use a specific instruction.

Speaker 2

14:07

Easily, or account how many times a certain function is called, or automatically rename functions based on patterns you find, or even modify the program's memory. If you're doing emulation or patching, you can print results to the console, add comments automatically. It's very flexible for automating tasks.

Speaker 1

14:24

Beyond simple scripts, can you add bigger pieces of functionality.

Speaker 2

14:28

Yes, you can develop custom analyzers. Gadra's autoanalysis uses many built in analyzers. One example is the function id analyzer, which tries to identify common library functions by matching their code patterns against pre built databases fight dates.

Speaker 1

14:44

So it labels things like print automatically, right.

Speaker 2

14:47

If it recognizes the pattern. But you can build your own analyzers in Java using gidra's development framework in Eclipse. Maybe an analyzer that specifically looks for return oriented programming ROP gadgets those short instruction sequences useful for exploitation.

Speaker 1

15:00

Okay, and you mentioned loaders earlier.

Speaker 2

15:02

Yes, custom loadors. If Gidra doesn't recognize a specific file format, maybe some custom firmware image or packed executable, you can write a loader module to teach Gidra how to interpret it, how to map it into memory for analysis.

Speaker 1

15:15

So you could write a loader for maybe shell code embedded inside a word document.

Speaker 2

15:20

Potentially. Yes, if you can write code to extract the shell code and tell Gidra where to load it and how it's structured, then gidrick can analyze.

Speaker 1

15:27

It, and the deepest level of customization.

Speaker 2

15:30

That would be processor modules. These define how gidra actually disassembles instructions for a given CPU architecture. Gidri uses its own language called s LAAG. For this. You could potentially modify existing processor definitions or even add support for a whole new processor. Although that's quite advanced.

Speaker 1

15:48

Wow, And can you automate all this without the GI.

Speaker 2

15:51

Yep headless mode. You can run gidra from the command line using Analyze headless. You can tell it to import a file, run the AutoAnalyzers, run spasific scripts, bulkitscript, and out results all without ever opening the graphical interface.

Speaker 1

16:03

Oh.

Speaker 2

16:03

Great for batch processing lots of files.

Speaker 1

16:05

So gidra isn't just a tool, It's really a platform. You can shape it, extend it, automate it to tackle very specific problem.

Speaker 2

16:12

That's exactly right. You're not just using it, you can build upon it.

Speaker 1

16:15

Okay, let's talk about the real world, which, as you said, is messy. Reverse engineering isn't always against clean, straightforward code. What about differences between compilers or when software deliberately tries to hide what it's doing.

Speaker 2

16:31

Yeah, those are major hurdles. First, compiler variations, the exact same C plus plus code can produce wildly different assembly depending on the compiler GCC versus CLIG versus Microsoft's compiler, and even just the optimization level used like Nanniko two versus a debug build. How different, very different. A simple switch statement might become an efficient jump table or slow

16:52

series of comparisons. Debug builds often have extra checks and simple information, while release builds are stripped and optimized.

Speaker 1

17:00

What about optimizations like inlining.

Speaker 2

17:02

That's a big one. Function inlining can completely remove a function call. Pasting the function's code directly where it was called makes following the logic harder because the explicit call is gone.

Speaker 1

17:12

And C plus plus adds its own wrinkles.

Speaker 2

17:15

Oh yes, things like run time type information RTTI use for features like dynamic AST. Guidra has analyzers for some RTTI formats, but often relies on that name. Mangling we talked about. If a binary is stripped, meaning simple information is removed, finding C plus plus structures gets much harder. Even finding the main entry point main function in a stripped binary requires knowing how the specific compiler sets things.

Speaker 1

17:42

Up, So just the compiler choices make a big difference. What about intentional roadblocks anti reverse engineering?

Speaker 2

17:48

That's the next level of challenge obfuscation is about making the binary intentionally confusing many ways, inserting junk code instructions that don't actually do anything but look like real code to confuse the disassembler using obscure jump targets, maybe jumping into the middle of another valid instruction. Sometimes they'll load libraries dynamically using OS functions like load library instead of

18:09

standard imports, hiding dependencies. Or they might use hashes of function names instead of plane strings, making simple stream searches less.

Speaker 1

18:16

Effective, trying to break the analysis tools exactly.

Speaker 2

18:20

And then there's anti dynamic analysis detecting if the program is running inside a virtual machine or a sandbox, looking for specific files, registry keys, or artifacts left by virtualization software like VMware tools. If detected, the program might just refuse to run or behave.

Speaker 1

18:37

Differently sneaky and anti debugging yep.

Speaker 2

18:41

Actively trying to stop debuggers. Yeah, they might intentionally trigger exceptions that debuggers handled differently, or on Linux Unix they might use tricks with the sirtrace system call, which debuggers like GDB rely on to prevent a debugger from attaching in the first place.

Speaker 1

18:54

It really is an arms race. So faced with all this, can GUDRA still.

Speaker 2

18:58

Help it absolutely can. Static analysis is often key to defeating these techniques For things like packed or encrypted code where the real code is hidden initially. You could sometimes use GETRA scripting and emulation capabilities to simulate the unpacking or decryption routine.

Speaker 1

19:13

You run the deoccuscation code virtually sort of, you emulate just that small part.

Speaker 2

19:18

Let it decode the hidden section in memory, and then you can tell Guedro to reanalyze that now decoded memory. It lets use statically analyzed code that was initially hidden.

Speaker 1

19:28

Clever, and what about changing the binary? Can GEDRA help with patching?

Speaker 2

19:32

Yes? Patching binaries is another common use case. Maybe you want to change a string bypass a license check for educational purposes, of course, or fix a bug in a program where you don't have the source.

Speaker 1

19:42

How do you do it?

Speaker 2

19:43

Guedra helps you find the spot you want to change, using a search tools, searching memory for byte sequences or strings, or even searching for specific instruction patterns. Once you find it, you can use the byte viewer window, which acts like a hex editor, to directly modify the bytes, or use scripts for more complex patches, and.

Speaker 1

20:01

Then save the modified file.

Speaker 2

20:03

File export program. Unless you save your changes back to a new executable file, though you need to be careful about things like file offsets and relocations.

Speaker 1

20:11

One last area comparing files. What if you have two versions of a program or want to see what changed in an update.

Speaker 2

20:17

Guidra has excellent binary diffing tools for that. It's super useful for seeing what changed between versions, maybe porting your analysis from an old version to a new one, or merging work with colleagues.

Speaker 1

20:29

What tools does it offer.

Speaker 2

20:31

There's program diff which shows two files side by side and synchronized listing windows, highlighting differences in color. You can choose to ignore, replace, or merge changes. There's also function comparison for just comparing two specific functions and for more complex scenarios. Version tracking helps correlate functions in data between different versions, even finding functions that are similar but not identical fuzzy matches.

Speaker 1

20:56

So it's not just analysis, but also modification and comparison whole life cycle.

Speaker 2

21:00

Pretty much. Geydra provides a really comprehensive toolkit for dealing with binaries, from initial understanding right through to modification and tracking changes over time, even when facing obfuscation.

Speaker 1

21:11

It's incredible. From its somewhat shadowy origins inside the NSA to now being this vital open source tool. Geedra really does empower anyone to decode that digital core.

Speaker 2

21:21

It truly does. It goes way beyond just showing you bytes. It's an ecosystem for deep analysis, for active modification, for collaboration. That ability to pull together different kinds of information, assembly, decompiled code, cross references graphs, and look at the problem from multiple angles is just so valuable.

Speaker 1

21:40

So thinking about all this, the complexity compilers add, the deliberate tricks used to hide code, makes you wonder, doesn't it. What surprising things might Geidra reveal If you pointed it at the everyday software you use, maybe even the OS on your phone.

Speaker 2

21:54

Or computer, it's a fascinating thought. You might be surprised what's really going on under the surface.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript