File System Forensics - podcast episode cover

File System Forensics

Aug 25, 202543 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Provides an extensive overview of file system forensics, primarily focusing on the technical aspects of analyzing various file systems common in digital investigations. It introduces Linux as a forensic platform, detailing its open-source advantages and essential commands for digital forensics. The text then explores the structure and analysis of different file systems, including FAT, ExFAT, NTFS, EXT2/3/4, XFS, Btrfs, HFS+, and APFS, explaining their on-disk structures, metadata, and data recovery techniques. Finally, the document addresses future challenges in digital forensics, such as new file systems and the complexities of live data forensics.

You can listen and download our episodes for free on more than 10 different platforms:
https://linktr.ee/cyber_security_summary

Get the Book now from Amazon:
https://www.amazon.com/File-System-Forensics-Fergus-Toolan/dp/1394289790?&linkCode=ll1&tag=cvthunderx-20&linkId=74dca1e73d7f60c2450486ba81e2d53a&language=en_US&ref_=as_li_ss_tl

Discover our free courses in tech and cybersecurity, Start learning today:
https://linktr.ee/cybercode_academy

Transcript

Speaker 1

Imagine this. You're watching a crime drama, right, and the detective they're dusting for fingerprints. Classic stuff.

Speaker 2

Yeah, you see it all the time, but honestly.

Speaker 1

In the real world today that's almost quaint, like a rotary phone. Today, the fingerprints, they're digital and they're.

Speaker 2

Everywhere, absolutely everywhere.

Speaker 1

We're talking the smartphone in your pocket, the sat NAV in your car, even your home CCTV that's running twenty four to seven. These digital traces, they used to be just for specific cybercrimes, but now now they're found in almost every case. It's not just evidence anymore. It's really an explosion of it.

Speaker 2

It truly is. The sheer amount and well the variety of digital devices mean pretty much any incident, you know, from a minor theft all the way up to a major criminal investigation, it leaves a digital footprint something that wasn't even really conceivable a few decades back, exactly.

Speaker 1

And for you listening in this deep dive, it's your shortcut to getting properly well informed about this hidden world. We're not just going to scrap it's the surface of what information is found, No, definitely not. Our mission here is to pull back the curtain on how it stored, how it gets retrieved, and maybe most importantly, why understanding those hidden mechanics, Well, why it fundamentally reshapes an investigation and maybe even our idea of digital truth.

Speaker 2

Yeah, and our deep dive today it's built from sources that really focus on file system forensics. So we're going to give you a detailed look at how data is organized write down at its most fundamental level. We'll look at the essential tools investigators used to uncover it and the fascinating, sometimes pretty complex challenges that lie ahead in this field because it's evolving so fast.

Speaker 1

Okay, so we've painted this picture digital evidence exploding everywhere. But here's where it starts to get tricky. The very basic rules for collecting this evidence they've been shifting under our feet. Let's dig into what we're calling the shifting sands of digital evidence, because what worked maybe ten years ago could actually compromise an entire case.

Speaker 2

Now, that's right.

Speaker 1

For years, the standard advice for seizing a computer and an investigation was just dead simple, pull the plug, just cut the power immediately. Yep, that was the gold standard.

Speaker 2

But today doing that can be a huge mistake. Why is that old rule suddenly so well dangerous.

Speaker 1

What's fascinating, I think is that modern operating systems, in hardware, they've introduced features that directly conflict with that old advice. Think of it like this. For encrypted storage, right, the decryption keys often only live in the computer's volatile memory. It's RAM, basically, it's short term digital brain. Okay, So you pull the plug, those keys just vanish instantly, and all that data becomes a locked box, you know, permanently inaccessible.

It's like trying to open a safe after the combination's just been wiped from the manager's memory.

Speaker 2

Right gone forever. And people aren't just saving things locally anymore, are they. They're using remote storage, cloud services, email, social.

Speaker 1

Media constantly connects.

Speaker 2

So if you yank the power, all those live connections, all that access to potentially crucial data, it's instantly.

Speaker 1

Lost, exactly. And this is why live data forensics or LDF has become so critical. It lets analysts capture that live data from a running computer system before it disappears. But hang on, that sounds like it completely contradicts one of the most fundamental principles of forensics, doesn't it.

Speaker 2

It absolutely does, And this raises a really critical point. The first principle in digital forensics. It's often called ACPO principle one. It states that no action taken by law enforcement agencies should change data which may subsequently be relied upon in court. Okay, LDF, well, it inherently breaks this principle. I mean, even just moving a mouse on a running system leaves digital traces. So the tension there, it's very real.

Speaker 1

So, Okay, if you're altering the data, how on earth can it be admissible in court? Doesn't that just open the evidence up to immediate challenge, Say you changed it?

Speaker 2

Well, not necessarily the principles themselves, they're actually still fit for purpose, but they kind of adapt. So while LDF does alter data, it can be admissible in court when you combine it with ACPO principle two, which emphasizes investigator competence, and principle three, which demands a really thorough audit trail.

Speaker 1

Ah okay, the paperwork.

Speaker 2

Essentially, yeah, it means every single step taken, every command run, every change made, it has to be meticulously documented. And it's that meticulous record keeping that allows the court to understand and hopefully trust, the context of the altered data. The integrity of the whole investigation now relies on understanding that fundamental shift in collection and then just rigorously documenting every single step.

Speaker 1

Right. Speaking of integrity, let's talk about maybe an unsung hero of digital forensics. Linux. It's often called an open source powerhouse in this field. But what does open source actually mean? Is it just about software you don't have to pay for.

Speaker 2

That's a common misconception. To really get open source, it helps to contrast it with closed source software. So imagine I write a simple Hello World program and see.

Speaker 1

Right, okay.

Speaker 2

With closed source, i'd give you the compile executable file. You can run it, sure, but you can't see the underlying code. You can't change it. It's like a mystery box that just well does its thing right.

Speaker 1

You just trust it works exactly.

Speaker 2

With open source, though, I give you the actual c program file, the source code. You can read it, you can understand exactly what it does, and you can even modify it if you've got the skills. Richard Stallman, the founder of the Free Software Foundation. He famously said that free in open source means free as in free speech, not free as in free beer.

Speaker 1

Ah. That's a great distinction.

Speaker 2

It is so while it's often free of costs because of the lightnsing, the core idea is really the freedom to examine, use, and modify the code.

Speaker 1

That distinction is key. So why is this open source model, particularly Linux, such an advantage for forensics. It seems a bit counterintuitive that something anyone can tinker with would be more trustworthy.

Speaker 2

Maybe well, it's precisely because anyone can modify it, or at least examine it, that it's often seen as more trustworthy. There are a few big reasons. First, there's community power. Open source projects often have these huge communities of users, developers, testers, all working together. This collective effort often leads to new features being introduced faster and crucially, issues being resolved much quicker than with small proprietary teams.

Speaker 1

So it's not just about speed then, does that community scrutiny also directly help with the trustworthiness and accuracy of the tools, which must be absolutely paramount when potentially lives depend on them.

Speaker 2

That's exactly it. More eyes on the code, more brain solving problems. This leads directly to greater trust and correctness. With closed source software, you're essentially relying on the developers having got everything right, and we've all seen the infamous blue screen of death in Windows. For example, With open source, the community can review and fix the code at any point. That provides a lot more confidence in the tool's accuracy, which, as you say, is vital when people's lives might depend

on the investigation's outcome. Makes sense, and yes, cost effectiveness is a major advantage too. Because of copyleft licensing requirements, it's actually quite difficult to sell open source software directly, so it's often free of cost. Now, companies can still offer services like training or customization around these products, but

the core software itself is usually freely available. And lastly, specifically for forensics, Linux offers great support for many file systems by default, often much more than Windows or Mac OS natively support, which makes it a really ideal forensic workstation right out of the box.

Speaker 1

So when we talk about Linux as an operating system, what are its main parts? You hear about the kernel, but what else makes it a functioning OS?

Speaker 2

Yeah, good question. At its very heart is the kernel, which was created by Linus Torvold's that's the bit that directly controls the hardware and manages the software. Then layered on top of that are the GNU utilities. These are standard programs that let users control files, run programs, that sort of thing. It's really the combination of the Linux kernel and these GENU utilities that forms the functional operating

system we commonly just call Linux. Beyond that core, you've got graphical desktop environments, the visual interface most people see, and of course all the application software the end users are most familiar with, including those powerful forensic tools we've been mentioning.

Speaker 1

Okay, let's get practical then. What are some basic, really fundamental forensic commands in Linux that investigators are using day to day. These must be like the digital equivalent of a magnifying glass and dusting powder.

Speaker 2

Definitely, one of the most fundamental is hashing. This is absolutely crucial for ensuring data integrity.

Speaker 1

Okay, how did that work? Well?

Speaker 2

Hashing algorithms like say MD five or the Saha family. They create a unique digital fingerprint for any piece of data. If even a single bit is changed in a file, its hash value will change traumatically. It's like if you change just one letter in the entire collected works of Shakespeare, the hash would completely change instantly, confirming even the tiniest alteration.

Speaker 1

Wow. So if someone sends you a file and you calculate its hash, you can instantly verify it hasn't been tampered with since they calculated their ash. Yeah, that's powerful.

Speaker 2

It is very powerful. Now. While some smaller hashes like CRC threety two maybe can sometimes experience hash collisions, that's where different inputs accidentally produce the same hash, using larger outputs like SAHA five twelve, or maybe using multiple different algorithms together, that greatly reduces that probability to almost zero.

Speaker 1

Got it? What else?

Speaker 2

Another really useful tool is hex viewers like XXD. This lets analysts examine raw binary data, bite by byte, things like a discs partition table for example. It's like looking at the absolute purest form of the computer's language, the ones and zeros represented compactly. This often requires root access, though, using the pseudo command.

Speaker 1

Okay, and then there's strings I've heard that's a really really powerful one for investigators. What makes it so special? It really is?

Speaker 2

The strings command is deceptively simple but incredibly useful. It displays all the printable as key carecharacters it finds within any file, even binary file, so even.

Speaker 1

In like an image file or a program exactly.

Speaker 2

Even if a file is an image or an executable program, strings can pull out any plain text that happens to be embedded within it. And when you combine it with EGREP for text searching, it becomes a very powerful forensic tool for quickly finding keywords or phrases within potentially massive amounts of raw data. That sounds incredibly useful, and the AT option is particularly useful. It displays the bite offset,

basically the address where the text is found. This lets an investigator navigate directly to that specific spot within the file or the disc image using other tools.

Speaker 1

So it's not just about finding a keyword, it's about the context, seeing what's around it. I imagine that's crucial when you're dealing with huge amounts of data where a word might appear harmlessly in one place, but maybe sinatraily in another, all hidden away in binary code.

Speaker 2

Precisely, it's not just finding the needle in the haystack. It's like finding a specific pollen grain on that needle, and it's precise digital address. It's truly granular work.

Speaker 1

That's incredible. Okay, that's a great segue into understanding the hidden language, how computers speak in ones and zeros. I mean, at the end of the day, it's all just zeros and ones, right, But how do they turn that into something we actually understand, like text or numbers.

Speaker 2

It is all zeros and ones. But how those zeros and ones are interpreted? That's the key. Computers, as you say, use the binary number system base two. Humans we generally use decimal base ten, but in computing you'll very often encounter hexadecimal or hex, which is base sixteen. It uses the digits zero through nine and then the letters A through F to represent the values ten to fifteen. Hexadeesimal is simply a much more compact way to represent binary data for human eyes.

Speaker 1

Okay, so it's like a shorthand exactly.

Speaker 2

For example, the binary number ten eleven that's one zero, one one is equivalent to eleven in our decimal system. In hex, it would be b. It just makes reading long strings of binary data much easier.

Speaker 1

Okay, that makes sense. Then there's text. My computer understands what I type, But how does it turn the letter a into numbers and then back again.

Speaker 2

Ah, that's where character encodings come in. They basically assign a unique numerical code to each character. Letter's numbers, symbols everything like a secret codebook kind of Older encodings like ASE and ISO eight eight five nine, they were quite limited. They worked well for English, but struggled with special characters like maybe the A character in Spanish or other European languages. They simply didn't have enough codes assigned for every symbol used across.

Speaker 1

The world, Which is where Unicode and UTF eight step in, I guess correct.

Speaker 2

Unicode is this huge standard that supports a vast range of characters from pretty much all the world's writing systems. It covers virtually every character you could imagine, and UTF eight is a specific encoding method for Unicode. It's a variable width encoding, which cleverly solves the storage inefficiency you'd get if every single character took up say four bites. UTF eight uses anywhere from one to four bytes per character, so it adapts exactly. This makes it the de facto

standard for web pageing coding and much more. What's really clever about UTF eight is that standard English ACI characters ABC one, two three, they are represented identically to their original ACI form, taking up just one byte, but more complex characters like emojis or characters from other alphabets, they might take two, three, or four bytes. So it's incredibly efficient for common text, but flexible enough for global communication.

Speaker 1

That's smart. And what about time? How do computers keep track of that down to the you know, the millisecond or even nanosecond, which must be crucial for an investigation and timeline.

Speaker 2

Time representation in computing is another fascinating area. Many systems, especially Unix like systems like Linux and mac os, use what's called Unix time.

Speaker 1

Right, I've heard of that.

Speaker 2

It's measured as a number of seconds that have elapsed since midnight UTC on January first, nineteen seventy. That specific moment is known as the epoch.

Speaker 1

Okay, so just a big counter of seconds. But what if two things happen really fast, like within the same second, would they have the exact same time stamp? That could be a real problem for investigators trying to figure out the exact order of events, especially if automated processes are involved.

Speaker 2

It absolutely could be, and it was a limitation. While maybe not an issue for things happening at human speed, automated processes can access or modify many files within a single second. This meant older filesystems like say x two to two, which only had second level granularity, couldn't always definitively say which isn't happened first if they occurred in the same second.

Speaker 1

So how do they fix that?

Speaker 2

Well, most modern implementations of UNIX time, like you find in the XT four filesystem, for example, they now include a nanosecond subcomponent. Nanosecond yeah, billions of a second. This significantly improves the granularity, allowing for incredibly precise ordering of events and file creation timestamps that can be absolutely crucial for building an accurate forensic timeline.

Speaker 1

That's a huge leap in precision. Okay, one more weird term before we move on. Indian thiss. That sounds like something out of Gulliver's Travels or I don't know, a really obscure technical debate. What's that about?

Speaker 2

Huh? It does sound a bit strange, doesn't it, But it's actually crucial for correctly interpreting raw Heck's data off a disc. Indianness is just about the order in which computers store or read multi byte numbers. Okay, how so imagine writing down a date. Do you write month, day year like in the US or day month year like in Europe. It's the same information, right, just a different order.

Speaker 1

Gotcha.

Speaker 2

Computers have a similar choice for numbers. Big Indian is like writing one twenty three. The most significant bite, the one hundred's part, comes first. Little Indian, which is more common on PCs, is like writing three twenty one. The least significant bite, the ones part, comes first.

Speaker 1

So why does that matter for forensics?

Speaker 2

Because if you pull raw data off a disc, maybe from a critical system boot file or a timestamp field, and you don't know the reading order, the indianness of the system that wrote it, you'll completely misinterpret what those bites actually report. It could be the difference between seeing December fifth and May twelfth in a critical timestamp, just based on reading the bytes in the wrong order.

Speaker 1

Okay, crucial detail. Then, with that secret language sort of decoded, let's zoom out a bit from the individual bits and bytes to the actual landscape where all this data lives. We're moving to the disks, partitions, and file system fundamentals, the very architecture of digital storage. Starting with how computers organize information physically, What are the different types of storage and which ones are most relevant for forensics?

Speaker 2

Right? Computer storage is usually classified into a few tiers, primary, secondary, tertiary, and offline. Primary storage is typically RAM random access memory, and the key thing about RAM is that it's volatile, meaning all the information stored in it is lost as soon as the power is removed. This is exactly why

live data forensics LDF is so critical. As we've discussed, RAM holds so many ephemeral details about what was just happening on the system's story, open documents, running processes, network connections, those encryption keys we mentioned.

Speaker 1

Stuff you lose if you just pull.

Speaker 2

A plug precisely. Then you have secondary storage. This includes your traditional hard disk drives HDDs and the now very common solid state drives SSDs. This is where most of our persistent data resides, the stuff that stays when the power is off.

Speaker 1

And SSDs would you're everywhere now. Because they're so fast and efficient, they bring their own unique forensic headaches, don't they. I've heard they can be kind of a forensic investigator's nightmare compared to the old spinning hard drives.

Speaker 2

They absolutely do post some unique SSD specific challenges because of how they work fundamentally differently from HDDs see. Unlike HDDs, the flash memory components inside an SSD can only be written to a limited number of times before they wear out, so to extend the drives lifespan, the SSD controller employs

techniques like were leveling. This involves intelligently moving data around independently of the operating system just to make sure all the memory cells get written to roughly the same amount.

Speaker 1

So the controller is shuffling data behind the.

Speaker 2

Scenes exactly, which makes predicting the precise physical location of a specific piece of data for forensic analysis much harder. You can't just assume data stays put in one physical spot like you mostly could on an HDD.

Speaker 1

So data might not be where the operating system thinks it is. That sounds like a constant game of digital hide and seek.

Speaker 2

It can be, and it gets worse even more critically when the operating system marks data as unallocated, which happens when you delete a file. Modern SSDs use a function called trim. The OS basically tells the SSD controller, Hey, we don't need the data in these.

Speaker 1

Blocks anymore, and the controller.

Speaker 2

The controller can then internally mark those blocks for erasure, often almost immediately, as part of its garbage collection routines. This means that deleted files are much less likely to be present and recoverable on SSDs than on traditional HDDs, where the data just sat there until overwritten.

Speaker 1

Wow, so deleting really means deleting much more often on ans sat often.

Speaker 2

Yes, And here's the real kicker for forensics. These SSD controllers run their garbage collection routines in the background while the drive is powered on. This means that potentially data is changing on the device in question, even if it's just sitting there, plugged in as evidence doing nothing.

Speaker 1

From the OS perspective, WHOA, So the evidence is potentially altering itself.

Speaker 2

Exactly, which, as you can imagine, directly breaks the principle of not altering evidence. It's a fundamental conflict that investigators have to be acutely aware of when dealing with SSDs.

Speaker 1

That's a huge, huge challenge to the core ideas of forensics. Okay, so physical discs, whether HDD or SSD, they're often divided into partitions. Why do we do that? What's the purpose of these logical divisions?

Speaker 2

Partitions are essentially logical divisions of a single physical disc. They allow that one physical disc to be split into multiple logical areas, each of These areas can then contain a different file system or even a different operating system. For you might have one partition for Windows and another for Linux on the same drive, or maybe a separate partition just for your user data.

Speaker 1

Okay, and there are different ways these partitions are laid out on the disc. Right, Like MBR versus GPT. What's the practical difference there for someone investigating a system.

Speaker 2

Yes, MBR Master Boot Record and GPTGII Partition Table are the two main schemes used to define how partitions are organized on a disc. MBR is the older standard. GPT is more modern and allows for well far more partitions on a single disk, and also provides greater space for storing partition information compared to the very limited.

Speaker 1

Space in the MBR, so more robust.

Speaker 2

Generally, Yes, For an investigator, knowing which scheme is used tells you where to look on the disc for critical boot information and how that disc is logically structured overall.

Speaker 1

Got it. Let's zoom in again now to the core filesystem concepts. These are the real nuts and bolts of how files actually live on a disc, and understanding them can reveal hidden data. What's a cluster or block, and why does that basic unit mate or so much?

Speaker 2

Right? A cluster, sometimes called a block, is the basic, smallest allocatable unit of storage within a file system. Think of it like building blocks. Even if you have a tiny file that's only say, one byte in size, the filesystem has to alligate an entire cluster to store it. That cluster might be, for instance, forty ninety six bytes, so.

Speaker 1

One byte of data, except forty ninety six bytes of.

Speaker 2

Space exactly the remaining forty ninety five bytes in that cluster. The space between the end of the actual file data and the end of the cluster. That's known as slack space.

Speaker 1

And that's interesting because.

Speaker 2

Because this slack space isn't necessarily empty, it can contain data from previous files that happen to occupy this cluster before the current file was written there. It might hold fragments of old documents, emails, chatlogs, religual evidence that was never fully overwritten. It can be a real gold mine for investigators.

Speaker 1

Wow, okay, and unallocated space. That sounds like just empty space, But I have a feeling it's not always truly empty either.

Speaker 2

You're right, it often isn't. Unallocated space is simply disk space that isn't currently assigned to any active file by the filesystem. But crucially, when a file is deleted, especially on older hgds, less so on trim abled SSDs, its content often isn't wiped immediately. The space is just marked as available. The actual data might still be physically present in that unallocated space, just waiting to be overwritten by new data eventually.

Speaker 1

Which is why you need a full copy exactly.

Speaker 2

This is precisely why forensic investigators always aim to create a bit by bit image of the device. This ensures that all the unallocated space is captured and can be analyzed later. If you just copied the active files, you'd miss all that potential evidence of deleted but still recoverable files.

Speaker 1

Makes sense, What about file fragmentation? Does that just make file slow to load or does it complicate forensics too?

Speaker 2

File fragmentation happens when there isn't a single large enough continuous area on the disc to store an entire file when it's first written or when it grows, so the file system has to split the file into multiple pieces or fragments, and store them in different physical locations on the disc. Okay, while it can definitely affect performance for forensics. It means that to recover or analyze that file, you first have to find and correctly reassemble all those scattered fragments.

It's like putting together a jigsaw puzzle, sometimes with missing pieces or pieces from different puzzles mixed in.

Speaker 1

It adds complexity, right, and something called copy on write or COWW. That sounds like maybe a way to preserve data rather than lose it.

Speaker 2

It is in a way copy on righte. COWW is a strategy used in many modern file systems like APFS and others. When a resource like a file block is about to be modified. Instead of overwriting the original data directly, the filesystem first makes a copy of the original block and then writes the changes to the new block.

Speaker 1

Ah, so the old version sticks around for a while.

Speaker 2

Potentially yes, it means the original data might still be present somewhere on the disc, offering the potential to discover earlier versions of artifacts or files. This is also fundamental to how filesystem snapshots work. The preserve a view of the filesystem at a specific point in time by referencing these older, unmodified blocks. These snapshots are great for backups but also incredibly valuable for forensic analysis to see what the system look like at a previous state.

Speaker 1

Very cool. Finally, in this section, what's RAID? I usually hear that mentioned in the context of big server storage or maybe backups.

Speaker 2

RAID stands for a redundant array of independent discs. It's a technology that combines multiple physical disc drives into a single logical unit that the operating system sees as one big disc.

Speaker 1

Why do that?

Speaker 2

It can be done for various reasons. Sometimes it's just to create one large, consolidated filesystem from several smaller discs. That's RAID zero, which focuses on performance or size but offers no redundancy. Or more commonly, it's done for redundancy and fault tolerance. For instance, RAD one marrors data exactly across two or more drives. If one fails, the data is safe on the other. RAT five uses parity information striped across drives, allowing for a single drive to fail without any data.

Speaker 1

Loss and for forensics.

Speaker 2

For forensics, analyzing a RATE array means understanding how the data is striped or mirrored across those multiple physical discs. You often need to image all the member discs and then use specialized software to reconstruct the original logical volume before you can even start analyzing the filesystem on top of it. It adds another layer of complexity.

Speaker 1

Fascinating. Okay, let's move on and take a quick tour of con file systems. These are like the different languages or organizational schemes and operating systems used to manage all these bits, bytes, clusters and partitions we've been talking about. We'll start with the old, reliable FAT and its newer cousin x fat sure.

Speaker 2

The FAT file Allocation Table filesystem is well, very old and relatively simple in its structure. It basically consists of only three main components, the boot sector, the file allocation table itself which tracks cluster usage, and the directory entries. Because of its simplicity and wide compatibility, you still find it very commonly on removable media like USB drives and older SD cards, but it.

Speaker 1

Had limitations, especially with file size.

Speaker 2

Big Time FAT thirty two, the most common version, famously had a four gigabyte limit for single files, which is tiny by today's standards. That's where x fat comes in. It's a newer file system, also for Microsoft, designed specifically for larger removable media. Like modern high capacity SD cards and USB drives.

Speaker 1

What's the key difference?

Speaker 2

A key improvement is its support for much larger files. XPECT can handle files up to theoretically one hundred and twenty eight petabytes that's pb, which is enormous. It achieves this partly by using eight byte values to store file sizes compared to FAT thirty two's four byte values, so it's much better suited for things like large video files on flash drives.

Speaker 1

Okay, then there's the standard for Windows for a long time now and TFS. What are it is? Defining features? What makes it such a robust system? And what kind of hidden details might it hold for investigation? Right?

Speaker 2

NTFS New Technology Filesystem has been the default for Windows for decades now, and it's a much more complex and robust filesystem than FAT. One key feature is journaling.

Speaker 1

What does that do?

Speaker 2

Journaling means the filesystem records pending changes to its metadata in a log or journal before actually committing those changes to the main filesystem structures. This makes NTFS much more fault tolerant. If the system crashes mid operation, it can use the journal to recover and ensure the filesystem structure remains consistent reducing the risk of data corruption.

Speaker 1

That sounds important very.

Speaker 2

NTFS also supports something called alternate data streams or ADS. This allows multiple separate streams of data to be associated with a single.

Speaker 1

File name like hidden attachments.

Speaker 2

Kind of Yeah, the main file data is in one stream, but you can attach other hidden streams. These aren't always visible through standard tools like Windows Explore, so they have sometimes been used to hide information maybe malware components or other data. Instigators definitely need to check for ADS.

Speaker 1

So it's like a hidden digital tag or a secret compartment within a file. Have investigators found surprising ways these ADS are used, maybe to trace a file's origin absolutely.

Speaker 2

For example, a common ADS you might encounter, maybe without realizing it, is the zone dot identifier. Windows often automatically attaches this stream to files downloaded from the Internet, and it records information like the original URL it came from. That can be crucial evidence for tracing malware or suspicious documents back to their source.

Speaker 1

Interesting. What else is key in NTFS?

Speaker 2

Well, the heart of NTFS is the Master Filetable or MFT. Think of it as a highly detailed database or library catalog for every single file and folder on the volume. Each file has an entry in the MFT.

Speaker 1

And what information is in that entry.

Speaker 2

Lots of metadata. It stores file attributes, including things like objectives. These are unique identifiers UUIDs that can act like digital fingerprints, potentially linking a file back to the specific computer it was a real created on, even if the file itself has been copied around. It includes creation time stamps and even the m MASS address of the network card of the creating machine sometimes WOW. And the MFT also holds

security descriptor attributes. These contain information like the owner and group security identifiers sids and the access control lists acls, which define exactly who has permission to read, write, or execute the file, all crucial details for an investigation.

Speaker 1

Okay, moving across to the Linux world, now we have the EXT family of filesystems. How has that evolved over the years and what does the latest version XT four offer investigators?

Speaker 2

Right, the EXT family is native to Linux. It started with X two, which was simpler and stored file metadata in structures called inodes. Each file or directory has an inode containing information like permissions, time stamps, and pointers to the actual data blocks.

Speaker 1

But X two lacks something important, Right?

Speaker 2

Yes, X two lacked journaling, making it vulnerable to corruption if the system crashed during rites. So X three was developed and its main addition was journaling, providing that resilience similar to NTFS. X three also introduced h tree directory indexing, which was a significant performance improvement, allowing directories to efficiently handle millions of files, overcoming a major limitation of X two for large file systems.

Speaker 1

And X four is the modern standard now right, what really makes it stand out? Especially for forensics?

Speaker 2

Yes, xdoor is the default for many Linux distributions today and brought several important X to four innovations. One major change was the introduction of extents. Instead of using individual block pointers for large files, extends to find a starting block and a length, making storage much more efficient for large contiguous files and simplifying.

Speaker 1

Recovery any other key features Yeah.

Speaker 2

Another one is inline storage for very small files, X four can actually store the file's data directly within the inode structure itself, saving space and eliminating the need for separate data blocks entirely all interesting and crucially for investigators. XT four dramatically improved timestamp granularity down to nanosecond precision. It also officially added support for a file creation timestamp sometimes called cre time or b time, which was often

missing or unreliable in older ext versions. This allows for incredibly detailed timelines of filesystem events down to billions of a second.

Speaker 1

Nine seconds again, okay, finally in our tour, Apple's next generation system, APFS. How does that compare? Especially given Apple's big focus on security and user privacy.

Speaker 2

APFS Apple filesystem was introduced relatively recently to replace their older HFS plus filesystem on macOS iOS and other Apple devices. A big architectural change was moving to sixty four bit enodes. This vastly increases the theoretical number of files possible on a volume compared to HFS plus tin essential for handling the massive amounts of data on modern devices.

Speaker 1

Makes sense. What about features?

Speaker 2

Well? APFS was designed with modern hardware like SSD's and mind. It features robust built in encryption, which can be applied at the whole disc level or even per file. This obviously makes forensic analysis much more challenging if you don't have the decryption keys, a.

Speaker 1

Big hurdle definitely.

Speaker 2

However, APFS also has really robust snapshot creation capabilities built in. These are used by time Machine for backups, but they also mean that older versions of the filesystem state might be easily accessible, which can be very useful for forensic analysis, allowing you to roll back and see previous file versions or system.

Speaker 1

States interesting, anything else unique.

Speaker 2

One other notable feature is space sharing. In APFS, multiple logical volumes can exist within a single physical container and share the underlying free space. This is flexible for users, but can make tracking exact data allocation and free space a bit more complex for forensic analysis compared to traditional fixed partitions.

Speaker 1

Wow, it's a lot of complex hidden infrastructure under the hood. So how do investigators actually get to all this data, these bits bytes in odes, MFT records without altering the original evidence or causing more problems like those SSD changes. What's in the investigator's toolkit acquiring and analyzing digital evidence right?

Speaker 2

The first and absolutely critical step is forensically sound acquisition. The goal here is to create a perfect copy of the original storage device without changing the original in any way.

Speaker 1

How do they guarantee that?

Speaker 2

Primarily through the use of right blockers. These can be specialized hardware devices that sit between the investigator's computer and the evidence drive, physically preventing any right commands from reaching the evidence, or they can be software based, like specific settings and Linux that mount a device and read only mode. At a very low level, the idea is to put that fragile artifact, the original drive, in a protective glass case, metaphorically speaking, before you even begin to study it.

Speaker 1

Okay. Essential step and a core Linux command for making that copy is DAD. I hear. That's incredibly powerful, but maybe a little dangerous if you're not careful.

Speaker 2

Yes. The AAD command is a classic, very powerful Linux tool for creating raw images, which are exact bit by dick copies of a device or partition. It's sometimes nicknamed data destroyer if you mix up the input and output. Uh hot, Yeah, you have to be careful, but used correctly, it copies everything every sector, including all the unallocated space and slack space we talked about earlier, which is vital.

You can also use options like count and skip to copy only specific parts, allowing you to extract exact data that is desired. For example, you could use it to copy just the first five hundred and twelve bytes to get the master boot record, or just the blogs belonging

to a specific partition table. Precise, very although for efficiency, error handling and adding metadata about the acquisition process, many forensic investigators now prefer specialized image formats like the Expert Witness Format EWF, often created by tools other than just DD. These formats can compress data, handle read errors better, and store case information alongside the image data.

Speaker 1

Okay, So once they have that pristine, forensically sound image, that bit for bit copy, how do they start actually picking it apart finding those hidden files, deleted remnants, or specific pieces of metadata.

Speaker 2

That's where analysis tools come in. A very widely used suite, especially in the open source world, is the sleuth kit, often abbreviated as TSK.

Speaker 1

Okay, what does that do?

Speaker 2

TSK is actually a collection of command line tools designed specifically for filesystem forensics. For example, a tool called fusstat can analyze the image and determine the filesystem type NTFS, XT four, FAT, et cetera, and provide overall information about the volume like block size and total blocks, so like an initial assessment exactly. Then there's FLS, which is used

to list files and directories within the filesystem image. Crucially, FLS can often list deleted files that still have metadata entries, and it can even list the low level filesystem structures themselves like inodes and ext or cnids and hfs, plus essay showing you where things used to be or how they're.

Speaker 1

Organized, and getting the actual data for that.

Speaker 2

You'd use tools like astat to recover specific file metadata, timestams, permissions, size as block pointers by referencing its inode number or MFT entry id, and then iicat in use to recover the actual file content itself, essentially concatenating together the data blocks associated with that specific file or structure.

Speaker 1

So TSK can also help put together timelines and maybe even recover deleted files more automatically.

Speaker 2

Indeed, TSK includes tools like mac time, which can parse various timestamps collected from across the filesystem metadata like modified access change times from inodes or MFT entries, and generate

incredibly detailed timelines of activity. There's also bly calls for extracting blocks from the unallocated space, making it easier to search that space for remnants of deleted data and scract cover attempts to automate the recovery of deleted files by finding their orphaned metadata and associated data blocks.

Speaker 1

What if the metadata is completely gone, though, the filesystem structures are corrupted, but maybe the raw data for say a picture, is still sitting out there in unallocated space. Ah.

Speaker 2

That's where a technique called data carving comes up.

Speaker 1

That sounds like digital arc cheology, like trying to reconstruct a shattered vase from its unique edge patterns.

Speaker 2

That's a great analogy. Data carving works by bypassing the file system entirely and instead searching for known file signatures directly within the raw data stream of the image. These signatures are unique sequences of bytes that typically mark the beginning header and sometimes the end footer of specific file types. For example, JPEG image files almost always start with the hex bytes zero ox f ft eight and end with zero ox f ft nine, so.

Speaker 1

You just scan the whole image for those patterns.

Speaker 2

Essentially, yes, carving tools scan the raw data looking for these known headers and footers, and then extract the data in between as a potential file. It's like looking for specific patterns of color and texture to identify pieces of that shattered vase. However, it's important to know that data carving is not fully reliable. Files can be fragmented, meaning the header and footer might be separated. By unrelated data,

or the footer might be missing entirely. Also, those byte patterns might occasionally appear randomly within other unrelated day leading to false positives. So car files always need careful validation.

Speaker 1

Makes sense? Okay? Looking ahead now, with all this technology evolving so incredibly rapidly, SSD's new file systems, encryption, what are some of the biggest things on the horizon future challenges in digital forensics? What keeps investigators up at night?

Speaker 2

Well, probably the single biggest challenge is simply the data volume problem. We've seen this vast increase in the quantity of digital evidence phones, with terabies of storage, cloud accounts, IoT devices, it's everywhere, right, This creates a kind of vicious cycle because the resources available for analysis, skilled human analysts,

processing power, storage for forensic images. They haven't kept pace more data, but limited resources to handle it efficiently means backlogs grow and investigations can slow down.

Speaker 1

And new file systems keep popping up too, right, Yeah, constantly changing the rules of the game.

Speaker 2

Absolutely. New file systems like APFS when it first appeared, and even Extra four. Historically, they often require significant reverse engineering by forensic researchers, especially if official documentation is scarce or non existent. This takes a huge amount of time and specialized skill, and there's always a risk of misinterpretation, which has obvious implications if the findings are presented in court.

Speaker 1

And you mentioned live data forensics earlier LDF, that comes with its own set of ongoing challenges too.

Speaker 2

Yes, definitely, Live data forensics revisited is a constant topic. We already talked about how it inherently breaks ACPO Principle one by altering data, making it impossible to realize that principle fully. But there are other.

Speaker 1

Risks too, sugg as like the system crashing maybe or that crucial audit trail.

Speaker 2

We talked about exactly. System crashes are a real possibility, especially when performing intrusive actions like acquiring the contents of RAM directly, which can sometimes destabilize a running system, and maybe more fundamentally, LDF is inherently non repeatable. Unlike deadbox forensics, where you can rerun analysis commands on a static image multiple times and get the same result, LDF changes the live system with every action. You can't perfectly recreate the state later, so.

Speaker 1

The documentation becomes even more critical.

Speaker 2

Absolutely paramount that documentation. The ACPO audit trail becomes the only way to verify the process, to show what was done, when and why, because you can't simply rerun the experiment.

Speaker 1

Okay, And then there's the elephant in the room for any digital investigation today. Encryption. It's fantastic for user privacy obviously, but it can be a massive hurdle, sometimes a complete dead end for an investigation.

Speaker 2

Encryption really is the classic double edged sword. It protects legitimate users privacy and security, which is essential, but it equally protects criminal communications and data from investigators if they can't get the keys.

Speaker 1

Are there any proposed solutions?

Speaker 2

Well, Ideas like key escro systems have been proposed, where decryption keys would perhaps be held by a trusted third party under specific legal conditions, but these face huge technical and ethical challenges. For one, criminals could simply choose to use other encryption software or methods that aren't part of the escrosystem. And second, there are massive privacy concerns about governments or other entities having potential access to everyone's encrypted data. It's a really difficult balance.

Speaker 1

So we're kind of back to square one on that.

Speaker 2

In many ways, and many ways yes, it remains a major challenge and finally sort of tying a lot of this together, there's a significant ongoing need for better standardization and tool testing in the field, meaning developing standardized TORBA basically large, well defined data sets of known digital evidence containing specific artifacts. These could then be used to rigorously test and validate the accuracy and reliability of different forensic

tools and techniques. Having such standard test sets would greatly increase their acceptance in the courts and build more confidence in forensic findings overall because results could be consistently replicated and verified across different tools and laps.

Speaker 1

Wow, what a journey that was. We've really delved deep into the hit and structures of file systems, haven't we From the absolute basics of bits and bytes, through the intricate designs of different operating system formats like ntfs and XT four and apfs, and then explore the cutting edge techniques investigators use to try and uncover digital truths from all that complexity. It's a world that is just constantly, constantly changing.

Speaker 2

It really is. The evolution is relentless. Every new app, every new device, every new way we interact digitally, it creates new challenges, but also potentially new opportunities for digital forensics. And as that digital world keeps expanding, so does this invisible landscape of information hidden beneath our everyday interactions. It's honestly a constant race just to keep up to understand the new hidden languages that are always emerging.

Speaker 1

So here's something to think about as we wrap up. Given how deeply digital evidence now intertwines with almost every aspect of our lives, how might our growing understanding of these invisible file systems the stuff we talked about today, How might that reshape the very definition of truth in a modern investigation? And maybe more intriguingly, what new hidden languages, what new forms of digital evidence might emerge next that

will challenge even our current forensic capabilities. Think about that for a moment. This knowledge isn't just for the tech experts or the investigators. It's really about understanding the very fabric of our digital existence and how those unseen layers can reveal or sometimes conceal, the most profound stories of our time.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android