#375 Max: The AI Illusion (Why ChatGPT & Claude are "Pretending" to Read Your Docs)

00:00

You upload a massive 300 -page PDF into an AI. You expect it to really read every single word. It spits out this beautifully formatted, deeply confident summary. It looks perfectly organized. It looks flawless. Right. But what if the AI didn't actually read it? Right. What if it just pretended to? That is a truly terrifying thought. Okay, let's unpack this. Welcome to today's deep dive. Yeah, I'm really excited for this one. It fundamentally shatters that blind trust we've

00:28

all built up. We routinely trust these digital oracles to summarize our most important documents. We just blindly assume they process every single page. But today we are exploring a landmark guide. Max Ahm published it in March 2026. It covers context, rot, and long document risks. It changes exactly how you view AI. We have a very clear roadmap for you today. First, we explore a fascinating hacker experiment. One involving the Harry Potter books. Exactly. Then we uncover why AI models

00:58

just glaze over. They literally ignore the middle of your documents. Finally, we give you a specific framework. It is called the Divide and Conquer Strategy. This framework will actively protect your hard work. Let's start with that hacker experiment. This took place in early 2026. Researchers went on Hacker News with a brilliant plan. They fed all seven Harry Potter books into an AI. That is well over 1 million words of text. They used Claude 4 and GPT 5 .2 for this test. They

01:27

asked the AI for every single spell. They wanted a perfectly complete list from the books. The models did an absolutely amazing job. They pulled 407 spell references. They found 82 unique spells in total. The output looked completely flawless, neatly organized by chapter and character. It looked like perfect comprehensive analysis, but there was a massive catch. Yeah. The researchers had laid a very clever trap. What's fascinating here is the specific setup. Before running the

01:54

test, They modified the books. They secretly inserted two fake spells. They put them right into the actual text. They didn't just drop them in randomly. Right, and they engineered them to fit the story perfectly. They made them look incredibly authentic. The first fake spell was called Fumbus. Which makes a target float exactly one inch. The second fake spell was called Driplo. It causes rain on one specific person. They hid them seamlessly in the story. Harry used them

02:22

in very believable mundane moments. They felt exactly like real scenes. But the AI models missed them completely. Not a single one noticed them in the text. Why did they miss them entirely? It all comes down to training data bias. Pure memorization. They were not actually reading your file. They were just reciting what they learned in training. Fumbus and Driplo didn't exist anywhere online. They were not in any fan wikis? They weren't in the original training

02:49

data? The models couldn't recall them from memory, so they completely ignored them in the uploaded text. This was backed up by a 2025 study. Stanford and Yale researchers tested this deep memorization. They wanted to see exactly how strong it was. The results were absolutely stunning. The models have the first book so deeply memorized. You give them just the opening sentence of chapter one. Then they reproduce the rest of the book. They hit up to 96 % accuracy. Whoa, imagine reciting

03:15

a whole book from one sentence. It is truly mind -blowing to think about. The AI is essentially hallucinating its own accuracy. Right. It recalls historical patterns instead of analyzing your actual text. Beat, so is the AI actually thinking about the document we give it or just matching patterns? It is entirely matching patterns from its training. It relies on its vast memory bank. rather than actively comprehending the new text in front of it. So it's not thinking, just reciting

03:43

memorized patterns. But that brings up a massive counter -argument. What about documents the AI has never seen before? Right. Harry Potter is literally everywhere on the internet. It is heavily trained, ubiquitous data. But what about a private legal contract? Or a... Brand new internal financial report? The AI has to genuinely read those, right? In 2025, researchers tested this exact scenario. They created brand new documents totally from scratch, filled them with random, unseen information.

04:11

They completely removed any chance of prior memorization. Then they hid specific facts inside the text. They didn't just place these facts randomly. They specifically engineered the test. They put some facts at the beginning of the file. They buried others deep in the middle. And they put a few near the very end. It was a classic needle in a haystack test. They asked the models to retrieve the hidden facts. This revealed a very real physical limitation. A phenomenon known

04:38

as context rot. This happens when documents exceed 100k tokens. Just to clarify the jargon for a moment. Sure. Tokens are simply pieces of words the AI reads and processes. Exactly. When documents get that massive, transformer models suffer. They develop a severe U -shaped attention span. Chromer research mapped this exact pattern out clearly. As the token count grows, retrieval measurably declines. This is a very predictable curve. At the beginning, the AI recall is very

05:10

strong. At the end, the recall is still decent. But the middle is a complete disaster. The AI's attention significantly degrades over those middle pages. It simply glazes over the central sections. Two sec silence. I still wrestle with prompt drift myself. I always worry it missed something crucial in the weeds. It is a totally valid fear. The deeper the information sits, the worse it gets. Smaller details fade away entirely in the middle. The Harry Potter experiment suffered

05:36

from this too. Yeah. Even if the model tried to actually read it. The fake spells were buried deep in the middle. Context rot made them extremely difficult to find. Why? Why is the middle of a document so uniquely vulnerable to this fading attention? The underlying math of transformer models heavily prioritizes the edges of the provided context window. The AI naturally prioritizes the edges, ignoring the middle. Yeah. Here's where it gets really interesting. A lot of people

06:03

point to a specific tech fix. They say RAG is the ultimate cure -all here. It is a very popular acronym right now. It stands for Retrieval Augmented Generation. Let us define RA simply for everyone. Okay. Searching small document chunks to help the AI answer your question. That is a perfect explanation. Instead of feeding the whole document, you break it up. You convert those smaller chunks into number patterns. We call those patterns embeddings. Embeddings are just translating text

06:31

into numbers for the AI to understand. Exactly. It's kind of like stacking logo blocks of data. Then you retrieve only the most relevant chunks. You pass those specific chunks back to the AI. It sounds like a truly perfect solution. It is a brilliant workaround in theory. But it completely fails on very broad requests. Imagine asking the AI to summarize all document risks. This is a very open -ended, extremely broad question. Right. And that completely breaks the Argue system.

07:01

The vector search just panics. There are three main ways Argue fails here. First, the system returns far too many chunks. This causes a mini version of context rot. You just have chunks instead of full pages. The AI still glazes over the middle. Second, the system returns far too few chunks. The search simply doesn't know what to grab. So highly relevant information gets missed entirely. Third, it misses ambiguous search terms completely. The concept might not map cleanly

07:28

to a chunk. If it misses the chunk, it misses the fact entirely. ARG is definitely a very smart tool, but it clearly struggles with massive, comprehensive retrieval tasks. Two sec silence. When is ARG actually useful for the average person trying to analyze a large document? It shines brilliantly when you are looking for highly specific, narrow facts in well -organized files. It excels at specific facts, but fails broad summaries. Sponsor. Mid -roll sponsor break. We are back.

07:59

Let us talk about who is actually at risk here. Who really gets hurt by this? If we connect this to the bigger picture, the real world stakes are incredibly high right now. This is not just an academic coding curiosity. No, it is a massive professional liability for many. Lawyers rely heavily on these AI tools. They upload long contracts to spot liability issues. Imagine a critical clause buried on page 47. Out of an 80 -page legal document. The model easily misses it due

08:24

to context rot. But the rest of the analysis looks completely clean. Medical professionals face the exact same danger. They analyze massive patient histories with AI assistance. A buried contraindication could be overlooked entirely. A critical warning gets lost because the AI glazes over. Financial analysts are highly vulnerable, too. They scan 300 -page PDFs for buried risk disclosures. Missing one paragraph can ruin an entire financial deal. Compliance teams and researchers

08:51

also struggle heavily. They review huge regulatory filings and massive data sets. Missing middle sections changes their entire conclusion. This brings us to the most dangerous failure mode. The 2026 guide calls it the polished omission. This is truly terrifying to think about. In 2026, models are incredible at formatting text. They present beautifully structured, numbered lists. They feel deeply authoritative and utterly complete. But they are actually missing critical information.

09:21

A bizarre hallucination is actually quite safe. Because you naturally question something that sounds crazy. Yeah. You check the facts immediately. You verify the bizarre claim without hesitation. But a beautifully formatted list bypasses human defenses completely. It looks absolutely perfect on the screen. It feels incredibly thorough and completely accurate. But it is simply missing two critical items. Almost complete is highly

09:44

dangerous in the real world. Beat, why does our psychology allow us to trust formatted text so easily? We inherently associate neat organization and confident presentation with thorough accuracy in actual human competence. Beautiful formatting tricks our brains into assuming total accuracy. Exactly. So what does this all mean? How do we actually fix this problem? You don't have to abandon AI completely. You just need to use it intelligently. The guide outlines a very clear

10:12

playbook for you. It is called the Divide and Conquer Framework. Step one. Never upload a 300 -page document all at once. Split it up into much smaller pieces. Break the massive document into 20 -page sections. Analyze each section separately to reduce context overload. Ask highly targeted questions. Give the AI extremely narrow specific scopes. Say something like, focus on pages 20 to 35 for liability clauses. Narrow prompts drastically reduce reasoning errors.

10:40

Step three, cross -validate your final results. Run the exact same query through Claude. Then run it through GPT. Differences between the outputs will reveal missed details. Their blind spots are different. It is a really great safety net. Step four, spot check the completeness of the output. Test the AI with items you already know exist. Make sure the AI actually found them. Step five is the most important of all. Get a final review. AI is the first set of eyes. It

11:08

should never be the last set of eyes. You must verify everything yourself. Especially in high -estates documents like contracts or medical records. Two -sec silence. What is the time trade -off for doing all this manual dividing and cross -checking? It definitely takes longer than a single click, but it completely prevents catastrophic professional failures. It takes more time but saves you from critical errors. We have covered a lot of important ground today. Let us briefly

11:33

recap the core insights for you. AI tools are incredibly powerful assistants, but they suffer from two massive systemic flaws. Training data bias and severe context rot. They heavily memorize famous data like Harry Potter. They often pretend to read your actual files. And their attention predictably fades in the middle of long texts. The professionals who win don't trust AI blindly. They understand exactly where the systemic cracks are. They build smart workflows around those

12:02

physical limitations. They divide and conquer their large files. They stay actively involved in the final review. This raises an important question. Wait, actually, let me rephrase. This raises an important question. If the most dangerous thing an AI can do is give us a perfectly polished, beautifully formatted half -truth, how do we train our own brains to be naturally skeptical of things that look flawless? That is something you should definitely chew on today. Try the

12:29

20 -page chunk method yourself. Use it on your very next big project. See the massive difference it makes in accuracy. Thank you for joining us on this deep dive. Stay curious out there.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript