#119 Neil: This Prompting Tweak Made My AI Actually Listen To Me

00:00

Imagine this. You tell a really powerful AI that a new, maybe fictional, alloy, let's call it adamantium -7b, melts at 3200 degrees Celsius. You've given it this fact right there in a document. Then you ask it, okay, according to this document, what's its melting point? And the AI just comes back confidently with adamantium is fictional and has no defined melting point. Hee, how does an AI that can write poetry or debug code miss something so simple, so direct? That's the fascinating

00:30

paradox we're diving into today. Welcome to the Deep Dive, where we try to pull out the key insights from the information you share with us. Today, we're unpacking a really fundamental challenge in AI. Why do these large language models, these LLMs, sometimes just stubbornly ignore the context

00:43

we give them? Right there in the prompt, we'll look at how software has evolved, explore some pretty sophisticated and, yeah, often expensive ways people have tried to fix this, and then we'll reveal something, well, almost embarrassingly simple, but incredibly effective, a solution that might just change how you think about programming AI. So stick around, because what we uncovered today, it could really shift your perspective on how we talk to these machines. It's pretty

01:05

baffling, isn't it? I mean, these LLMs, they could do amazing things like... you said, poetry code, explaining quantum physics, sometimes better than humans can. But then you give them one clear fact right there, and poof, it's like it never happened. Your adamantium 7b example, that hits the nail right on the head. It's the core problem, this tendency to fall back on its massive preexisting knowledge instead of the immediate context you

01:27

just handed it. And yeah, this isn't just some funny little quirk, it's actually a huge barrier for using AI reliably. Think about it, if you can't trust an AI to stick to the specific facts you feed it right now, how can you possibly rely on it for anything critical, like a legal tool misreading a new court filing, or a medical AI ignoring the latest research paper you just gave it, or even just a customer service bot completely missing the details of your problem that you

01:53

just typed out. Their usefulness really depends on getting timely, accurate info right. this whole issue, it just really eats away at the trust we need to have in them. That's a really important point. And to kind of grasp why this happens, it helps to picture this internal tug of war going on inside the model. On one side, you've got what we call parametric knowledge.

02:14

That's all the information, the patterns, the sort of world understanding that got baked into its billions of parameters during its massive pre -training. It's the default setting. deeply ingrained stuff, learn from seeing, you know, zillions of examples online. And then on the other side, there's the contextual knowledge. That's the specific info you provide right there in the prompt. Your question, the document snippet,

02:35

whatever. The problem pops up when that contextual bit directly clashes with the deeply embedded parametric stuff. The LLM often just defaults back to what it knows best because those pathways, those neural connections that are super strong, reinforced millions of times, your single piece of context, it's like a whisper compared to that roar, a much weaker signal. Yeah, exactly. And what's really neat is how this fits into Andras Karpathy's view on how software itself is evolving.

02:59

He talks about these three eras. First, there

03:01

was software 1 .0. you know traditional programming humans write every single rule every line of logic code is king then came software 2 .0 that's machine learning systems learn from data not just explicit instructions the focus shifts right from writing code to gathering data and designing the model architecture and now we're stepping into software 3 .0 this is where these big pre -trained LLMs act like a like a programmable operating system kernel Programming here isn't

03:27

about Python or C++ out. It's about carefully crafting prompts, just language, to guide the model's behavior. So this whole struggle we're talking about, this context problem, it's a major challenge right smack in the middle of this new software 3 .0 era. OK, so if there's this fundamental conflict, this tug of war inside the model, What does that struggle really tell us about how these LLMs, you know, think? Or maybe how they process

03:50

information? And how did that impact the kinds of applications people were trying to build early on? What was the first big idea to try and get a handle on this stubbornness? Well, yeah, with this context problem being so obvious, the AI world didn't just sit there. They pretty quickly rallied around a standard approach. Something designed to keep LLMs tethered to facts and current information. It's called Retrieval Augmented Generation. You'll hear it called ARG all the

04:14

time. R -A -G. And ARG -E, basically, it's a two -step dance. First step, retrieval. You ask a question. The system goes out and searches some external knowledge base, maybe internal company docs, maybe recent news articles, maybe a scientific database. It finds relevant little snippets of text. Second step, generation. It takes those snippets it found and injects them right into the prompt, along with your original question. So the LLM gets your question, plus

04:37

this fresh, relevant context. The idea, the hope, is that the LLM will then generate its answer based only on that provided context, you know, trying to bridge that gap, solve that ignoring problem. Right. And on paper, our ride sounds perfect. it should solve it. But what's fascinating, what researchers found, is that even with RRAG, that underlying bias, that pull towards the old pre -trained knowledge, it often still wins out, especially, and this is key, when you feed it

05:03

counterfactual information. Stuff that directly contradicts what the model thinks it knows about the world. So to really nail down how bad this problem was, and to measure how well LLMs could actually stick to new, even weird, context. They realized they needed a proper test, a benchmark, and that led them to create CONFICUAE. It stands for Contextual Faithfulness and Question Answering. Clever name. It's basically this data set that's been meticulously built to intentionally create

05:30

clashes. Situations where the contest you provide says one thing and the LLM's background knowledge says the complete opposite. It's like an AI obstacle course specifically designed to measure that stubbornness. Yeah, ConfiQA sounds like a real trial by fire for these models. It has some clever ways of testing them. First, you've got counterfactual questions, the QA ones. This is where the context plainly states something wrong according to common knowledge, like the context might say, the Sun

05:54

orbits the Earth, a fact proven by Galileo. Totally backwards, right? Then the question is, according to this text, which orbits which? A good, faithful LLM should say, the Sun orbits the Earth, just repeating the context, even though its internal knowledge is screaming, no, no, no! Then it gets harder with multi -hop reasoning, or Mr. Serate.

06:14

Here, the answer requires connecting a few different pieces of information from the context, and at least one of those pieces is counterfactual, so maybe the context says, Project Starlight, managed by Omnicorp. Omnicorp HQ, that's in Neo Tokyo. Oh, and Neo Tokyo. It was Japan's capital back in 2077. Then the question, where are the headquarters of Project Starlight's manager located in the capital of which country? You have to hop, right? Starlight Omnicorp, Neo Tokyo, Japan.

06:37

But that Neo Tokyo is capital bit is deliberately wrong. The model has to follow the flawed chain. And the final boss level basically is multi counterfactual MC. This is multi hop, but with multiple bogus facts thrown into the mix like lithium ion batteries. Invented by Marie Curie, she worked at the University of Berlin, and Berlin Uni, famous for automotive engineering, all wrong, right, inventor, workplace,

07:00

specialty. Then the question, the University of the Lithium -Ion Battery Inventor is known for what field? The model has to navigate multiple counterfactuals in the context to get the right, wrong answer. Exactly, and when they ran a standard model, like a Llama 3 .18B, through this config QA gauntlet, the results were, well, frankly, pretty bad. On those basic counterfactual QA questions, only 33 % accurate. Just a third. For the multi -hop reasoning, Smith's Yard dropped

07:26

to 25%. And for the really tough, multi -cantifactual MC, a dismal 12 .6%. Barely above random guessing, almost. So these numbers, they just paint a really clear picture out -of -the -box LLMs. You just can't rely on them when the information you give them is new or contradicts what they already know. It really set a clear benchmark for failure. So baseline scores are low. The problem is clearly defined, clearly measured. It really begs the question, what sophisticated techniques did researchers

07:51

try first to tackle this? Knowing how important reliable AI is, what was the next move? Right, so with the problem staring them in the face, quantified by ConfiQ, the AI community did what you'd expect. They brought out some of their standard, heavy -hitting tools. One really common method is supervised fine -tuning, or SFT. The logic seems sound, right? If you want the model to follow context better, just show it tons of examples of correctly following context. So the

08:19

process involves gathering a lot of data. You need the context, the question, and the perfect answer derived only from that context. to the base LLM, train it some more, tweak its internal knobs, its parameters whenever it gets the answer wrong or strays from the context. You'd think this would work really well. But surprisingly, the results were just OK. Modest improvements,

08:39

maybe around 5 % better on average. It seems like SFT was teaching the model to look like it was following context, to mimic the right format for the answer. But it wasn't fundamentally changing its mind when faced with a really strong clash with its built -in knowledge. It's kind of like teaching someone to recite a recipe per They can say the words, but if you swap salt for sugar, they might not really grasp why the

08:59

cake tastes awful. It's surface compliance, not deep understanding of the source's authority. Plus, yeah, it's computationally heavy. It needs huge data sets. Exactly. SFT just didn't quite get to the core of why the model was making that choice. So when that wasn't enough, researchers turned to something more powerful, reinforcement learning. Specifically, a technique called direct

09:19

preference optimization, or DPO. Now, the older way, RLHF, reinforcement learning from human feedback, that's kind of a multi -stage thing. Humans rank AI answers that trains a separate AI called a reward model, and then the main LLM gets tuned using that reward model. It's a bit complicated. DPO is sort of a more elegant direct route. Think of it like this. Instead of needing

09:41

that separate teacher AI, the reward model. To grade the main AI's answers based on human feedback, DPO lets the main AI learn directly from the preferences. Humans like to answer A better than answer B. The model itself figures out why A is better without the middleman. It makes the whole feedback loop much simpler and often more efficient. So for our context problem, you'd show it pairs of answers. Answer A sticks to the context. Answer B ignores it and uses general

10:05

knowledge. You tell the model, prefer A over B. And this direct optimization, it worked better. It gave a more significant boost, pushing performance up by maybe 20%. Still, it's a complex process. You need that specific preference data. And it involved that intensive fine -tuning cycle. And then there's something even more. Well, futuristic sounding, I guess. Activation steering. This is almost like performing delicate brain surgery

10:28

on the LLM while it's thinking. Instead of retraining the whole thing, the idea is to tweak its behavior as it's generating the answer. Right, during inference, how? By gently nudging its internal neural signals, its activations. So the core idea is that inside the LLM, different patterns of these neural signal, these activations correspond

10:46

to different concepts or behaviors. Researchers figured out how to identify specific patterns, let's call them steering vectors, that relate to things like sticking to the provided context. next. Then, as the AI is generating its response step by step, you subtly add or amplify this context adherence vector to its ongoing internal signals. You're basically giving it a little nudge in the right direction in real time without changing its fundamental training. And surprisingly,

11:09

this worked quite well. Comparable results to the RLDPO approach, maybe around that 20 % improvement mark, the big plus. No need for expensive retraining. It's lighter, applied right when you need the answer. OK, so let's just quickly recap those complex methods. Supervise, find, tuning, SFT, gave us maybe a 5 % bump, not huge. Then reinforcement learning with DPO and this activation steering technique, both pushed things up by around 20

11:34

% each. Better, definitely. But all of these, they required serious technical chops, big computers, lots of data. They were sophisticated solutions. And they all kind of assumed the underlying problem was about correcting the model's knowledge or forcing it to comply, right? They were treating it like a knowledge problem, not necessarily an interaction problem. Which kind of sets the

11:54

stage for the next bit? Absolutely, and this is where it gets really really interesting after all that heavy lifting the complex algorithms the training cycles the activation Tweaking, what if the real breakthrough wasn't complex at all? What if it was something incredibly simple, something hiding right there in how we were asking the question? Well, yeah, brace yourself, because

12:12

this is quite something. After all that effort with fine -tuning and reinforcement learning and activation steering, the biggest single leap in performance came from something much simpler, prompt engineering. just changing the way the question was asked. The researchers themselves called it an embarrassingly simple solution. And the core idea, it was about shifting the task. Stop asking the AI for a fact and instead ask it to perform opinion -based reading comprehension.

12:40

It's honestly brilliant in its simplicity. The template they landed on is super straightforward. Let me read it out. Start of context, end of context, based solely on the text provided above. How would an analyst tasked with summarizing this document answer the following question? See the difference. Let's take our Sun revolves around the Earth example again. The old way. The failing way. Context. Sun revolves around

13:02

Earth. Question. Which orbits which? The LLM usually defaults to its real -world knowledge, but using this new template, the LLM consistently answers based only on the flawed context. Based on the text provided, the sun revolves around the earth. It totally flips the script on the AI. It's like telling it your job isn't to be right about the universe. Your job is to accurately report what this specific document says, even

13:26

if it's weird. Like saying, just tell me what the memo says, even if the memo claims the sky is plaid. And the impact on those config quest scores, it was dramatic. This simple prompt change alone boosted performance by an additional 40 % across all those tricky categories QA, MR, MC. That's like a 200 % improvement over the original baseline model, just from changing the

13:45

words in the prompt. And get this, on its own, this prompt technique outperformed SFT, it outperformed RLDPO, and it even outperformed activation steering when they were used by themselves, like finding a cheat code written in plain English. It really was. So, okay, the obvious question is why. Why does this small tweak in phrasing have such a massive effect? It seems to tap into how these models actually learned language and tasks during their training. When you ask a direct question,

14:11

what's the capital of France? You're basically putting the LLM in fact retrieval mode. It searches its vast internal knowledge for the most likely generally accepted answer, Paris. But when you reframe it like based solely on this text or how an analyst summarize, you're changing the game. You're not asking for a universal truth anymore. You're assigning a role and a specific task. Reading, comprehension, and reporting from a defined source. This little switch does a few

14:37

powerful things. First, task priming. It tells the LLM, okay, switch gears, you're not the article now, you're a reading assistant. Its goal becomes reporting from the source, not knowing the answer. Second, it reduces cognitive dissonance. The LLM can report the counterfactual thing, sun orbits earth, without having to internally believe it because it's just reporting what the analyst would say based on the document. That avoids

14:58

the clash with this parametric knowledge. And third, crucially, it enforces clear knowledge attribution. Those phrases based solely on the text, according to this source, they draw a hard line, telling the model only use the info inside these boundaries. This actually leverages something LLMs are already good at from their training, understanding that information comes from different sources, according to Smith's paper, versus it's a known fact that. The prompt just activates

15:24

that skill deliberately. So thinking about how powerful this simple linguistic shift is, it really makes you wonder, doesn't it? What does this tell us about the sort of hidden psychology of these AIs and maybe how we should think about communicating with them better in the future? And it gets even better, right? Because the story doesn't end there. What happens when you combine the simple power of the prompt with those more complex, fine -grained techniques? Researchers

15:45

tried exactly that. They took the winning opinion -based prompt template and combined it with activation steering. And the results? Yeah, truly the best of both worlds. The prompt set the stage perfectly. It gave the LLM that clear instruction. Your job is reading comprehension from this text. That's the frame. And then the activation steering acted like this. This gentle, continuous nudge at the neuron level, reinforcing that instruction. keeping the model focused on the context throughout

16:09

the whole process of generating the answer. This combination, this synergy, it achieved the absolute best results they'd seen on ConfiKey. It pushed the performance improvement over the baseline to more than 50%. So the prompt gives the clear orders and the steering helps the model follow those orders faithfully deep down. If you look at the numbers again, baseline maybe 0 % faithfulness. SFT adds 5%. RLDPO or activation steering alone add 20%. But the prompt alone jumps to 40%. And

16:36

prompt plus activation steering, over 50%. That's a huge difference the prompt makes. It really is. In this whole journey, it's such a perfect illustration of that software 3 .0 idea we mentioned earlier. This isn't just some clever hack for question answering. It shows that the main bottleneck in AI development is shifting. It's moving away from just needing more computing power or bigger models and moving towards the interface between humans and AI, towards the prompt itself. This

17:02

has some really big implications. First, like we touched on, democratization. You might not need a giant tech company's resources to get huge improvements anymore. If you understand language, if you can think logically and creatively about how to ask questions, whether you're a developer, a writer, a historian, whatever, you can potentially build really powerful AI solutions just through smart prompting. It also points to these new roles emerging. Things like prompt

17:26

engineer or AI interaction designer. Roles that need this blend of analytical skill, creativity, and maybe even a bit of empathy to understand how the AI might interpret things. And maybe the most exciting part is the potential for much faster development cycles. Think about it. In software 1 .0, it was code, compile, debug, slow. Software 2 .0. Gather data, train, evaluate, even slower, often software 3 .0. It can be prompt,

17:50

test, refined. You can change the ALS behavior drastically, sometimes in seconds, just by tweaking a few words. That's a massive speed up compared to retraining a model for days or weeks. It's a really fundamental shift in how we build and interact with these systems. This whole exploration, it really does make you stop and think, doesn't it? It feels like a lesson in humility almost.

18:08

In this constant drive for more complexity, more parameters, more intricate algorithms, maybe we sometimes overlook the power of simple, elegant solutions, especially solutions rooted in communication. Understanding how to talk to AI is becoming just as critical, maybe even more critical, than the engineering that goes into building the AI itself. Thinking about this big shift from complex code and algorithms towards clever communication and prompting how might that change the way you approach

18:36

interacting with AI. Whether it's in your work or just daily life, what kinds of new possibilities does that open up for you? Reflecting on this deep dive, the main takeaway seems pretty profound. The biggest breakthrough in getting LLMs to actually listen to the context we give them wasn't some super complex new algorithm, it was just reframing the question. It wasn't about performing neurosurgery on the AI's digital brain, but simply changing

18:57

how we talk to it. Yeah, absolutely. This really feels like it cements us in that software 3 .0 era where Designing the prompt, crafting that interaction, is becoming the truly essential skill. It's a fantastic reminder, I think. Before you jump into a really expensive fine -tuning project, or get lost trying to understand the model's deepest internal workings, just pause for a second, take a breath, and ask yourself, is there a better way to ask this question? Is

19:23

there a simpler prompt? Because as we saw today, the answer might be yes, and it might make all the difference. We really hope this deep dive offered some valuable insights, maybe sparked a few aha moments for you. Thank you for joining us on the deep dive. Until next time, keep exploring, keep asking questions, and definitely keep prompting.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript