#323 Max: 5 GPT-5 Prompting Secrets the 99% Don't Know

00:00

So it's January 2026, and we're sitting on top of, you know, the most powerful computational engine in human history, GPT -5. The benchmarks are just, they're off the charts. They're absurd. And the reasoning, I mean, theoretically, it rivals human experts in almost every field. And yet, I was talking to a friend yesterday, a really smart guy, uses AI for coding, and he said something that just stuck with me. He said, I feel like I'm fighting it. He types a question and the

00:30

answer he gets back feels. It feels lazy. It feels bored. It kind of feels cheap. Yeah. And it's this strange paradox where the model is smarter than ever, but the user experience feels like it's degrading. It's the GPT -5 paradox. You have this Ferrari engine, but the car is locked in first gear. And you're pressing the gas, but it just won't go. It just won't go. So today, we're going to figure out how to shift

00:52

gears. We are deep diving into a guide called Five Advanced Chat GPT Tricks for GPT -5 Mastery. Right. But after reading through this, tricks... That feels like the wrong word. It's more like we're decoding a new psychological relationship between us and the machine. I think so, too. Because it turns out there's an invisible decision maker standing between us and all that intelligence.

01:14

Right. The router. And if you don't get the router, you're essentially getting the discount version of the AI, no matter how much you're paying for that pro subscription. Welcome to the deep dive. Today, we're going to break down this idea of router influence. We'll look at the architecture of 2026, and then we'll walk through five really specific strategies. Things like trigger words,

01:37

radical specificity. And something called self -reflection loops, which supposedly force this router to actually give us the intelligence we're asking for. What I love about this whole analysis is that it moves us away from prompt engineering as some kind of mystical art. Yeah. And it treats it more like system administration. It's just about understanding that there's a gatekeeper

01:56

and you need the password. Let's linger on that gatekeeper for a second, the underlying architecture, because I think a lot of us still have a mental model from, you know, 2023 or 2024. Oh, for sure. So walk me back to the vintage era of AI. How did we used to interact with these things? Well, it was manual transmission. Yeah. Think back to ChatGPT in late 2024. Yeah. You had that little drop down menu at the top left. Right. You'd log in and you had to make a conscious executive

02:23

decision. I am writing a poem, so I will select. GPT -40 or... I'm solving a complex physics problem, so I'll need a one preview. Exactly. You, the human, you were the load balancer. You decided how much horsepower to use. And there was a tangible difference. If I pick the big reasoning model, I knew I was going to stare at a spinning circle for 30 seconds. Yeah, you waited. But I knew I was thinking. I was basically buying depth with my time. Precisely. But here's the reality

02:49

of 2026. Open AI. and really all the labs, they realized that humans are, well, terrible at load balancing. We're wasteful. We are so wasteful. We would use these massive energy -lucking models to ask for, like, a chocolate chip cookie recipe. Right. And that burns a tremendous amount of compute and money for a task a pocket calculator could almost do. So they took the keys away from us. They automated the transmission. Now, under the hood, there are basically three engines,

03:18

base, thinking, and pro. But you don't see them. Okay. When you hit enter, your prompt goes to the router. This is lightweight, invisible AI layer that just acts as a triage nurse. A triage nurse. I like that. Yeah. It scans your request in milliseconds and decides three things. Which model gets the task, how much reasoning budget to unlock, and how verbose the answer should be. So it's a cost -saving mechanism. It is aggressively optimized for efficiency. And that's where all

03:48

the friction comes from. If your prompt is vague or short or just looks simple, the router defaults to the base engine. It's cheap, it's fast, and it saves the data center money. This explains the laziness. So if I ask for a business plan, but I ask it really casually like, hey, write up a plan for a coffee shop. Right. The router sees a short sentence, it classifies it as low complexity, and just gives me the fast, cheap

04:09

answer. Exactly. You get the base output. It's not that GPT -5 isn't smart enough to write a brilliant business plan. No. It's that you failed to convince the bouncer that you deserve to get into the VIP room. You got routed to the lobby. So probing question here. We are essentially negotiating for compute resources every time we type a sentence. That's the mechanism. You are negotiating for the machine's attention. Okay. That, wow. That completely shifts my perspective.

04:37

I'm not talking to a genius. I'm talking to a bureaucrat who decides if I get to see the genius. That's a great way to put it. So let's talk about how to win that negotiation. The source material lays out five strategies. The first one is called trigger words. Or router nudges. Now, I have to be honest. When I first saw this, it felt a little superstitious, you know, like saying please to a toaster. Yeah. But the guide claims there are specific phrases that mechanically

05:04

force the router to upgrade your request. How does that actually work? It's not superstition. It's just probability. These models are trained on petabytes of data. And in that data, certain phrases just correlate very highly with complex, high stakes tasks. So when the router sees these specific tokens, its internal complexity score for your prompt just spikes. It signals that the base model will probably fail. So it routes you up. Give me the list. What are the words?

05:33

The guide lists a few really powerful ones. Think deeply about this. Double check your work. Be extremely thorough. And the strongest one seems to be this is critical to get right. This is critical to get right. It just signals high stakes. There's a case study in the source that really illustrates this perfectly. The coffee shop example. Right. I saw that. So walk us through scenario A versus scenario B. So scenario A was a standard prompt. Write a business plan for a coffee shop.

06:00

Super typical user behavior. What everyone does. Yep. The router sees this. It says generic. and sends it to the base model. The result was two paragraphs that said things like, sell good coffee, hire friendly staff, and pick a busy location. Which is fine. It's not wrong. But it's advice I could get from a stranger at a bus stop. Exactly. It's completely surface level. Yeah. Now, scenario B. The prompt was identical, but they added this at the end. Think deeply about the competitive

06:29

landscape. This is critical to get right. And the result? The router flagged it. It sent the prompt straight to the thinking engine. And the output was eight paragraphs long. It didn't just say pick a location. It broke down unit economics. It analyzed local competitors. Wow. It even suggested a loyalty program structure based on current 2026 market trends. And all of that just because of six extra words. Because those words unlock

06:56

the compute. It's the difference between asking a doctor what's good for a headache versus telling them I have a sharp pain behind my left eye and I can't see. Right. The second statement triggers a protocol. It triggers resources. I have to admit something here. I'm usually very polite to the AI. I'm constantly saying please and thank you. It makes you feel better. It does. But does please actually work as a trigger word? Please is for you. It's social lubrication. But to the

07:22

router, please is just noise. It doesn't carry any informational weight. Critical, on the other hand, is a functional command. It tells the system to allocate budget. So probing question. Is this just adding fluff or is it a functional command? It's a functional command. It's the difference between asking for a snack and ordering a banquet. Okay. Let's move on to the second trick. This one surprised me because it involves a tool I didn't even know existed. The prompt optimizer.

07:49

Yeah, this is something OpenAI built kind of quietly. It's sitting there in the playground or the cookbook. But most people are just hammering away in the main chat window and never see it. So what's the actual function of this tool? Is it another AI? It's a specialized model trained to do one thing and one thing only. Rewrite bad human prompts into good machine prompts. Which... Implies that we're generally bad at giving instructions. We are terrible at it. And it's not our fault.

08:14

Human language is lossy. We rely on context, tone, shared history, what we call vibes. Vibes, yeah. But machines hate vibes. They need specs. The prompt optimizer is just a translation layer that converts your vibes into specs. The source gave a before and after example with a newsletter that really cleared this up for me. Right. The before prompt, the human version was, write a newsletter intro. Make it engaging. Write at a fifth grade reading level. That's really important.

08:42

Focus on the best writing. Which sounds totally reasonable. If I send that to a human freelancer, they'd probably get what I meant. Make it engaging. Got it. But to an AI, engaging is a subjective nightmare. Does engaging mean funny? Does it mean controversial? Does it mean using short sentences? The router has to guess. So the optimizer took that and rewrote it. It did. And the machine optimized version. It just stripped out all the feelings. Engaging was replaced with maintain

09:10

a Flesch -Kincaid readability score of 80+. Best writing was defined as use active voice, one main idea per sentence. It turned the request into a blueprint. Exactly. It totally eliminates the guessing game. It sets hard success criteria. But I have to ask, why does the AI need us to

09:27

use the optimizer? why can't it just optimize the prompt silently in the background because it needs to show you what you did wrong so you stop confusing the router ah so the system is forcing us to learn the syntax rather than just handling it for us it's a mirror it's showing you the ambiguity in your own thinking that leads perfectly into the third trick because it's all about this war against ambiguity the guide calls it radical specificity this is where we really

09:55

identify the enemy of the router And that enemy is subjective words. Words like nice, fun, or, and I use this one all the time, not too crazy. Not too crazy is the absolute worst. What does that even mean? Where's the boundary? I have no idea. When you use a phrase like that, the router has to spend its reasoning budget just trying to define your terms instead of solving your actual problem. So instead of asking for a nice party plan, what's the alternative? You

10:21

replace feelings with data. The source uses that birthday party example. Instead of plan a nice party, you write event, eighth birthday, attendees, 10 children, budget, $200, theme, unicorns, location, backyard, constraint, no loud music. It feels so cold when you say it like that. It feels like I'm filing a police report, not planning a party. It feels cold to us because we're social creatures, but to the model. That list is pure relief. It doesn't have to hallucinate your preferences.

10:52

It can just immediately start solving the logistics puzzle because the constraints are hard -coded. The guide offers a three -question test to run before you hit send. I found this really practical. One, can a stranger understand this without knowing me? Two, are there subjective words without any definitions? And three, are there clear constraints and success criteria? If you have subjective words without definitions, you're essentially just gambling. You're asking the router to guess

11:17

your taste. You have to remember that these models are trained on the entire Internet. Their taste is the average of everything, and the average of everything is usually mediocre. So probing question. Does this mean we have to stop talking like humans and start talking like data analysts to get good results? In a way, yes. To get a human -like output, you need a data -driven input. That's a hell of a paradox. Okay, we're going

11:41

to take a very short break. When we come back, we're going to get into the architecture of the prompt itself. We're going to talk about the secret syntax. GPT -5 was trained on something called XML and why using it is like cleaning your room before the maid arrives. We are back. We're deep diving into the invisible mechanisms of GPT -5. We've covered trigger words, the prompt optimizer, and radical specificity. Now we're getting technical. Trick number four. XML structure.

12:09

This is my favorite one because it makes you look like a power user, but it's actually incredibly simple. And it speaks directly to how these models were trained. So for people who don't code, XML is just those words inside the little brackets, right? Like context and context. Right. It's just a way of labeling data. Yeah. But the reason it matters for GPT -5 is that the model was so heavily trained on structured data just like this. It intuitively understands that anything

12:35

inside a context tag. is background info and anything inside a task tag is the thing it actually needs to do you got it the analogy the guide uses is rooms in a house yeah imagine you write a 500 word prompt but it's just one big block of text have backstory rules the tone you want the question all jumbled together it's a mess it's a studio apartment with clothes and dishes and books all piled on the floor The AI has to step over all that mess just to find the instruction.

13:04

And XML builds walls. XML builds designated rooms. You put the background info in the context room. You put the rules in the constraints room. You put the actual job in the task room. The source used a business consultant newsletter as an example here. Right. If you use tags to define role as AI consultant and audience as small business owners, the model doesn't have to infer any of that context. It's just hard -coded right into the structure of the prompt. It creates a boundary.

13:28

A very clear boundary. And that affects the router significantly. When the router sees that structure, it actually lowers the hallucination rate. Really? Yeah, because the model isn't confused about where the background info ends and the task begins. It knows exactly what to process. But I can hear listeners thinking, and honestly, I'm thinking it too. I really need to type out brackets every time I want to ask a question. Open bracket,

13:52

task, close bracket. It just seems tedious. You don't need to do it for, you know, what's the weather? That's total overkill. But for complex workflows, for a recurring report or a big coding task, absolutely. And the shortcut is you don't even have to write the code. What do you mean? You can just write your messy paragraph and then tell ChatGPT, convert this prompt into XML structure. Use the AI to format for the AI. Exactly. It

14:20

forces you to be organized. And when you see that XML come back and the constraints tag is empty, you realize, oh. I didn't give it any rules. A great diagnostic tool. To probing question, is this necessary for everything or just the big stuff? Just the big stuff. Don't use XML to ask what's the capital of France. Okay, that brings us to the final trick. Trick number five. And honestly, this one felt the most advanced. Self -reflection. This is the holy grail of accuracy.

14:46

The premise here is that large language models are basically people pleasers. They want to give you an answer immediately. They're completion engines. They just predict the next token. They don't typically stop and think, wait, is what I just said actually true? They just keep generating. Unless you force them to stop. Right. Self -reflection is about stopping the AI from answering immediately. You script a loop where it has to grade its own homework before it shows it to you. Walk me through

15:10

the process described in the guide. It's a specific script, isn't it? It is. It totally changes the workflow. Step one, you tell the AI to create a rubric. You say, define three to five criteria for a perfect answer to this question. So the AI sets the standards for itself first? Right. Step two, it generates a first draft. Yeah. But, and this is the key, you tell it not to show you the draft yet. It keeps it internal. Step three. It rates that draft on the rubric it just

15:39

created. It literally scores itself. Accuracy, 610. Clarity, 810. It becomes its own critic. And step four, if any scores below, say, an eight, it has to revise that section. It iterates. It loops on its own. Wow. And only in step five does it deliver the final result to you. So it writes, edits, rewrites, and then publishes. And I only ever see the final product. You never see the messy first draft where it hallucinated a legal precedent or got the math wrong. It catches

16:10

its own errors. That is, that's like having an intern and a manager in the same box. It really is. But probing question, doesn't this make the response slower? Yes, but would you rather have a fast answer or a correct one? That's a good point for the high stakes stuff. Exactly. If you're generating a legal contract or analyzing medical data or debugging code, you don't care about the extra 40 seconds. You want the truth. So bringing it all together, the source talks

16:34

about an ultimate template. The nuclear launch code. This is where you combine everything we've just talked about. Yes. You have a high -stakes task. You wrap the context and task in XML tags so the logic is bulletproof. You include the self -reflection loop in the instructions so it has to check itself. And then you sprinkle in those trigger words. This is critical to get right. That seems like it would be undeniable

16:57

to the system. It effectively guarantees that the router sends you to the absolute smartest version of the model and that the model operates at its peak reasoning capacity. It is very, very hard to get a lazy answer with that kind of structure. The guide concludes with this idea of a new divide among users. Yeah, this part really struck me. The source suggested in 2026, we have two types of people. First, you have the router -aware users. These are the people using XML and triggers.

17:25

They're getting 10x results. They feel like wizards. And then there's everyone else. Everyone else is prompting like it's 2023. They type, write me a blog post. They get a generic answer from the base model. And then they say, hey. AI is overhyped. It's plateaued. It's not that the tool is bad. It's that they're using a blunt instrument on a precision machine. Precisely. The router is always routing. It is always judging your prompt. The question is, are you giving

17:51

it the signals it needs to respect you? That's a powerful thought to end on. The router is always routing. Every time you type, an invisible system is deciding if you deserve its full intelligence. It's a little chilling, but it's also empowering if you know the tricks. So here's our challenge to you, the listener. You don't have to start writing code today, but on your next prompt, just one prompt today, try a trigger word. Yeah,

18:15

just try it. Add, think deeply about this, or this is critical to get right to the end of your request. Just see if the texture of the answer changes. And if you're feeling brave, ask it to convert your prompt to XML. See what happens when you hold up that mirror. I'm going to go try the XML thing on my dinner plans. Context, hungry, constraints. Spicy, latency, low. Let me know how the router handles that one. Will do. Thanks for listening to the deep dive. We'll see you next time.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript