#440 Neil: Agentic Workflow Frameworks Crushing The Context Window Ceiling

00:00

Everyone is fighting the exact wrong war right now. Oh, absolutely. We constantly jump from Claude to ChatGPT to Gemini. Yeah, looking for the next big upgrade. Right. We keep chasing the illusion of a magic model. Beat. But the model is no longer the bottleneck. No, it really isn't. Welcome to this deep dive. Today, we explore what an agentic workflow actually is. It is completely going to change how you work. We will discover why cramming prompts absolutely destroys your

00:28

output. And we will map the exact method to build daily skills. People still treat AI like a magic search box. And honestly, that is a massive mistake for your daily productivity. We really need to treat it like a brand new employee instead. Like onboarding a real person. Exactly. We are completely changing how we interact with these tools. Right, because an agentic workflow isn't just one megaprompt. No, not at all. It is the entire loop the agent runs through. It includes the specific context,

00:58

the tools, and your hard rules. Yeah, and it even includes where the results get saved eventually. So it's a whole environment. Picture the model is just one machine on an assembly line. A perfect machine on a broken line still produces broken parts. That is a great way to look at it. You wouldn't hire a brilliant college grad and just walk out. Right. You would never just say, handle the whole company today. Exactly. You have to clearly define what good and bad looks like.

01:24

But wait, these models have read the entire internet. Yeah, they have. Shouldn't that vast public knowledge be enough for basic tasks? Like, why do they need constant hand -holding for every little thing? because vague inputs will always yield incredibly vague outputs. These models just predict the next likely text chunk mathematically. They know general facts, but not your highly specific standards. They don't know your company's taste. Right. They do not actually think the way that

01:54

humans think. The agentic workflow is the specific training plan for that new hire. Got it. So public knowledge isn't enough. It needs your specific training plan. That is exactly it. That makes total sense when you frame it like onboarding. But here is the problem. Since the AI desperately needs our instructions, we overdo it. We go way too far. Our natural instinct is to give it absolutely everything immediately. We just dump entire company

02:19

manuals right into the chat window blindly. And that instinct completely breaks the machine over time. It totally does. We have to talk about the mechanics of the token window here. A token is basically just a tiny fragment of a word. Right. It is how the AI digests information. The AI processes text using these small mathematical chunks. Anthropix beta models are hitting 1 million tokens right now. Which is just a massive amount of data. It is. Most standard models give you

02:47

200 ,000 tokens. But there's a huge problem with actually using all that space. Right. The recent research from Chroma highlights this perfectly. Yeah, the needle in a haystack test. Exactly. All 18 leading models drop in quality with longer contexts. They enter what the developer community calls the dumb zone. The dumb zone. That sounds incredibly frustrating to deal with. Oh, it is. It happens between 50 ,000 and 150 ,000 tokens. The attention mechanisms inside the neural network

03:18

literally get diluted. So the agent basically just forgets your initial instruction. Right. It forgets and it gets incredibly sloppy. This degrading phenomenon is widely known as context rot today. I still wrestle with prompt drift myself. to sex silence. It happens to everyone. You ran a brilliant prompt with perfect safety guardrails. And then 20 messages later, the AI completely forgets those fundamental rules. Yes. An agent that is sharp at message five fails

03:47

at 30. It just totally loses the plot. And this brings us directly to the 95 % rule. 95 % of users do not need giant instruction files. because it just wastes resources. Repeating known facts, like explaining what TypeScript is, just burns tokens. You can easily burn 140 ,000 tokens over 20 turns. Wow, just on repetitive basics. You are paying precious context for information the agent already knows. So why do we instinctively

04:13

react to an agent's failure by adding more? Like, why do we constantly cram more rules into the prompt? Because we mistakenly think that more text creates more safety. We think we are building better walls. In reality, your core signal just gets buried under useless noise. The AI simply loses the plot entirely when it gets overwhelmed. So we are just confusing it. You should only use large files for very specific things. Things like mandatory legal compliance language or strict

04:42

brand voice guidelines. Makes sense. We think more text adds safety, but it just creates confusing noise. Exactly. It is honestly like shouting at someone in a crowded room. That is a perfect analogy. So if we cannot cram the context window, what do we do? Right. How are we supposed to give the AI the rules it actually needs? There has to be a better way. The elegant answer is to use something called skills. Right. A skill is just a small file with detailed instructions.

05:08

Keep it super simple. It has a name, a short description, and an instruction block. And the secret here relies on a concept called progressive disclosure. Which Anthropic talked about recently. Right. The agent only loads the name and description into memory initially. It acts like a lightweight menu of available tools. Just a menu, not the whole meal. The heavy instruction body loads only if the agent actively needs it. The AI triggers a specific action. based on a semantic match.

05:34

The token math on this mechanism is absolutely incredible. It really is wild. The metadata only costs about 50 tokens to scan initially, but the hidden instruction body might be 5 ,000 tokies long. This creates a nearly 18 to 1 token savings ratio for you. Whoa, imagine saving tens of thousands of tokens a week just hiding instructions. It adds up so fast. It is like stacking Lego blocks of data efficiently. It keeps the context window perfectly clean and high. focused? It is a massive

06:03

upgrade for your daily workflow. For sure. But there's a huge warning we need to discuss right now. Okay what is the catch? You should never use random downloaded skills from the internet. Wait if I shouldn't use public skills aren't I stuck? Writing custom code for an agent sounds like a massive bottleneck like how do I actually create one without just guessing the code? Because public skills can contain highly dangerous prompt injections. Oh wow. They might have strange API

06:29

endpoints hidden deep in the code. They could even contain malicious data exfiltration steps inside them. So they are stealing your data? You are essentially letting a total stranger write scripts for your agent. That sounds like a massive security risk for any company. It is. And even if they are totally safe, they were written for someone else. Right, for a different company's workflow. The seams will show up very fast in your own setup. So you read the public

06:55

skills for structural inspiration only. Just to see how they did it. You borrow their general architecture, but you write the code yourself. You tailor it perfectly to your own proprietary data. Right, so borrow structure from public skills, but write the actual code yourself. Exactly. Protect your workflow. Midroll sponsor read. mid -roll sponsor read, content inserted here, and mid -roll sponsor read. So let's look at the exact step -by -step process of building

07:21

one. Yes, let's get into the practical side. We are going to build a daily competitor tracking bot today. I really like this practical example from the outline. This bot checks five competitor websites every single morning. Automating the boring stuff. It looks at their pricing pages, blogs, change logs, and X accounts. Now if you use a lazy prompt, you just ask what is new. Which never works well. The agent will immediately

07:45

fail and ask you confusing questions. It will ask about RSS readers, web scrapers, or visual comping. Right, because you gave it an incredibly vague objective. It doesn't know how to start. To fix this, you have to build a very clear spec. You need to add specific exclusion rules directly to the prompt. Give it actual boundaries. You tell it to only flag if a price moves 10%. Or flag it if a brand new paid tier suddenly appears. Yes. Or if an ex post has over 200 likes. Exactly.

08:15

Then you tell it to save that output to a Notion page. Keep it organized. But here is the absolute golden rule of this entire process. You must run the workflow completely manually at first. You have to actually prove the logic works in reality. Yes. Do not skip this step. You wait for two completely clean back -to -back manual runs. Just to be sure it wasn't a fluke. Only when it succeeds twice do you save it as a skill. And you don't even write the skill description

08:45

yourself. Well, you let the AI do it. You ask the agent to review those two successful runs. Then it writes its own skill description in under 30 words. That ensures the skill is born from a real success. It is not just a blind guess from a human. It uses its own language. The AI knows exactly what trigger words will activate it best. That is the only kind of skill worth keeping around. But what happens when reality actually sets in on a Tuesday? Oh, it always

09:11

does. Like a website completely changes this layout without any warning, or an API throws a random 503 error during a run, the skill is going to trip and break eventually. Of course it will break when the real world shifts. So what do you do? When it trips, you don't just rewrite it from scratch. You use what we call recursive improvement loops. Okay, how does that work? You ask the AI why it broke, and you fix it together. Ah. So we use recursive improvement

09:36

loops to fix and update it together. Exactly. You iterate on the errors. But once you get one skill working, a dangerous temptation kicks in. Oh, I know what you were going to say. People immediately want to build a massive complex network of agents. I know everyone listening right now immediately wants to build swarms. You want 15 sub -agents talking to each other with a maze of arrows. It looks so cool on a whiteboard. But you are saying that is a massive trap for

10:02

beginners. It is the trap of wanting to look impressive online. You have to build a hut before you build a cathedral. Start small. Stick to one main agent and maybe three to five skills. You should only scale that system when your productivity truly demands it. Let's talk about using your actual code as context. Because the agent can read things straight from your code base. Right. It sees Next .js, SuperBase, and Tailwind automatically. It knows what they are. Do not waste your precious

10:29

tokens explaining those public frameworks. You should be using clean starter templates instead of long explanations. There is a really great screening question for this context. You just ask yourself, can the AI find this information publicly? That is the perfect filter. If it can find it on the open web, Skip adding it. Don't repeat the internet. You only add the things the AI absolutely cannot find publicly. Things like your highly specific customer personas or

10:58

pricing logic. Exactly. Or the specific things your company likes to say no to. That taste and constraint is highly proprietary to your business. Those are the high value pieces of context you have to protect. But isn't it safer to just remind the agent what framework we use just in case it somehow forgets the syntax during a long task? No, reminding it just clutters the precious context window unnecessarily. It takes up that valuable space. It pushes you right back toward the dumb

11:26

zone we discussed earlier. Keep it out if public, included if proprietary. I see. Skip it in public. Keep it only if it is completely proprietary. We now have a blueprint for a lean, simple workflow. Which is great. But you need to know about the painful reality of the first 14 days. Let's share that skeleton template example for a weekly revenue review. It pulls your financial data directly from Striken Notion. Super useful automation. It calculates week -over -week deltas and flags

11:54

15 % moves. Then it saves a clean five -line summary for your team. It sounds great, but getting this to work means hitting the two -week wall. The two -week wall. The first 10 to 14 days are gonna honestly hurt. You are gonna spend way more time fixing errors than producing. The automated system feels significantly worse than just doing it manually. It is super frustrating at first. Yeah. But around day 14, you finally hit the turning point, the recursive loops pay off, and

12:23

the skills stop breaking. Everything just starts clicking. A reporting task that used to take an hour now takes 10 minutes. And this creates a massive career advantage for you over others. People using tight, agentic workflows are going to do the work of three people. They really are. A solo founder. with five sharp skills becomes incredibly fast. They will easily outrun a 10 -person team that is still arguing about tools. That leverage is entirely real, but you have

12:50

to actually earn it. Is this initial friction the real reason people mistakenly blame the AI models? Like, is this why they jump to a different provider the second something fails? Absolutely. They blame the machine because the initial training is hard. They don't want to do the work. They quit at the wall instead of pushing through the friction. They refuse to pay the initial training cost for their new employee. Exactly. They quit at the two -week wall instead of paying the training

13:15

costs. The big takeaway here is that the model was never the bottleneck. It was always our process. Less context, clearer rules, and smaller starts are the absolute keys. They are what actually create a true agentic workflow for you. Keep it lean. Do it by hand, get two clean rends, and make it a skill. Thank you for joining us on this deep dive today. I want to leave you with a final thought before we go. Beat. Think about a repetitive task you do every single morning.

13:42

We all have them. What are the unspoken, invisible rules sitting in your head right now? The ones you have never explicitly written down for another human? Let alone an AI. How could an AI ever guess them if you don't write them down? We used to think the magic was always in the model, but the real magic is making your invisible rules visible. OTRO music.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript