If you don't change how you structure your work, AI just amplifies your sloppiness. Wow. I read that this morning and I had to put my coffee down. It felt less like tech advice and more like a personal attack. It's pretty aggressive, isn't it? But honestly, for 2026, it's the reality check I think a lot of us need. It is. Welcome back to the deep dive. It's Friday, January 23rd. Today we are. We're slowing things down. We're not talking about the newest apps. No new gadgets.
No. We're looking at the invisible layer, the psychology of how we actually talk to these new kinds of intelligence. We're deconstructing a piece called AI problem solving, systems for reliable intelligence. Which is, you know, a very dry title for what is basically a manual on how to stop messing around with these tools. Right. And it feels like the timing is perfect. For the past few years, it's been all about experimenting, just, you know, throwing stuff at the wall. The
let's see what happens phase. Exactly. But the whole idea here is that phase is over. It's time to build systems or you're just going to get buried in your own mess. That's the great divergence we're seeing. People who treat AI like a magic eight ball and people who treat it like an engineering problem. So here's the plan for today. First, we're going to dismantle that search box idea why that blinking cursor is actually a trap. Then we'll get into reliability, things like
grounding. And the I don't know rule, which is way harder than it sounds. Then there's the LLM council. Sounds very grand. Very sci -fi. It does. We'll talk orchestration versus agents, and then spend some real time on a term I love, vibe coding. My favorite. It's a real shift in how we think. And finally, we'll talk about the last human bottleneck, things like judgment and curation. So let's start at the beginning, the search box trap. OK. The source argues our biggest
problem is just. muscle memory. It's cognitive inertia. I mean, think about it. For what, 30 years, the entire internet era, we've been trained to do one thing. You have a question? You type it in a box, you hit enter, and Google or whatever finds a list of answers that already exist. It's a retrieval task. You're a librarian, and you're just fetching a book from a shelf. The answer is already out there. Precisely. But here's the fundamental breakdown. An LLM is not a librarian.
It's a poet. It's a poet. It's a generator. When you type in a prompt, it is not looking up an answer. It is calculating the most probable next word. So if I treat a generator like a retriever? You get hallucinations. You're asking a probability engine to act like a database. And because it's designed to be helpful, it will just confidently lie to you. It fills the gaps with noise that sounds plausible. And in a professional setting, legal medical, that's a huge liability. It's
just dangerous. If your input is vague, if you just kind of toss a question out there, the AI's output is going to be just as vague. It defaults to the average of the internet. And the average of the internet is? Mediocrity. At best. So the goal is to shift from just getting an output to actually solving a problem. It's about building a workflow where errors are caught by the structure itself. You have to assume the model is a little
bit drunk. And you have to build guardrails so that even if it wants to go off the rails, it can't. So let me pause on that. If that search box method is our default, what's the immediate consequence of using it for high stakes work? It creates fast but risky outputs. It gives you speed without reliability. Speed without reliability. That feels like the slogan for the last three years. So how do we engineer that reliability in? The source talks about grounding. We hear
that term a lot. Explain it to me like I'm a skeptic. OK. So an ungrounded AI is improvising. It's just making things up based on its vast, messy training data. Grounding is like handing the AI a script. You're telling it, ignore everything you think you know. Look only at these three PDFs I just uploaded. Answer my question using only this information. So it's the difference between an essay from memory versus an open book
test, where you have to cite your sources. It's even stricter, because with an open book test, you can still kind of fudge it. The magic trick with grounding is you have to add one more instruction. You have to explicitly tell the model. If the answer's not in this text, you must say, I don't know. That feels almost too simple. Does that really work? It changes everything. Because these models are people pleasers. They're optimized to give you an answer. They hate saying they
don't know. It feels like a failure to them. Exactly. So by giving them permission to fail, you eliminate all those forced hallucinations where they're trying to connect dots that just aren't there. So you're giving it permission to be useless. And that... paradoxically makes it incredibly useful. Because then when it does give you an answer, you know where it came from. And this connects to ARAG, right? Retrieval Augmented Generation. Right. That's just the plumbing for
grounding at scale. If you have, say, 10 ,000 company documents, you can't paste them all into the prompt. Of course not. ARAG is just the system that first searches your documents, finds the right page, and then hands just that one relevant page to the AI to ground its answer. So you move from an answer based on vibes to an answer based on evidence. Evidence is the word. You stop judging if the answer sounds right and you start checking
its citations. So let's nail this down. What's the one critical instruction that turns a guessing AI into a grounded one? Explicitly permitting the model to say, I don't know, eliminates forced hallucinations. Yeah, I don't know rule. I feel like I need to apply that to my own life. No, we all. OK, let's scale this up. One model, even a grounded one, can still make a mistake. The source brings up this idea from André G. Carpathy,
the LLM Council. That's the council, yes. It sounds like something out of Lord of the Rings. It really does. But it's actually just expensive and redundant computing. Walk me through it, because my first thought is, why would I need four different AIs to answer one question? Well, you wouldn't for a simple email. This is for high stakes work. Imagine you're analyzing a legal contract for a loophole. OK. If you just use one model, say GPT -5, you're a victim of
its specific blind spots. Every model has a different personality, a different way of reasoning. A different flavor. Right. Some are more creative, some are more literal. The council approach is you write one clear prompt and you send it to three or four different models all at the same time. And then you compare the answers. Exactly. If three say this is safe and one says this is a risk, you stop. That disagreement is your signal. It's a smoke alarm telling you there's nuance
you missed. And the source mentioned a meta step that I thought was brilliant, using another AI to judge the council's answers. The judge model. It turns out models are often better at critiquing than creating. It's easier to spot a flaw than have an original idea. So you just feed the four answers into a fifth model. And you say, rank these. Who's right? Who missed the point? It's like peer review at the speed of light. It's an insurance policy against a single point of
failure. OK, so the probing question is, why go through all that trouble and expense of consulting multiple brains for one problem? It exposes blind spots. If the models disagree, you know you need to verify. Find spots. Okay. So that's how we get reliable answers. Now let's talk about getting work done. The source draws a really clear line between two things, orchestration and agents. The train versus the taxi. This is the mental model everyone needs. Break it down. What's the
train? The train is orchestration. It's on rails. It goes from station A to station B to station C. It's rigid. It can't improvise. Not at all. It's brittle, but it's 100 % predictable. Orchestration is for when you map out a boring, repetitive task. Step one, summarize this text. Step two, extract the key dates. Step three, put them in a calendar invite. And if something changes in step one? The whole train derails. But as long as the track is clear, it works perfectly every
single time. OK, so that's the train. What's the taxi? The taxi is an agent. When you get into a taxi, you don't give it step by step directions. No, you give it a destination. Right. Get me to the airport. The driver, the agent, figures out the route. It adapts. If there's traffic on one street, it takes another. Agents are goal -oriented, not step -oriented. And you give them tools like access to a browser or your calendar. Exactly. You tell it schedule a meeting with
Sarah for next week. You don't tell it how to do that. It figures it out. See, that's the part that makes me and I think a lot of people a little nervous giving an AI that much agency. What if it decides the fastest way to the airport is through a park? That is the core alignment problem. and it's why real agents are only just now becoming practical. The risk was too high. The train is safe because you built the track yourself. The
taxi requires trust. So you use trains for boring, predictable tasks and agents for messy problems. For dynamic, messy problems. Like, do some research on this new company and find a good person to contact. You can't script that. Every website is different. You need an agent that can try something, fail, back up, and try a different approach. It's creating its own map as it goes. Yes. but you have to put it in a sandbox. You
have to give it clear boundaries. So what's the fundamental distinction here between an agent and just a standard automation workflow? Automations follow rigid steps. Agents follow a goal and adapt their path. Adaptability. Okay. I want to shift to the term that I admit made me roll my eyes at first. Vibe coding. Ah, don't do that. It's important. I'm trying not to. It just sounds like something you'd overhear at a skate park. But the source treats it as this huge Revolution.
It is a revolution. Just forget the slang for a second. Think of it as natural language programming. Go on. In the old days, meaning like 2023, if you wanted to build software, you were a translator. You had a human idea in your head and you had to painstakingly translate it into the machine's language. All the semicolons and brackets. Right. Syntax was the barrier. Yeah. Miss one comma, the whole thing breaks. Vibe coding is the shift where you stop being a translator and start being
an architect. You just describe the intent. So instead of set background color to hex code FFF. You say, make the landing page feel clean and minimalist. The AI handles the implementation. It translates clean into the right hex codes and CSS. And this matters because it completely changes who can build things. Totally. You don't need to know Python anymore. You need to know how to clearly describe a problem and what a
solution should feel like. So instead of being stuck with a messy spreadsheet, you can just... You can just say to the AI, build me a simple web page that lets me search and filter the rows in this spreadsheet. You describe the function, the vibe of the tool, and it generates the code. But that requires a totally different skill set, right? If it's not syntax, what is it? It's systems thinking. It's clarity of communication. If your description of the problem is sloppy, the tool
the AI builds will be sloppy. We're seeing this weird thing where people with, say, a background in literature are becoming great vibe coders because they're masters of describing things with precision. That's a wild thought. The entire skill stack is flipping upside down. It's the revenge of the liberal arts. I love that. So the probing question. What's the main skill for vibe coding if it isn't programming? The ability to clearly describe a problem and a desired functional
outcome. Describing the outcome. Sounds so simple, but it's incredibly hard. We're going to take a very quick break when we get back. If AI is handling all this, what's actually left for us to do? The good stuff. Stay with us, sponsor. And we are back on the deep dive. Before the break, we covered the search box, the LLM council, trains, taxis, and the rise of vibe coding. Now let's talk about the human in the loop. Where do we fit in all this? The source points to two
main roles, debugging and curation. Let's start with debugging. Well, it makes sense, right? If you're vibe coding an app or using an agent, things are going to go wrong, the AI is going to misunderstand you, the agent will get stuck. So we stop being writers and we become mechanics. We become diagnosticians. Yeah. And the source warns against what it calls the retry loop. You know, when you get a bad answer from the AI and you just hit regenerate over and over, hoping
for a better one. I do that all the time. We all do. It's the definition of insanity. The new skill is to look at the failure and ask why. Was my context bad? Did I assume the AI knew something it didn't? The source mentioned a technique called stress testing. Oh, I use this constantly. After you get an answer you like, you ask the AI a follow -up. Where is this output likely to break? Or what assumptions did you make to get here? You're asking it to audit its own work.
You're forcing it to reveal its own weak points. And that requires a human. to understand the context of the real world, which the AI just doesn't have. And then there's the second role, curation. This line really hit me. By 2026, creation is trivial. It's basically free. The cost to generate a thousand words or a piece of code or an image is trending to zero. And in economics, when supply is infinite, value collapses. So being able to write generic text isn't a valuable
skill anymore. No. The value moves up the stack. It moves to judgment. It moves to the role of the editor. I like the magazine editor analogy. It's perfect. It used to be that the hard part was getting the articles written. Now you can generate 1 ,000 articles in a second. The hard part, the valuable part, is knowing which one is true, which one is important, and which 999 to throw away. So you stop asking the AI to give me 10 ideas. And you start asking it to filter
these 50 ideas. You give it 100 pages and say, reduce this to the single most important paragraph. You use it to prune, not just to generate. The source calls this cognitive offloading. We offload the busy work, but never the final judgment. That is the bright red line. You can outsource labor. You can outsource summarization. You can not outsource the decision of what actually matters. The moment you do that, You're not using the tool anymore. The tool is using you. Wow. That's
a powerful distinction. So the final probing question. In a world of infinite AI generation, what becomes the most valuable human contribution? Curation. The ability to decide what is worth keeping and what to ignore. Deciding what to ignore. That might be the most important skill of the century. I really think it is. Let's bring this all together. We have covered a ton of ground today. We really have. We started by tearing down the search box idea. Then we built reliability
with grounding and the LLM council. We look at the actual systems. orchestration for the predictable stuff, agents for the messy stuff. And we talked about the big shift in our roles from being just users to being architects with vibe coding and editors through curation. The single theme connecting all of this seems to be. That's it. The people who are succeeding with AI are the ones who treat it like an engineering discipline. They aren't just chatting with it. They're designing systems
for it to operate within. It's the difference between just experimenting and actually operating. That's the whole game right there. So our challenge for you this week is to pick one boring, repetitive task you do. And don't just ask AI to do it. Try to orchestrate it. Write down the steps A, then B, then C. See if you can build a reliable track for it. And try the, I don't know, rule. Just add that one sentence to your instructions. See how much better and quieter your outputs
become. I want to leave you with one final thought from the source, a bit of a tough one to sit with. If AI exposes how you think and your AI outputs are messy, what does that say about your current thinking process? Oof. Yeah. That's the one that'll keep you up at night. Something to reflect on. Thanks for diving in with us. Always a pleasure. We'll see you next time.
