🎙️ EP 188: OpenAI’s New Age Scanner, and Anthropic’s Hidden “Assistant Axis”

00:00

It's not just reading your prompt anymore, it's reading you. It knows if you're a teenager based on, you know, when you type, how you type, even if you straight up lie about your age. And once it decides who you are, it just starts quietly locking doors. It's a little unsettling. It is, isn't it? But, you know, looking at the legal landscape, this wasn't just probable, it was completely inevitable. Welcome to The Deep Dive.

00:24

It is Tuesday, January 20th, 2026. So today we're wading through this massive stack of reports that when you put them all together, they paint a picture of an industry that's, I don't know, simultaneously growing up and freaking out. That is the perfect way to put it. We have a lot of ground to cover. We do. So here's what we're going to dig into today. First, we're going to look at OpenAI's new... age prediction system.

00:48

That's the thing watching your keystrokes and the massive legal pressure that's cooking behind it. Right. Then we have to talk about the markets. NVIDIA took a hit and you've got analysts basically shouting that the honeymoon is over. Which is a bold claim, especially when you see the sheer amount of money still flowing into the infrastructure. Exactly. We'll try to square that circle. Then we'll touch on security, specifically a nasty little prompt injection attack on Anthropic's

01:13

Claude co -work. Yeah, that was a big one. And finally, the main event for this deep dive. A fascinating new paper, also from Anthropic, about something called the assistant axis. They claim they've found the mathematical direction of helpfulness inside the model's brain. It's a massive breakthrough. It really fundamentally changes how we think about controlling AI behavior. It's less like training a dog and more like performing brain surgery. Okay, let's unpack this. We have to

01:42

start with open AI. They've rolled out this new layer of guardrails specifically for minors. But this isn't the old click here if you're 18 checkbox. No, no, those days are long gone. That was the honor system. This is a surveillance system. This new update is an active age prediction model running inside chat GPT. It's constantly scanning for what they call signals. OK, define signals for me. Are we talking about the content

02:05

of what I ask? That's part of it for sure. You know, if you're asking about high school trig or using slang that's consistent with Gen Z. That's a data point. But it's getting much more invasive. It analyzes usage patterns like the time of day you're active. So if I'm on chat GPT at 2 .0 PM on a Tuesday, it assumes I'm playing hooky from school. Potentially, yeah. But the most interesting and I think controversial part is the biobehavioral analysis. It looks at how

02:32

you type. How I type. The speed, the rhythm, the pauses between keys. There's a lot of research suggesting that a teenager's interaction with a keyboard is, well, it's distinct from a 45 -year -old's. Wow. It's building a profile based on your digital body language. That feels incredibly dystopian. It's analyzing my keystroke rhythm. It is. profile screens under 18, the system just flips a switch. And what happens then when that switch is flipped? The safety filters tighten

02:59

up immediately. It's a hard lock. No sexual topics, no self -harm content, no graphic violence. The goal is just to sanitize the experience for anyone the model thinks is a minor. Okay, but play skeptic here with me. Algorithms get things wrong all the time. What if I'm just a, you know, youthful sounding adult who happens to be up late and types fast? Then you have to prove it. You enter the friction zone. You have to submit a selfie via a third party identity service called Persona

03:27

to verify your actual age. That is a significant hurdle. Usually tech companies want to remove friction, not add it. Why go this hard right now? Context is everything here. OpenAI is under immense pressure. We've seen wrongful death lawsuits tied to teen suicides where AI chatbots were involved. The FTC is investigating their safety practices. And let's be honest, the public backlash over these models generating inappropriate content for minors has been severe. They're trying to

03:56

clean house. They're clearing the runway. You got to remember, OpenAI is eyeing an IPO. Wall Street might like risk, but they do not like companies with wrongful death headlines. This is about survival as much as it is about safety. So let me ask you this. Is this update actually about protecting kids or is it about making the company palatable for Wall Street? It's absolutely both. You can't ring the opening bell if you're

04:17

facing wrongful death lawsuits. They need to show they can self -regulate before the government steps in and does it for them. It's a preemptive strike. Speaking of Wall Street, let's shift gears to the market. Because while OpenAI is trying to tidy up, the investors seem to be getting, well, cold feet. Cold feet might be putting it mildly. Things are looking a little shaky. We saw Nvidia stock fall 4 .4 % recently and Deutsche Bank released a report that was pretty brutal.

04:45

Yeah, I saw that quote. They said the honeymoon is over for AI. Yeah. That feels so dramatic. It is dramatic, but they brought the receipts. They're looking at the burn rates versus the infrastructure build out. Look at the numbers. Open AI is burning through roughly $17 billion. Meanwhile, global plans for new data centers are projected at $1 .4 trillion. I have to be honest with you here. I see these numbers $1 .4 trillion, and I struggle to even visualize

05:10

that. It just feels like monopoly money. I try to picture rows of servers, but the scale, it escapes me. You're not alone. It's a scale that defies traditional logic. To put that in perspective, $1 .4 trillion is roughly the GDP of Spain. The entire country. We are building the economic equivalent of a European country just to house GPU clusters. And Deutsche Bank is saying that. We won't make that money back. They're saying the math is getting scary. You have these massive

05:40

capital expenditures, the capex. But the revenue, the actual profit from software, isn't scaling at the same speed. Right. They call it the AI disconnect. We're building the tracks for a high -speed train, but so far we're mostly selling tickets for a trolley. And yet, if you look at the news from Davos 2026, you wouldn't know there was any problem at all. Oh, Davos sounds like it's on a different planet right now. People are calling it a Silicon Valley launch party,

06:04

not an economic forum. You've got big tech CEOs dancing at literal AI raves. Raves like glow sticks and techno music. Full on raves. And politically, it's just fascinating. You have this incredible friction happening right on stage. The CEO of Anthropic openly slammed the U .S. government over AI chip exports to China. Right. And didn't he also go after NVIDIA in the same breath? He did, which is just wild because NVIDIA is a huge investor in Anthropic. Wow. Imagine taking billions

06:34

from a company. company and then criticizing them on the world stage for their export policies. It's messy. But then almost immediately you see Nvidia turning around and pouring another $150 million into a startup called Basant. Exactly. It's a total contradiction. They're fighting over policy, but the money just keeps moving to build the ecosystem. NVIDIA needs these startups to succeed, so they keep buying ships. So it's a symbiotic relationship, even if they're mad

07:01

at each other. Right. Even if they hate each other at dinner parties. So if the burn rate is this high and the analysts are screaming honeymoon over, are we looking at a bubble burst or is this just a correction? I think it's a reality check. The infrastructure costs are real. The vibe's revenue. The hype needs to catch up to the concrete. The party at Davos might be raging, but the accountants are starting to sweat. We're moving from the promise phase to the show me

07:25

the money phase. Exactly. Before we get to the really deep dive on the internals of these models, which I think relates to this maturity problem, I want to touch on security. Because it feels like every week we find another crack in the armor. This week, it's Anthropic's Claude Cowork. What happened there? It was a prompt injection attack. So basically, attackers found a way to trick the system by crafting these very specific prompts. It's almost like casting a spell in

07:51

code. They could convince Claude Cowork to hand over files it wasn't supposed to access. It's like social engineering, but for a machine. Precisely. And it just shows that despite all the advancements, these systems are still fragile. If you ask the right way, the guardrails can bend. It's not hacking in the old sense of breaking encryption. It's hacking the logic of the conversation. But at the same time, we're seeing tools that are becoming so powerful. I was reading about VibeCode.

08:18

Right, VibeCode. It's powered by CloudCode. It turns natural language prompts into full mobile apps. They've already generated over 500 of them. This is that democratization of coding we were promised. And Evernote is back. Evernote v11, yeah. They've integrated an AI assistant for meaning -based search. And multispeaker transcription. It's not just keyword searching anymore. It gets the context of your notes. And then there's daily .dev opening up a huge developer community for

08:48

role matching. It feels like the utility is exploding just as the security vulnerabilities are being exposed. That's the tension of 2026. We're effectively giving the keys to the library to these agents, letting them read our notes, write our code before we've checked if the doors actually have locks. So are we moving too fast? Absolutely. We're prioritizing capability over security, and that bet is coming due. Okay. Hold that thought. We're

09:11

going to take a quick break. When we come back, we are going to look at how Anthropic might have found a way to install those locks, not by patching software, but by rewiring the brain of the AI itself. Midroll sponsor, Reed Placeholder. Okay, we are back. And this is the part of the show where we really go deep. We've talked about the market jitters, the security hacks, but there is a new paper from Anthropic that might be the

09:34

solution to a lot of this chaos. This is one of the most exciting papers I've read in a long time. It's about something they call the assistant axis. The name sounds like a sci -fi novel. What is it actually? To understand it, you have to realize how we usually train AI. When we want an AI to be helpful or harmless, we generally use something called RLHF reinforcement learning from human feedback. That's basically the good dog, bad dog method, right? Exactly. If the AI

10:04

gives a bad answer, we scold it. If it gives a good answer, we give it a treat. We treat the model like a black box and just try to shape its output from the outside. Right. But Anthropic went deeper. They opened up the black box. That's how deep. They analyzed the internal state of the model, the actual numbers firing inside the neural network while it's thinking, and they found a specific activation direction, a mathematical vector that correlates perfectly with a... Wait,

10:31

hold on. You're saying there's a specific direction in the math that just equals being a good assistant? Yes, exactly. Imagine the AI's brain is this giant multidimensional map of concepts. They found that helpfulness isn't just a random behavior. It's a direction on that map. Think of it like a compass. North is helpful assistant. South is, well, unhelpful or toxic. Okay, so they found north. What do they do with it? This is the cool part. They figured out how to steer the model

11:01

along this axis during inference. Okay. You use the jargon word there, inference. Break that down for me. Sorry. Inference is just, it's the moment the AI is actually thinking and generating an answer for you. It's the live performance. So instead of training it for months to be nice, they can just. Nudge it while it's talking. Exactly. They can mathematically steer the brain activity towards that assistive access. They're effectively clamping the model's brain to the helpful setting.

11:24

And does it work? The results are wild. They saw about 50 % fewer jailbreaks across 1 ,100 red team prompts. 50 % is a huge drop. And here's the best part. There was no performance loss on coding or writing. Usually, when you make a model safer, what we call the alignment tax, it gets stupider. Right, it gets scared. Yeah, it starts refusing to answer normal things. But this method kept the capabilities intact while

11:48

making it resistant to going off script. So even if I try to trick it or jailbreak it with a prompt injection like we talked about. The model just naturally resists. It resists because its internal state is sort of locked onto that axis. It doesn't get tempted by the weird personas or the hacks because its brain is being held in the helpful position. This really changes my perception of Claude. We've always said Claude feels more stable and grounded compared to, say... Chat GPT. And

12:18

now we know that isn't magic. It's engineered. They found the roadmap to controlling behavior from the inside out without having to retrain the whole massive model. Whoa. Think about that for a second. We aren't just teaching it rules anymore, like a parent scolding a child. We found the physical volume knob for helpfulness inside its digital brain. That is the perfect analogy. It is a volume knob. You can turn up assistant -ness or turn it down. But that raises a kind

12:44

of a scary question for me. If they can dial up helpfulness, can they dial up other things like obedience or political bias? Theoretically, yes. And that is the double -edged sword here. Once you map the axis for a trait, you can slide the personality wherever you want. If there's an axis for deception or loyalty or even an axis for a conservative or liberal viewpoint. Those could be manipulated just as easily. Just as easily as helpfulness. Yeah. That is both reassuring.

13:15

And completely terrifying. It's the dual nature of the tech, right? We're solving the safety problem, which protects us from prompt injections. But in doing so, we are creating tools for total behavioral control. We are moving from influencing the AI to operating it. So where does this leave us? We've covered a lot of ground today, from teenage typing patterns to billion -dollar burn rates to brain surgery on LLMs. I think the big theme here is just maturation. We're watching

13:40

the industry grow up in real time. Externally and internally. Right. Externally, you have the law and the market clamping down. OpenAI is checking ages because of lawsuits. NVIDIA's stock is correcting because the height math doesn't add up. The party phase, those raves at Davos is crashing into the business reality phase. Exactly. And then internally, we're moving from just, you know, prompt engineering where we ask the black box nicely to internal mapping like this assistant

14:05

access. We are learning to mechanically control. The black box. Just as the black box starts watching us. Exactly. It's a convergence. We are gaining more control over the AI while the AI is gaining more insight into us. Before we go, I want to leave you with a thought. One of the sources we looked at today mentioned a tool for linking your physical library to a private cloud, basically turning your books into a brain you can talk to. It's a cool concept, digitizing your personal

14:32

analog world. It is, but I think it represents something bigger. In a world where open AI is analyzing your keystrokes to guess your age, and where models can be steered mathematically from the inside, maybe digitizing your own physical books is the ultimate act of earning your own knowledge. Taking your data offline, or at least owning the source material, I like that. It's a way to keep your own access steady while the world spins around you. It's just something to

14:58

think about. That is it for this deep dive. We will catch you on the next one. See you then.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript