🎙️ EP 115: Harvard’s AI Doctor Makes History & Samsung’s Model Breaks the Rules

00:00

We often think that AI progress, it just means going bigger, right? More scale. It's kind of the default assumption. Bigger models, massive compute, huge data centers. That's what gets all the attention, the brute force method. But what if the really big step forward, the next

00:16

one, comes from something, well, tiny? architecturally smart we've got some data showing a small recursive ai that's actually out reasoning the giants the trillion parameter models on really tough benchmarks welcome to the deep dive yeah yeah you sent over some truly fascinating sources this week it's like ai is splitting going down two very different roads at the same time oh it's our job today clarity let's try and separate that uh massive computation hype from the real tangible algorithmic

00:43

intelligence that's actually popping up it seems smart design might just be starting to beat sheer muscle Okay, so here's the roadmap for this deep dive, just for you. First, we're going to unpack this quiet revolution of these tiny efficient

00:54

models, TRM they're calling it. why they're winning on abstract reasoning then second we'll look at ai agents the age of agents seems to be upon us we'll dive into how they're being integrated the sort of market wars and also the societal friction lawsuits jobs that kind of thing and finally the story of dr cabot it's pretty shocking actually the first ai getting a diagnosis published in a top medical journal a huge professional step so let's get into this divergence in the

01:21

ai landscape right so for the last what five years maybe the mantra in ai It's just been scale. Scale solves everything. Your LLM isn't good enough. Fine. Throw more data, more parameters, more GPUs at it. Just make it bigger. But now Samsung seems to be really challenging that idea with this tiny TRM model. And when we say tiny. We mean compared to those trillion parameter monsters. You know, those things are like enormous computer brains needing billions of dollars and

01:48

just mountains of data to build. Right. But this TRM, it's going head to head with some pretty serious models, like even the older Open AIO 3 Mini, which is still a solid baseline, and Google's newest Gemini 2 .5 Pro on complex reasoning tasks. And the kicker, it's winning. Yeah. And the concept underneath it all is it's really compelling because it shifts focus away from just raw data crunching towards like structural intelligence. Instead of one giant pass through

02:17

the data to get an answer, TRM loops. It goes over its own initial answer again and again, recursively. You can think of it like a really careful student doing a hard math problem. They get an answer. Right. But instead of just stopping, they spend maybe five minutes just rechecking every step, refining that first output until they're really, really sure about the logic. Exactly. It's self -refinement built in. People have tried getting LLMs to do this externally,

02:38

right? Like with chain of thought prompting, making it right out its steps. But TRM builds that recursive checking right into its core design. It doesn't need some massive, expensive prompt to get that deep reasoning ability. And the proof is pretty striking, especially on abstract tasks, the kind designed to really push LLMs past just remembering facts. Take Sudoku Extreme. Okay. That's famously hard. It needs serious logic, constraint satisfaction. TRM got 87 .4 % accuracy.

03:08

The bigger models, around 55%. Wow. Big difference. Huge. And on maze -hard problems, testing navigation logic, TRM hit 85%. Again, just beating the larger models pretty easily. So that's the trade -off then, isn't it? TRM isn't going to, like, summarize your emails or chat about the news. It won't replace a generalist like GPT -4. But its thing is deep, abstract, computational reasoning. Yeah, it sacrifices that broad, general knowledge for being incredibly efficient at one type of problem

03:36

solving. And it proves there's another way forward. that doesn't automatically mean you need a billion dollar GPU farm to get breakthrough reasoning. Whoa. I mean, imagine scaling that. That kind of precise self -correcting logic for specialized industrial stuff without needing, you know, a billion queries or a giant carbon footprint. That efficiency. That's actually kind of amazing. Points to the future, maybe. Okay, but hang on.

04:02

If this TRM is so much better on these complex reasoning benchmarks, why isn't it, like, everywhere? What's the practical catch right now? Well, the catch is that specialization. It's laser focused. It's not built for chatting or writing emails. Think of it as a research win, proving smart algorithms can beat raw power. OK, so we've talked about size versus smarts in the models themselves. Now let's pivot. Let's talk application integration.

04:23

The real world. This is really where the rubber meets the road, where AI meets the spreadsheet and where the friction really starts. Yeah, the agentic future isn't some far off theory anymore. It's about immediate utility. We're past just single prompts now. We're talking autonomous systems doing tasks. I mean, look at OpenAI's agent builder. It apparently already has like 50 real use cases, boosting productivity and sales marketing operations almost everywhere.

04:49

And just for fun, showing how mainstream agents are becoming. Mark your calendars. October 27th, there's going to be this hilarious AI poker showdown. ChatGPT versus Claude versus Grok versus DeepSeek. Poker playing agents. It's like the new competitive sport. OK, noted. But yeah, the integration wars, they are absolutely heating up. Google just launched Gemini CLI extensions that the command line interface literally days after OpenAI showed off their

05:16

new app idea. And the key thing here, what's really crucial is these extensions let anyone publish tools. You can plug in huge platforms, Figma for design, Stripe for payments. The AI isn't just helping you anymore. It's becoming the hub, kind of like the new operating system for all your other digital tools. It's a vision people have talked about for ages, actually. There's this funny historical echo going around social media now. Someone dug up a 1985 video,

05:39

Steve Jobs. And he seems to be predicting tools exactly like ChatGPT, something that could automate using other tools. Kind of wild to see that finally happen. Yeah, it is. But integrating this stuff. It comes with real tension, complex tension right now. On one side, you've got resistance, major lawsuits like 17 authors are suing OpenAI right now over using copyrighted books for training. And you hear big creators like Mr. Beast saying publicly that AI means, quote, scary times for

06:08

YouTubers. They're worried about being drowned out or replaced. And at the exact same time, adoption is just. Warp speed, creating this huge industrial split. IBM teams up with Anthropic, putting Claude models into their business stuff. Deloitte deployed Claude AI to half a million employees globally. Half a million. Wow. Yeah, 500 ,000 people using it internally. But that efficiency, it has consequences we really need to look at. We're seeing actual job displacement.

06:34

Equiture, an insurance broker, is cutting 400 jobs. Why? AI is automating accounting and operations. And this integration, even for the people building it, it's not always smooth sailing, is it? Taking an agent from a cool idea to something stable and useful, that's tricky work. Oh, absolutely. It's not magic yet. I mean, I still wrestle with prompt drift myself when I'm trying to build complex agents. Yeah. You set up a sequence,

06:58

right? And like three steps in, the system just kind of wanders off or misinterprets the data it just made. It needs constant fiddling, constant tuning. So given all the... The lawsuits, the creator anxiety, the actual job cuts. How do we balance this massive tech acceleration with these really serious societal risks we're seeing? Yeah, that's the core issue. Integration really

07:21

needs careful management, clear policies. We need to focus on job transitions and figure out fair copyright rules basically as fast as the tech itself is changing. All right, let's shift to maybe the highest stakes area of all, medicine. This source material you sent talks about an AI hitting a major professional milestone, something that absolutely demands transparency and accountability. All right, that's Harvard's Dr. Cabot and the

07:43

milestone. It's the first AI system ever to publish a diagnosis in the New England Journal of Medicine, the NEJM. Just for context, that journal, it's arguably the top medical journal globally. It's where the absolute best human doctors debate the toughest, most complex medical cases. And this AI, Dr. Cabot, it doesn't just, you know, spit out an answer. That's always the worry with LLMs, right? But this thing, crucially, it simulates

08:11

how a doctor actually thinks. So you feed it a complex case, symptoms, patient history, lab results, the whole picture, and then the system goes to work. Okay. It builds a full slide deck and talks you through its reasoning audibly. It lays out all the possibilities, the differential diagnosis, doctors call it. It methodically rules out the red herrings, the misleading clues, and it backs up every claim by citing actual clinical papers. And it does all of that in about five

08:37

minutes? Five minutes, yeah. Wow. It's apparently powered by OpenAI's O3 model. but specialized, trained on over 100 years of very specific cases from Mass General, their CPC cases, clinical pathological conferences. So it has this incredibly deep historical knowledge base, probably more than any single human could have. And here's a detail that's, oh, it's kind of freaky, honestly. The source material says it even uses human -like filler words when it presents its case, like,

09:03

ugh, and, you know. Hmm. Okay, wait. Is that actually useful or is it just window dressing? Like, are those A's helping the diagnosis or is it just trying to sound more human, maybe less like a scary robot to build trust? That's a really good question. And maybe the most important part is how the NEJM handled it. They published the AI's diagnosis right next to the human experts reasoning for the same case. And critically, they didn't hide its flaws. They pointed them

09:28

out. Total transparency, which you absolutely need for medical AI. Right. That transparency is everything for actual deployment. So while this is super impressive, Dr. Cabot isn't quite ready for your local hospital yet. But if you're curious, you can actually watch 15 of these AI case talks online. So you justify its conclusions step by step. So how absolutely vital is it then that AI cannot just get the answer right, but also clearly explained its whole thought process

09:56

to really earn that professional trust? Yeah, I think transparency and explaining the why, they aren't nice to haves. They're absolutely essential. The foundation for using any expert system in fields where the stakes are this high, hashtag, tag, tag, recap, and outro. So the big idea kind of weaving through all the sources today seems to be this shift. AI isn't just about

10:15

massive scale. anymore it's moving towards specialized efficient quality exactly we saw these tiny recursive models like TRM showing that clever design smart architecture can actually beat raw computing power on tough reasoning tasks and then we saw specialized agents like dr. Cabot hitting major professional milestones in fields like medicine where transparency and explaining yourself are non -negotiable right So we definitely encourage you to dig deeper into the sources you sent us.

10:41

Maybe read up a bit more on those TRM concepts. Or even better, go watch one of those Dr. Cabot case talks online. See that high stakes, transparent AI reasoning actually happen. It's quite something. Yeah, it really is eye opening to watch. So here's a final thought to leave you with. If an AI can publish a complex, peer -reviewed differential diagnosis in just five minutes, what's the next traditionally human -exclusive professional achievement

11:05

we should be preparing for as a society? Something to think about. Out to your music.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript