🎙️ EP 158: OpenAI’s Reality Check… And the 6‑Person Team That Just Beat Google

00:00

So for the last, what, 18 months, we've just been buried in these staggering numbers about AI adoption. Oh, yeah, nonstop. But this new open AI enterprise report, it just dropped this huge, almost paradoxical insight. Right. There's this immense gap opening up. A huge chunk of companies haven't even touched the basic AI features, even while the top tier is just, you know, exploding. It's the great AI divide. Yeah. And you really need to look at the metrics here because. The

00:29

financial stakes are just enormous. We all know the 800 million weekly chat GPT users, but in the corporate world, the difference between these frontier firms and everyone else is already creating this huge divergence in shareholder returns. Welcome back to the Deep Dive. Our mission today, it's built entirely from the sources you sent us, really trying to cut through the noise and give you the most critical, actionable stuff. And we have a pretty clear roadmap for this.

00:54

First, we're going to dissect this great AI divide. Look at the user habits that are translating directly into real financial gains. Second, we'll scan the landscape for 2026. So, monetization trends, hardware shifts, and this crazy accelerating model race. And finally, we have this just ultimate underdog story. How a tiny six -person team just smashed a critical AGI benchmark. Yeah. And they proved that smart system design is maybe more valuable now than just raw computational muscle.

01:27

Okay, let's untack this. When you just look at the raw growth metrics from this report, the numbers, they almost feel unreal. Right. Enterprise usage is up nine times year over year. Nine times. And the number of weekly messages inside these companies, that's up eight times. This isn't some slow rollout. No, it's an absolute stampede, at least for the companies who've bought in. But that's just the volume side. What's really, I think, insightful here is the quality of the

01:52

usage. The quality, how so? Reasoning token usage. Yeah. Which is basically the model's internal thought process, right? Its ability to do complex, multi -step problem solving. That's up a staggering 320 times since last November. 320. I mean, that's a number you have to sit with for a second. Yeah. It suggests businesses are not just using AI for, you know. drafting a quick email anymore. Not at all. They're giving it deep, complex,

02:16

strategic work. They're running analysis that needs the AI to connect a bunch of different dots and follow this long chain of logic. Right. And this kind of deep reasoning. It only really became commercially viable like 16 months ago. So this explosion is a very recent and I think profound shift. And that shift is leading to a productivity payoff that seems undeniably real. It is. The average user saves, what, 40 to 60 minutes a day, which is great. Yeah, that's significant.

02:47

But the heavy users, the ones who are really embedding it in every single workflow, they're saving 10 or more hours a week. That's a full extra workday. It's like buying a full extra workday without a new hire. And on top of that, 75 % of workers now say they're doing things they just couldn't do before, like complex coding, data modeling, building custom automations. It's letting non -technical people jump straight into

03:07

the technical deep end. I have to admit, I still wrestle with prompt drift myself sometimes, you know, finding just the right conversational path to get the result I want. But seeing people who a year ago wouldn't touch a line of code. And now they're building actual functional applications. It's astounding. It's just flattening that technical skill curve so dramatically. Yeah, but here's where it gets really interesting, right? Yeah. How do those frontier users, the ones driving

03:34

all this, how do they actually do it? It seems to come down to deliberate. high volume and what you call high leverage interaction. Right. They send six times more messages per employee than the average company. Six times. But here's the critical metric. The top analysts in these firms, they're using AI tooling 16 times more often than their peers. 16. That intensity just creates this feedback loop that the non -adopters, they just can't compete with it. And that intensity

04:02

maps directly to the bottom line. The firms operating at this level are seeing 1 .7 times faster revenue growth. But the kicker is the shareholder return, 3 .6 times higher. 3 .6. So if you're a company that hasn't even touched basic AI yet, the data is screaming that you are already way behind the curve. So this raises a big question for me. We see this data and it correlates financial

04:26

return with this high volume usage. Is this adoption gap we're seeing fundamentally a failure of management, a failure to encourage that kind of... deliberate interaction the data certainly suggests management is key financial returns track directly with that deliberate high leverage ai usage so leadership has to be a factor so while the big players are winning today with sheer usage we've got to look at what they're prepping for tomorrow let's uh scan the immediate landscape the pace is just

04:55

It's a racing heartbeat. We're seeing utility tools just explode. Like, developers shared 25 pre -built cloud skills. They're kind of like custom GPTs that are just copy -paste. Which massively simplifies things for teams. Totally. It simplifies rapid deployment, and people are hunting for these high -value shortcuts. Right, like that thread with 15 ,000 bookmarks. Yeah. The one that claims six prompts are better than Duolingo for learning a language. Exactly. People

05:19

want to shortcut expertise instantly. Meanwhile... You know, the big model heavyweights are accelerating. There are these reports that OpenAI's code red might have led to them panic releasing GPT 5 .2 like this week. And Elon's always teasing Grok 4 .20. Of course. And the investment side, it really shows where the long term money is going. Physical AI. Embodiment. Yeah. SoftBank and NVIDIA are reportedly about to put over a billion dollars into skilled AI. They build the

05:47

robot brains, right? They build robot brains using human like AI. And that company's valuation? It tripled in less than two years. The big money thinks the next frontier is physical. So looking out a bit, to 2026, Microsoft has outlined these seven big trends. And two of them really jump out. First, monetization. Google finally confirmed it. Ads are coming to Gemini in 2026. The end of the ad -free era. It was always going to happen. It was inevitable. But that's a huge psychological

06:13

shift, isn't it? For enterprise users, once ads are in the ecosystem, the whole trust model changes. For sure. Are you really going to build your core workflows on a tool where the attention economy is now a feature? It creates a friction point, for sure. And the second major trend is hardware integration. We're seeing this massive shift from screen -based chat to something more ambient. Google's new AI glasses, also coming in 2026 with Gemini built right in. Yeah, that's

06:42

a direct shot at Meta's Ray -Bans. It signals that AI is about to become deeply integrated into how we actually perceive the world. So what really stands out to you in all this? I mean, are we really moving past AI as just a chat box, a tool we open, to AI as integrated hardware that's just... always on. Yes, I think the focus is absolutely shifting to deeply integrated AI and hardware, making the technology ambient and constant. Mineral sponsor, Geek Placeholder.

07:07

All right, let's pivot to what might be the breakthrough that really shocked the AI world this week. We're talking about an incredible underdog story. Oh, this is great. A six -person startup called Poetic. They just beat giants like Google DeepMind and OpenAI on one of the toughest reasoning benchmarks out there. The ARC -AGI -2. And this is a huge deal, a huge win, for a couple reasons. First, the ARC -AGI -2 is a genuinely difficult test. It's not about spitting back facts. It's about

07:38

abstract visual patterns, fluid reasoning. You have to actually understand structure and logic. Exactly. And Poetic was the first system ever to cross 50 % accuracy. They hit 54%. And here's where it gets really wild. Poetic didn't do what everyone else does. They didn't sink billions into training some giant new model from scratch. No, what's so fascinating is they just used Google's own Gemini 3 Pro as their base engine. They rented it. They essentially rented the foundational

08:05

capability from their biggest competitor. So how does a tiny team win the race when they're using the very engine built by the people they're racing against? They built a smart, self -improving system on top of it. A refinement layer. Okay, so break that down. A refinement layer. Think of it like this. The foundational model, Gemini, it's like a massive, powerful jet engine that

08:25

you can just rent. Okay. The refinement layer is the brilliant, custom -built flight computer that makes sure that engine is always running at peak performance, maximizing efficiency for whatever specific job you give it. That helps a lot. So it's not about brute force power. It's about elegant, high -leverage engineering. Precisely. And this layer has... Basically, three jobs that let them win. First, it's a smart traffic cop. It picks the right base model for the task. Second,

08:52

it guides and improves the output. It's almost like it's teaching the rented engine in real time until the answer is good enough. And third, it is a self -auditing mechanism to verify the quality before it ever spits out the final result. And I'm guessing this method is a lot cheaper. Oh, significantly cheaper. Yeah. And way more flexible. Poetic's approach cost about $30 per task. Google DeepThink, which only hit 45 % accuracy, cost $77 per task. And that efficiency, that

09:19

changes the whole economic model. Their system can adapt to any new base model from any company in a few hours. No costly retraining. And it's open source. Whoa. I mean, imagine scaling that refinement layer to a billion queries. Right. That completely changes the economics. You're not bottlenecked by the insane cost of building the foundational model anymore. The value just shifts. So what does this all mean then? Does this small team success show that smart system

09:47

architecture, that elegant design? Is that the new moat now? Is it displacing raw data and compute as the main barrier to entry? I think this paradigm shift proves that smart systems, these refinement layers, they can outperform raw training scale and compute costs. It's a huge win for efficiency and innovation over just pure capital. You know, this whole deep dive, it really boils down to two core takeaways for you, the listener. First, that massive productivity and financial gap we

10:13

talked about. It's driven by intense, deliberate usage. You have to adopt that frontier mindset. Using AI. tooling 16 times more than your peers. To see that 3 .6 times shareholder return, it's a choice. And second, the race for AGI. It might now be won by the smart architectural layers. That six person startup proved that system design and refinement can be more powerful than just having the biggest training budget. The value

10:42

is shifting from the size of the engine. To the brilliance of the engineering team that's customizing it. Yeah. So we'd encourage you to adopt that frontier mindset. Just look at your own knowledge gathering. How can you increase the leverage of your tools? How can you add your own refinement layer to how you approach problems? And here's a final thought. If an open source refinement layer can outperform these proprietary foundational models on the world's hardest AGI benchmarks.

11:07

What does that imply for the security and the future market value of those incredibly expensive proprietary models? Something to mull over. We'll catch you next time. Thanks for joining us for this deep dive Outero Music.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript