🎙️ EP 290: $200 GPT Plan Costs OpenAI $14K & GLM-5.2 Crushes Banned Fable 5

00:00

That $200 monthly AI subscription you might be using, well, it could actually be a massive problem. Oh, absolutely. It's a huge issue. Yeah, because it could be costing OpenAI up to $14 ,000 a month. Right, for a single user. It's wild. That is the brutal math behind this current AI boom. Welcome to the Deep Dive. We have a massive stack of research today. And it really paints a fascinating picture. Okay, let's unpack this. Our mission today is to explore the hidden economics of AI

00:30

subscriptions. We're looking at how massive compute costs are driving this crazy explosion of cheap open source tools. Yeah. And we're also covering how geopolitical bands are radically reshaping the whole global AI landscape. It's all connected. So let's start with that brutal financial reality. It really underpins the entire AI ecosystem right now. It does. It's the foundation of everything we're seeing. We all love a flat rate subscription. You pay 20 bucks or maybe 200 for a pro tier

00:58

and you just use it. Right. It feels like an all you can eat buffet. Exactly. Like an all -you -can -eat buffet. But if you actually max out your theoretical limits on a $200 ChatGPT Pro plan... It costs OpenAI roughly $14 ,000. $14 ,000 in API compute just to serve one user. Yeah. And API compute is just the server power needed to process AI requests. It's the raw electricity and hardware. And Anthropic's top tier is not much better. Their system caps out around $8

01:30

,000 in token costs. The break -even points are actually shocking. Yeah, I was looking at this. OpenAI starts losing money at just 11 .4 % utilization. Wow, 11%. Just 11 .4%. And Claude hits its break -even point at a 20 % utilization rate. So if you use it even a quarter of the amount you're allowed to, they lose money. They're just bleeding cash. Yeah. So you have to ask, what drives this massive cost? Because human typing questions can only work so fast. Right. A person can't

02:00

type $14 ,000 worth of prompts. The real culprit here is agent workflows. Let's define that quickly. Sure. Agent workflows are AI systems that independently loop and execute multi -step tasks. I have a bit of a vulnerable admission here. I still wrestle with my own token usage. when trying to set up agent loops. Yeah. It gets out of hand so fast. Oh, yeah. You are definitely not alone there. I mean, I looked at my dashboard last week and was just staring at the bill. It's because an

02:28

agent breaks a task down. If you ask it to research a company, it doesn't just write a summary. It breaks that into 20 subtasks. It searches the web. It reads a PDS. It scrapes a database. And it feeds every single step back into itself. So it's constantly talking to itself. Exactly. It loops infinitely until it solves the problem. And every single loop costs money. Going back to that buffet analogy, it's like an all -you -can -eat buffet where one guy just sits down

02:56

and eats the entire kitchen. Where he be? The restaurant cannot survive that. No, they can't. And we saw this happen. One firm reportedly burned through $500 million in a single month on Claude. Wait, really? Half a billion dollars? Half a billion, yeah. How is that even possible? Don't they have kill switches for enterprise contracts? Well, you would think so, but they didn't cap internal employee access, so everyone was running

03:18

complex multi -step queries. Wow. An employee asks for a massive data sort, the agent hits a glitch, and it just loops 10 ,000 times in the background. That is just financially terrifying. It is. And it proves you don't need a massive frontier model for every task. You don't need quantum -level AI. to summarize a Tuesday meeting. How can frontier models survive if the business model is this fundamentally broken? They'll likely transition from flat consumer rates to strict

03:47

metered enterprise usage contracts soon. Unlimited subscriptions die, replaced by strict pay -as -you -go corporate meters. Two sec silence. So because these frontier models are unsustainably expensive, the market is fracturing. Oh, completely. It is urgently fracturing into chipper, highly efficient alternatives. Let's talk about that push for efficiency. Startups are ditching expensive APIs, right? Yeah, they're moving to dirt -cheap alternatives like DeepSeek. And they're saving

04:13

up to 95 % on compute. 95%. That is the difference between life and death for a startup. Absolutely. And we're seeing specialized models emerge too. Look at the new Kimi K2 .7 code. Right. The coding model. Yeah. It beats Opus 4 .8 on tool use, and it uses 30 % fewer reasoning tokens. And reasoning tokens are just tokens spent thinking before typing an answer. Exactly. Less thinking time means lower cost. Plus, Kimi can be self -hosted for entirely free. So no cloud fees at

04:44

all. None. And the big players are scrambling to adapt. OpenAI just launched a $150 million partner network. To turn AI plans into actual workflows. Right. And Anthropic is testing something called Conway. I saw this. Whoa. Imagine scaling to an always -on agent environment like Conway. The possibilities are staggering. It really is. Conway is a standalone cloud agent environment. It uses webhooks and active Chrome browsing. Just to clarify, webhooks are automated messages

05:13

apps send when something happens. Exactly. And this isn't just for big enterprise either. We're seeing a massive explosion of consumer tools. Like Slashy. Yeah. Slashy handles AI email triage. It writes in your voice and makes sure you don't miss follow -ups. And Tastelab. I thought this was fascinating. Oh, Tastelab is incredible. It extracts a website's design DNA. It pulls the hex codes and typography directly into a template, right? And there's Permute, which is

05:43

a universal media converter. And Athenic 2 .0. Athenic is the data analysis agent. Right. It basically replaces a junior data analyst. It ships automated dashboards and reports. Are we moving toward a future of a million micro models instead of one giant god model? Definitely. A swarm of hyper specialized local models is far cheaper than one expensive monolith. We trade one expensive supercomputer for an army of cheap digital interns. Mid -roll sponsor placeholder.

06:11

So it's not just cost driving people away from these big... proprietary models, it is also access. Geopolitics is playing a massive role here. It really is. Geopolitics is forcing the world to build around US tech giants. I want to maintain complete neutrality here and just report the facts. Sure. Let's look at the actual events. US export restrictions recently forced Anthropic to shut off Fable 5 and Mythos 5. Yeah, that happened incredibly fast. Anthropic is actually

06:37

rushing staff to Washington, D .C. right now. They're claiming the security risk was overstated. What's fascinating here is the global ripple effect this caused. Bans didn't slow the competition down. No, they didn't. India is now actively rethinking its dependence on foreign models. Right. They are having a serious debate. Do they lean into open source or do they build sovereign AI? Sovereign AI simply means an AI model built and controlled entirely by one nation. Exactly.

07:04

And Europe is taking the sovereign AI route. They are heavily backing Mistral AI. Mistral is raising 3 billion euros at a 20 billion euro valuation. Which is huge. Though, to be fair, OpenAI and Anthropix still lead the global market. For now. But the massive breakthrough came just two days after Fable 5 was banned. Yeah, this was wild. China's Zippo AI stepped in. They released GLM 5 .2, and they released it as a fully open source model under a permissive MIT license.

07:33

That is the key detail. An MIT license means there are zero regional restrictions. Anyone can use it. And GLM 5 .2 absolutely dominated the benchmarks. It hit number one on BridgeBench. Right. It scored 100 .0 on BS and a 42 .8 on reasoning. So it actually beat the banned Fable 5 model. It did. And it runs at 300 tokens per second. That is incredibly fast. And it does

07:58

that at one tenth the cost. of the big proprietary models plus it has a 1 million token context window a context window is an ai's short -term memory limit during one conversation right so you can dump an entire massive code base into it at once we saw real world tests immediately

08:14

the platform z .ai rolled it out instantly yeah developers got their hands on it right away and it coded a 925 line svg clock from scratch it's pure math and visual logic it also built a functional 3d penalty kick game And a mini spreadsheet. All running flawlessly. If export bans immediately result in highly capable open source clones, do these restrictions actually work? Honestly, they seem to act as a catalyst that just supercharges

08:41

global open source competition instead. Export blocks fail to contain tech and only force competitors to innovate faster. Two sec silence. So while superpowers fight over these massive models, AI is quietly rewriting our daily realities. It's happening on a deeply personal level now. It really is. It's both taking jobs and managing our personal lives. The job data is pretty grim right now. AI layoffs are rising. Nearly 120 ,000 tech workers have lost jobs this year. Wow,

09:12

that's a massive shift. But there is a very surprising statistic here. Almost 75 % of unemployed Americans... Never apply for unemployment benefits. That is so high. People just quietly absorb the loss. Yeah. And this brings us to what we're calling the personal assistant paradox. Right. Because while AI is displacing tech jobs, people are leaning on AI to manage their daily friction. Exactly. For example, there are these 15 chat GPT problems going viral right now. Oh, I saw

09:38

these. People use them to dissect their paychecks. They use the AI to find highly specific ways to squeeze savings out of their budget. It's crazy. The same tech disrupting the job market is what people use to survive the disruption. Right. So what does this all mean? We are seeing the tension of macro -level job losses mixed with micro -level convenience. Yeah. And there's a lighter side to this convenience, too. It's not all just budgeting and layoffs. True. ChatGPT

10:06

now has a dedicated World Cup 2026 page. You can track live scores and matches. And you do it without ever leaving the chat interface. It's all just seamlessly integrated into your conversation. It becomes the primary lens for how you interact with the internet. It's the ultimate trade -off of the AI era, exchanging job security for extreme personal convenience. It really seems like we are trading long -term career stability for hyper -efficient... Automated daily task management.

10:33

Trading lifelong career stability for incredibly smooth daily task automation. Exactly. Let's briefly synthesize the major through line of this deep dive. The era of relying on a few massive, expensive, centralized AI models is fracturing. It is absolutely breaking apart. Between the massive token burn rates we discussed, the $14 ,000 underlying cost per user, and geopolitical export bans. Right. The global developer ecosystem is just actively routing around all of those

11:01

roadblocks. We are entering an era of incredibly cheap, hyper -capable, open -source agents that basically live everywhere. It's a completely decentralized future. I want to leave you with a final lingering question to ponder. If government roadblocks only motivate the global community to build smarter, faster, and completely open source models with zero regional restrictions, are we rapidly approaching a point where AI regulation is no longer a legal question, but a technical

11:30

impossibility? Two sex islands. Thank you for joining the Deep Dive today. Keep questioning the information around you. Out to your own music.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript