Monologue: Tokenpocalypse Now

Speaker 1

00:02

Azo Media.

Speaker 2

00:05

Hello and welcome to the second of this week's Better Offline Monologues.

Speaker 3

00:08

I'm your host ed Zetron.

Speaker 2

00:18

It's been a funny few weeks watching the little spurts of air coming out of the AI bubble. It's unclear when it will burst, what will burst it, really even.

Speaker 3

00:25

What's going on.

Speaker 2

00:26

Half the time, they just all seem to be running around like a billion mister beans. But I get the sense that everything will accelerate dramatically based on one or two big events, things that just knock the confidence of everyone. Maybe it's an AI company dying, or a funding ground not coming together, or maybe it's a hyperscalar cutting Capex Goldin SAX analyst Rich Pirotovsky said last week that he

00:48

believed that everybody was spending simply to remain competitive. Fucking love that, don't you, and added that the first hyperscala to signal that it can slow the pace of spending

00:57

would likely see its share price rewarded. If I had to bet, I think it's between Meta and Microsoft, the latter of which has made numerous noises about using cheaper AI models like deep Seek and its co pilot cowork product Yeah and Also, I should add the Sacha and Adela keeps doing interviews where he's like, yeah, no, one company should have all that power. We shouldn't rely on one lab, one lab being able to be shut down at any times bad.

Speaker 3

01:21

We can't.

Speaker 2

01:22

It's really funny watching this guy change his chew and considering how a year and a half ago, a year ago he was rock hard for open AI, and six months after that rock harder for Anthropic. It's almost as if he's just doing random shit based on what he

01:36

thinks might work, you know, like a fucking cargo cult. Anyway, as I've mentioned over the last few months, both Anthropic and open Ai both started to charge their enterprise customers companies with over one hundred and fifty users for the actual cost of the AI tooken spend as of the middle of Q one, twenty twenty six. To explain, when you use a regular chat, GPT or claud account, you burn tokens, each one about three quarters of a word, up to a five hour or weekly limit, depending on

02:03

what kind of model you're using. So with Anthropic on their clause subscription Opus four point eight and Fable when it was around, had a specific limit to themselves. But anyway, just the more powerful the model powerful being meshed by the companies, of course, the more it burns and the

02:17

less you can use it. On a regular subscription you're able to burn and I am not kidding eight thousand dollars worth of tokens a month on Claude and fourteen thousand dollars a month on open Aiy's chet GPT codec subscription.

Speaker 3

02:30

All for two hundred bucks a month.

Speaker 2

02:32

Pretty good deal, right, you know? It's like the discount prices sketch from Tim and Eric that I reference every so often just to see if anyone emails me. And so Yeah, after years of being able to burn thousands of dollars of tokens for two hundred dollars a month, enterprises are suddenly having to pay the actual costs. The

02:50

result is multiple companies capping their workers token spend. As I've reported, both T Mobile and Brecks have done so, and others have reported that Uber, Meta, Walmart, Coinbase, and Cisco have all done so. Meta notably had a token maxing leaderboard. Very funny again, just changing direction based on random signals. None of these people have a plan, none of them. None of these companies have a plan. Nobody

03:12

integrating AI has a plan. If they did, they would have been like, Yeah, let's make sure we know how much this is gonna cost, or let's make sure we know whether this is good or not, whether we can measure the roy you know, anything that would suggest anyone

03:26

knew what the hell they're doing, but they don't. And four o four media reports that Management Consultancy Firm and Evil Corporation Accenture, which is an insane way of saying that I'm just going to keep has seen what it calls soaring token spend based on leaked audio recordings, with much of that token spend being driven by non engineers doing things like combining PDFs into presentation slides. I love this shit is the easiest stuff in the world made

03:53

slightly easier but probably worse, costing way more. You could definitely hire an extra, but you could just hire a contract to do this. You could iira a bloke at forty five bucks an hour to do that for you. Maybe the happiest man alive probably cost your less thought too. I know, Ugh, God, this stuff is exhausting. You ever sit and wonder what you could do with the rest of that money I could have, Oh no, not like millions of dollars a month. I could have like a

04:20

week's worth of diet coke. Anyway, it turns out that telling your workers that AI can do basically anything and to use it as much as possible starts costing you a lot of money when you actually pay what it costs. As I've said before, we're barely three months into enterprises paying the actual cost of AI, and they're already screeching

04:37

like they're being pecked to death by birds. And that joke is a courtesy of Kisei Kugawa, whos said that to me about a year ago when we talked about the amount of money that Anthropic was allowing people to burn on Claude subscriptions. Meanwhile, the threat of open source

04:50

models appears to be getting more serious. I've heard from numerous sources that Chinese AI lab z poos GLM five point two coding model is competitive with Anthropic Sopus four point eight at somewhere between a quarter and a sixth of the price, but that seems to vary based on the cost of the task. As far as whether GLM five point two is actually competitive goes it appears that the hype is somewhat real, but it's kind of you

05:13

kind of had to check your saucing. Open source AI coding company Cliin ran benchmarks on a natural bug from its repo and found that while GLM used twice as many tokens to fix it, it also did so at a cheaper cost forty one cents versus OPUS four point

05:27

eights eighty one cents. Opus was quicker, taking one point six minutes versus glm's four point seven minutes, but one thing that stood out was the GLM to quote client cleaned up dead code and verified the build compiled before completing something that Opus, the more expensive frontier model run by the company that uses billions of dollars, apparently didn't

05:46

bother to do. It's just one bug, but GLM five point two seems to score well on many of the major benchmarks, coming third on cybersecurity firms semgreps cyber benchmarks, with the top two being semgp's own harness, so their own thing that they put on top of the models running Opus four point eight and GPT five point five, followed by GLM five point two, which is interesting.

Speaker 3

06:07

It's very interesting if.

Speaker 2

06:09

I was Glamy Samuel when I'd be looking at this and getting a little worried. I'd genuinely be looking at them and even saying, oh, why don't I train my own? But you know, that might actually be a bad idea, even though it's fully possible, because GLM five point two is a true open weights model. It's released under something called the unrestricted MIT license, which means you can anyone, by the way company, a regular person whomever, can self host on, fine tune it, or use it in any

06:34

way they wish. And yeah, I mean in theory, any company that wants to use LMS could either run their own version of it locally or be on a rig that cost tens of thousands of dollars, or pay an inference provider like base ten to spin up and train their own version. Now, I must be clear that there's not a ton of evidence that running GLM five point

06:52

two is profitable for the inference provider. In fact, there's really note I genuinely have looked, nor do we have any significant signs that customers are moving away way on mass from open AI and anthropics models. People generally like name brands, and they even though Anthropic shit is constantly broken. Go look at their stability page, stability page, uptime page. I don't know what you call that. Email me if you know what they call that, or I could just look after this.

Speaker 3

07:16

I'm going to keep going though. Nevertheless, people want name brands.

Speaker 2

07:19

People like paying for a company that ostensibly works on this and keeps it updated.

Speaker 3

07:24

How true is that? Who knows.

Speaker 2

07:25

I don't think people are buying lllms because of particularly sensible reasons anyway. But what might change things would be Microsoft, Amazon or Google offering GLM five point two via their foundry Bedrock or Vertex AI platforms. And even then it's how does left customers are excited about AI or just paying open AI and Anthropic to use buzzy models? Really

07:44

isn't clear. We should know by now. All that being said, now does feel like the perfect time for a shift towards open source models, se if only because everyone is

07:53

crapping themselves about costs. Everyone is really freaking out, And I think that this is going to be the time when we start separate the wheat from the chaff, the people that actually like this stuff versus the people who just feel good because they're doing AI and they don't know what else to do with their time, and they like forcing people to use it.

Speaker 4

08:12

Now I'll get to who actually does in a minute.

Speaker 2

08:25

Open Routers ranking show a dramatic move away from Frontier labs. The top ten most popular models are all open source safe for Opus four point seven at six, Opus four point eight at eight, and Sonnet four point six at ten, as well as a mysterious new model called al Alpha, which is currently free to use, which is probably why it's trending. To be clear, anthropics models still dominate the top models by task across the board, and that's based

08:47

on the spend on the task. I hate to give him any credit though, I really don't like doing this, but Scott Gallaway recently made the point that China could thrive by AI dumping high quality open source models into the ecosystem as a means of destabilizing the major model providers. And I hate to say that's what's happening. It's like

09:06

a fucking broken clock. I guess. Zepu and Minimax are both partially government funded by the Chinese government and also both wofully unprofitable, but neither do so at the scale of anthropic or open aim. While both do the kfabe discussion of AGI and autonomous compute and all that bullshit, they seem far more focused on creating open weight models to compete with them. And just for the differentiation here,

09:29

open source means you can share some of it. Open weights means you actually can deddle with the model itself. That's a flattening. Someone's gonna email. They're gonna be mad at me.

Speaker 3

09:36

I don't care.

Speaker 2

09:38

Sorry, Like it gets the point across, all right, just right on the subreddit if.

Speaker 3

09:42

You're mad anyway.

Speaker 2

09:43

In truth, I think the future of lms will end up being locally run models and expensive specialist hardware. The underlying economics of cloud based GPU compute do not make sense for anybody. You need to make sure the GPUs are saturated. You have to buy in advance, otherwise you won't like, you just won't be able to get the capacity. And if you have less people than you expect, you will lose money for sure.

Speaker 3

10:04

And if you have.

Speaker 2

10:05

More people than you expect, you have to get more compute, which means you'll also lose money. So as long as you can goldilocks this bullshit, maybe you'll eke out the world's worst gross margins in history, or you'll just lose money, which is what everyone's doing.

Speaker 3

10:18

But I think there may be.

Speaker 2

10:19

A point at which ZEPU, minimacs or even American model developers start focusing more on services driven custom deployments and become boring, ugly licensed driven monstrosities like Oracle.

Speaker 3

10:31

And that's the thing.

Speaker 2

10:32

I've talked to quite a few people recently and they are all saying the same thing. They've even in Vidia is moving into this realm and they're selling these giant hundred grand machines. I think that's kind of interesting. And like I said, that GPU compute, it requires a certain critical mass of customer demand that I don't believe will exist once the cargo culture of the AI bubble passes, because GPU commitments are done a few years in advance, and any drop Paul Lull kills the margins of even

11:01

a cheap to run model. Remember Baseten just they're an influence provider. They just raise one and a half billion dollars. They're not profitable. I mean, ZEPU isn't profitable, and they do the same thing open ai does with their stuff where they say, oh yeah, yeah, our cost of revenue, it's lower than our revenue. However, when you add the sales and marketing, it's actually it's actually unprofitable as well. Oh also we lose hundreds of millions of dollars because

11:24

of training. Other than that, we're super profitable. Yeah, it's really fucking stupid industry. And look, even if GLM five point two can do things at half the cost of OPAHS four point eight, that's still half of what in some cases is hundreds of thousands or millions of dollars a month. I've heard tell from multiple source is that that hardware demand is growing and that customers at big companies are starting to look into those high end hardware solutions.

11:50

And I think that that is the only future for this stuff, and I think it would reduce the cost significantly for the end user, probably keep this shit off the internet, massively marginalize the damage that LM based code can do by focusing on people actually give enough of

12:07

a shit to buy the buy the actual machines. And ah No, I feel like an actual serious fiscal commitment makes companies take things a little bit more seriously, unless, of course, your Microsoft, Google and Amazon and meta buying GPUs, at which case you gives a shit. But I mean, if this takes off, it is it's curtains vazousha. It's it's a bad time for Open AI and Anthropic unless they too start doing these hardware driven things. This is just a guess. This is just a guess they would

12:33

ever think about that. But I think it's possible. I think it's really the only economically viable end for this unless there are just going to be companies that lose billions of dollars a year for no apparent reason. I just don't think that's gonna happen. It's not enough money to do it. And goddamn, please go and listen to the Open AI will not get bailed out AI is not too big to Fail podcast They did.

Speaker 3

12:56

I'm tired of.

Speaker 2

12:57

Getting Please don't please stop emailing me.

Speaker 3

13:00

I love hearing from you.

Speaker 2

13:01

I really I would love to hear from you all more other than the people who email me saying I think the GPUs all for surveillance.

Speaker 1

13:08

I think it's all that. I think it's a big going s that's called Oracle. Oracle's first client was the CIA. Oracle has a giant government gpu array. They have several of them.

Speaker 2

13:20

Jesus Christ, that's already happening, and they're not doing surveillance on that. They did that with Maven, they did it with the promise software. Jesus fucking Christ. Sorry, I understand that I sound agitated here. It's just that if the only choice you can make when looking at something is to say, well, bad thing will definitely happen, that's not actually an intellectual pursuit. It's just crapping yourself. It's just saying, well,

13:42

everything's bad, so everything stays bad. I actually want to bring you a message of hope. I don't think these companies get bailed down. I don't think they get anything. I think that they get allowed to die. I think partner assets might be scooped up by Amazon, Google and Microsoft. I think the government might take a chunk out of them. Perhaps they get some sort of bridge financing. But I actually don't know if the government could even extend that

14:05

because AI is very unpopular with regular people. Regular people do not like this, and they hate data centers, and they have intelligently connected data centers, these horrible AI companies, and so in the end, I tell you please stop emailing me. The surveillance thing. Hey, it's not true. You're wrong. You sound you sound like someone that wants to be paranoid but doesn't want to do the reading well enough to be paranoid. If you want to be paranoid, go

14:31

read the promised software. Go look at Oracle like Go look at the actual history of these companies if you want to be freaked out by something.

Speaker 3

14:38

But in truth, the AI industry is just.

Speaker 2

14:40

A direction the zeg Grigor of capital bumbles about and is only kept alive by a belief system that's mostly growth focused and doesn't really understand how to run a business anymore or even build innovation or technology. And it's blatantly obvious that lms are at best a classic risk versus reward software tool for the real sickos that know what they're getting into. But the problem is it's been sold as this mass market panacea for basically any problem.

15:09

And I'll say that my thesis remains undeterred. I believe a great deal of AI usage is cargo cult shit driven by executives that don't do any work and people that feel pressured into adopting AI by their peers on the media, or of course, people that spend twelve to eighteen hours a day on Twitter who just follow the algorithm. I call it the Dunce's Casino because you just crank the thing and a bunch of insane people come out. And all of these people come out and they're like, if.

Speaker 3

15:36

You don't use agents, you're going to die.

Speaker 2

15:38

They're going to kill your wife, They're going to eat your dog.

Speaker 1

15:41

If you don't use loops, Boris Journey's going to blow up your house. Ah.

Speaker 2

15:46

Twitter is driving people insane using AI, but not because of groc or anything, but because it just fire hoses nutters into your feed who will constantly make you feel terrible for not picking up the next AI thing. It's this insane. It's I keep saying cargo cult because that's what it is. It's people just going around being like, yeah, you know, this is the big thing now.

Speaker 3

16:09

We all like that, right, we love it, We love it. It's good.

Speaker 2

16:11

If we all follow this, the great prophecy will come true and everyone then everyone will make money somewhere somehow. I think it's sickness. It's a genuine sickness. So I went a little Tucker cos and there, what's going on? Why are we using lms.

Speaker 3

16:37

What are they going to do to us? What are they going to do to your family?

Speaker 2

16:41

But anyway, back to the podcast, back to the monologue, I apologize for out libbing. I must be clear as well. I do love hearing from you because I keep getting emails from people that actually use AI, and I got one just before I started recording this from a bloke who said, and I quote, he's used it to deliver a good work and seeing it tried to submit absolute cramp. And that really is the AI industry. Hey, you know, sometime it works, and when it does, it's all right,

17:07

and when it doesn't is dogshit. Yay, great, I'm so glad we sunk a trillion plus dollars into this. Look, these are specialist tools that are being forced to solve generalist problems, and they're probabilistic models that fail to do deterministic things, and they're really just an eternal financial black hole for anyone trying to scale them into a Google search or a meta sized business. And it's never going

17:31

to work like that. Just before this, as well, I saw that Broadcom and open Ai announced their Jalipino chip is giant dinner place sized motherfucker.

Speaker 3

17:42

Dinner place, dinner plate. I'm not fixing that.

Speaker 2

17:45

And it's funny because everyone's doing the cargo culture around it. They're all like, oh, well, it's going to bring down inference costs. If they did that, that would fix everything. If Kalapino fixes everything, then everything will work out. There's no proof that that's the case. Actually, here's the thing. If you want if you want to like, here's a

18:04

specific request if you have an information subscription. At some point in the last year, I know there was an information story that suggested that the Broadcom's open Ai chip would have modest gains. If you bring that to me, I will fuck it. I will venmo you five dollars if you can bring me that story. Seriously, I need that story because I'm pretty sure that if Broadcom's chip was going to change everything for open Ai, open Ai would sink every fucking dollar they had into it. Open

18:32

Ai would dispense within video. Instead, it's like, oh, yeah, we're getting Microsoft to pay for some of it, and it'll be there in twenty twenty seven.

Speaker 3

18:40

Maybe. Uh oh no.

Speaker 2

18:43

Sounds like something you pre ordered on Kickstarter and you got it and it kind of sucks and it turns out the video is AI generated, which I did just see with this thing where someone had done a Kickstarter where it's a yeah, you can build things using real principles and you pour your own cement, but it's really small and the whole thing is AI generated. Fucking how this industry really is miserable?

Speaker 3

19:04

Isn't it?

Speaker 2

19:05

All the people every other era, other than of course crypto and the metaverse, when something cool happened. Even in the augmented reality era, there were still people doing goofy, funny shit. There were still people doing really interesting things with computer vision. There were people doing interesting things with depth sensing. The fact you can measure stuff on your iPhone I'm pretty sure you can do on Android too, is because of depth sensing cameras and augmented reality.

Speaker 3

19:29

It's cool.

Speaker 2

19:30

The snap spectacles are embarrassing, and if you get them, I assume that people will just come and beat you up. I mean it triggers a deep bully instinct. I think in ninety percent of human beings, you look like Roy Orbison, but so much worse and with so much less talent. Hell no, this whole era pisses me off. That's the long and shure of it. I think you've probably got that by this point. I and apologize that I've been a little all over the place of this monologue. But

19:58

it has been a long few months. I've still got plenty of energy, but this has definitely been a tough few But it's hard to tell what happens next. I know it's hard. I know you asked me when it

20:09

will burst. I truly don't know. But I'm gonna do my best every week to inform you, keep you up to date, and to try and explain what the fuck is going on, because quite frankly, I don't think most people have any idea, and I think that that's actually the story of the moment, because when you look around the tech industry right now, does this strike you as an industry full of leaders or followers? Meta is apparently working on a polymarket competitor after releasing some new three

20:39

hundred dollars fucking glasses that only perverts will wear. What Microsoft a year ago was saying open source was bad, now it's good. Six months ago, Anthropic was good. Now most of a Sulliman says they're too expensive and they're going to stop working with them. This is an industry full of people working things out as they go along.

21:00

Except the difference between you and them is that they have billions of dollars, billions and dollars, billions and dollars I'm keeping it billions of dollars, and also all the talent and all the resources and all the ability to raise that they could do anything.

Speaker 3

21:16

And they choose to do this.

Speaker 2

21:18

Remember that even when this era ends, remember that this is what they chose to do. This is the intentional choice of Google, Microsoft, Meta and Amazon. They could have done anything else, anything, And there are real problems they could solve. There are problems they could solve with their own fucking problems that they've caused with their apps. They could make Facebook better or Instagram better, except doing so with lower their revenue.

Speaker 3

21:46

I don't know.

Speaker 2

21:47

Microsoft could redesign the office suite to work to actually work to They could make Microsoft teams stable. Google Search could work again again. It would lower the growth, It would make the growth go down. And I think by the time these fuck nuts realize that they have a real problem, they'll realize that they've pissed away the only opportunity they really had to change direction, because had they started this era and just been like, we're out of

22:12

hypergrowth ideas now. They probably wouldn't say it in those exact words. They could have said, this is the era of experimentation, this is finding our new business lines the next Google search and announced it like some world's fair bullshit. Oh no, Maybe the markets wouldn't have been quite as happy, but they'd be happier that they're going to be at the end of this era. And I think in the end, it's frustrating to watch because logic and reason have gone

22:36

out the window. But gravity exists and this era will come to an end. And next week we've got an awesome interview something that's not about AI for once, with two writers from The Wall Street Journal about the greasy, nasty, freaky world of poly markets influencer marketing involving all sorts of weird shit, like ads that include fake bets on fake poly market websites, genuinely weird shit, And that'll be coming up next Wednesday. I enjoy hearing from you. Please

23:04

email me itron a Easyatbetter Offline dot com. Jesus Christ, I messed up my own email. It's been a long few weeks, folks, but I'm loving hearing from you. Please email me, please jump on the subreddit. I love to hear from you. I love feedback. I love feedback. That's reasonable. If you send me a typo correction, I'm only going to get mad.

Speaker 3

23:26

But I do.

Speaker 2

23:26

Love hearing from you, even the typo correctors, even the pedants. I love having the feedback.

Speaker 3

23:31

And I love you all. I love you all. I enjoy hearing from you.

Speaker 2

23:35

I enjoy being here and speaking to you, and I'll speak to you next week.

Speaker 3

23:38

Zitron out

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript