The Point of No Return: GLM 5.2 Approaches the Frontier

⁠¶ China’s Open Source Comeback

00:00

Ejaaz: Last week, a Chinese company released a free AI model that is as good as Anthropik's Ejaaz: best model. It also beats ChatGPT 5.5 at writing and coding, Ejaaz: but it comes with a twist. Ejaaz: It's a sixth of the price and it's completely open source.

00:13

Ejaaz: You can download it and run it at home. Now, in that same week, Ejaaz: the United States government banned Anthropik's most powerful model, Ejaaz: Fable 5, after someone revealed that an unrestricted version of it had hacked Ejaaz: into the National Security Agency's systems.

00:27

Ejaaz: I think we've reached a point of no return. And not to sound dramatic, but Ejaaz: in six months, it is very realistic that we will have open source or open weight Ejaaz: models that are accessible to anyone in the world with an internet connection Ejaaz: and 5 to 10k to run at home, Ejaaz: that they can fine tune to do anything.

00:45

Ejaaz: And it's mythos grade level models. These are the same models that we're hearing Ejaaz: rumors and reports from verified that they can exploit some of the most secure Ejaaz: systems in the world faster than any other exploiter has been able to do in the past. Ejaaz: And I think we're going to look back on 2026 as the moment or the year that Ejaaz: everything really changed and the point where humanity as itself really needs Ejaaz: to focus on safeguards and figuring out how to regulate

01:14

Ejaaz: and release these AI models in the future. So we've reached a convergence of Ejaaz: this really interesting trend where the most powerful models in the world are Ejaaz: freely available and open source, available for anyone to access. Ejaaz: And the government, the United States specifically, has an off switch for their most powerful model. Josh: Yeah, it's been a couple of months, it seems, since we've had some news on the

01:33

Josh: frontier of China. And you kind of forget about them every couple of weeks where Josh: they just kind of disappear, they quiet down. Josh: The new models come out, we see the fables, we see the mythos of the world. Josh: But then out of nowhere, they strike back and seemingly every single time it Josh: comes as a surprise at how powerful these new models have become so to start Josh: with this we have a new model from our favorite company to pronounce jeepu.

01:54

Josh: I feel like i want to name my dog that is such a cute name but jeepu Josh: is doing something not so cute they're actually releasing a model named glm Josh: 5.2 which kind of blew everyone's expectations out of the water i remember way Josh: back like six months ago when deep seek was doing this like Josh: deep secret release model everyone is like wait you did what with what and Josh: that's what this model feels like again we're getting that moment again because

02:18

Josh: this is an open weights model which is not to be confused with open source and Josh: we'll talk about that in a little bit but this is an open weights model that is if i'm Josh: correct about this within one single point of the sw bench pro benchmark which Josh: is the benchmark that a lot of people use for coding oh yeah of gpt 5.5 Josh: the like frontier coding model from open ai and that comes as a surprise because

02:39

Josh: the cost well one if you run it locally is free but two if you run it on a server Josh: is like you said earlier you just one sixth of the cost so you're getting a Josh: incredible amount of coding capability for something that costs a fraction of Josh: what it costs if you were to go to one of these larger language models and it seems to work,

02:58

Josh: almost as good, if I'm right. And this comes as a surprise to most people because Josh: every time we start to count China out, we're like, no, surely they can't catch up.

⁠¶ Benchmarks and Cost

03:06

Josh: They continue to chip away at this frontier. Ejaaz: There's a few things that people will jump to immediately. OK, Ejaaz: one, that these benchmarks can be easily gamed. Ejaaz: We're going to show you a few examples of benchmarks that couldn't be gamed Ejaaz: and GLM 5.2 performs really, really well. But the second thing is the cost.

03:22

Ejaaz: Cost has become a really important point of discussion amongst enterprises specifically that are spending Ejaaz: hundreds of millions of dollars per year to access Claude and GPT. Ejaaz: It's just too much money for them to spend in terms of like the return on investment Ejaaz: that they're getting in work that they actually see.

03:37

Ejaaz: So what they're now turning towards is these free open source models, Ejaaz: primarily designed and made by Chinese AI labs that can cut costs down drastically. Ejaaz: Just last week, we had Microsoft announce that they're replacing their co-pilot Ejaaz: LLM with not ChatGPT, with not Claude, but with DeepSeq itself. Ejaaz: So the point is, this comes at a very important time where cheaper models are Ejaaz: getting a lot of attention.

03:59

Ejaaz: So now when we look at GLM 5.2 specifically, it is Ejaaz: Five to seven times cheaper than GPT 5.5 and Claude Opus 4.8, Ejaaz: but performs, as we're seeing on the benchmarks right here, almost as good as Ejaaz: each of these models, specifically at the metric that is the most important, which is coding. Ejaaz: Now, a lot of skeptics quite rightly were like, I don't know if this is actually Ejaaz: true. Like, let me test it against a few other independent benchmarks.

04:25

Ejaaz: It came up pretty high. So if you look at the front end development when it Ejaaz: comes to like website design, GLM 5.2 Max is just below Fable 5. Ejaaz: We're not even talking about Opus 4.7 or 4.8 anymore, which it absolutely beat. Ejaaz: And then when we're looking at like anecdotes or feedback from like distinguished Ejaaz: individuals in the Western frontier. Ejaaz: So right now we're looking at a tweet from the CEO of Vercel.

04:46

Ejaaz: He goes, I'm genuinely impressed, almost shocked at how good GLM 5.2 is at coding. Ejaaz: So this is feedback from real people using this for real use cases. Ejaaz: For the last three years, Josh, we've basically been told that the hundreds Ejaaz: of billions of dollars that is being spent on AI CapEx is for one single reason Ejaaz: only, to gain a moat ahead of any other model provider. Ejaaz: So we spend all this money on compute to train a frontier AI model.

05:11

Ejaaz: And that moat, it doesn't matter what other companies do in China, Ejaaz: we will have the best model and that's enough for us. Ejaaz: This release from Gipu with GLM 5.2 basically shows us the opposite. Ejaaz: For a fraction of the cost, you can create a near frontier model that does like, Ejaaz: I don't know, 95% of the work, Ejaaz: And so it brings into question the valuation between these companies.

05:32

Ejaaz: Should they be spending this amount of money or can we just do it for a lot Ejaaz: cheaper like these Chinese AI labs? Josh: Yeah, well, the large AI labs, I'm not sure they have a choice. Josh: I mean, it's just that you have to continue to push the frontier forward, Josh: whether you like it or not. Josh: But I think what we're seeing is a lot of these questions that we were excited Josh: to see play out, we're starting to get answers to.

05:50

Josh: Like now it's less China versus America and more open source versus closed source Josh: because I mean, the open source models are coming from inside too. Josh: We have NVIDIA. They're working on open source models that are incredible, Josh: and they're making progress in that front. Josh: We have Apple now, who has an actually functional Siri on everyone's hardware Josh: device that runs essentially for free.

06:09

Josh: So they're slowly starting to nibble away at this, I guess, the lower bottom Josh: of the barrel set of use cases.

06:14

Josh: And then we have china which is glm that's deep seek that's these larger models Josh: where they're actually competing on the frontier so these big frontier private models are facing Josh: heat both from the lower end of the stack but also right at the top where these Josh: benchmarks sit and we're going to see how that plays out economically for in Josh: the case of jipu at least it's been playing out pretty well and,

06:37

Josh: we probably should talk about the stock a little bit believe it or not this Josh: company is publicly traded not here in the united states but this is publicly Josh: traded at least in china and it's gone up.

06:47

Ejaaz: What is that Josh: 1500 percent 15x on the year that's like a crazy return and some interesting Josh: facts about this return and it's it's so funny to see kind of i guess how inefficient Josh: chinese markets are also note that the chart you're seeing on screen Josh: they have a lunch break in their stock market i didn't know this labeled it, Josh: like i didn't realize that chinese stock markets had an hour-long lunch break Josh: in the middle of the day. So that's cute and that's fun.

07:11

Josh: But the numbers are pretty outrageous. When we trade, when we talk about expensive Josh: companies, we talk about SpaceX, who's trading what is it, like a very high Josh: multiple towards earnings. And, Josh: What we have with Jibu and this company that it's kind of owned by, Josh: Knowledge Atlas Technology, it's currently trading at about $136 billion market cap. Josh: It made $170 million or $107 million, I should say, in the full year of 2025.

07:35

Josh: That means it trades 1,300 times sales, which is just this unbelievably high Josh: multiple on this company. Josh: And I think it's a testament to the, I guess, the lack of availability to get Josh: AI exposure in Chinese markets, but also the confidence and the excitement and Josh: enthusiasm they have around companies like this. That was just an interesting thing to see.

07:53

Ejaaz: Yeah, I mean, at this valuation, it's about, what is that, like a fifth of Anthropics Ejaaz: valuation right now, which is, I think, around a trillion dollars. Ejaaz: So again, like it begs the question, is Chinese AI labs underpriced or are American Ejaaz: companies overpriced? And I'm curious to hear, like what listeners of the show actually think. Ejaaz: I tend to think that they probably need to meet somewhere in the middle.

⁠¶ Markets

08:16

Ejaaz: We were actually saying before we started recording, Could you imagine the reaction Ejaaz: to this news if Anthropic was a publicly traded company and a new 3D open source Ejaaz: model that was freely accessible to anyone could achieve pretty much 95% Ejaaz: of the capability of Opus 4.8? Ejaaz: Like, I wonder what that would have done to the stock price in like a fair market

08:36

Ejaaz: value, but crazy to see nonetheless. So if we're looking at a few different Ejaaz: metrics that compare cost and performance, just quickly to run you guys through this. Ejaaz: For input versus output tokens, for a million tokens, you're looking at around Ejaaz: $1.50 to $4.50 when it comes to cost. Ejaaz: Now, comparing that to Opus 4.8, that's around, I believe, $5 versus $25.

08:57

Ejaaz: So again, we're achieving that 3 to 5x cheaper when it compares to a model of Ejaaz: similar performance and capability. Ejaaz: Now, I was skeptical of the benchmarks, and I have a new favorite benchmark Ejaaz: to compare it against, which is called DeepSwee. Ejaaz: DeepSwee is basically a benchmark that gives no models any answers. Ejaaz: Typically, with a benchmark, you have an answer sheet, and it can kind of cheat Ejaaz: and look at it and figure out a way to get to that answer.

09:20

Ejaaz: There's no answer sheet for this Ejaaz: one, so it's a very accurate test of how good your model is at coding. Ejaaz: For DeepSuite, GLM 5.2 achieved a very modest fifth place. Now, Ejaaz: that is probably, or rather, fourth place, fifth place, fifth place. Ejaaz: And that is a pretty accurate standing of how agentic coding looks like for Ejaaz: this particular model. It is the highest number one place for open source model.

09:41

Ejaaz: It absolutely crushed Kimi K2 by 17 percentage points. or a very clear lead. Ejaaz: And it's great to see how it weighs up. Like if it may not be frontier capability, Ejaaz: but if you want a workhorse, if you want an agent that basically works overnight Ejaaz: and isn't going to break the bank, GLM 5.2 is probably something that you can look at. Ejaaz: Another thing is it's really good at front-end web development.

10:02

Ejaaz: So if you're looking at this screen right now, the website that you're seeing Ejaaz: was completely one-shotted in about 10 minutes from this one single model, GLM 5.2. Ejaaz: And repeatedly across design benchmark, Arena Benchmark was another one that I saw. Ejaaz: It performs really highly, in some cases beating Fable 5. So it's a really good Ejaaz: front end design model if that is something of interest.

10:21

Ejaaz: And then the final one, because I know a lot of listeners on the show is like, Ejaaz: you know, how good are these models at like trading, investing, making money for you? Ejaaz: Well, there's this very famous benchmark, which is called the Vending Benchmark, Ejaaz: which basically allows an AI model to control a theoretical $10,000 and see Ejaaz: if it can make money by stocking a vending machine and then conducting sales, Ejaaz: managing inventory against competition.

10:44

Ejaaz: It achieved second place right behind Claude Opus 4.7, which is the current Ejaaz: leading model. So it's also pretty good at making money as well.

⁠¶ A Six-Month Model Gap

10:51

Josh: Yeah, and it also has a very clear roadmap to continue to be good and to get Josh: even better. There's an interaction actually between Elon Musk and the CEO of Josh: Z.ai, who is creating these models. Josh: So this guy asked, what's your current timeline for China to reach Fableclass? Josh: GLM 5.2 certainly shortened the gap. And then Elon said probably Q1.

11:10

Josh: And then the CEO said, won't take that long. Which means they expect us to get Josh: a new Fableclass level model that's open weight and open source within the next six months. Josh: Which is incredibly compelling because that is going to be served up as open weights. Josh: And as you know, with open weights, you can actually run it on your own hardware. Josh: But the question is, do you actually want to run this on your hardware?

11:29

Josh: I see on Twitter all the time, people who are spending tens of thousands of Josh: dollars to get those Mac studios, they're stacking them up in their offices, Josh: they're trying really hard to run these models locally. Josh: And I hate to break it to you, but the math ain't really math in on this so well. Josh: So there's a suite by Mike Schweinbach I thought was great. And it says the Josh: minimum to run the model is about $20,000 in hardware and you get about 20 tokens per second out.

11:53

Ejaaz: For $20,000, that's like, Josh: That's pretty slow. It's not thinking that fast. And if you have these really Josh: long chain of thoughts, these long reasoning traces, it's going to take you Josh: a very long time to get an answer that involves deep thinking. Josh: So for about $20,000, you can get close to 35 billion tokens. Josh: And that's a 12 to one input to output ratio, assuming you have like good token caching setup.

12:14

Josh: So he's saying if you ran the hardware 24-7 with zero downtime, Josh: it would take roughly five and a half years just to break even. Josh: And that right there is why open weights models are incredible. Josh: You're probably better off getting it served directly from their servers from Josh: the cloud instead of running your own.

12:32

Josh: Because not only do you have to deal with the complexity, you have to power Josh: it all on, you have to deal with hardware stuff, and you have to worry about Josh: getting the actual hardware. Josh: Because Lord knows, getting those computers now is not as easy as it used to Josh: be. So interesting note on cost, Josh: on how available these are and accessible these are on a relative basis. Ejaaz: And the Chinese companies themselves are willing to subsidize these costs, just to be clear.

12:52

Ejaaz: Like to play around with Kimi K 2.7, which is their frontier model, Ejaaz: I've been able to access it and use it since they launched it. Ejaaz: And I've been free using it to kind of like do research and all that kind of Ejaaz: stuff. And I've never once been charged for it. So there's a high subsidy coming Ejaaz: from like the Chinese side of things as well. Ejaaz: The other thing I'll say is these numbers may look big, right?

13:11

Ejaaz: Like who on earth is spending $20,000 to get hardware that you can like run Ejaaz: at home to run these models open source? Ejaaz: But the idea is six months from now, 12 months from now, these very same models Ejaaz: will be distilled enough. Ejaaz: So that means it can maintain its intelligence, but good enough to run on your Ejaaz: local hardware at home, a custom PC, or maybe even your laptop.

13:30

Ejaaz: The trend that we're undeniably seeing with these open-source models in particular Ejaaz: is higher intelligence for lower-cost hardware. Ejaaz: And if that trend continues, we will end up seeing this model that we're talking Ejaaz: about today being able to run off your handset. So it's something that seems Ejaaz: unfeasible right now to access. Ejaaz: But further on down the line, open-source, in my opinion, is pretty undeniable.

13:52

Ejaaz: You'll be able to run it at home, and that's pretty good. But moving on.

⁠¶ The Fable 5 Ban

13:56

Ejaaz: The reason why we wanted to write this episode is there's a convergence of two trends, right? Ejaaz: So last week, we had a lot of reporting around Fable 5 being banned by the United States government. Ejaaz: The primary reason is the United States government does not think the model Ejaaz: is safe. If placed in a malicious actor's hands, we'll be able to be used against Ejaaz: government systems, hack, exploits, all that kind of stuff. And it's proven Ejaaz: itself on internal testing.

14:22

Ejaaz: And the most recent revealing was a quote from a senator saying that the head of the NSA Ejaaz: Explained in a red team exercise, which is like a controlled environment, Ejaaz: that Claude Mythos 5 was able to breach all of its systems. Ejaaz: And typically, it would take months for an individual expert to do that. Ejaaz: It did it in hours. And this is just a crazy story and headline to read.

14:48

Ejaaz: They've switched it off. It's not accessible to anyone. If you go on cloud right Ejaaz: now, you're unable to access Fable 5. Ejaaz: But the point is, these two trends have converged at the same time. Ejaaz: And it's important to discuss this because very soon in a few months time, Ejaaz: as that Elon tweet showed, we're going to end up with Mythos grade level models Ejaaz: that are freely available to anyone, subsidized by China or available to run at home for 10k.

15:09

Ejaaz: And that is pretty scary, I guess. Josh: Yeah. Is that the lead now? Are we at six months? Does that feel about right? Josh: Like if they, if they release Mythos class by the end of this year, Josh: and then that gives, I guess, an open AI and Anthropic a six month head start. Ejaaz: And then the head of Chippoo has said it. Josh: So, yeah. So it seems like that's about right currently where we have like a Josh: six month window between us and the current bleeding edge open source.

15:34

Josh: I could see that kind of getting closer and closer. It feels like they're right on the tail. Josh: Of course, understanding what's going on internally would be very helpful to Josh: know, because I'm sure GPT 5.5, well, we know we're getting 5.6 pretty soon. Josh: I'm sure Anthropic is working on something even more powerful than Mythos. Josh: And it feels like we don't really have a choice but to continue progressing Josh: as fast as we are. Otherwise, these are going to catch up.

⁠¶ Public Access and Competition

15:56

Josh: And they won't have the guardrails that are put in place currently by the Frontier Josh: models. Now, what's happening currently is we're seeing this fork. Josh: In terms of these private models where only people internally are now able to Josh: use them and anyone out in the world is getting, I guess, kind of disabled. Josh: They're getting a handicap because they're not actually able to access these frontier models.

16:15

Josh: So we're seeing this weird crossroads where there's a small subset of people Josh: that work internally within OpenAI, within Anthropic, that are getting access to these models. Josh: The government is limiting their public use, which means the public is getting left behind. Josh: And then China is coming up and they're saying, hey, in six months, Josh: we're going to be right here at your head.

16:29

Josh: So it's this really interesting dynamic that's at play. And we're going to really Josh: have to closely monitor this as these new frontier models continue to be released, Josh: because you have to assume, even though the world isn't using Mythos or Fable, they're continuing Josh: to iterate and to build better models. They're not just going to stop because of this. Josh: Same with OpenAI, same with all the other frontier labs.

16:48

Josh: The question is, are these models Josh: going to be held privately for just a small subset of people to use? Josh: Or is there going to be this path forward in which the public can use them? Josh: I think everyone's hope is that there is a path forward. Josh: But currently, we're at this weird standstill where it feels like China's kind Josh: of breathing down your neck here.

17:03

Ejaaz: Well, the irony also is if the government is just going to come in and switch Ejaaz: off the frontier model, it's going to push companies to use open source models. Ejaaz: Imagine you're an enterprise, right? And you're running your entire company Ejaaz: on Fable 5 or whatever the frontier model is from an AI lab. Ejaaz: And then suddenly you know that the government can just switch the button off Ejaaz: and suddenly your company can't do its thing.

17:29

Ejaaz: You're more incentivized to kind of like run an open model at home that's privately Ejaaz: inferenced such that you can never shut it down. Ejaaz: So if I was an enterprise that has been running Fable 5 and that has now been Ejaaz: shut off, I'll be looking over at this GLM 5.2 thing and thinking, Ejaaz: well, it's MIT open source.

17:45

Ejaaz: Yeah, maybe it costs 20K to run on hardware, but like I'll rather spend that Ejaaz: and save, you know, hundreds of millions down the line versus like going with Fable 5. Ejaaz: And yeah, maybe achieving frontier level performance, but then, Ejaaz: you know, being shut off potentially by the government, according to their agenda, Ejaaz: like that's not something that you potentially want.

18:02

Ejaaz: Now, I want to give a quick counterpoint to the whole Chinese open source AI Ejaaz: models are going to take over the world because they're cheaper, Ejaaz: they're as good, maybe not as good, but as good, good enough, Ejaaz: right? Which is very simple.

18:16

Ejaaz: If you're an American lab that has a frontier AI model that is expensive and Ejaaz: you see your neighbors, or if you see your adversaries, China, Ejaaz: distilling your model and presenting it as a cheaper model, you just do the same for your own model. Ejaaz: And Anthropic has demonstrated that many times, producing Sonnet. Ejaaz: Sonnet 4 is basically their cheaper model of Opus 4.8, I believe.

18:38

Ejaaz: And then you see it with ChatGPT, with GPT Flash. These AI labs will produce Ejaaz: a cheaper version, and they'll distill it directly from their frontier models. Ejaaz: And as these models get good enough to rebuild themselves, it gets easier to do. Ejaaz: So I can see a world where they release Fable 6 in the future with a companion Ejaaz: model, which is like Sonic 6. And it's super cheap for anyone that wants 85%

19:01

Ejaaz: of the capability and don't care about that extra 15%. And it's super cheap. Ejaaz: So it's competitive with the Chinese models. I don't think America has lost Ejaaz: the kind of like cheap model argument, but the open source one, Ejaaz: they definitely have. I don't see the American and labs open sourcing anytime soon. Josh: Yeah, well, we saw MetaPivot very clearly from the open source, Josh: but like the savior of the open source world to closed source very quickly.

19:22

Josh: And I mean, that hasn't worked out too well for them or anyone really,

⁠¶ Open Source vs Open Weights

19:26

Josh: which is disappointing. Josh: There is a small caveat. Maybe we should cover about what open source actually Josh: means because it's not truly open source. There are still some secrets. Josh: I think a better way to classify this is open weights. And when you go through Josh: training, there's, let's say, a trillion parameters. Each one of those parameters Josh: gets tuned over and over and over through each training run, Josh: which happens trillions of times.

19:46

Josh: And the output of this are the weights. It's just a large text file that has Josh: all of those parameters finely tuned that the model can run off of. Josh: What it doesn't include is the actual source code that it took to make that. Josh: It doesn't include the ability to reproduce it. All it shares is the outputs.

20:05

Josh: So while you could take their outputs and you could retune and fine-tune those Josh: parameters to give you exactly what you want Josh: it's not giving you the recipe it's not giving you the secrets on how it built it Josh: so there is still some proprietary knowledge as it relates to this open source Josh: model these chinese companies because they they are actually preserving the Josh: recipe in which they landed on this the data that they trained on there's a

20:24

Josh: lot of secrets the output Josh: is what's open source and that's technically open weight so when we say open Josh: source i think what we really mean whenever you hear open source model chances Josh: are it's open weights and that's a pretty big distinction because that allows Josh: them to keep their kind of their secret sauce of how they do it and it's also Josh: probably for the better because i assume, Josh: you got to imagine they've been distilling some sort of stuff from i mean i

20:44

Josh: remember seed dance that was so like obviously stolen material because it was Josh: just able to reproduce all the copyright and video formats from any public tv show in the world so. Josh: Where they get their data from leaves a lot to be desired and questioned, Josh: but that's kind of the nuance between open source and open weights. Josh: And what we're getting right now currently is open weights. Ejaaz: I don't necessarily believe it's open models versus centralized models.

⁠¶ Multi-Model Routing Arrives

21:10

Ejaaz: I think it lands somewhere in between. Now, we've been noticing this new type Ejaaz: of product that is getting used by a lot of software engineers and AI users. Ejaaz: It's probably best demonstrated by this recent product release from Sakana AI. Ejaaz: It's called this new model called Fugu. Ejaaz: And they describe it as a multi-agent orchestration system. Basically how it Ejaaz: works is you send their model a prompt as you do with ChatGPT or Claude.

21:38

Ejaaz: And it disperses that prompt across many different models. It could be closed Ejaaz: models like Claude and GPT. Ejaaz: Could be open models like GPO GLM or Kimi K2.7. as well as their own trained Ejaaz: model called Fugu, I believe. Ejaaz: And the result of this is like agentic debate. So these models kind of produce their own answers. Ejaaz: Then you have another model that kind of judges these answers and produces the Ejaaz: best answer from all of this.

22:04

Ejaaz: And the result from these tests is basically, not only do you have a better Ejaaz: quality output, but it's also cheaper. Ejaaz: So the orchestration module basically picks the best models to do something Ejaaz: when it's like cheaper, and then only uses the best models when it really needs Ejaaz: to solve a really hard task that the other cheaper models can't do. Ejaaz: So it saves you a bunch of money, and we see it across other companies like

22:25

Ejaaz: OpenRouter with their new Fusion API. The point being made here is, Ejaaz: We are headed towards a world where the ideal AI chatbot uses multiple models, Ejaaz: and they may not just be from the same company.

22:37

Ejaaz: So the question I have for the United States government and any government that Ejaaz: decides to regulate, whether it's open source models or closed source models, Ejaaz: how are you going to regulate every single model in the world, Ejaaz: especially when the model labs come from other countries or are in fact open source? Ejaaz: You can't regulate open source models. That's the whole idea of it, Ejaaz: whether it's open weight or open source.

22:55

Ejaaz: The whole idea is the government can't try to doubt if you're running it on hardware at home. Ejaaz: So it's just a really interesting nuance. I just don't think that the stance Ejaaz: that the United States government has taken so far is necessarily the most productive Ejaaz: one. I understand why they're doing it, but we need to figure out a different framework. Josh: It's funny because I saw this news this morning about this Sakana Fugu.

23:16

Josh: I think I'm pronouncing that right. I mean, surely I've never heard of this. Josh: I don't know if you've ever heard of this. I think a lot of people watching Josh: have never heard of this company. They're Japanese. They came out of nowhere. Josh: And suddenly they're posting benchmarks that show that it has higher performance than Fable. Josh: And maybe that's true. Maybe they use this mixture of agents. Josh: But I think it's also notable that a lot of this is benchmarks.

23:36

Josh: And I actually got some time to play around with the new GLM model this weekend. Josh: And while I'm sure it's great at coding and technical use, that's not really Josh: what I generally use the models for.

23:46

Josh: And as I'm actually using these models, I'm giving it the general vibe test Josh: i'm noticing that i really do strongly bias the american closed source models like, Josh: uh gpt and like anthropics um opus and um claude and i mean fable when it was Josh: available was incredible and Josh: although the benchmarks show that it's very competent at coding a lot of people Josh: aren't using it for coding they're using it for other things and

24:08

Josh: and the the general the vibe check doesn't get passed with these models yet Josh: at least um so i think that's something worth noting too is like these are just Josh: benchmarks i encourage anyone who's listening go try this out for yourself and see for yourself. Josh: Some people may actually get a lot of benefit from using a cheaper model. Josh: Some people just like having all the context in one place and they want just Josh: a better overall experience.

24:27

Josh: With the routing, I think this is a super interesting precedent that we're seeing. Josh: Sakana fugu and how they are choosing to route their outputs through a series Josh: of open source and closed source models in order to generate a better and more Josh: powerful outcome i wonder the costs i noticed that as i was looking through the documentation Josh: there was no real cost associated i have to assume it's, Josh: not as high but pretty close because it is routing through.

24:51

Josh: A lot of the private models and some open source models in order to get this Josh: which means it's probably consuming a good bit of tokens it's not totally going Josh: to be this like open source very low price model Josh: but it is interesting to see this trend towards more router based applications Josh: where not everyone needs to solve this incredibly difficult challenge.

25:06

Josh: Perhaps you spin off a few sub agents, they use a more lightweight model to Josh: get you an answer without needing to consume a lot of those higher cost tokens. Josh: So it's cool, innovative, I won't say it's novel, we've seen this before, Josh: but it's a new iteration of this that is now showing pretty compelling benchmarks.

25:22

Ejaaz: On the cost side of things, if it's anything like OpenRouter's Fusion API, Ejaaz: which does the same architecture, it achieves roughly like 30 to 50% cheaper Ejaaz: versus the frontier models, which isn't that major compared to like some of Ejaaz: the Chinese open source models. Ejaaz: But it still saves you a bunch of money if you're an enterprise using this at length.

⁠¶ Regulation and the Road Ahead

25:41

Ejaaz: I'm trying to think about the major takeaway that I have for myself after we've Ejaaz: done this episode, Josh. Ejaaz: And I think the main one is I'm inclined to say, and I hope I'm wrong, Ejaaz: that future AI model releases, Fable and above, whether it comes from GPT 5.6 Ejaaz: or 6 or other frontier AI labs, Ejaaz: they're going to be more controlled in their release because governments are Ejaaz: going to start getting more involved.

26:04

Ejaaz: We're going to start seeing nationalization attempts from different nation states Ejaaz: in order to figure out how to release these AR models because if they're out Ejaaz: in the wild, they can exploit and cause some real damage. Ejaaz: I don't want to think about what could happen in terms of a major event, Ejaaz: but I think we're reaching that point where we need to pay careful attention. Ejaaz: So that's what we're trying to do on this episode. At least that's what I'm trying to do.

26:27

Josh: Yeah, I think that's right. Like the speed and acceleration of these models Josh: and the cadence in which they're released is up only. Josh: If we had a chart that showed you the length of time in between major model Josh: releases, It is just getting shorter and shorter and shorter, Josh: and that's not changing. Josh: So there needs to be a way to reliably be able to push these out.

26:44

Josh: Otherwise, the gap between what exists behind closed doors and what's available Josh: to the public is just going to keep growing. Josh: And I'm not sure what implications that has, but it sounds like it is noteworthy and something... Josh: Something needs to change in a material way because the speed and velocity in Josh: which progress is being made is not slowing down. Josh: Like, what does this look like a year from now? How quick are these models able

27:05

Josh: to improve themselves? What are the benchmarks look like? Can we even create Josh: benchmarks anymore because it will be so capable? Josh: We're right on that cusp because we are approaching this vertical asymptote off the curve. Josh: And it's just like, it's a little weird. It feels like we're on this roller Josh: coaster and we're like kind of going down, but I guess it's inverted where we're Josh: going up and we're going up really fast and you're not really sure. It's escaping.

27:24

Josh: It's escaping control in a way well i wouldn't say escaping control but it's Josh: just like that it's it's definitely getting fast and it's like okay like if Josh: you're driving your car really fast you got to be a little more careful once Josh: you reach high speed because like things things can kind of get a little shaky quickly so. Josh: We're at that point and models are getting very capable very quickly. Josh: I can't imagine what OpenAI's mythos class model looks like.

27:46

Josh: I'm sure they're working on them. Josh: We talk about, I mean, the hardware. I always think about these are the Blackwell series models. Josh: What happens with the Vera Rubin series models? It's like this, Josh: we are going to accelerate so fast. And I think it's important to, Josh: yeah, work on these safeguards now where it's still reasonable to catch up, Josh: where there's only one model release in which you have to focus on.

28:06

Josh: And there's not 10 different ones from all these different companies that are Josh: being pushed every single week so interesting that's the update china is, Josh: back with their open weights model not to be confused with open source and um Josh: yeah we still don't have fable access so Josh: hopefully these things will get sorted but i think it's it's noteworthy that Josh: china they they never disappeared i want to know what deep seek is doing next

28:27

Josh: i think that's my next question is like where's deep seek at where's deep seek v v5 or v6 they just.

⁠¶ Closing

28:31

Ejaaz: Raised a massive round 50 billion dollars um that's their valuation at least Ejaaz: there's still a fraction of frontier labs but yeah they raised like uh was it Ejaaz: nine billion dollars the founder himself put in three billion dollars there Ejaaz: they're doing pretty well and we haven't seen a model race from them anytime soon Josh: Yeah yeah so will be fun to see but that is the update on china on open source

28:50

Josh: thank you guys so much for watching as always if you enjoyed this episode don't Josh: forget to share it with a friend who might also like the show who might care Josh: about china or open source models or wherever it may be Josh: if you listen on a podcast player rating us how you believe we deserve to be Josh: rated is always appreciated we Josh: love the five stars those are always great uh newsletter twice a week next one

29:10

Josh: is dropping on wednesday a day after you listen to this and yeah that's i have one final. Ejaaz: Request josh something that you and i discussed on our on our walk uh last week but um Ejaaz: We are in the market for sponsors or anyone that can support us, please. Ejaaz: Josh and I and producer Luke have been keeping the lights on this entire time Ejaaz: and we've reached a point where we're feeling really confident about the numbers Ejaaz: and all the support that you guys have given us.

29:38

Ejaaz: And we would love to have a partner that we feel very passionate about join Ejaaz: us and support us in our vision of growing this into the leading frontier and Ejaaz: AI tech podcast in the world. Ejaaz: So if there's anyone out there listening to this that is inspired or wants to Ejaaz: support us, let us know, DM us, you know, we're on X, we're everywhere, Ejaaz: just reach out and we would love to hear from you. Josh: That would be great. All the support is very much appreciated.

30:02

Josh: Keep the lights on around here and keep things going strong. Josh: So yeah, thank you as always for the support. If you made it this long, Josh: you're a real one and hopefully you enjoyed this episode. So thank you as always Josh: and we will see you on the next one.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript