Kimi K2 is the Open Source Claude-Killer | US vs China AI | Limitless: An AI Podcast

⁠¶ Intro

00:03

Ejaaz: A bunch of AI researchers from China just released a brand new AI model called Ejaaz: Kimi K2, which is not only as good as any other top model like Claude, Ejaaz: but it is also 100% open source, which means it's free to take, Ejaaz: customize and create into your own brand new AI model. Ejaaz: This thing is amazing at coding, it beats any other model at creative writing, Ejaaz: and it also has a pretty insane voice mode.

00:27

Ejaaz: Oh, and I should probably mention that it is one trillion parameters in size, Ejaaz: which makes it one of the biggest and largest models to ever be created. Ejaaz: Josh, we were winding down on a Friday night and this news broke that this team Ejaaz: had released this model. Ejaaz: Absolutely crazy bomb, especially with like OpenAI rumored to release their Ejaaz: open source model this week. Ejaaz: You've been jumping into this. What's your take?

00:54

Josh: Yeah. So last week we crowned Grok 4 as the new leading private model, closed source model.

⁠¶ The Rise of Kimi K2

00:59

Josh: This week we got to give the crown to Kimi K2 we got another crown Josh: going for the open source team they are winning I mean this is Josh: better than DeepSeek and DeepSeek R2 this is basically DeepSeek R3 Josh: I would imagine um and if you remember back a couple months DeepSeek really Josh: flipped the world on its head because of how efficient it was and the algorithmic Josh: upgrades it made and I think what we see with Kimi K2 is a lot of the same thing

01:20

Josh: it's it's these novel breakthroughs that come as a downstream effect of their Josh: needing to be resourceful Josh: China, they don't have the mega GPU clusters we have, they don't have all the Josh: cutting edge hardware, but they do have the software prowess to find these efficiencies. Josh: I think that's what makes this model so special. And that's what we're going Josh: to get into here is specifically what they did to make this model so special.

01:41

Ejaaz: Yeah, I mean, look at these stats here, Josh, like 1 trillion parameters in total.

01:46

Ejaaz: It's 32 billion active mixture of expert models. So what this means is, Ejaaz: although it's really large in size, typically these AI models can become pretty Ejaaz: inefficient if it's large in size, it uses this technique called mixture of Ejaaz: experts, which means that whenever someone queries a model, Ejaaz: it only uses or activates a number of parameters that are relevant for the query itself.

02:07

Ejaaz: So it's more smarter, it's much more efficient, and it doesn't use or consume Ejaaz: as much energy as you would if you wanted to run it locally at home or whatever Ejaaz: that might be. It's also super cheap.

02:18

Ejaaz: I think I saw somewhere that this was 20% the cost of clawed, Ejaaz: josh which uh we love that insane uh Ejaaz: for all the nerds that kind of want to run you know Ejaaz: really long tasks or you know just set and Ejaaz: forget the ai to to run on like your coding log or whatever that might mean Ejaaz: you can now do it at a much more affordable rate at one-fifth the cost uh than Ejaaz: some of the top models that are out there and it is as good as those models

02:42

Ejaaz: so just insane kinds of things josh i know there's a bunch of things that you Ejaaz: wanted to point out here on benchmarks um And what do you want to get into?

⁠¶ Efficiency and Cost Benefits

02:50

Josh: Yeah, it's really amazing. So they took 15 and a half trillion tokens and they Josh: condensed those down into a one trillion parameter model. Josh: And then what's amazing is when you use this model, like she said, Josh: it uses a thing called mixture of experts. Josh: So it has, I believe, 384 experts.

03:05

Josh: And each expert is good at a specific thing. So let's say in the case you want Josh: to do a math problem, it will take a 32 billion parameter subset of the one Josh: trillion total parameters, and it will choose eight of these different Josh: Experts in a specific thing. So in the case of math, it'll find an expert that Josh: has the calculator tool. Josh: It'll find an expert that has a fact, like a fact checking tool or a proof tool Josh: to make sure that the math is accurate.

03:29

Josh: It'll have just a series of tools to help itself. And that's kind of how it Josh: works so efficiently is instead of using a trillion parameters at once, Josh: it uses just 32 billion and it uses the eight best specialists out of the 384 Josh: that it has available to it. It's really impressive. Josh: And what we see here is the benchmarks that we're showing on screen. Josh: And the benchmarks are really good.

03:48

Josh: It's up there in line with just about any other top model, except with the exception Josh: that this is open source.

⁠¶ Training Breakthroughs Explained

03:53

Josh: And there was another breakthrough that we had, which was the actual way that Josh: they handled the training of this. Josh: And yeah, this is the loss curve. So what you're looking at on screen for the Josh: people who are listening, it's this really pretty smooth curve that kind of Josh: starts at the top and it trends down in a very predictable and smooth way.

04:09

Josh: And most curves don't look like this. And if they do look like this, Josh: it's because the company has spent tons and tons of money on error correction Josh: to make sure this curve is so smooth. Josh: So basically what you're seeing is the training run of the model. Josh: And a lot of times what happens is you get these very sharp spikes and it starts Josh: to defer away from the normal training run.

04:29

Josh: And it takes a lot of compute to kind of recalibrate and push that back into the right way. Josh: What they've managed to do is really make it very smooth. Josh: And they've done this by increasing these efficiencies. So if you can think Josh: about it, there's this analogy I was thinking of right before we hit the record button. Josh: And it's if you were teaching a chef how to cook, right? Josh: So we have Chef Ejaz here. I am teaching him how to cook. I am an expert chef.

04:50

Josh: And instead of telling him every ingredient and every step for every single Josh: dish, what I tell him is like, hey, if you're making this amazing dinner recipe, Josh: all you need that matters is this amount of salt applied at this time, Josh: this amount of heat applied for this length of time, and the other stuff doesn't matter as much. Josh: So just put in whatever you think is appropriate, but you'll get the same answer.

05:11

Josh: And that's what we see with this model is just an increased amount of efficiency by being Josh: direct by being intentional about the data that they used to train it on, Josh: the data that they used to fetch in order to give you high quality queries. Josh: And it's a really novel breakthrough. They call it the MuonClip optimizer, Josh: which, I mean, it's a Chinese company, maybe it means something special there, Josh: but it is a new type of optimizer.

05:34

Josh: And what you're seeing in this curve is that it's working really well and it's

⁠¶ Innovations in AI Training

05:37

Josh: working really efficient.

05:37

Josh: And that's part of the benefit of having this open source is now we have this Josh: novel breakthrough and we could take this and we could use this for even more Josh: breakthroughs even more open source models and and that's part that's been really cool to see Ejaaz: I i mean this is just um time Ejaaz: and again from china uh so so amazing from their research team so so like just Ejaaz: to kind of like um pick up your comment on deep seek at the end of last year

06:02

Ejaaz: we were utterly convinced that the only way to create a breakthrough model was Ejaaz: to spend billions of dollars on compute clusters. Ejaaz: And so therefore it was a pay-to-play game. And then DeepSeek, Ejaaz: a team out of China, released their model and completely open-sourced it as well. Ejaaz: And it was as good as OpenAI's Frontier model, which was the top model at the time. Ejaaz: And the revelation there was, oh, you don't actually just need to chuck a bunch of compute at this.

⁠¶ The Impact of Open Source

06:31

Ejaaz: There are different techniques and different methods if you get creative about Ejaaz: how you design your model and how you run the training cluster, Ejaaz: the training one, which is basically what you need to do to make your model smart, Ejaaz: you can run it in different ways that is more efficient, consumes less energy, Ejaaz: and therefore less amount of money, but is as smart, if not smarter, Ejaaz: than the frontier models that American AI companies are making.

06:55

Ejaaz: And this is just a repeat of that, Josh. Ejaaz: I mean, look at this curve. For those who are looking at this episode on video.

07:03

Ejaaz: It is just so clean yeah it's beautiful Ejaaz: the craziest part about this is when deep Ejaaz: seek was released they pioneered something called uh reasoning Ejaaz: or reinforcement learning uh which are two separate Ejaaz: techniques that made the model super smart um with less energy and less compute Ejaaz: spend um with this model they didn't even implement that technique at all so Ejaaz: theoretically this model can get so much more smarter than it already is um

07:29

Ejaaz: and they just kind of leveraged a new method to make it as smart as it already is right now. Ejaaz: So just such a fascinating kind of like progress in research from China. Ejaaz: And it just keeps on coming out. It's so impressive. Josh: Yeah, this is this was the exciting part to me is that we're seeing so many Josh: algorithms or exponential improvements in so many different categories.

07:49

Josh: So this was considered a breakthrough by all means. And this wasn't even the Josh: same type of breakthrough that DeepSeek had. Josh: So we get this now compounding effect where we have this new training breakthrough Josh: and then we have DeepSeek who has the reinforcement learning and that hasn't Josh: even yet been applied to this new model.

⁠¶ Competitive Landscape of AI

08:05

Josh: So we get the exponential growth on one end, the exponential growth on the reasoning end, Josh: those come together and then you get the exponential growth on the hardware Josh: stack where the GPUs are getting much faster and there's all of these different Josh: subsets of AI that are compounding on each other and growing and accelerating Josh: quicker and quicker and what you get is this unbelievable rate of progress and

08:25

Josh: that's what we're seeing. So Josh: reasoning isn't even here yet and we're going to see it soon because it is open Josh: source so people can apply their own reasoning on top of it i'm sure the moonshot Josh: team is going to be doing their own reasoning version of this model and i'm Josh: sure we're going to be getting even more impressive results soon i see you have Josh: a post up here um about the testing and overall performance can you please share yeah

08:46

Ejaaz: Yeah so um this is a tweet that summarizes really well how this model performs Ejaaz: in relation to other Frontier models. Ejaaz: And the popular comparison that's taken for Kimi K2 is against Claude. Ejaaz: So Claude has a bunch of models out. Ejaaz: Claude 3.5 is its earlier model, and then Claude 4 is its latest.

09:05

Ejaaz: And the general take is that this model is just better than those models, Ejaaz: which is just insane to say, because for so long, Josh, we've said that Claude Ejaaz: was the best coding model. Ejaaz: And indeed it was. And then within the span of, what is it, five days? Ejaaz: Grok 4 released and it just completely blew Claude 4 out of the water in terms of coding.

09:26

Ejaaz: Now Kimi K2, an open source model out of China who doesn't even have access Ejaaz: to the research and kind of proprietary knowledge that a lot of American AI Ejaaz: companies have also beat it as well, right? Ejaaz: So it kind of beats Claude at its own game, but it's also cheaper.

⁠¶ Context Window Capabilities

09:41

Ejaaz: It's 20% the cost of Claude 3.5, which is just an insane thing to say, Ejaaz: which means that if you are a developer out there that Ejaaz: wants to try your hand at kind of like vibe coding Ejaaz: a bunch of things or actually seriously coding something you Ejaaz: know that's quite novel but you don't have the hands on deck to do that you Ejaaz: can now spin up a Kimi K2 AI agent actually multiple of them for a very cost-efficient

10:05

Ejaaz: reasonable you know salary you don't have to pay like hundreds of thousands Ejaaz: of dollars or you know hundreds of millions of dollars which is what Meta is Ejaaz: doing to kind of buy a bunch of these software engineers, Ejaaz: you can spend, you know, the equivalent of maybe a Netflix subscription or $500 Ejaaz: to $1,000 a month and spin up your own app. So super, super cool.

10:23

Josh: And also one added perk that's there is it's that even if you have a lot of Josh: GPUs sitting around, you can actually run this model for free. Josh: So that's the cost if you actually query it from the servers. Josh: But I'm sure there's going to be companies that have access to XS GPUs. Josh: They can actually just download the model because it's open source, Josh: open weights, and they could run it on their own.

10:41

Josh: And that brings the cost of compute down to the cost per kilowatt of the energy Josh: required to run the GPUs. Josh: So because it's open source, you really start to see these costs decline, Josh: but the quality doesn't. Josh: And that's every time we see this, we see a huge productivity unlock in encoding Josh: output and amount of queries used. It's like, this is freaking awesome.

10:58

Ejaaz: Yeah josh i saw something else come up as well so so do you remember when claude Ejaaz: first released um their frontier model i think it was 3.5 or maybe it was four Ejaaz: one of their bragging rights was it had a one million uh token context window which. Josh: Oh yes which was huge Ejaaz: Yeah which for listeners of the show is huge it's like several uh book novels Ejaaz: worth um of words or characters you could just bung into one single prompt.

11:28

Ejaaz: And the reason why that was such an amazing thing was for a while, Ejaaz: people struggled to kind of communicate with these AIs because they couldn't set the context. Ejaaz: There wasn't enough bandwidth within their chat log window for them to say, Ejaaz: you know, and don't forget this. And then there was this. Ejaaz: And then, you know, this detail and that detail, there just wasn't enough space. Ejaaz: And models weren't performing enough to kind of consume all of this in one go.

11:51

Ejaaz: And then Claude came out and was like, hey, we have one million context windows. Ejaaz: Don't worry about it chuck in all the research papers that you want chuck in Ejaaz: your essay chuck in reference books and we got you um i saw this tweet that Ejaaz: was uh deleted i think you sent this to me um. Josh: We got the screenshots we always come with receipts yeah i Ejaaz: Wonder why they deleted it but uh good catch from you um yeah let's get into this.

12:11

Josh: What's your take on it was was first posted i think Josh: earlier today yeah like an hour ago and then deleted pretty shortly afterwards Josh: and this is from a woman name crystal crystal works with the moonshot team she Josh: is part of the team that that released kimmy k2 um and in this post it says Josh: kimmy isn't just another ai it went viral in china as the first to support

12:32

Josh: A 2 million token context window. And then she goes on to say, Josh: we're an AI lab with just 200 people, which is ministerially small compared Josh: to a lot of the other labs they're competing with. Josh: And it was acknowledgement that they had a 2 million token context window.

12:46

Josh: And for those who, just a quick refresher on the context window stuff, Josh: it's imagine you have like a gigantic textbook and you've read it once and you Josh: close it and you kind of have a fuzzy memory of all the pages.

⁠¶ The Surge of Kimi K2

12:56

Josh: The context window allows you to lay all of those out in clear view Josh: and directly reference every single page so when Josh: you have two million tokens which is roughly two million words Josh: of context we're talking about like hundreds and hundreds Josh: of books and textbooks and knowledge and you could really dump a Josh: lot of information in this for the ai to readily access and Josh: that if they release that a two million token Josh: open source model that's huge

13:20

Josh: deal i mean even grok 4 recently i believe Josh: what did we say it was it was a 256 000 uh token context window something like Josh: that so grok 4 is one eighth of what they supposedly have accessible right now Josh: which is a really really big deal um so i'm hoping it was deleted because they Josh: just don't want to share that not because it's not true i would like to believe Josh: that it's true because man that'd be pretty epic yeah

13:42

Ejaaz: And the people are loving it josh um check out this Ejaaz: graph from uh open router which basically shows Ejaaz: uh the split of usage between everyone Ejaaz: on their platform that are querying different models so for context Ejaaz: here open router is a website that you can go to Ejaaz: and you can type up a prompt just like you do at chat gpt and Ejaaz: you can decide which model your Ejaaz: prompt goes to or you could let open router decide for you

14:09

Ejaaz: and it kind of like divvies up your query so if you have a coding query it's Ejaaz: probably going to send it to claude or now kimmy k2 or grok4 but if you have Ejaaz: something that's more like to do with creative writing or something that's like Ejaaz: a case study it might send it to OpenAI's O3 model, right? So it kind of like decides for you.

14:27

Ejaaz: OpenRacha released this graphic, which basically shows that KimiK2 surpassed Ejaaz: XAI in token market share just a few days after launching, which basically means Ejaaz: that XAI spent, you know, Ejaaz: hundreds of billions of dollars training up their Grok4 model, Ejaaz: which just kind of beat out the competition just last week.

14:47

Ejaaz: Then KimiK2 gets released completely open source Ejaaz: and everyone starts to use that more than Ejaaz: grok 4 which is just an insane thing to say and Ejaaz: just shows how rapidly these ai models compete with each other and surpass each Ejaaz: other um i think part of the reason for this josh is it's open source right Ejaaz: which means that not only are retail users like myself and yourself using it Ejaaz: for our daily queries you know uh you know,

15:14

Ejaaz: create this recipe for me or whatever, but researchers and builders all over Ejaaz: the world that have so far been challenged or had this obstacle of pots of money Ejaaz: basically to start their own AI company now have access to a frontier, Ejaaz: world-renowned model and can create whatever application, website, Ejaaz: or product that they want to make.

⁠¶ Market Adoption Insights

15:36

Ejaaz: So I think that's part of the usage there as well. Do you have any takes on this? Josh: Yeah, and it's downstream of cost, right? We always see this when a model is Josh: cheaper and mostly equivalent, the money will always flow to the cheaper model. Josh: It'll always get more queries. I think it's important to note the different Josh: use cases of these models. So they're not directly competing head to head on the same benchmarks.

15:56

Josh: I think what we see is like when we talk about Claude, it's generally known as the coding model. Josh: And I don't think like OpenAI's O3 is not really competing directly with Claude Josh: because it's more of a general intelligence versus a coding specific intelligence. Josh: K2 is probably closer to a Claude. I would assume where it's really good at Josh: coding because it uses this mixture of experts.

16:15

Josh: And I think that helps it find the tools. It uses this cool new novel thing Josh: called like multiple tool use. Josh: So each one of these experts can use a tool simultaneously and they could use Josh: these tools and work together to get better answers. Josh: So in the case of coding, this is a home run. Josh: Like it is very cheap cost per token, very high quality outputs. Ejaaz: I actually think you can compete with OpenAO3, Josh. Check this out.

16:38

Ejaaz: So Rowan, yeah, Rowan Cheng put this out yesterday And he basically goes, Ejaaz: I think we're at the tipping point for AI-generated writing. Ejaaz: It's been notoriously bad, but China's Kimi K2, an open-weight model, Ejaaz: is now topping creative writing benchmarks.

16:53

Ejaaz: So just to put that into context, that's like having the top most, I don't know, Ejaaz: smartest or slightly autistic software engineer, at the top engineering company Ejaaz: working on AI models, also being the best poet or creative script and directing Ejaaz: the next best movie or whatever that might be, Ejaaz: or creating a Harry Potter novel series. Ejaaz: This model can basically do both. And what it's pointing out here is that compared

17:22

Ejaaz: to 03, it tops it. Look at this. Completely beats it. Josh: Okay, so I take that back. Maybe it is just better at everything. Josh: Yeah, that's some pretty impressive results. Ejaaz: I think like what's worth pointing out here is, and I don't know whether any Ejaaz: of the American AI models do this, Josh, but mixture of experts seems to be clearly a win here. Ejaaz: The ability to create an incredibly smart model doesn't come without,

17:47

Ejaaz: you know, this large storage load that is needed, right? One trillion parameters. Ejaaz: But then combining it with the ability to be like, Like, hey, Ejaaz: you don't need to query the entire thing. Ejaaz: We've got you. We have a smart router, which basically pulls on the best experts, Ejaaz: as you described earlier, for whatever relevant query you have.

18:05

Ejaaz: So if you have a creative writing task or if you have a coding thing, Ejaaz: we'll send it to two different departments of this model. Ejaaz: That's a really huge win. Do any other American models use this? Josh: Well, the first thing that came to my mind when you said that is Grok4, Josh: which doesn't exactly use this, but uses a similar thing, where instead of using Josh: a mixture of experts, It uses a mixture of agents.

18:26

Josh: So Grok4 Heavy uses a bunch of distributed agents that are basically clones of the large model. Josh: But that takes up a tremendous amount of compute. And that is the $300 a month plan. Ejaaz: That's replicating Grok4 though, right? So that's like taking the model and copy pasting it. Ejaaz: So let's say Grok4 was one trillion parameters just for ease of comparison. Ejaaz: That's like creating, if there was four agents, that's four trillion parameters,

18:51

Ejaaz: right? So it's still pretty costly and inefficient.

18:53

Josh: Is that what you're saying no it's the actually the opposite direction of k2 Josh: so what they have used is just and again this is kind of similar to tracking Josh: sentiment between the united states and china where the united states will throw Josh: compute at it where china will throw like Josh: kind of clever resource at it so grok yeah Josh: when they use their mixture of agents it actually just costs a lot more Josh: money whereas k2 when they use their mixture of

19:17

Josh: experts well it costs a lot less instead of using 4 trillion Josh: parameters in this case it uses just 32 billion and it Josh: kind of copies that 32 billion over and over and it's really it's a really Josh: elegant solution that seems to be Josh: yielding pretty comparable results so i think as we Josh: see these efficiency upgrades i'm sure they will Josh: eventually trickle down into the united states models and when they do that

19:38

Josh: is going to be a huge unlock in terms of cost per token in terms of the smaller Josh: distilled models that we're going to be able to run on our own computers um Josh: but yeah i don't know of any who are also using it at this scale it might be Josh: novel just to k2 right now and Ejaaz: And i think that this is the method that probably scales the best josh like.

⁠¶ Versions of Kimi K2

19:58

Josh: Yeah it makes sense efficiency Ejaaz: Always wins at the end right and to see um this kind of innovation come pretty Ejaaz: early on in a technology's life cycle is just super impressive to see, Ejaaz: Another thing I saw is there's two different versions of this model, I believe. Ejaaz: There's something called Kimi K2 Base, which is basically the model for researchers Ejaaz: who want full control for fine-tuning and custom solutions, right?

20:26

Ejaaz: So imagine this model as the entire parameter set. So you have access to one Ejaaz: trillion parameters, all the weight designs and everything.

20:36

Ejaaz: And if you're a nerd that wants to nerd out you can Ejaaz: go crazy you know if you have like your own gpu Ejaaz: cluster at home or if you happen to have a convenient Ejaaz: warehouse full of of servers that you weirdly Ejaaz: have access to you can go crazy with it you can if you Ejaaz: think about like um the early gaming days of counter-strike and then you could Ejaaz: like mod it you can basically mod this uh model to your heart's desire and then

21:00

Ejaaz: there's a second version called k2 instruct which is for drop-in general purpose Ejaaz: chat and AI agent experiences. Ejaaz: So this is kind of like at the consumer level, if you're experimenting with Ejaaz: these things, or if you want to run an experiment at home on a specific use Ejaaz: case, you can kind of like take that away and do that for yourself. Ejaaz: That's how I understand it, Josh. Do you have any takes on this?

21:22

Josh: That makes sense. And I think that second version that you're describing is Josh: what's actually available publicly on their website, right? Josh: So if you go to Kimmy.com, it has a text box. It looks just like ChatGPT like you're used to.

21:31

Josh: And that's where you can run that second tier model which Josh: um you described as that's the the drop in general purpose Josh: chat and then yeah for the the hardcore researchers there's Josh: a github repo and the github repo has all the weights and all the code and Josh: you can really download it dive in use the full thing i Josh: was playing around with the kimmy tool and it's it's really cool Josh: it's fast oh i mean it's lightning fast if you

21:52

Josh: go from a reasoning model to an inference model like kimmy Josh: you get responses like this like when Josh: i'm using grok 4 or o3 i'm sitting there sometimes for a couple minutes it's Josh: waiting for an answer this you type it in and it just types back right away Josh: no time waiting so it's it's kind of refreshing to see that but it's also a Josh: testament to how impressive it is i'm getting great answers and it's just spitting

22:11

Josh: it right out so what happens when they add the reasoning layer on top well it's Josh: probably going to get pretty freaking good Ejaaz: So the trend we're seeing, and we saw this last week with Grok4, Ejaaz: is typically we're expected to wait a while when we send a prompt to a breakthrough Ejaaz: model because it's thinking, it's trying to basically replicate what we have in our brains up here. Ejaaz: And now it's just getting much quicker and much smarter and much cheaper.

22:38

Ejaaz: So the long story short is these incredibly powerful, I kind of think about Ejaaz: it as how we went from massive desktop computers to slick cell phones, Ejaaz: Josh, and then we're going to eventually have chips in our brain. Ejaaz: AI is just kind of like fast tracking that entire life cycle within like a couple Ejaaz: of years, which is just insane.

22:57

Josh: And these efficiency improvements are really exciting because you can see how Josh: quickly they're shrinking and allowing eventually for those incredible models Josh: to just run on our phones.

23:06

Josh: So there's totally a world a year from now in which like a Josh: grok 403 kimmy k2 capable model Josh: is small enough that it could just run inside of in our Josh: phone and run on a mobile device or run locally on a laptop Josh: or you're offline and you kind of have this portable intelligence Josh: that's available everywhere anytime even if Josh: you're not connected to the world and that seems really cool Josh: like we were talking a few episodes ago about apple's um local

23:30

Josh: free ai inference running on an iphone Josh: but how the base models still kind of suck like they don't really do Josh: anything super interesting they're basically good enough to do what Josh: you would expect siri to do but can't do and these Josh: models as we get more and more breakthroughs like this that allow you to Josh: run much larger parameter counts Josh: on a much smaller device it's going to start really

23:50

Josh: super powering these mobile devices and i can't help but think about the open Josh: ai hardware device i'm like wow that'd be super cool if you had like oh three Josh: running locally in the middle of the jungle somewhere with no service and you Josh: still had access to all of its capabilities like that's probably coming downstream Josh: of breakthroughs like this where we get really big efficiency unlocks

24:10

Ejaaz: I mean, it's not just efficiency, though, right? It's the fact that if you can Ejaaz: run it locally on your device, it can have access to all your private data without Ejaaz: exposing all of that to the model providers themselves, right?

⁠¶ Privacy and Local AI

24:22

Ejaaz: So one of the major concerns of not just AI models, but also with mobile phones is privacy. Ejaaz: I don't want to share all my kind of like private health, financial, Ejaaz: and social media data, because then you're just going to have everything on Ejaaz: me and you're going to use me. Ejaaz: You're going to use me as a product, right? And that's kind of like been the Ejaaz: quota for the last decade in tech.

24:42

Ejaaz: And so with AI, that's a supercharged version of it. The information gets more Ejaaz: personal. It's not just your likes. Ejaaz: It's, you know, where Josh shops every day and, you know, who he's dating and Ejaaz: all these kinds of things, right? Ejaaz: And that becomes quite personal and intrusive very quickly. Ejaaz: So the question then becomes, how can we have the magic of an AI model without it being so obtrusive?

25:03

Ejaaz: And that is open source locally run AI or privately run AI. and Kimi K2 is a Ejaaz: frontier model that can technically run on your local device if you set up the right hardware for it. Ejaaz: And the way that we're trending, you can basically end up having that on your Ejaaz: device, which is just a huge unlock. Ejaaz: And if you can imagine how you use OpenAI 03 right now, Josh, Ejaaz: right? I know you use it as much as I do.

25:27

Ejaaz: The reason why you and I use it so much isn't just because it's so smart, Ejaaz: but it's because it remembers everything about us. Ejaaz: But I hate that Sam knows or has access to all that data. Ejaaz: I hate that if he chooses to switch on personalized ads, which is currently Ejaaz: the model where most of these tech companies make money right now, Ejaaz: he can, and I've got nothing to do about it because I don't want to use any Ejaaz: other model apart from that.

25:49

Ejaaz: But if there was a locally run Ejaaz: model that had access to all the memory and context, I'd use that instead. Josh: And this is suspicious. I mean, this is a different conversation in total, Josh: but isn't it interesting how other companies haven't really leaned into memory Josh: when it's seemingly the most important mode that there is. Josh: Like Grok4 doesn't have good memory rolled out. Gemini doesn't really have memory.

26:11

Josh: There's no, Claude doesn't have memory the way that OpenAI does. Josh: Yet it's the single biggest reason why we both continue to go back to ChatGPT and OpenAI. Josh: So that's just been an interesting thing. I mean, Kimmy is open source. Josh: I wouldn't expect them to lean too much into it. But for these closed source Josh: models, that's just, it's another interesting just observation. Josh: Like, hey, the most important thing isn't, doesn't seem to be prioritized by

⁠¶ The AI Talent Landscape

26:30

Josh: other companies just yet. Ejaaz: Why do you think that is so so my theory um at least from xai or grok force Ejaaz: perspective is elon's like okay i'm not going to be able to build a better chat Ejaaz: bot or chat messenger than openai has there's not too many features i can um. Ejaaz: Set Grok 4 apart, then that O3 doesn't already do, right? Ejaaz: But where I can beat O3 is at the app layer.

26:59

Ejaaz: I can create a better app store than they have because I haven't really created Ejaaz: one that is sticky enough for users to continually use. Ejaaz: And I can use that data set to then unlock memory and context at that point, right?

27:15

Ejaaz: So I just saw today that they released, they Ejaaz: being um xai released a new feature for grok 4 Ejaaz: called i think it's uh companions josh um Ejaaz: and it's basically these yeah these animated um Ejaaz: avatar like um characters so they basically look like they're from an anime Ejaaz: show and you know how you can use voice mode in open ai and you can kind of Ejaaz: like talk to this uh realistic human sounding ai you now have a face and a character

27:44

Ejaaz: on grok 4 and it's really entertaining, Josh. Ejaaz: Like I find myself kind of like engaged in this thing because I'm not just typing words. Ejaaz: It's not just this binary to and fro with this chat messenger. Ejaaz: It's this human, this cute, attractive human that I'm just like now speaking to. Ejaaz: And I think that that's the strategy that a lot of these AI companies, Ejaaz: if I had to guess, are taking to kind of like seed their user base before they

28:08

Ejaaz: unlock memory. I don't know whether you have a take on that. Josh: Yeah, I have a fun little demo. I actually played around with it this morning Josh: and I was using it totally unhinged, no filter, very vulgar, Josh: but like kind of fun. It's like a fun little party trick. Josh: And yeah, I mean, that was a surprise to me this morning when I saw that rolled Josh: out. I was like, huh, that doesn't really seem like it makes sense. Josh: But I think they're just having fun with it.

28:29

Ejaaz: Can we for a second talk about the team? Ejaaz: So we've mentioned just now how they've all come from China and how China's Ejaaz: like really advancing open source AI models, and they've completely beat out Ejaaz: the competition in America, Mata's Lama being the obvious one. Ejaaz: We've got Kwen from Alibaba. Ejaaz: We've got Deep Seek R1. Now we have Kimi K2. The team is basically...

28:53

Ejaaz: The AI Avengers of China, Josh. So these three co-founders all have deep AI Ejaaz: ML backgrounds that hail from the top American universities, Ejaaz: such as Carnegie Mellon. Ejaaz: One of them has a PhD from Carnegie Mellon in machine learning, Ejaaz: which is basically, for those of you who don't know, is like God-tier degree for AI. Ejaaz: That means you're desirable and hireable by every other AI company after you graduate.

29:19

Ejaaz: But it's not just that. They also have credibility and degrees from the top universities in China. Ejaaz: Especially this one university called Tsinghua, which seemed to be the top of their field. Ejaaz: I looked them up on rankings for AI universities globally, and they often come Ejaaz: in number three or four in the top 10 AI universities. So pretty impressive from there.

29:41

Ejaaz: But what I found really interesting, Josh, was one of the co-founders was an Ejaaz: expert in training AI models on low-cost optimized hardware. Ejaaz: And the reason why I mentioned this is it's no secret that if you want a top Ejaaz: frontier AI model, you need to train it on NVIDIA's GPUs. Ejaaz: You need to train it on NVIDIA's hardware. Ejaaz: NVIDIA's market cap, I think, at the end of last week, surpassed $4 trillion.

30:11

Ejaaz: That's $4 trillion with a T. That is more than the current GDP of the entire British economy. Josh: Where I hail from. And the largest in the world. Ejaaz: And there's never been.

30:20

Josh: A bigger company Ejaaz: There's never been a bigger company it it's just Ejaaz: insane to grab your head around and it's not without Ejaaz: reason they supply basically or they have a Ejaaz: grasp or a monopoly on the hardware that Ejaaz: is needed to train top models now kimmy k2 Ejaaz: comes along casually drops a one trillion parameter model one of the largest Ejaaz: models ever released um and it's trained on hardware that isn't nvidia's um

30:46

Ejaaz: and jensen huang i i need to find this clip josh but But Jensen Huang basically Ejaaz: was on stage, I think it was at a private conference maybe yesterday, Ejaaz: but he was quoted as saying 50% of the top AI researchers are Chinese and are from China. Ejaaz: And what he was implicitly getting at is they're a real threat now.

⁠¶ China's AI Competitive Edge

31:05

Ejaaz: I think for the last decade, we've kind of been like, ah, yeah, Ejaaz: China's just going to copy paste everything that comes out of America's tech sector. Ejaaz: But when it comes to AI, we've kind of like maintained the same mindset up until Ejaaz: now where they're really just competing with us.

31:19

Ejaaz: And if they have the hardware, they have the ability to research new techniques Ejaaz: to train these models, like DeepSeek's reinforcement learning and reasoning, Ejaaz: and then Kimi K2's kind of like efficient training run, which you showed earlier. Ejaaz: They've come to play, Josh. And I think it's worth highlighting that China has Ejaaz: a very strong grasp on top AI researchers in the world and models that are coming out of it.

31:45

Josh: Where are their $100 million offers? I haven't seen any of those coming through. Josh: None, dude. The most impressive thing is that they do it without the resources that we have. Josh: Imagine if they did have access to the clusters of these like H100s that NVIDIA is making. Josh: I mean, that would be, would they crush us?

32:03

Josh: And we kind of have this timeline here where we're kind of running up against Josh: the edge of energy that we have available to us to train these massive models. Josh: Whereas China does not have that constraint. They have significantly more energy to power these.

32:17

Josh: So in the event, the inevitable event that they do get the chips and they are Josh: able to train at the scale that we are, I'm not sure we're able to continue Josh: our rate of acceleration in terms of hardware manufacturing, Josh: large training as fast as they will. Josh: And they already have done the hard work on the software efficiency side. Josh: They've cranked out every single efficiency because they are doing it on constrained hardware.

⁠¶ Open Source vs. Closed Source

32:40

Josh: So it's going to create this really interesting effect where they're coming Josh: at it from the like ingenuity software approach we're coming at it from the Josh: brute force throw a lot of compute added approach and we'll see where both both Josh: sides end up um but it's clear that china is still behind because they are the Josh: ones open sourcing the models and we know at this point now if you're open sourcing Josh: your model you're doing it because you're behind

33:00

Ejaaz: Yeah yeah i mean one thing Ejaaz: that did surprise me josh was that they released a one Ejaaz: trillion parameter open source model i i didn't Ejaaz: expect them to catch up that quickly um like one Ejaaz: trillion is a lot um yeah another thing Ejaaz: i was thinking about is china has dominated Ejaaz: hardware for so long now so it wouldn't Ejaaz: really surprise me if like i don't know a Ejaaz: couple years from now they're producing better models

33:27

Ejaaz: at specific things basically because they have better Ejaaz: hardware than america than the west um but Ejaaz: where i think the west will continue to dominate Ejaaz: is at the application layer and i don't Ejaaz: know if i was a betting man i would say that most of the money is eventually going Ejaaz: to be made on the application side of things i think grok Ejaaz: 4 is starting to um kind of show that Ejaaz: with all these different kinds of novel features that they're releasing i i

33:52

Ejaaz: don't know if you've seen some of the games that are being produced from grok Ejaaz: 4 josh but it is ultimately insane and i haven't seen any similar examples come Ejaaz: out of uh asia from any of their ai models even when they have access to american Ejaaz: models So I still think America dominates at the app layer. Ejaaz: But Josh, I just came across this tweet, which you reminded me of earlier.

34:11

Ejaaz: Tell me about OpenAI's strategy to open source model, because I got this tweet Ejaaz: pulled up from Sam Altman, which is kind of hilarious. Josh: Yeah. All right. So this week, if you remember from our episode last week, Josh: we were excited about talking about OpenAI's new open source model. Josh: OpenAI, open source model, all checks out. This was going to be the big week.

34:30

Josh: They released their new flagship open source. Well, conveniently, Josh: I think the same day as K2 launched, later in the day, or perhaps the very next morning. Josh: Sam Altman posted a tweet. He says, Hey, we plan to launch our open weights model next week. Josh: We are delaying it. We need time to run additional safety tests and review high-risk Josh: areas. We are not yet sure how long it will take us. Josh: While we trust the community will build great things with this model,

34:54

Josh: once weights are out, they can't be pulled back. This is new for us and we want to get it right. Josh: Sorry to be the bearer of bad news. We are working super hard. Josh: So there's a few points of speculation. The first, obviously, Josh: being, did you just get your ass handed to you and now you are going back to Josh: reevaluate before you push out a remodel? Josh: So that's one possible thing where they saw K2. They were like, Josh: oh, boy, this is pretty sweet.

35:16

Josh: This is our first open source model. We probably don't want to be lower than them. Josh: And there is this second point of speculation, which, Ejaz, you mentioned to Josh: me a little earlier today, where maybe something went wrong with the training one. Josh: And it's not quite that they're getting beat up by a Chinese company.

35:32

Josh: Is that like they actually made a mistake on their own accord and can you explain Josh: to me specifically what that might be what the speculation is at least yeah Ejaaz: Well i'll keep it short i think it was a little racist under Ejaaz: the hood and i i can't find the tweet but basically Ejaaz: one of these um ai researchers slash Ejaaz: product builders on x got access to Ejaaz: the model supposedly according to him and he tested it

35:56

Ejaaz: out uh in the background and he said yeah it's it's Ejaaz: not really an intelligence thing it's just worse than Ejaaz: what uh you'd expect from an alignment and uh consumer facing approach it was Ejaaz: it was ill-mannered it was saying some pretty wild shit kind of the stuff that Ejaaz: you'd expect coming out of 4chan um and so sam altman decided to delay whilst Ejaaz: they kind of like figured out why um it was kind of acting out.

36:21

Josh: Got it okay so we'll leave Josh: that speculation where it is there's a there's a funny post Josh: that i'll actually share with you if you want to throw it up which was actually from elon Josh: and we'll abbreviate but it was like elon was basically saying um Josh: it's hard to avoid the the libtard slash Josh: mecha hitler like approach both of them Josh: because they're on so polar opposite ends of the spectrum and he said he spent

36:43

Josh: several hours trying to solve this problem with the system prompt but there's Josh: too much garbage coming in at the foundation model level so basically i mean Josh: what happens with these models is you train them based on all the human knowledge Josh: that exists right so everything that we've believed all the ideas that we've Josh: shared it's been fed into these models.

36:59

Josh: And what happens is you can try to adjust how they interpret this data through Josh: the system prompt, which is basically an instruction that every single query Josh: gets passed through, but at some point is reliant on this swath of human data that is just Josh: It's too overbearing. And that's kind of what Elon shared. Josh: And the difference between OpenAI and Grok is that Grok will just ship the crazy Josh: update. And that's what they did. And they caught a lot of backlash from it.

37:22

Josh: But what I find interesting and what I'm sure OpenAI will probably follow is Josh: this last paragraph where he says, our V7 foundation model should be much better. Josh: And we're being far more selective about training data rather than just training on the entire internet.

37:34

Josh: So what they're planning to do to solve this problem, which is what I assume Josh: OpenAI probably ran into in the case that the AI training model kind of went Josh: off the rails and it started saying bad things about lots of people is that Josh: you kind of have to rebuild the foundation model with new sets of data. Josh: And in the case of Grok, I know one of the intentions for v7 is actually to Josh: generate its own database of data based on synthetic data from their models.

37:57

Josh: And I'm assuming OpenAO will probably have to do this too if they want to calibrate. Josh: A lot of times people call that the temperature, which is the like variance Josh: of aggression in which a model uses. Josh: And I don't know, I think we're gonna start to see interesting approaches from Josh: that because as they get smarter, you really don't want them to necessarily Josh: have these evil traits as the default.

38:18

Josh: And it's very hard to get around that when you train them on the data that they've been trained on so far. Ejaaz: It just goes to show how, I guess, cumbersome it is to train these models, Ejaaz: Josh. It's such a hard thing. Josh: Yeah. Yeah. Ejaaz: It's not something that you can just kind of like jump into the code and tweak a few things. Ejaaz: Most of the time you don't know what's wrong with the model or where it went

38:40

Ejaaz: wrong. I mean, we've talked about this on a previous episode, but Ejaaz: So essentially, if you build out this model, right, you spend hundreds of millions Ejaaz: of dollars, and then you feed it a query. Ejaaz: So you put something in and then you wait to see what it spits out. Ejaaz: You don't really know what it's going to spit out. You can't predict it.

38:58

Ejaaz: It's completely probabilistic. and so if you Ejaaz: release a model and it starts being a little racist or uh Ejaaz: you know um kind of crazy uh you Ejaaz: have to kind of like go back to the drawing board and you have Ejaaz: to analyze many different sectors of of this model Ejaaz: like was it the data that was poisoned or was it the way that we trained it Ejaaz: or maybe it was a particular model weight that we tweaked too much or whatever

39:21

Ejaaz: that might be so i i think over time it's going to get a lot easier once we Ejaaz: understand how these models actually work but my god it must be so expensive Ejaaz: to just continually rerun and retrain these models.

39:32

Josh: Yeah when you think about a coherent cluster of 200 Josh: 000 gpus the amount of energy the amount Josh: of resources just to to retrain a mistake is is huge so i think i mean the more Josh: we go into it the deeper we get the more it kind of makes sense paying so much Josh: money for talent to avoid these mistakes where if you pay a hundred million Josh: dollars for one employee who will give you a strategic advantage to avoid having

39:54

Josh: to do another training run, that will cost you more than $100 million. Josh: You've already, you're already in the profit. So you kind of start to see the Josh: scale, the complexity, the difficulties. Josh: I do not envy the challenges that some of these engineers have to face. Josh: Although I do envy the- I envy the salary. Ejaaz: I envy the salary, Josh.

40:11

Josh: I envy the salary and I envy the adventure. Like how cool must that be trying Josh: to build super intelligence for the world as a human for the first time in like Josh: the history of everything.

⁠¶ Closing Thoughts and Future Prospects

40:20

Josh: So it's gotta be pretty fun. This is where we're at now with the open source Josh: models closed source models k2's pretty epic i think that's a home run i think Josh: we've crowned a new model today um do you have any closing thoughts anything Josh: you want to add before we wrap up here this is pretty amazing i Ejaaz: Think i'm most excited uh for the episode that we're probably going to release Ejaaz: a week from now josh when we've seen what people have built with this open source

40:44

Ejaaz: model that's the best part about this by the way just to remind the listener that, Ejaaz: anyone can take this model right now you if you're listening to this can take Ejaaz: this model right now run it locally at home and tweak it to your preference Ejaaz: now yes it's going to be you know you kind of need to know how to tweak model Ejaaz: weights and stuff but i think we're going to see some really cool applications

41:03

Ejaaz: get released over the next week and i'm excited to play around with them personally.

41:07

Josh: Yeah if you're listening to this um and you can Josh: run this model let us know because that means you have quite a solid uh Josh: rig at your home yeah i'm not sure the average person is Josh: going to be able to run this but that is the beauty of the open weights is that anybody Josh: with the capability of running this can do so they Josh: could tweak it how they like and now they have access to the new Josh: best open source model in the world which i mean just a

41:27

Josh: couple months ago from now would have been the best model in the Josh: world so it's moving really quickly it's really accessible and Josh: i'm sure as the weeks go by i mean hopefully we'll get open ai's model open Josh: source model soon in the next few weeks we'll be able to cover that but until Josh: then just lots of stuff going on this was uh another great episode so thank Josh: you everyone for tuning in again for rocking with us We actually plan on making this like 20 minutes,

41:50

Josh: but we just kind of kept tailing off into more interesting things. Josh: There's a lot of interesting stuff to talk about. I mean, there's really, Josh: you could take this in a lot of places. Josh: So hopefully this was interesting. Josh: Go check out Kimmy K2. It's really, really impressive. It's really fast. Josh: It's really cheap. If you're a developer, give it a try. Josh: And yeah, that's been another episode. We'll be back again later this week with

42:11

Josh: another topic. and just keep on chugging along as the frontier of AMI models continues to head west. Ejaaz: So also we'd love to hear from you guys. So if you have any suggestions on things Ejaaz: that you want us to talk more about, or maybe there's like some weird model Ejaaz: or feature that you just don't understand and maybe we can do a job at explaining it, just message us. Ejaaz: Our DMs are open or respond to any of our tweets and we'll be happy to oblige.

42:37

Josh: Yeah, let us know. If there's anything cool that we're missing, Josh: send it our way and we'll cover it. That'd be great. Josh: But yeah, we're all going on the journeys together. We're learning this as we go. Josh: So hopefully today was interesting. And if you did enjoy it, Josh: please share with friends, likes, comment, subscribe, all the great things.

⁠¶ Get Involved

42:50

Josh: And we will see you on the next episode. Ejaaz: Thanks for watching. See you guys. See you.

Transcript source: Provided by creator in RSS feed: download file

Kimi K2 is the Open Source Claude-Killer | US vs China AI

Episode description

Transcript