#198 - DeepSeek R1 & Janus, Qwen2.5, OpenAI Agents | Last Week in AI podcast

⁠¶ Intro

Andrey

00:25

Hello and welcome to the last week in AI podcast, where you can hear us chat about what's going on with AI. As usual in this episode, we will summarize and discuss some of last week's most interesting AI news. And you can also check out our last week in AI newsletter with even more news stories at lastweekin. ai. I am one of your hosts, Andrey Kurenkov. My background is of having studied AI in grad school and now I work in a generative AI startup.

Jeremie

00:54

And I'm your other host, Jeremy Harris. I'm the co founder of AI national security company. And I will say before we got started Andrey was a real champ. Just this week has been crazy and then the week before is crazy and we didn't cover the week before so now we're doing two Weeks, and I was just like at the last minute.

01:09

Hey, dude, like I have 20 minutes less than normally would for this and you know Sometimes we each go back and forth with different constraints and we're trying to figure it out. But this week, it's me. I apologize and he was very kind and started pruning off a couple stories, which maybe we'll cover later but like There's so much shit. This is such a hard deal. It's going to be

Andrey

01:27

a dense episode for sure. As people might expect, we'll be talking about deep sea quite a lot. But there's other stuff on the business front and the policy front. And I'm sure Jeremy, part of why you've been Crazy busy is that there is a new administration in the United States that is making some moves, you know, anyway, there's quite a lot going on. So we will just dive deep into it in a bit. I will just say. Give a high level preview.

01:58

We're going to start with projects and open source, unlike we do usually. So we're going to start right away with DeepSeq R1, then talk about some Quen models and some other models. We're going to cover tools and apps also related to DeepSeq and Quen actually, and some other stories about perplexity. Applications and business as usual. There's updates about OpenAI, that seems to be like half of the news that we cover in some cases and Microsoft and DeepSeq.

02:26

We're going to mostly skip research this week because we are going to dive deep on DeepSeek and then we're going to have some policy and safety stories related to a new administration and our usual sort of geopolitics in that section.

Jeremie

02:39

And can I just say also, YouTube viewers may note Andrei's teeth. are looking pretty good. That may sound weird. If this is the first episode of this podcast you're listening to, then you think I'm a weirdo. If it's not, you know, I'm weirdo. But, but but congratulations. I guess your, your surgery went well. Is everything?

Andrey

02:54

Yes. Yes. I have fully recovered from my unfortunate New Year's event and I'm glad. Thank you for noticing.

⁠¶ Response to listener comments

03:01

And speaking of listeners, one last thing before we get to the news I do want to quickly acknowledge some listener comments and corrections. I noticed a fun bit of feedback actually just posted recently on Apple podcasts. We got a three star review that said that we have consistently bro quality. We are status quo, young Si Valley bros, always behind the curve. But constantly cheerleading case in point, the ironic hardware episode right before DSR one.

03:34

So an interesting take, you know, thanks for the feedback. I will say on this point I went back and re listened to our episode where we covered DeepSeek v3, which was, you know, towards the start of January. And Jeremy, you need to get some credit because I think at the time you called it as a. gigantic, huge deal. We went deep on the technical details of how they were able to train this very efficiently. And all this news about it costing 6 million, et cetera, et cetera.

04:07

That's not even R1, right? That's going back to DeepSeek v3, which we did. You know, cover. So anyway, I'm just going to point

Jeremie

04:14

that out. thank you, Andre. I mean, my goodness. But yeah, no, it's, I will say, we'll talk about this when we get to R1 and R1 0 and all that jazz, a way, I mean, and if you listen to the, the first podcast that we did on V3, when it came out, you're probably not that surprised by R1 and R0. Right.

04:32

I mean, the way we were talking about it, then I think it made it clear this thing had, it was a base model that had all the potential of, you know, GPT 4. 0 to, to give a one and, you know, and so on. And so to the extent that you have a good base model, all you really need is that optimization routine for, for RL and so on, which is really what, popped. So in some ways. super consequential in some ways, not too surprising. first of all, status quo, young Silicon Valley bros. I love that.

04:56

I'm going to make it have a t shirt made with, with that on it. That's awesome. The case in point. So the ironic hardware episode right before deep seek are one. So I would actually love to get this reviewers take on what specifically they think is ironic about it, because I will say we'll talk about this the discussion on R1 comes. But I think there's been a real kind of misreading on the implications of R1 and R0 for hardware. It's actually strongly bullish for hardware.

05:25

This is not stock advice for NVIDIA stock. It's strongly bullish for that, that hardware ecosystem and for scaling in a way that I think a lot of people have kind of missed out. So anyway, just to kind of plant that flag there again, the reviewers referring to something else, in which case I think it would be really interesting to hear what it is. And maybe they disagree, but I think

Andrey

05:44

everyone's talking about hardware now suddenly, right? So anyway thank you for the review. Always appreciate constructive feedback like this, that we'll take into account. And we'll mention just one more thing and we'll dive in on the discord, which has been a lot of fun to see people's discussion and questions. We did get a question asked to DeepSeek and its implications for, regulations in the U S. We're over caution here would now impact the race between the US and China.

06:14

So we'll get back to that. Also planning a flag. Once we get to policy and safety, we'll discuss the implications for geopolitics. And thank you DiscombobulatedPenguin

Jeremie

06:22

for the question. DiscombobulatedPenguin. Thank you. Thank you. DiscombobulatedPenguin.

⁠¶ Projects & Open Source

Andrey

06:26

and that preview out of the way, let's dive in and we begin, as I said, with projects and open source. And of course, the first story is DeepSeek R1. And we're going to be diving into the paper, I would say, which is titled DeepSeek R1 incentivizing reasoning capability in LLMs via reinforcement learning. So I'm sure many people already know what DeepSeek R1 is, but let's quickly Summarize that. So DeepSeeker 1 is basically equivalent or competitive with OpenAI 01.

06:59

It is a language model, a chatbot that is optimized for reasoning and that is meant to be able to tackle problems that are challenging and things like Cloud SONNET, things like GPU 4. 0. are not able to do well on. So this paper came out just a few weeks after DeepSeq is the base model that they started out with. So G01, for instance, presumably started out of GPD 4. 0. Similarly, this is a model that is trained on top of DeepSeq. V3, it's not trained from scratch.

07:33

And this paper has some very interesting details and implications for how this is done. So with things like O1, we don't really know what they did. There was a lot of speculation, but it was not very clear. We've also covered a lot of news stories over the past year or so, looking into various ways to do reasoning with LLMs, various ways to do inference timescaling.

07:59

The very interesting thing with this paper that I think from a more technical perspective is exciting is as a pair of a title incentivizing reasoning capability in LLMs via reinforcement learning, the focus of the approach is pretty much on reinforcement learning. So that meaning that they train the model via giving it rewards. It produces some outputs. It's kind of like trial and error. And so it is trained without being told the right answer.

08:31

You could say, and we've seen this as one possible approach, but that pretty much is the only approach that they are relying on of a various, multiple steps at a high level. This really showcases the potential of reinforcement learning. And similar to DeepSeek v3 it seems to. Be done with relatively few resources and get really impressive results. Like, you know, people say it's not quite as good as I want, but it's, it's very impressive.

09:04

You know, obviously it gets really good benchmark numbers. So that's another reason this is exciting. So I'll, I'll Stop there, I think. And Jeremy, I'll let you jump

Jeremie

09:12

in with more details. Yeah. Yeah. So I think you're exactly right. You know, this idea that reinforcement learning is King, right? So there's reinforcement learning and there's reinforcement learning. So the, the, the triumph here is showing that if you just. Reward the model basically for getting the right answer versus not getting the right answer. That's kind of enough.

09:34

There are all these convoluted strategies and we've covered a whole bunch of these as they've been discussed and you know, we have deep mind papers about these. We have kind of blog posts about these. Process reward models as well. Yeah. Exactly. Right. So, so PRMs and ORMs process reward models and outcome reward models where, you know, you have essentially think about you know, a chain of thought, right? Where your model kind of goes, okay, well, step one, I'll do this. Step two, I'll do that.

09:56

Right. So on process reward models, of course, are these models that get trained to assess how likely a given step is to be accurate. There are all kinds of ways to do this. You know, often you'll like, Generate 10 different rollouts starting from a given step, 10 different like alternative paths that start from there and see what fraction of those paths lead to the correct answer.

10:16

And then based on that fraction, you score the presumed accuracy of the initial step that you started your rollouts from. That's one way to do process reward modeling. And you basically use that to train a model that predicts like how accurate a given step in a reasoning stream is, likely to be process reward models are really, really finicky, very, very hard to get good data for them. and so on their outcome reward models to kind of do the same thing with the output.

10:40

So you basically train a model to just like predict whether an output is likely correct or not. These are not ground truth, right? These are models. It's kind of like having a reward model for our LHF or something. They are not the ground truth. So you're training your model to optimize against something that is not the thing you care about in some sense. And that means that the model can exploit, right? It can, it can hack basically the process or the outcome reward models.

11:04

and just kind of like anyway, generate outputs that the model thinks is good, but that aren't actually good. That has always been a problem. It's been like we've been playing whack a mole with that issue for the last basically two years or so. What this shows is take a pre trained model like DeepSeek V3 and just do RL. All you're going to do is tell it, Hey, going to give you thinking, thinking tags. I'm going to force you to write your, your chain of thought between these thinking tags.

11:30

Okay, but your output is going to come after closing of the thinking tag. So you've got this kind of defined area as your scratch pad, essentially, that you can do all your thinking in, and then you've got a defined area for your actual output. And that is it! At least for DeepSeek R1. 0, that is it, right?

11:45

So you just have your pre trained model, and then you're doing the straightforward reinforcement learning process with, you know, you either get it right and you get a reward, or you don't, you get no reward.

11:54

Insanely, what ends up happening when you do this, and they're going to use a sort of a data set that has quantifiable outputs, like, you know, math data sets or coding data sets where you actually can give an objective assessment of whether or not the output was correct, and that allows them to generate these rewards efficiently.

12:11

We've seen that basically everywhere in the space, but what you find when you do that is that the model will naturally learn to reason in a way that kind of looks like. Chain of thought. It's more diverse than that. And in that sense, it's actually more powerful, right? Because if you think about the chains of thought that we typically force these models to think through, the way we do that is we'll actually take our base model instead of going straight into reinforcement learning.

12:36

What we'll do is we'll give it some extra training, some just supervised fine tuning on a data set, a curated data set that has a bunch of chains of thought and then outputs. And those are really expensive to produce those data sets, right? Because you've got to get humans in some cases to like, you know, annotate chains of thought, produce chains of thought, solve problems in all kinds of ways and in detail. And then you just train your model to do text auto complete on those basically.

12:59

And it learns the pattern of chain of thought. It's sort of gets forced to think a little bit like that data set, and so you'll see it naturally run through a chain of thought after it's been fine tuned on that data set. But what happens here is much more organic than that.

13:11

All we're rewarding it for is getting the right answer or not, and just based on that reward the only extra thing we're telling it is put your thoughts between these thinking tags, and the thoughts turn out to look a lot like chain of thought naturally, but they're not forced to look like chain of thought. It can do some more variable, more diverse things because you're not forcing it. You're not training it explicitly on human chains of thought. So it will tend to explore more.

13:35

And one of the other things that you'll tend to see is when you start this process of reinforcement learning, the first couple rounds of reinforcement learning that you do on the base model, the chain of thought, the length of the chain of thought, or the amount of text between those thinking tags starts off pretty short and as you spend more time, do more steps of reinforcement learning. You'll find that the model actually tends to fill out those thinking tags even more.

14:00

The chains of thought essentially get longer and the outputs get more accurate. And what this is telling you is that the model essentially is independently rediscovering and exploiting inference time scaling laws itself. They have an amazing figure, it's figure three in the paper, where they show this very like kind of linear increase in the number, in the average length. Per response.

14:21

In other words, in essentially the amount of text between those thinking tags and the amount of inference time compute that the model is investing to generate these outputs. And again, that's not happening because any human hard coded this idea of like, Hey, by the way, the more tokens you spend on your thinking, in other words, the more inference time compute you expend, the better your output will be. No, that's an organic result. That is the model.

14:43

Just Kind of stumbling into this strategy through pure reinforcement learning. This is a really, really big deal. It's not just a big deal because because we don't have to collect now these giant supervised fine tuning chain of thought data sets. So that is a really, really big deal. But it's also just an indication of how robust inference time scaling laws are. They are a convergent fact of the matter about AI systems.

15:07

When you train systems with reinforcement learning, they will independently discover this. Last thing I'm going to say about R1 0 before we move on to just R1, which is a slightly different story. R1 0 has trouble sticking to just one language. Remember, all you're training it to do is either get the right answer or, or, or not, like with the, or rather all you're awarding it for is getting the right answer. You're not telling it how to think, you're not giving it chains of thought to train on.

15:37

So what you tend to find is that the model will actually kind of switch between languages sometimes. It'll sometimes generate text that's not human legible, and they kind of call this out as an issue, as a problem. Almost like it's a bug, but it's actually not. It's, the way to think of this is it's a feature. There is a long compound German word that captures 20 English words worth of meaning. Right.

16:00

And, and there's a, there's probably a word in Chinese or a word in French that captures 20 English words worth of meaning. And when you can use exactly that word and compress essentially more thinking into one or a small set of tokens, you ought to do that if you're trying to optimize for the efficiency of your computation. And that, I think, is the actual correct interpretation of why you see this weird kind of language flipping.

16:22

It's the model essentially taking advantage, like, hey, look, to me, There's no, no real, no such thing really as English. There's no such thing as like French or whatever. There's no need to be coherent in one language. I use whatever kind of linguistic or reasoning tools in my chain of thought that happened to be most efficient. And so you end up with exactly this and the problem, and we called this out to a few episodes way back when OpenAI is 01, came out and opening.

16:47

I said, Hey, look, it's reasoning through human interpretable chains of thought. How wonderful for AI safety, right? This means that we can understand what the model is, is actually thinking, and we can intervene if it, starts thinking kind of risky, dangerous thoughts. And at the time we said, well, hold on a minute, that's actually absolutely not going to be the end state of this stuff. There are always more efficient ways to reason than the human legible ways.

17:09

And that's what we're seeing here. I think it very clearly. This is sort of the early signs of when you just let the model reason for just the reward and you don't kind of introduce artificial human bullshit, the model will naturally learn to reason in ways that are less and less human intelligible because human intelligibility is a tax. It is a tax that you impose on the model. It's an additional Unnecessary inductive prior that you are better to do away with.

17:33

So anyway, I'll, I'll park it there. I'm sorry. Just very exciting results. Yeah, it's,

Andrey

17:37

it's a very, it's a very good thing to call out and it is part of the reason that there is a divide between R one zero and R one, which we'll get to next. Right. there are some nuances here to cover a couple more nuances just with R one zero, so R one zero. That is the purely reinforcement learning model. That's the first step to get to R1. And they really do just start with a base model. They have a very simple template.

18:05

So the prompts they give to a model does tell it to think about the reasoning process, to output a thinking process before giving the answer, but that's pretty much the only thing they tell it. They train it on math problems and coding problems. Only. And I think this is important to call out with respect to reinforcement learning. I think people often kind of forget that this is a limitation of reinforcement learning in general.

18:31

If you don't have the ability to get rewards in a way, which in this case you can is much. more difficult. And you know, if you have, you presumably have all sorts of reasoning processes, you have reasoning processes for, you know, how to navigate the web, etc. It may not be possible to train with reinforcement learning in general. as to the Language thing.

18:55

I do wanna mention, this is a bit of a tangent, but I do think it's worth bringing back for anyone who's been following AI for a very long time. Back in 2017, I think it was back in the old days when deep learning was all the hype and then all lambs were still not a thing. There was a new story. about AI inventing its own language. So this was some research from Meta. I think at the time it wasn't Meta yet, maybe.

19:23

And they were doing a paper on bartering, a multi agent system where you had two AI models barter. I, that's what we call it anyway. And so they did a similar thing, optimizing these two models. together and they found that the model started basically using gibberish, using, you know you know, punctuation and so on, not human readable stuff.

19:46

And just like here, that makes a lot of sense because if your reward is only, you know, get the right output, the process to get there, you're not telling the model what to do. It can make up its own weird language along the way. And that's not nefarious. So it's not. Surprising, even that's just like a pretty reasonable outcome of leaving a model unconstrained. It can now do whatever it wants. And so at the time for that paper, they mentioned explicitly that they added reward component.

20:20

They then added a bit to reward that's like, this should actually read like English. And then it was actually interpretable. And in a way that's similar to what they did here in this paper. So now we can move on from R1. 0 to R1. R1 is R1. 0, but with a few more limitations and constraints and kind of design considerations you could say, so just to quickly cover the process, they begin to train R1, not R10, by doing supervised learning.

20:58

So they get a data set of reasoning traces through a combination of different things. At least as far as I know, they use DeepSeq R10 for some of it. They use some other approaches to get some of it. And then they just train the model to mimic that data set, which is part of what presumably OpenAI is maybe doing, you know, paying human people to produce data to train on. Then, after doing that supervised fine tuning, they do some more reinforcement learning.

21:35

So they do the same kind of reinforcement learning as R1. 0 on R1, After doing supervised fine tuning on it to kind of I guess bias it in a certain direction that does use human interpretable approaches. they go into the paper also distillation and getting to smaller models. So ultimately you get somewhat complicated, I don't know if you could call it complicated, but it's not quite as simplified as it might seem. This, this set of steps is a little.

22:07

Unintuitive. And I would say, you know, may not be optimal, but it's still, it's, pretty interesting that they also do large scale reinforcement learning when it gets to training R1, they mix it a little bit with supervised fine tuning, and that gets you the best of both worlds of sort of your LLM type clarity with the reasoning of

Jeremie

22:31

R1 0. Yeah, absolutely. And when they do that, right, when they add the supervised fine tuning step to get it to think in human legible chain of thought terms yes, the human interpretability absolutely goes up, but the performance drops, it's a slight drop, but there is a performance drop. And so earlier when I was saying there is a tax that you pay for human legibility, human interpretability, they're literally measuring that tax.

22:56

You either are going to optimize for a model that is a really good reasoner or you're going to optimize for a model that is human interpretable. But those two things mean different things, pressure on companies to make better reasoners is ultimately going to be very, very strong and maybe stronger than the pressure on companies to make human intelligible.

23:14

And to the extent that that's true, you start to, you know, be concerned about things like, you know, steganography or things like even just like explicit reasoning of, of kind of dangerous reasoning trajectories that are human legible because that's where you ought to expect these things to go down the line. So I think the kind of really interesting one way to think about this, by the way, is that You know, R1 is the model you actually use.

23:39

It's maybe human, more human interpretable for right now, for a lot of applications, what you want to use. But R1. 0 is the model that shows you the future, is reinforcement learning, right? This is the model that, kind of makes the point that RL can scale and and it really works. And the big lesson from all of this and and now this gets back to this is not investment advice, but looking at the movement, the stock movement of NVIDIA. So there's a lot that was going on there, but.

24:07

When you think about what caused NVIDIA to take off in the first place, it was basically the argument that Rich Sutton first made in, in the bitter lesson, right? Which is that well, scale is king. And a lot of people misinterpreted what that meant. The point of the bitter lesson is not that you don't need smart ideas anymore. A lot of people think that, but instead it's that you're like, you're a need to find, like you need to find smart ways to get out of the way of your optimization process.

24:35

You need to find ways to melt away inductive priors. Just let the compute do what the compute does. That actually takes smart ideas. Those are exactly the kinds of ideas that deep seek use so well in both V3 and R1 zero in particular. And so. When you, like the actual distillate of this is. So DeepSeek showed that you could achieve OpenAI 01 quality performance on at least an inference time, like something like a 30th of the budget, right?

25:01

5 million for the training or 6 million is what they claim and their asterisk is that only applies to compute used during the particular training run that led to the successful output. We talked about this before. It doesn't account for all the experiments they had to run, but still. Okay. So in other words, I can get a lot more intelligence. Per flop, I can get a lot more intelligence per unit of compute. That is the DeepSeek story. Does that sound like a bearish case for NVIDIA?

25:27

To me, it sounds like a bullish case for NVIDIA. Essentially, your GPUs just went up in value 30x at inference time. Like, that's what that says. It says that actually the slope on the scaling curve that you get by applying the lessons that DeepSeek learned in this process are actually way steeper than we thought before. The ROI is even bigger. And because there's a never ending demand for intelligence, like that is what the literal economy is based on. They're essentially a huge chunk of it.

25:54

All this means is people like they hold the same question when you're, it doesn't matter if you're a, you know, anthropic opening eye, whoever else. Your questions yourself is always gonna be the same. How much money can I possibly afford to pump into my compute budget? And like then I get whatever intelligence I get out. What this says is you'll actually get 30 x more.

26:13

So if anything, this actually, you know, makes a, a bull case for, hey, why don't we try to, you know, if we possibly can squeeze even more into that budget that's what's going to happen. Like, mark my words, that is what is going to happen. It's already like there's like this swath of people who see this as pessimistic news. But when you talk to the people in the labs themselves, that is not where they're going. This is very much like, scaling is very much alive.

26:35

We happen to be in this special moment where we're kind of at this turning point in paradigm, in the paradigm, right, where we had pre training for a really long time, was the dominant paradigm. Now we're having inference time compute take over a little bit more in RL and all that. And that gives new entrants the opportunity to just get ahead.

26:51

But the fundamental question over here is, For the next six months, 12 months is going to return to, yeah, but how much compute can you throw at these same strategies? Deep seek. If they're not state backed in a very big way, by then they're going to struggle already struggling as, as their CEO has said, with accessing enough high quality compute resources to push the sport export controls are absolutely hitting them. This is another lesson that's been mislearned.

27:15

Everyone's like, Oh wow, a Chinese company did a really impressive thing. Like what's the point of export controls? No, no, no. The lesson is. Compute matters 30 times more than it did yesterday. Export controls are even more critical. That's the real lesson here. So anyway, there's all kinds of stuff. We'll, we'll, we'll talk about this as well in the policy section. Cause Dario from anthropic had a, a blog post. That's actually kind of a banger in my opinion.

27:36

But. Anyway, all that is to say, and this is to the question of one of the viewers who, who is asking this on on the, the discord, that's kind of my take on the export control story. This is a really impressive model by the way. I think a lot of people who are trying to cope with, you know, this is not actually super, it is impressive, is absolutely impressive. It's absolutely impressive. Also absolutely on trend.

27:55

But the crazy thing is that you have a Chinese company that is on trend in terms of what the frontier ought to be able to do, or a reasonably close to it. Maybe not quite on the frontier. So anyway, super impressive model. Just look at the sweet bench verified score, you know, 49. 2. That's better than opening eyes. Oh, one model in seven on 17th, December, that tells you everything you need to know. This is legit.

28:14

It has huge, huge implications, but they are different from what a lot of, I think the kind of mainstream narrative is right now.

Andrey

28:20

Right. And I think we'll, we'll spend a little more time on the mainstream narrative and the reaction to R1, which was pretty extreme, I would say. And, and we'll get to there in the business bit for now we're focusing more on the technicals. I will say one more thing before we move on as to the technical report of a paper. One of the thing that was very interesting and I appreciate a lot, which is not done They do have a section on unsuccessful attempts and things that didn't work for them.

28:51

And they do actually call out process reward models as something that kind of worked, but basically the compute wasn't worth it. Just doing a rel turned out to be better rather than this more complex approach. And they did try Monta. Carlo tree search inspired by AlphaGo, AlphaZero, and others. And this is another kind of idea that people are excited about doing more of a search process where you do search to get a good outcome rather than just doing RL, which is what seems to be the case here.

29:25

I think also there are Some details missing in the X ActoREL setup because there are various ways to do reinforcement learning. They do I guess one of the big pieces here is GRPO, which we didn't even mention, but worth mentioning, they're using group relative policy optimization as the reinforcement learning algorithm, which they came up with back in early 2024. also demonstrates that that is a very promising algorithm. That algorithm makes it possible to train more efficiently.

30:01

We can't get into the details, but seems to work quite well. So anyway, it's a great paper. It's a very, interesting paper if you're tracking this stuff. And R1 certainly is impressive and exciting. And we'll, Probably get back to it in a bit. Moving on. There's a few more stories. We are not going to be able to dive as deep on, so we're going to just start moving quick.

30:25

First up next story is again on DeepSeek and this was a fun just after our one very soon after they did announced another type of model and multi modal AI model called Janus Pro, which they claim outperforms other models like it. And the very last thing to mention about deep Seek R one that is notable, it is very permissively licensed. I think it's the MIT license, which basically means you can do anything you want of it. That means that you can use it for commercial.

31:00

Applications for, you know, research, obviously for pretty much anything. There's no restrictions of any kinds, which often are there for other open source releases. So that is another aspect of why this is exciting. This is now one of the frontier models that you can use to build upon. And obviously that's exciting to many in the space. And now moving on, we have quite a few more stories to cover. So we'll have to go quick.

31:30

Next up, we have another story about DeepSeek and another model they released, which is not quite as big of a deal, but still very cool. They have now a model. Titled Janus Pro, which is a text to image model also released under the MIT license. Similar to you know, other text to image. I think it's hard to say exactly, you know, it looks very good. It does reportedly outperform DALI3 and other models like StableDiffusion XL on benchmarks.

32:03

They released a 7 billion parameter version of it as well as a 1 billion parameter version. So it is you know, there are Pretty good open source text to image generators out there. Not quite as big of a deal, but pretty impressive at DeepSeek as a lab, really an R and D project, not a commercial venture is now releasing multiple models like this into open source and making a big impact really.

Jeremie

32:33

Yeah, and is bears mentioning, right? The same kind of company that tends to be good at making reasoning models also tends to be good at making these kind of more multimodal systems. That's not a coincidence. But anyway, so I'd be interesting to see, like, well, we see more multimodal models from deep seek in the future, you know, integrating the reasoning with the vision and other modalities. I mean, I certainly would expect that to be.

Andrey

32:56

Right. And one other nuance to mention here, I guess in the, description for it, what they highlight is that this unifies multi model understanding and generation. So the, the big highlight is the text to image part, but they are combining, we have a vision language models, which are image plus text to text. That's image understanding. We also have text to image model, image generator models, which are just text to image.

33:28

These are usually done in different ways slightly different ways, trained in different ways. So the very interesting fear here is the unification and getting it kind of all to work together. So here again, there are some pretty significant technical insights that are. novel and actually, potentially quite impactful. And there's another paper on this Janus pro unified multimodal understanding and generation with data and model scaling.

33:58

Can't go into the details, but again, pretty exciting research as well as a model that people can use. And moving right along, we have another exciting Baywater release, which happened just after R1. Not quite as big a deal, but still pretty notable. And this one is about Quen 2. 5 1M. So Quen is coming from another Chinese organization, I believe funded by Alibaba. They've been working on this grand series of models for quite a while.

34:35

And so they have now released the technical report for this latest iteration, which is focused on long context links. So the dash one M in the name is Because they are extending it to be able to process 1 million tokens. And so they release a paper with pretty much a focus of how they get to that, optimization of long context scaling. They also release variants of it 7 billion parameter and 14 billion parameter, and also update their APIs to have access to it.

35:17

So again this is one of the, I guess. Missing spots in open source models. Often, typically you get lengths of more like 128, 000 tokens. So effectively going up to a long context is a pretty significant deal.

Jeremie

35:34

Yeah. And they, they use a whole bunch of techniques for this and they document quite well. one of the key ones is progressive length training. We've seen this in previous. cases, but they, they push it to its limit here where you start with like a relatively small context window, like reflective context window of like around 4000 tokens in this case. And they gradually increase it, you go to 32, 000, you know, like 64, 000 ish, you're basically doubling every time.

35:56

Until eventually your model kind of gets the point where it can, it can accommodate the full context and do really well on things like needle in a haystack evals, which is one of the things they look at. There's also this need to track. So because attention, the attention mechanism doesn't natively care about the word ordering.

36:11

You have to superimpose essentially like anyway, you use a technique to kind of superimpose some sort of like sinusoidal type pattern on top of, you know, your embeddings so that you can track which words are where they use adaptive rope base frequencies that increase. Anyway, they'd increase with context length. Basically it's a way of kind of dynamically adjusting to adjusting this kind of word ordering accounting strategy as you grow that context window.

36:36

The training data mix too, it's kind of interesting. So for that progressive length pre training what they do, sorry, training rather, 75 percent of the text that they use is actually full context at that length. Like 75 percent of it is the maximum length it can be. And then they have shorter sequences for 25 percent or so, but anyway, use all kinds of other techniques that we won't go into too, in too much detail. We talked about sparse attention in the past.

37:00

They do use that a lot of ways to kind of do like VRAM optimization too on the chip and. All kinds of stuff. So it is, it is really cool. It's another one of these like very engineering heavy kind of open source developments, right? We're, we're starting to see like the in order to be able to read these papers, you have to understand the, the hardware. You have to be able to kind of get into weeds on like what your VRM is doing and what your SRAM is doing even, and this jazz.

37:25

So increasingly, I guess you could say that frontier AI. is about the engineering side or at least the engineering side is, is totally inseparable, of course, from the, the architecture and the modeling stuff. So anyway, I find this really interesting and, and good timing for our hardware episode. How about that?

Andrey

37:42

Yes, exactly. also with regards to scaling laws, I think an interesting thing to note that obviously the general idea of scaling is you make bigger models, you get bigger data, you combine those things, you get better performance. As we see with DeepSeq v3, with R1, with this, Ultimately, just to do effective scaling is not easy as you said earlier, right?

38:07

So it's about figuring out the right mix of ingredients, optimization process, the hardware, et cetera, the data that enables you to do effective scaling. You know, work on various problems. And, and this is another demonstration of the accumulation of knowledge that exists in this space that just wasn't there two years ago for people to leverage. And onto the next story, again, a second release from the Quen team. And this one is Quen So as I mentioned, this is a vision language model.

38:46

This is focused on things like analyzing text and image, video understanding, and object counting. And similar to OpenAI's Operator Model, and to Entropic Computer Use API, this would power the ability to control, website browsing, and, and, pretty much use your computer for you in an agentic matter. So this one, I would say is less again of a interesting big deal.

39:16

This also, as you like to mention Jeremy came with an interesting blog post of a title of the blog post was Quentin 2. 5 VL, Quentin 2. 5 VL, Quentin 2. 5 VL. That's what's going on over at the Quentin man. It's in the water. Someone is very creative, you know, they release blog posts that are not boring. So yeah, here they have various demonstrations of the model.

39:42

So, clearly these teams are getting a lot of resources, or at least they are able to make a lot of progress at this point, and this is part of the reason, I suppose, why there's been a very strong reaction to all this stuff.

Jeremie

39:57

Yeah. And, and one of the sort of concrete advances they've had to do here is in the, the kind of long, like, as they put an ultra long video understanding just because, you know, that that's what you need to make a, an agent that runs on a, a computer like this I will say just like from a national security standpoint, think about like, you know, So we've, we've talked somewhat about actually quite a bit about this idea of the legal picture around pseudo permissive licenses, right?

40:20

So you have a Chinese company, that puts out some model that's really perform it. And there's a license term that says, if you have any issues with like the use of this model, those issues get litigated in a PRC court in a Chinese court. Right. And, and this kind of gives you a bit of open source warfare vibes where, you know, You know, it kind of brings you under the umbrella of the CCP.

40:38

That was a kind of an interesting, maybe vaguely academic problem or not entirely kind of not a huge, huge deal, but, but a thorn in the, in the side of the United States. Here when we're moving into increasingly these operator type models that actually take control of your computer and do actual shit, like Potentially send emails for you or, have access to your personal data and the ability to exfiltrate it to servers outside your kind of remit. This starts to become a real frigging issue.

41:08

You think about open source warfare in the form of planting black kind of back doors and Trojans in these models to have them behave in certain ways that achieve the objectives of the Chinese communist party or whoever developed them. This is actually a very interesting strategy and open sourcing is very you know, I'm not saying that that's what's going on here.

41:30

I suspect it's not, but as we get more used to using these kinds of like PRC originated models This is something that we ought to start thinking about is like, who's building these models? What incentives do they have to like bury certain behaviors in ways that are inscrutable to us? Cause we lack the interpretability techniques to look at these systems in detail. I think that's an actually under discussed dimension of this from national security standpoint.

41:51

There's a world where like, you know, a year from now, We just discover that, Oh shit. Like the latest zero day is actually to use all these deployed agentic models that come from, you know, Quinn or, or deep seek or whatever else. So I just, I think that's a really interesting dimension of this. something that, is worth tracking anyway.

Andrey

42:08

That's right.

⁠¶ Tools & Apps

42:09

And moving right along again, we're going to start moving very quickly, moving on to tools and apps. Going to take a quick break from all the R1 and Quinn stories. So. going to OpenAI and going to another story related to that agentic computer use kind of story.

42:28

So just recently OpenAI launched a research preview of Operator, which is exactly that, a tool that you can use within chatGPT that will browse the web and do pretty much computer use of the same sort that Anthropic and in this case, the Quentin demonstrated. So within operator. chatqpt. com, if you go there and if you have access, you will only be able to try it out as a US user. If you are at the 200 pro subscription tier, at least for now, you can then use it.

43:07

And there's going to be a small window that will open up. Pop up with a dedicated web browser, but the agent will then start using. And you know, as a user, you can still take control because the operator is using its own thing. It's not controlling your computer, so you can keep doing other stuff. OpenAI says that operator has a computer using agent model.

43:31

We don't know very much about it, aside from that, similar to Vanthropic computer use model, but apparently it's strained to interact with visual websites, to be able to click and read text, navigate menus, et cetera. So. it's been something that Anthropic launched, I don't know, back in October, many months ago, they had this preview on their API. At the time it was kind of a big deal and I think people are still very bullish on their GenTech AI.

44:01

So, you know, I think it was overshadowed a bit by R1 and the conversation around that, but I do think this seems quite notable.

Jeremie

44:10

It does. And, and it's, you know, it's not perfect. They're very, Kind of forward about that. Obviously they have to be because if you're going to launch a model and say it's agentic in this way people are going to use it for real shit. And so they, they do say currently operator cannot reliably handle many complex or specialized tasks such as creating detailed slideshows, managing intricate calendar systems, or interacting with highly customized or non standard web interfaces.

44:32

Fine. So that's what it can't do, but they are explicitly taking a kind of precautionary approach here, the requiring supervision for some tasks. So banking transactions and other areas where you have to, for example, put in your credit card information, the user has to step in and actually do that. an opening I does say, and this is relevant in that context, operator doesn't collect or screenshot any data.

44:54

So this is obviously like, yeah, you might be nervous about entering your credit card information in, in a context where you have operator running on your system. Their claim is that they're not collecting that data. So sort of interesting this, where do you do this handoff? Between the human and the AI in context of this open ended I mean, at the end of the day, until we have full on AGI, right? Like we're not going to have a clean answer to that question.

45:18

it's a little dicey, even in self driving cars, where at least there, you're in a very constrained environment. You know, you're on, you know, you're on a road, you know, it's just like other cars, pedestrians. It's a notoriously complex environment. Don't get me wrong, but But compared to the entirety of the internet you're going to run into some really wild out of distribution settings there and, the stakes are high there too, right? You can be giving money away.

45:38

You can be downloading a malware, doing all kinds of stuff. And it is an adversarial environment in a way to drive these. So I think this will be really interesting to see, like how robustly can they make these models? How quickly can they improve them? But there are, you know, all kinds of partnerships, obviously, as you'd imagine with companies like DoorDash. Instacart.

45:56

So a lot of like YC companies, which is interesting 'cause obviously Sam Malman was the president of Y Combinator for a while. so, you know, he's got good relationships with those guys, but also eBay, Priceline, StubHub, Uber, and so on. So just, you know, making sure that operator respects their terms of service is obviously top priority for them. And a good sort of initial trial round, shakedown crews for, for operator here.

Andrey

46:18

Exactly. And I think similar to Anthropic Anthropic's computer use API is similar to Project Mariner from Google that was announced just in December. No real timeline on when this would be widely available and reliable. My impression is that with all these efforts, this is taking us towards that future where agents will just do stuff on your behalf. But probably will take a while for us to get there. Just looking at, you know, OpenAI only releasing this now, many months after Entropiq.

46:54

With multiple limitations, it also refuses to send emails and delete calendar events, which, you know, as an assistant which presumably you want your agent to be, sending emails and deleting calendar events is necessary. So, yeah, it's, it's exciting to see More work towards this front. if this whole idea of like, please buy a ticket for me, I don't know why everyone likes this idea of getting AI to book a ticket for you for travel.

47:20

I don't think that's a good idea, but often that's one mentioned clearly eventually we'll get there and we are making progress towards that future.

Jeremie

47:28

I talk about my views on that more, but I got to catch my 3am flight to New York city. So, good one.

Andrey

47:37

And moving right along and getting back to DeepSeek and another aspect of the story. So obviously as we nerdy people who cover AI all the time, the paper of R1 was very exciting and very interesting. Having an O1. level model, almost a one level model was very unexpected even. But another aspect of a deep seek story that I find surprising and interesting is that their app that is on smartphones. Has become hugely popular.

48:12

So the story is that the DeepSeek app had reached the number one position on the Google Play Store. That means that, it saw over 1. 2 million downloads since mid January and 1. 9 million downloads globally, overall. So that is pretty crazy, right? Cause obviously we've seen chatGPT go viral. We've seen kind of that huge spike in usage. The fact that DeepSeek now pretty much went viral with their own chatbot that is a chatGPT competitor. It is, a way to talk to the V3 model for free in this case.

48:54

And people are, I guess, flocking to it is again, something that's kind of surprising that I would imagine has open AI a bit worried. We've seen some reactions of like, you know, being grouchy about people being excited about it. So clearly I think. Also, one of the reasons that we saw a very strong response to the DeepSeek R1,

Jeremie

49:19

release. Yeah. I do think once, you know, once these new paradigms get hardware saturated, like at the end of the day, it's going to, it's going to devolve into the same thing, right? Who has the bigger pile of GPUs and the energy to run them? and the ability to cool them.

49:31

So I think like in this case, China ends up in the more or less the same position they were, if they were struggling to compete on the basis of pre training, they're going to continue to struggle to compete when inference time compute becomes more important. It's just that we're not yet at the point where these specific techniques, where this paradigm has been scaled on the hardware to the point where. we're saturating the full fleet of hardware that we have available.

49:54

Those optimizations are unfolding. Like as we speak right now that is part of the design conversation with the next generation of not just NVIDIA hardware, but also the way that data centers are being set up and computers being set up and networking fabric and all those things. yeah, very rapidly expect people to blow past the deep seek, the, the O1 and R1 levels of performance.

50:13

And I think you're, you know, you're, you're, you're going to see Unless there is a sustained and concerted effort on the part of the PRC, which well could be, to consolidate compute and do massive training runs that can compete on a compute basis with what we have in the West. You're going to see just the same thing play out again. I would guess that you're going to see, a sort of takeoff happening with Western models clearly pulling ahead.

50:36

But the gap between open source and closed source is, probably going to continue to close and that's worth tracking. I will say with this launch. You know, number one on the U S play store, like the deep seek app itself has shit going to Chinese servers. So you use it your own risk, but like, this is again, a form of, it's not open source warfare. It's, it's a little different because this is the app, the kind of the deployed app.

50:57

But this is part of the structural advantage that open AI has enjoyed and Anthropic have enjoyed as people, especially open AI, because the brand recognition is people use that system more, they get more data, which they can use to train their next models. in this case though, You know, bear in mind, civil military fusion is a thing in China. And so anything that a Chinese company has, the Chinese military has.

51:14

And so that's who you're sending your, your data to it won't matter for everybody, but for some people it may.

Andrey

51:19

Yeah. You may not want to give it a passwords to all your sensitive documents if you're working at Google or whatever. Right. And of course, worth mentioning quickly that again, this is coming from China. So many people have reported that it is, let's say censored in various ways that the Chinese government unsurprisingly would like it to be censored.

51:42

Although if you get the open source model, it's pretty easy to circumvent that as we have covered, you can untrain all sorts of restrictions and these models, the model is aware of the things it's not supposed to say. So but yeah, in, in the app we can expect that. But then again, also my impression is like, if you want to try something that's free and that is a new thing that you might prefer to chat GPT and you're not worried about sensitive information, like it's actually a good app.

52:14

And I'm sure that's part of why people are flocking to it. And next story, again, we covered DeepSeek, now we are going back to Quen. So, in addition to that 1 million context length model, Alibaba did also release Quen Chat V02. And that introduced features like web search, video creation, and image generation within that chat interface. That adds on to some of the, I guess, simpler things like document analysis and the image understanding that was already there.

52:49

So web search is coming on the heels of OpenAI not too long ago. Adding web search to chat GPT is one of the ways for it to get. Context for answering questions. I think notable that you know, obviously in the China, when is, filling out that, I guess, niche, or at least one of the companies filling out that niche of, you know, Having a chat GPT type consumer service that you can pay to use a chat bot.

53:20

And, and now I guess one of the nice advantages that you have, if you're using clan chat, is that they have that 1 million parameter model that has a very long contact size, which is more akin to. Opus or Gemini that do similarly, I guess, optimized for a long context. So a lot going on clearly with LLMs and AI in the China space in the month of January. And moving right along coming back to the US, but still talking about DeepSeek. It really is all the rage. The next story is about Perplexity.

54:01

Perplexity is that AI powered search interface that's quite popular. And they very quickly launched a US hosted DeepSeek R1. So now, if you use Perplexity, You can choose to use R1 to power the pro search mode and that used to be an option with a one only now you can choose to use DeepSeek R1.

54:31

Not too much else to say, but interesting to see, you know, so quickly after release, first of all, them integrating that into their product, hosting the model in the U S and giving it as an option for people to use.

Jeremie

54:45

Yeah. I mean, if I'm perplexity, I'm really loving this strategically. Not that it's going to last, but you know, at least for now and maybe in the future just having these alternatives to opening eyes, a one model. You know that are credible kind of frontier level capabilities is an aggregator in a sense, right? An aggregator of capabilities from many different models. They're not building the frontier models themselves. They are outsourcing that to others.

55:10

And to the extent that you have many different. Companies building models, you have the space become more commoditized. The, value capture ultimately is a lot easier to do at the aggregation level, if that's the case, or at least it becomes a much more more plausible value add. So I think that's kind of a strategic implication of the R1 release and its incorporation here into perplexity. Let

Andrey

55:31

And just a couple more stories in this section. Next we go to Apple and a bit of a, you know, fun, oopsie AI is doing silly things type story, which is nice to mix into all the serious progress stuff. Apple, as some may have seen, had really silly AI generated notifications for news. This is coming after they released iOS. 18. 3 that has AI enabled by default. And how weird that that isn't something we are like covering as a priority.

56:05

I think Apple, it, it went pretty under the radar that they are rolling out Apple intelligence more aggressively now. But part of the thing that didn't go under radar is. The silly things it does, like they summarize, I guess, headlines and the news stories in the notifications. And that leads to a lot of very silly, incorrect summarizations saying essentially false things. And it was so bad that Apple as per the news story has disabled this feature.

56:38

And there are also examples of it summarizing messages that you get sent from contacts. Similarly, very embarrassing or silly types of things, at least as some people are getting. So yeah, Apple you know, Has been quite slow to get to this compared to other companies. You could say strategically, but this is not a strong indication that Apple intelligence is, is going well. And, and really reminds me of Gemini, right?

57:08

Then when Google rolled out their stuff, you saw similarly, very silly types of things that indicates that these companies are rushing this stuff.

Jeremie

57:18

Yeah, absolutely. And I think Apple sort of like Amazon and either there are a couple of companies that are, are noteworthy for having been late to the game and recognizing that, yep, like it does seem that the scaling laws work, dude. Like we're on track for AGI. I don't know what you guys have been doing, but the, the, the cost of being behind is just, it's so. Compounded across verticals, right?

57:40

Like your hardware stack your networking what you do to acquire the, the power you need to build the data centers. And then the model development teams, like at every layer of the stack, you have to find ways to convince the best people because the best people. Are in this domain. Like they are absolutely 10 X, a hundred X engineers in terms of the leverage you can get from their work.

58:00

And so the difference between being in the number one spot and the number Two spot is, is a world of difference. And so I think that's kind of the, you know, part of the tax that anyway, Apple and, and Amazon are paying and Amazon at least has, has had the the wisdom to partner with anthropic to get help, you know, getting, getting their like inferentia, training them three chips online. So Apple doesn't really have that kind of partnership. And, and that I think is actually to their detriment.

58:25

I think that's something that if I was at Apple, one of the things I'd be looking to do is like, how do you. Find a way to partner with a true Frontier Lab and get them to help you with your, your build, because it's clearly not going too well.

Andrey

58:36

Right. By the way, I guess worth mentioning this iOS 18. 3 had some other updates. There's now visual intelligence where you can point your phone at a thing And ask questions about whatever you're taking a photo of, similar to things you can do with chat GPT. So Apple is rolling out some other features, but I guess this was the highlight that people are aware of, at least as far as I've seen. one more story along those lines that I'm sure not many people have heard, but is fun to cover.

59:07

French AI Lucy looks trey chick from whistleform headline, but keeps getting. Answers wrong. So France launched an AI chat bot named Lucy backed by the government with the aim of promoting European values and apparently countering English language dominance in AI tools. So this would be, you know, this is such a

Jeremie

59:30

European project. I'm sorry. It's just so European.

Andrey

59:34

Yes. And then shortly after this launch, it was suspended for providing incorrect and humorous responses. Causing amusement and frustration. So, there's, yeah, various examples of very silly things like it's saying that cow's eggs are a healthy food source things like that. So pretty embarrassing or at least humorous that this is that. And as you said, You know, Europe as part of a larger story, you're clearly very behind on competing with both us and China.

01:00:08

And this is not a great demonstration on the power of Europe develop this kind of tech.

Jeremie

01:00:14

Yeah. I, I feel like Emmanuel Macron, the, the French president there he kind of like knows just enough to like spend a lot of money on, stuff that's really dumb, but that's just like, Not obviously dumb. So there's like, it's a dangerous place to be. And anyways, there's been a couple of things like this. I mean, I guess I I've, I've said before on the podcast, just for those tracking at home, like I think Mrad, for example, is going to be in a world of hurt.

01:00:39

I don't think they can keep up in a world of scaling and I expect them to fold at some point or get acquired or something like that for like a bunch of other labs that we've seen that happen to but. One specific thing that I find kind of funny here, sorry to like bash this too much, but it is funny. So Lucy, whose logo is a female face, said to be a combination of Marianne, the French Republican symbol, and yes, Scarlett Johansson, the American actress, was widely criticized.

01:01:07

Why you would go with Scarlett Johansson after the GPT 40 debacle, I have genuinely no idea, but this apparently sounded like a really good plan, and they went ahead and did it. And so now this is just like another layer of, of shit in the shit sandwich that is this giant government investment in this chatbot. I don't know, there's so many things going on here where I'm like, I just don't know, but I, I'm sure they have a plan. I'm sure they have a plan.

Andrey

01:01:31

Right. And they did receive funds from the broader national national investment plan. The organization, by the way, is Lina Gora. French open source software firm that is the leader of the consortium behind the project. They did say a statement that the launch was premature quote. So yeah, we've seen that happen also to Google and also to Apple clearly. So, I guess we're not unique in having this sort of response, but still.

01:02:06

A little, a little silly and moving right along to applications and business.

⁠¶ Applications & Business

01:02:11

Once again, we've got to come back to deep seek and cover the result and response to the R1 model. So I don't know exactly the timeline of this. That's one interesting kind of thing. Pretty much nobody cared about DeepSeek v3 in the business world, at least, when R1 came out. And you know, everyone went crazy and started panicking. At least there was clearly a large amount of panic in the US business world. So there was a 2 percent fall in the S& P 500, a 3 percent fall.

01:02:47

fall in NASDAQ and a 17 percent plunge in NVIDIA's stock. Like, I mean, 17%, that's 600 billion of market cap value. So clearly we saw a lot of news coverage of the story. A lot of coverage that was not very good, that. I cited the 6 million figure from the paper in comparison to billions OpenAI, which obviously that's wrong of the 6 million story is about the training costs, not the infrastructure costs. And as to, you know, the ramifications for NVIDIA there, it's quite nuanced.

01:03:34

it may be the case that NVIDIA could see. future, profits due to the ability to train more efficiently that was covered in deep seek V3. But then again, right. The demonstration of that paper is that with the relatively, weak hardware available to train. To the Chinese companies that are restricted from buying for the latest generation of chips, they were still able to train.

01:04:00

So on that side, you could argue perhaps that Nvidia would not be able to sell quite as many of their flagship chips that are the most expensive. But regardless, yeah, from my perspective, this was a little surprising. Perhaps an indication of that almost like a wake up call there was a blog post made of last year of a 600 billion question of AI, where you've seen a huge amount of investment in infrastructure. That isn't leading to profits except for Nvidia, I suppose.

01:04:34

And so I think this could also be indication of a bit of worry that all of this massive amount of investment is perhaps not going to pay off quite so well.

Jeremie

01:04:45

so I'm just going to douse cold water as I tried to do before on this whole narrative. I don't like to be clear, like not an Nvidia shill and It's just a fact of the matter that they kind of run the space and have massive market capture But there's a great report out by semi analysis that goes over this in detail though. It was already obvious to some extent Previously, but the actual capex, right?

01:05:07

So the question that you ask yourself when you're deciding whether to buy more Nvidia chips Isn't what is the marginal cost and compute of training my model? It's How much is my fucking cluster going to cost me? How many chips do I have to buy? The total capex, the total server capex for this cluster, according to semi analysis, was 1. 3 billion dollars.

01:05:29

lot of that went to, maintaining and operating the GPU clusters, and kind of operating, like, but This is a massive expense that is orders and orders of magnitude greater than the advertised 6 million training. The 6 million training cost again, is perhaps the, perhaps the compute cost associated with one training run. The one that specifically led to the V3 model, it is not. The capex cost, which is the main thing you're thinking about when you're deciding whether to buy more Nvidia chips.

01:05:55

That's the thing that Nvidia's revenue is really based on to a significant degree. Anyway. The other thing to keep in mind is, you know, they, they advertised and we talked about it at the time when the three first came out, but what we're, we're learning and, and this was kind of like the ball was fumbled a little bit.

01:06:10

I think scale AI CEO, Alex Wang, and maybe even Dario at Davos gave interviews where they, they mistakenly said something like there was like 50, 000 H one hundreds in reality available to deep seek in reality.

01:06:21

It's a mix of H. 800s H one hundreds and then the eight twenties, these China specific chips that we've talked about a lot in the context of export controls and which probably should have been export controlled as well, but weren't, that was a chip that Nvidia specifically designed to just skirt the export controls, get right underneath the threshold and be able to sell to China. So moral of the story is. Dude, we got to tighten up export controls even more.

01:06:44

They are working because how much worse would it have been if deep seek had been able to get their hands on more of this hardware? That's a really key question.

01:06:52

so the stock plummets when people, in my opinion, incorrectly interpret the kind of result here of DeepSeek, but the other kind of complicating factor is that literally the next day we find out the President Trump has tariffs that he wants to impose on Taiwanese semiconductor exports up to 100 percent tariffs, he says, which would actually justify a crash in NVIDIA stock.

01:07:13

And so now we're left wondering, did the stock crash because people incorrectly priced in the implications of DeepSeek on day one? Or was there some sort of leak ahead of time and insider trading based on that leak about the imminent announcement of potential tariffs on TSMC or on Taiwanese imports? actually very ambiguous to me right now. I wonder if somebody's done kind of a detailed analysis kind of parse that out.

01:07:38

I don't know how, how one would but I think there's some, some blurring of the lines there that makes things quite interesting. Bottom line is, I think fundamentals here are bullish for NVIDIA, modulo the tariffs which is actually going to be big issue for, for USAI competitors.

Andrey

01:07:52

Right. So I guess, I think we're both on the same page as far as analysis where this is a bit of an overreaction seemingly, and, and only kind of could be said to make sense from the broader perspective of the outlook for AGI and, and. Building out these massive data centers in general, rather than specifically deep seek on its own. Moving right along, the next story also relates to data centers and it is on relationship between Microsoft and OpenAI.

01:08:25

So there was a post by Microsoft that is updating the details that we know about Microsoft and OpenAI's relationship. So now Microsoft no longer holds exclusive cloud provider status. To open AI, although it does have a right of first refusal agreement where open AI, I've guess would have to at least talk to them. So, opening, I still committed to using Azure quite a lot, but clearly also trying to loosen up its relationship with Microsoft. This is also.

01:09:07

In the context of the Stargate project, we'll discuss a little bit more which seems to be primarily for the benefit of OpenAI. OpenAI getting exclusive license to use the outcome of that project. OpenAI and Microsoft has a long storied relationship quite interesting from a business strategy perspective. And this is the latest update to a very ongoing situation.

Jeremie

01:09:35

Yeah, in a way, maybe not the most shocking. We actually talked about this in the context of the OpenAI Oracle deal, right? turns out to have been part in retrospect of Project Stargate. This Abilene, Texas build that they're, the cluster that they're partnering on, right? That was the first time we looked at that. We're like, Hey, you know, this is really kind of my opening eye going off script with respect to their relationship with Microsoft.

01:09:55

As we understood it at the time, it seems like that has evolved. As I recall, the argument back then was, Hey, Microsoft seems a little sketched out about following OpenAI as far as they want to go at, at the rate they wanna go. In terms of these buildouts, the very, very aggressive, right, like $500 billion over guess four year buildout pace, or five years, no four years.

01:10:15

it is also worth noting by the way that so Microsoft has been investing $80 billion per year in new data center builds for ai, which. Actually, if you look at it over a four year period, isn't that far off from the 500 billion number? there's a lot going on there and, and maybe, you know, opening eyes desire to have exclusive access to that cluster was, was a big factor too. this is a big deal.

01:10:38

The other piece that's been talked about and, you know, Elon tweeted about this and he was both right and, actually, I mean, I guess he was technically right. Sam kind of framed this up as like, yeah, it's a 500 billion investment funding secured if to coin a phrase. But Elon said, no, you don't have the funding secured. Like I have it on good authority.

01:10:58

He said, I think on, on X at some point that like, you know, soft bank only has, I don't know, 10, 15 billion available to spend in, in kind of liquid liquidity Anyway, when you add together the amounts like 15 billion from OpenAI and like another 15 billion from Oracle or whatever, it was like that order of magnitude just didn't add up to anywhere close to 500 billion. And that was absolutely correct. So there's a hundred billion that is actually secured.

01:11:22

And the hope is to raise the additional 400 billion in time. that extra 400 billion therefore is to some degree a marketing play. OpenAI is trying to make this the project, the government favored project. That's a big You know, element here. I will say, I mean, opening eyes, security is shit.

01:11:38

I know this because we've been doing an investigation for the last year on frontier lab security and the implications for kind of national superintelligence projects which will be coming out I guess two weeks from now. So without, we'll probably be covering it here. but when you're announcing to the fucking world that you're building a 500 billion 500 billion. cluster that you internally believe to be the super intelligence cluster. You're inviting nation state attention.

01:12:00

so, you know, like Sam A has put China on notice that this is going to be a big fat juicy facility and they know exactly what he plans to use it for. So not great from a kind of a security standpoint, not that you can hide these builds, but there are ways to go about this that might've been a little better. I think that, you know, you face kind of this media incentive because you're trying to draw investors as well. But one challenge with this build is who the investors are.

01:12:24

So G42 is one of the investors through they're not investing as G42. They're investing through MGX, but it is G42 that is UAE fund. There's also Saudi money, which is a dominant contributor to Masayoshi son's soft bank. So in a very real sense, the Stargate project is like.

01:12:43

UAE and Saudi funded, and I'd have to take a look, but I would not at all be surprised if the lion's share of funding came from those sources that is interesting from a national security standpoint, the, the strings that are attached to that funding would have to be very, very carefully kind of vetted, but that I think is a, very serious issue.

01:13:00

And so it speaks to, you know, some of the, the, the challenges people have had, you know, with open AI in particular, being willing reportedly to like, you know, trade off American national security interests for even the national security interests, according to some reports of, the Russians and the Chinese, where they say, you know, we're going to get these countries to like to bid on AI bid against each other to have the AGI project based in their company.

01:13:21

Like, this is the sort of thing, and unfortunately when there are stories like that that are very credible that leads to questions about, you know, when you then start taking Saudi and UAE money to, to do these builds. Like, what actually is the thought behind that? I'm not pretending to know. I can't read Sam Altman's mind. but these are things that, you know, you want to be considering, especially if you're, if you believe yourself to be building a project of this significance.

Andrey

01:13:43

That's right. And next up, another story related to OpenAI and their kind of ongoing journey, you could say related to the governance piece of OpenAI. They're updating their board of directors with the addition of BlackRock founding app. Partner Adebayo Ogunnesi, I think is how you could say. So he was, focused on infrastructure investment, was in Credit Suisse for 23 years.

01:14:16

I can't say that I know too much about the implications of this, but clearly, you know, coming at a time where OpenAI is still making a strong effort to shift towards a for profit structure coming on the heels of. You know, just I guess slightly over a year ago, we had the nonprofit board have a stage of coup. And since then there's been a gradual transition of powers, presumably behind the scenes. So this is happening alongside and probably does have some meaningful implications.

Jeremie

01:14:51

Yeah, it's basically, I mean, my read on this is they need a guy who can help bring in massive amounts of like Saudi and like sovereign wealth fund money into giant projects. And so he, this is a great finance guy, really experienced doing this kind of thing. They say actually in October, he launched a joint GIP BlackRock 30 billion fund with backing from Microsoft. Okay. Okay. NVIDIA. Okay. Okay. And Abu Dhabi. There it is to build data centers and adjacent power infrastructure.

01:15:20

So this is a guy with experience with, those UAE stakeholders, the sort of G 42s of the world and presumably networks that are deep in there. So my read is that that's kind of the play here with with this appointment.

Andrey

01:15:33

And one last story, just to cover something kind of normal, I guess, something more along the lines of what we may get on a calmer week. Worth knowing that 11 labs, which is specialized in AI voice technology has raised 250 million in a CEC funding round that led, leads to them being valued at 3 billion. I believe we covered this as. A potential story.

⁠¶ Policy & Safety

01:16:00

And this is the, confirmation of that funding round led by iconic growth, Andrew C Horowitz, I think a name that perhaps isn't, as well known, right, as open AI or as on tropic, et cetera, but as a leader in the space of AI voice technology, it's a very significant organization. And I do think clearly that is reflected in this funding and in the valuation and that's all there is to say to that.

01:16:29

Let us move on to policy and safety skipping research entirely, just because we don't have time and we begin a policy and safety with once again, Stargate and the. Announcement that happened at the probably White House or regardless happened with Donald Trump's presence. So there was a lot of fanfare about Stargate as what was marketed, as you could say, as that 500, 000.

01:17:00

Billion investment in AI infrastructure in the U S and so there was this presentation where Trump hailed this project and said this would position the U S as being competitive as being part of a make in America initiative. And he also mentioned using emergency declarations to facilitate infrastructure.

01:17:28

So an interesting like obviously Jeremy, you would know more about this, whether the U S government can kind of get behind this project and what the implications are for this announcement happening kind of weirdly because Stargate has been ongoing for a while and they seem to be kind of pushing it now in a way that isn't really. News, but is, is being made to seem like a new thing.

Jeremie

01:17:55

Yeah, in fairness, like, so, so that aspect is not so unusual. I think TSMC did something similar late in the Biden administration where they had a big fab they wanted to announce and they just said, Oh, we're going to kind of wait till Trump's in office and then, you know, give him credit for that. It's done. It's just like part of politics as usual.

01:18:14

must be said, this is a, an especially Sam Altman type move especially given that he's, he's playing catch up right now in terms of his relationship with the administration, having been, you know, like a very public anti Trump guy for so long. And then he, he sort of put out some, some pretty I don't know to some cringy tweets like it's sort of like when you've been tracking his views on the previous Trump administration and to see this one 80 is kind of like, Oh, that's interesting.

01:18:40

You know, like it's, it's very clearly, at least to me, it seems very clearly a an attempt to ingratiate himself, which like, Yep. Yeah. Like I get it. You're, you know, running a business here and there are obvious governance implications, but it's going to be part of the calculus for anybody in that position in terms of what this actually means in terms of government support. There's no, like, I'm not tracking any government investment in this.

01:19:00

in fact, for amounts this high, it would be really difficult for just the president to be able to say, Hey, yeah, we're gonna, you know, we're gonna fund you because Congress is responsible for appropriations. And, and so getting more funds in the door that just, that isn't something that the presidency can easily do without taking money away from other things. That being said.

01:19:20

Trump has, has been very forward on deregulation of in particular environmental regulations and, and other, other issues that slow data center build times and, and power builds, which is actually really important. Like right now, Our biggest gap relative to China is our ability to field enough power through whatever means to build really big campuses. We have all the hardware we can eat more or less, but we need that, that energy infrastructure.

01:19:44

A bunch of executive orders came out like towards the end of Biden's term, which seemed to still kind of be live. And then, so that's kind of interesting. So Trump has, has let those be because they, they did point to deregulation. But he's making other moves and, and really kind of bolder moves to to deregulate and move things forward, which I think, you know, if you're a fan of American competitiveness in the space, that is an important play.

01:20:07

You want America really, no matter what position you have in this field, even if you care about like loss of control which I absolutely do. You want America to be first so that American labs have enough lead time that they can work on their alignment techniques and, and not, not be too, too rushed by geostrategic factors. So anyway I think this is actually like, it's good that Trump is saying he's behind this.

01:20:28

The funding sources, and this is more of a, you know, Sam Altman sourcing the funds thing. The funding sources are, potentially an issue unless they're, the strings are very Looked at, you may need sovereign wealth fund money from outside the country. That may just be a fact of the matter of this stuff, but, but you definitely want some, some intense national security scrutiny on, on the source of those funds and the amount of leverage they have on the project.

Andrey

01:20:50

And as you said, once again, that that 500 billion number is basically just what they hope to get over the next four years. Apparently the a hundred billion dollar number is from South Bank CEO Masayoshi Son, who there's other investors there, including OpenAI. So a huge project and a very ambitious project. We'll see, I suppose, what winds up happening with it. And next some more news related to Trump taking office, which happened I think last week with we couldn't cover.

01:21:26

So surprisingly, I suppose we knew this would happen. President Trump has rescinded the Executive order on AI from the Biden administration, but safe, secure, and trust worthy development and use of AI order. That was a huge, very, very long executive order. Did a lot of stuff. Trump had an another executive order, initial rescissions of harmful executive orders and actions that went into effect. And as you said there is kind of a mix of things that Trump seems to be doing.

01:22:00

So this is more focused on the safety piece on a lot of the, things that Various agencies were targeted towards doing but there are other Biden policies and orders that haven't been targeted in this.

Jeremie

01:22:15

Yeah, this is actually, it's pretty interesting. So I think, we talked about this when, when this EO first came out, but the executive order that Trump has just revoked. It's, An EO that tried to do a little bit of everything, right? So the democratic coalition that was behind it was included people with a variety of interests and concerns. Some of them about like hardcore, like NatSec concerns that are bipartisan things around the weaponization of AI, loss of control risks, stuff like this.

01:22:44

And then there was a bunch of stuff that was kind of more you might say clearly kind of Democrat coded. So stuff around ethics and and bias in, in algorithms and stuff like this. anyway, so it was, it was at the time, the longest executive warden in us history, I think it may still remain that. So that was a fun fricking thing to read when it came out, but. you can certainly read this as they tore down the executive order because it had so much extraneous stuff.

01:23:08

And question is, what are they going to replace it with? One of the key things that this EO did that was good was it included a reporting requirement for any models that were trained with more than 10 to the 26 flops, right? So these are at the time, no model had been trained at that threshold. Now we do have some but yeah, so the question is going to be, is that going to be reinstated in some form? What other executive orders are coming? That remains an open question.

01:23:32

So I think they're, you know, people kind of reading in a lot right now and into things that are actually quite unclear. But the reasoning behind this is somewhat clear. Anybody who's been tracking this, you know, the administration was talking about how they were going to revoke this for quite some time. And it was clear why as well, that there was just so much fluff in there that was not germane to kind of the core national security issues that president Trump cares about.

01:23:53

So that's kind of the, I guess the angle that they're taking on that. And a lot of this, we have yet to see how it's going to play out.

Andrey

01:23:59

And now moving back to DeepSeek as promised getting back to the policy on geopolitics implications. And we're going to get into it through as you mentioned, Jeremy, the take of Anthropic CEO, Dario Amodei. As Amodei has done previously, he posted a blog post Conveying his viewpoints, saying that they do not perceive DeepSeek as adversaries. Basically saying this is not necessarily a bad thing, but at the same time, emphasizing the importance of export controls.

01:24:36

So Amadei kind of struck a fine line here. He had good words to say about DeepSeek and the research we've done, but at the same time, I suppose, tried to remind the fact that. They are based in China and therefore are directly tied and, and have to sort of follow the orders of the authoritarian government of China that, at least as someone in the West. And again, we want to be very clear here. We do have.

01:25:08

A bit of bias, you could say, or a viewpoint that is net negative with respect to the Chinese government. And similarly here, Amadei kind of positioned China as not a good thing. And as it still being important to double down or continue going with export controls.

Jeremie

01:25:29

Yeah, he also published a blog post, which is, which is quite good kind of laying out in a little bit more detail, his, his thinking on what DeepSeek actually means think everybody kind of in the space more or less converged, there's, there's more or less two groups of people, they're like people who are looking at DeepSeek v3 and we're

01:25:44

like, holy shit and kind of like, Already had run these calculations in their head, and then there are people who are just kind of getting shocked by our one when it came out. And media takes have been dominated by the latter, but the former, anyway, we've, we've kind of done it to death, but it's basically aligned with, with that idea, right? That scale is going to continue to work. scaling curves are going to continue to dominate.

01:26:04

And the question now is like, how quickly can China and the West saturate the compute that they already have. And once that's done, then we'll get a real sense for who's ahead in the space. But ultimately hardware is King that hasn't really changed. We just have a second axis on which to scale. And one of the points that Dario makes really effectively too, is like, it's been a minute since a one was trained. It's been a minute since like Cloud 3. 5 Sonnet was trained.

01:26:30

And in that time, you know, you'd pretty much expect, given the rate of algorithmic and hardware improvements, that yeah, you would get a model like this trained on about a billion dollars worth of infrastructure that, where the individual training run costs around six million dollars. Like, none of this is really that shocking. In fact, it's slightly behind the curve. shocking thing here is not that shocking. Is not that China managed to do this per se.

01:26:54

It's just the curves themselves are so steep. the improvement curves, like we are on course, at least many people believe, I believe we're on course for super intelligence here. That's what these curves say. If you take that seriously, then yeah, every incremental breakthrough is going to seem shocking. And even things like deep seek, which are, you know, maybe a couple of months behind where the frontier is in terms of the cost performance trade offs. Are, yeah, are going to shock people.

01:27:16

And when you open source them and, you know, you add some marketing that says 6 million then yeah, it's going to make an impact. And I think that the main lesson here is expect more of this sort of thing, not necessarily from China as Western scaling kind of, you know, Starts to drive things I would expect. but yeah, certainly from Frontier Labs elsewhere.

Andrey

01:27:35

And, and this is a good time to circle back to that question we got from discord and particularly Jeremy, your take clearly there's a bit of tension here. On the one hand, we want to have safety, you know, have interoperability. You, of course are a big safety hawk would like to, you know, Be aware of the alignment concerns and so on. At the same time, there are, you could say, argue race dynamics between the US and China and DeepSeek demonstrates that.

01:28:07

So yeah, what was your reaction here in terms of, I guess, how this relates to alignment and those sorts of things?

Jeremie

01:28:14

Like China obviously has a very impressive national champion in DeepSeek. And there, there's a lot of has been made of like a meeting between the Chinese. So, so Li Jiang, who is the like kind of second in command in China and one of the co founders of DeepSeek. That is interesting. The, the Bank of China announced a 1 trillion yuan investment in AI infrastructure, which, Incidentally, it has been incorrectly framed by Western media as 137 billion.

01:28:42

That's what you get if you, if you just do the kind of, currency conversion naively, but the number that actually matters is purchasing power. The, the PPP number purchasing power parody. And in PPP terms, that's. That's actually a 240 billion investment. So that is more than the actual total funding committed to project Stargate. It's almost, it's more than double in fact.

01:29:03

So like when you think about how serious is the CCP on this, they're fucking serious and they now have a national champion in deep seek that has the absolutely the technical chops to compete if they have enough hardware on hand.

Andrey

01:29:14

I think also worth mentioning, not just deep seek Alibaba and Quinn. Totally. We shouldn't overlook. They are. Very much competitive on the frontier model front.

Jeremie

01:29:24

No, great call out. Yeah. And, and when you, and as well, you think of like the Huawei SMIC axis and how much, anyway, there, there's a whole story about like what the hardware picture might look like with their seven nanometer process and whether that's enough to make enough chips at scale with good enough yields to do interesting things. It may well be. But the bottom line is like, China is for real here. And this is where, you know, like Western.

01:29:46

National security agencies have a lot of work to do and we'll have to get more involved. You know, there's, anyway, this is spiraling into essentially like the work that we're going to launch in two weeks. But bottom line is I think that we have to make some very thoughtful trade offs. And calculations regarding what it means for China to be a live player in this race and at the same time recognize that, yeah, alignment is unsolved.

01:30:10

Like there, there are too many people who look at the fact that alignment and control of superintelligent systems is probably a really big issue. And they almost don't want to acknowledge that because they also recognize that like trying to negotiate in good faith with China is not going to happen. The question. That we've been wrestling with over the last year as we've done our investigation is What happens if you take both those things seriously?

01:30:33

What happens if you acknowledge that yes, China has basically violated every international kind of national or international security treaty that they participated in? They've taken advantage of the treaties that the U S and Russia have engaged in on nuclear. and show no sign of stopping at the same time. We don't know how to control superintelligence and the likely outcome there is not great. If we build superintelligence before we can control it, how do you reconcile those two views?

01:30:56

And I think that's kind of at the core of a lot of sort of Pollyanna ish, unrealistic takes on both sides that, that don't kind of take the full picture into account. So I'll park it there. Cause I'm, I'm. I will go on way too, way too long on that piece.

Andrey

01:31:09

Time for a policy episode, where you heard it just now. And the last bit, there are a couple of stories related to TSMC. We'll, we'll focus on one of them. So there was a story of Trump threatening tariffs and there was a response from the Taiwan government in response to that.

01:31:29

And then also an interesting story of the Taiwanese government clearing TSMC to make two nanometer chips abroad, which is they have this so called silicon shield restricting that now they're lowering that, which is related of course, to TSMC's work in the United States.

Jeremie

01:31:50

Yeah. Look, the way to think about Taiwan in this situation is that they are a person who has taken your baby they're holding your baby. And then there's another person who's pointing a gun at them and they're not going to let go of your baby because if they do, then you're going to be like, eh, yeah, I don't really care if Taiwan gets shot, but they're holding onto your baby so that you care about them getting shot. They're like, no, we're building all the semiconductors here.

01:32:12

And if China attacks us. then you don't get any semiconductors. It's really, really bad. That's, this is a perfect metaphor. I'm really happy. Yeah, there's anyway. So, so yeah, this has actually been a matter of, of Taiwanese state policy for a long time that whatever a TSMC is leading notice, They're only allowed to build fabs outside of TSMC that make two generations of nodes behind that.

01:32:36

And so when you look at the Arizona fabs that are being teed up by TSMC famously, those are like four nanometer and that's because the leading node at TSMC right now is the two nanometer fab the two nanometer node. And so yeah, that's, that's changing now. And that's a really, really interesting development, right? That is essentially green lighting the, the build out of fabs for, you know, the, the two nanometer, the 1. 6 nanometer and so on.

01:32:59

In the United States, which obviously America would be really, really interested in because they need to ramp up their capacity to produce these chips really fast.

⁠¶ Outro

01:33:08

If something happens like hot war style, you know, China invades Taiwan. I mean, assume to first order that like that all the TSMC fabs are booby trapped to like to blow basically no, TSMC for you. And then everything resets to like, okay, well, what are the next leading fabs? And in that context, like SMIC. is a really interesting player. Actually, I mean, they'll have issues because they can't get lithography machines and other shit, but like they definitely become more important.

01:33:34

And so China rises to much closer to parody with the West in that situation. So a lot of interest in on shoring Taiwanese TSMC, fabs and, and capabilities that those, higher resolution. So that's kind of what we're seeing here. That that's been green lit essentially.

Andrey

01:33:46

Makes sense. And with that, we are finished with this very dense deep seek focused episode. Thank you for listening as always. You can go to the description for all the links. Go to lastweekin. ai or lastweekin. ai. com on the web to get those as well. As always, we appreciate your views, you sharing, subscribing, but more than anything, you listening and you chatting on Discord also is great. So thank you and be sure to keep tuning in.

Transcript source: Provided by creator in RSS feed: download file

#198 - DeepSeek R1 & Janus, Qwen2.5, OpenAI Agents

Episode description

Transcript

⁠¶ Intro

⁠¶ Response to listener comments

⁠¶ Projects & Open Source

⁠¶ Tools & Apps

⁠¶ Applications & Business

⁠¶ Policy & Safety

⁠¶ Outro