Dylan Patel — Deep dive on the 3 big bottlenecks to scaling AI compute | Dwarkesh Podcast

⁠¶ Why an H100 is worth more today than 3 years ago

00:00

All right, this is the episode of my roommate teaches me semiconductors.

00:02

It's also the send off for this uh this current

00:06

set. It's fi yeah, you're you know, after you use it, I'm like, I can't use this again. I gotta get out of here.

00:11

No, no, sloppy seconds for dark guys.

00:13

Okay, Dylan is the CEO of Semi-Analysis. Dylan, the birding question I have for you, um, if you add up the big four, Amazon, Meta, Google, Microsoft, the combined uh forecasted Cabecks that you published recently this year is six hundred billion dollars.

00:30

And given uh you know yearly prices of renting that compute, that would be like close to 50 gigawatts. Now obviously we're not putting on 50 gigawatts this year. So presumably that's paying for compute that is gonna be coming online over the coming years. So I have a question about what how to think about the timeline aro around when that cap maximum is online. Similar question for the labs where, you know.

00:53

OpenAI just announced that they raised a hundred and ten billion dollars. Anthropic just announced they raised thirty billion dollars. And if you look at the compute that they have coming online this year, um you you should tell me how much it is, but like is it not isn't another four gigawatts total that they'll have this year? It feels like the cost to rent the compute that OpenAI and Anthropic will have this year to like sustain their compute spend.

01:13

at, you know, ten, thirteen billion dollars a gigawatt. Th those individual raises alone are like m enough to cover their compute spend for the year. And then this is not even including the revenue that they're gonna earn this year. So help me understand first. When is the timescale at which the big tech CapEx is actually coming online? And two, what are the labs raising all this money for if like the the yearly price of a a one gigawatt data center is like thirteen billion dollars?

01:41

So when you talk about the capex of these hyperscalers, right, on the order of six hundred billion dollars and you look at the across the rest of the supply chain, gets you to on the order of a trillion dollars. A portion of this is You know, immediately for compute going online this year, right? and the uh the the other parts of CapEx that do get paid this year. But there's a lot of setup CapEx as well, right? So when we have when we're talking about 20 gigawatts this year in America, roughly.

02:08

Incremental.

02:09

Incremental added capacity, a portion of this is not this year, a portion of that capex is actually spent the prior year. And so when you look at, hey, Google's got$180 billion, actually a big chunk of that is spent on turbine deposits for twenty eight and twenty nine. A chunk of that is spent on data center construction for twenty-seven. A chunk of that is spent on, you know

02:30

power purchasing agreements and down payments and all these other things that they're doing uh for further out into the future so that they can set up this super fast scaling, right? And and and this applies to all the hyperscalers. and other people in the supply chain. And so, you know, twenty gigawatts roughly deployed this year, um, a big chunk of that being hyperscalers, a chunk of not being and all of these companies, their biggest customers are Anthropic and Open AI. Um

02:55

Anthropic and OpenAI are in the, you know, two gigawatt and you know, two and a half gigawatt and one and a half gigawatts roughly right now. They're trying to scale to much larger, right? If you look at what Anthropic has done over the last few months, you know, four billion, six billion revenue added. And if we just draw a straight line, hey, yeah, they'll add another six billion dollars of revenue a month.

03:15

Uh people would argue that's bearish and that they should go faster. What that implies is that they're gonna add$60 billion of revenue across the next 10 months, right? And sixty billion dollars of revenue at the current gross margins that Anthropic had, at least last uh reported by media, um, would imply that they have, you know, roughly forty billion dollars of compute spend for that inference for that sixty bill of revenue.

03:39

That 40 billion of compute at roughly$10 billion a gigawatt um rental cost means that they need to add four gigawatts of inference capacity just to grow revenue. And that's saying that their research and development training fleet stays flat, right? So You know, in a sense, Anthropic needs to get to well above five gigawatts by the end of this year. And it's gonna be really tough for them to get there, but it's possible.

04:01

Can I ask a question about that? If anthropic was not on track to have five gigawatts by the end of the end of this year, but it needs that to serve both the n revenue that's gone crazier than expected, and maybe it's gonna be even more than that, plus the research and training to make sure its models are good enough for next year. How, how, where is that gonna come from?

04:20

You know, Dario, when he was on your podcast, was very, very like conservative. He's like, you know, I'm not gonna go crazy on compute because if my revenue inflects at a different rate, at a different point, I don't wanna go bankrupt. You know, I wanna make sure that we're being responsible with this. Scaling, but in reality, you know, he's definitely missed the pooch in terms of like going like open AI, which was let's just sign these crazy fucking deals, right? Um

04:43

And OpenAI has k kind of got way more access to compute than Anthropic by the end of the year. And so what does Anthropic have to do to get the compute? Well, they have to go to lower quality providers that they would not have gone to before, right? You know, optimally you know, Anthropic, at least historically, has had the best quality providers been like Google and Amazon. Um whereas, you know, at least historically minded, you know, the biggest companies in the world

05:04

um now Microsoft and now they're expanding across the supply chain and going to other players that are newer. Um OpenAI has been, you know, a bit more aggressive on going to many players. Yes, they have tons of capacity from Microsoft. Uh they have Google and Amazon as well, but they also have like tons with Core Weave and Oracle and they've gone to like

05:20

random companies or, you know, one would think random companies like SoftBank Energy, who has never built a data center in their life, but you know, they're building data centers now for OpenAI. So they've gone to and and and many others like NScale and others that they're going and getting capacity from. And so there's this like

05:36

conundrum for anthropic because they were so conservative on compute, um because they didn't want to go crazy, right? And in in some sense, a lot of the financial freakouts in the second half of last year were like, OpenAI signed all these deals, but they don't have the money to pay for them. Um, okay, Oracle stock's gonna tank. Oh okay, Core Reef stock's gonna tank. Oh, okay, like, you know, all these companies' stocks tanked um and credit markets went crazy because people were like,

06:00

the end buyer can't pay for this. Now it's like, oh wait, they raised a ton of money. Okay, fine, they can pay for it. But in the sense anthropic was a lot more conservative. They were like, we'll sign contracts, but we'll be principled um and we'll purposely undershoot what we think we can possibly do and be conservative because we don't want to potentially go bankrupt.

06:17

But the thing I want to understand is so in a what what it what does it mean to have to acquire compute in a pinch? Um, is it that you have to go with like neo clouds that Is it that they have worse computers? Like in what way is it worse? And is it that you had to pay gross margins to a copwriter that you wouldn't have otherwise had to pay to because you're coming in at the last minute? Who built the spare capacity such that

06:37

It's available for anthropic and open AI to get last minute. And like basically what is the concrete advantage that OpenAI has gotten? If they end up at similar compute numbers by twenty twenty seven, um, is it just like they're gonna end this year with different gigawatts? If so, how many gigawatts is anthropic and opening AI gonna have by the end of this year? Yeah.

06:56

To to acquire excess compute, I mean yes, there is capacity at hyperscalers that um and not all contracts for compute are long-term, right? Five years, right? There's compute that in 2023 or 2024, H100s, 2025, that were signed at not five-year deal. Right. OpenAI, the vast majority of their computers signed at five year deals.

07:13

But they could, you know, there were there were many other customers that had one year, two year, three year deals, six month deals on demand. And as these contracts roll off, who is the participant in the market most willing to pay price? Um And in this sense, right, we've seen H one hundred prices inflect a lot and go up.

07:33

And people willing to sign long term deals for, you know, as above$2, even, right? Like I've seen deals where certain AI labs, I'm gonna be a little bit uh vague here for a reason, uh have signed at as high as two dollars and forty cents for two to three years for H100s, which if you think about the margin, a dollar forty for hopper when you release it.

07:54

uh or hopper to build it um across five years. And now two years in you're signing deals that are two to three years that are at two dollars and forty cents. Those margins are way higher. Right. And so now you can crowd out

08:05

all of these other suppliers, whether it's Amazon had these or Core Weav had these or Together AI or Nebus or whoever it is, right? You know, the the or these these neo clouds are the firms that had a higher percentage of hopper in general because they were more aggressive on it, A, and B, they tended to sign shorter term deals, you know, not Corey, but the others tended to sign shorter term deals. And so hey, if I want hopper, there is some capacity out there. And then also

08:34

While most of the capacity at like an Oracle or a core weave is signed for a long term deal in terms of Blackwell, uh anything that's going online this quarter is already sold. Um, and and in some cases.

08:45

They're not even hitting all the numbers that they promised they would sell because there are some data center delays, not just those two, but like Nebias and all the other folks, Mike, Microsoft, Amazon, Google. But there is a lot of Neo Clouds as well as some of the hyperscalers who have capacity they're building that they did not sell yet.

08:58

or capacity that they were gonna allocate to some internal use uh that is not necessarily super A A GI focused that they may now turn around and sell or they may, you know, in the case of anthropic They don't have to have all the compute directly, right? Amazon can have the compute, they can serve Bedrock, or Google can have the compute and serve Vertex, or Microsoft can have the compute and serve Foundry and then do a revenue share with Enthropic or vice versa. Okay.

09:20

Basically, you're saying Enthropic is having to pay either this like fifty percent markup in the sense of the revenue share or in the sense of last minute spot compute that they wouldn't have otherwise had to pay had they bought the compute early.

09:32

Right. And and you know, there's a trade off there. Uh, but also at the same time, um, you know, for a solid like Four months, everyone was like, OpenAI, we're not gonna sign deals with you. Like that sounds crazy, right? Because you guys don't have the money. Now everyone's like, yeah, OpenAI, we believed you the whole time. We can we can sign any deal because you've raised all this money. Um, but in a sense, oh anthropic is constrained in that sense.

09:56

There are not that many incremental buyers of compute yet because Anthropic hit the capabilities here first where their revenue is mooning.

10:03

That's interesting. Like the that's uh this you know,'cause otherwise we're like, Well, y having the best model is a extremely depreciating asset that, you know, three months later or you don't have the best model. But like The reason it's important is that you can sign these deals and then lock in the compute in advance, get better prices. Um doesn't this also imply, by the way, and maybe this is an obvious point, but there's at least until recently people were had made this huge point about

10:26

Oh, what is the depreciation cycle of a GPU? And the bears, the Michael Burris or whatever, have said, look, people are saying that four or five years for these GPUs And in fact, if you uh maybe it's because the technology is improving so fast or whatever, in fact it makes sense to have two year depreciation cycles for these GPUs.

10:46

Which increases the sort of like reported amortized capex in a given year, uh, and so makes it maybe financially less l lucrative to building all these clouds. But in fact you're pointing at like

10:57

m maybe the depreciation cycle is even longer than five years'cause if we're using hoppers and then especially if AI really takes off and in twenty thirty we're like, fuck, we gotta like get the seven nanometer fabs up and we gotta like you we gotta go back to the A one hundreds. Like we turn on the A one hundreds again. Uh then it's like actually the depreciation cycle cycle is incredibly long. And um uh so I I feel like that's an interesting financial implication of what you're saying.

11:21

There's a few um strings to pull on there. One is um what happens to depreciation of GPUs, right? Um and and I guess I didn't answer your prior question, which is like enthropic Think we'll be able to get to like five gigawatts ish, maybe a little bit more by the end of the year through themselves, as well as their product being served.

11:41

Through bedrock or through Vertex or through Foundry, uh, I think they'll be able to get to five or six gigawatts, uh, which is way above their like initial plans, right? Right. Um You know, and and anyways, that's that's sort of like an and an open AI will be r a little roughly the same, maybe a little higher. Um, actually a little bit higher based on our numbers, but

11:59

Anyways, the depreciation cycle of a GPU, right? Michael Burry was saying it's, you know, three years or or less, right? Is like sort of his argument. And there's sort of two ways and lenses to look at this. Like mechanically, um In in in this, you know, there's a TCO model, right, uh total cost of ownership of a GPU, where we sort of project

12:16

Pricing out for GPUs and uh build up the total cost of a cluster. But there's a number of costs, right? There's your data center cost, right? Um, there's your networking cost, there's your hand smart hands and people in the data center sh swapping stuff out, there's your spare parts.

12:30

Right. There's your actual chip cost. There's your server cost. All these, all these various costs get slumped together. And there's some depreciation cycles on it. You know, there's certain credit costs on it. Um, and you get to, okay, that's how you build up. Hey, an H one hundred costs a dollar forty an hour to deploy at volume across five years if your depreciation is five years. And then if you sign a do a deal at two dollars an hour

12:50

For those five years, your gross margin is roughly thirty-five percent. It's a little bit above that. But you know, if you sign it for a dollar ninety, it's thirty-five percent roughly. Um

12:58

And then at you assume at that fifth year the GPU falls off a bus, right? It's dead. Um and in some cases, you know, sort of the argument people are making is, well, If you didn't sign a long-term deal, because every two years Nvidia's tripling, quadrupling the performance while only 2xing the price or 50% increasing the price, Then the price of nature 100, sure, maybe the value in the market was$2 at 35% gross margins in 2024, but in 2026.

13:25

When Blackwell is in super high volume and deploying millions a year, you're actually now worth a dollar an hour. And when Ruben in twenty seven is in super high volume, right, even though it starts shipping this year, is in super high volume next year, um, doing millions of chips a year uh deployed into clouds. Uh you've got another three X in performance, another fifty percent or two X in price. Actually, the the hopper's only worth seventy cents an hour. Um

13:47

And so the price of a GPU would continue to fall. That's like one lens. The other lens is what is the utility you get out of the chip, right? Because if you could build infinite Rubin or infinite um of the newest chip, then yes, that's exactly what would happen. The price of a hopper would

14:02

Fall at a spot or a short term contract rate as the new chips come out and the per price per performance goes up. But because You are so limited on semiconductors and deployment timelines and all these things, you end up with actually what prices these chips is not.

14:20

Hey, what's the comparative thing I can buy today? It's actually what is the value I can derive out of this chip today? Right. And in that sense, Um let's take GPT 5.4. GPT 5.4 is both way cheaper to run than GPT-4, uh, has fewer active parameters, um, it's it's much smaller, right, in that sense of active parameters.

14:40

Plus, because it's a you know a sparser MOE versus GPT four being a coarser MOE. Um there's also been so many other advancements in training, RL, uh model architecture, et cetera, et cetera, data qualities. Um, all these things that have made GPD 5.4 way better than GPD 4 and it's cheaper to serve. And so when you look at an H100, it can.

14:59

serve more tokens per GPU of 5.4 than if you had ran for GPT four on it. Right. So so at some sense, it's producing more tokens of a model that is of higher quality. Interesting. And so In some sense, you know, obviously GPD four, what is the maximum TAM for its tokens? You know, maybe maybe it was a few billion dollars, maybe it was tens of billions of dollars. Adoption takes time.

15:19

For GPT 5.4, that number is probably north of 100 billion, but there's an adoption lag and there's competition, so other people are getting it, and there's the constant improvements that everyone else is having. So if If con if improvement stopped, you know, for here, the value of an H100 is now predicated on the value that GPD 5.4 can get out of it instead of.

15:36

the value that GPT four can get out of it and the margins and all that stuff that these labs are doing and they're in a competitive environment. So their margins can't go to infinity. Um so you sort of have this like dynamic that is quite interesting in that An H one hundred is worth more today than it was three years ago.

15:51

Years ago. That's crazy. Um and i i mean it's also interesting from the perspective of like, just take that forward. If you ha if we had actual AGI models developed, if we had like genuinely human on a server and a human like Uh on a flop basis, an H100, uh these are such hand wavy numbers about how many flops can the brain do. But on a flop basis, an H100 is estimated to uh one E15 is like how much some people estimate the human brain does in flops.

16:16

Um obviously in terms of memory, the human brain has way more. H100 is like eighty gigabytes and brain might have petabytes.

16:23

Oh yeah, you've got petabytes? Name name a petabyte of ones and zeros, bro. Name me a string.

16:31

Well this is actually the point or like actually in

16:33

No, we've just got the best sparse attention techniques.

16:35

Oh genuinely, right? Like in like in the the in the sort of like amount of information that is compressed, it might be petabytes, but like the actual like this you know, it's like extremely sparse MOE. Um but anyways.

16:46

Imagine if we had a a human knowledge worker can produce six figures a year of value. And so if an age one hundred can produce Something close to that, if we had actual humans on the server, uh, the value of an H100 is like it can repay itself in the course of like a couple of months. So as I've been going through everything to prepare for taxes, I realized that I worked with over fifty different contractors last year, from cinematographers to audio technicians to editors.

17:10

and I owed all of them ten ninety ninths. In the past I've just used a spreadsheet and a big folder of invoices to figure out who I need to collect tax forms from. But with so many contractors, this takes a bunch of time and I've almost missed some people.

17:22

This year though, Mercury made my process way more straightforward. Whenever I paid somebody in 2025, I just hit a toggle to have Mercury request a W9 from them. Because of that, everything that I needed to issue 1099s got sent directly to Mercury. I literally just clicked a button and Mercury generated and sent them all out.

17:37

This is just one of the many things that I never would have assumed that a banking platform could just handle for me. Mercury has a bunch of features like this, which are gonna collectively save me multiple days this tax season. You can learn more at Mercury.com. Mercury is a fintech company, not an FDIC Insured Bank. Banking services provided through Choice Financial Group and Column NA members FDIC.

17:59

So when I interviewed Dario, um the point I was trying to make is not that I think the singularity is two years away and therefore Dario desperately needs to buy more compute. Although it's the revenue is certainly there that he needs to buy more compute. But the point I was trying to make is that given what Dario seems to be saying, given his statements that we're two years away from a data center of geniuses,

18:19

um certainly not more than five years away. And data server geniuses should be earning trillions upon trillions of dollars of revenue. It just does not make sense why he keeps making these statements about being more conservative on compute or to your point, buying being less aggressive than open AI on compute.

18:33

Um, and I g I guess that point got lost because then people were like roasting me about like, Oh, this podcaster was like trying to convince this like multi hundred billion dollar company CEO, like, why don't you YOLO it, bro? But no, I was trying to say that his internally his c uh statements are inconsistent. Um uh anyway, so it's it's i it it's good to iron it out.

18:52

Yeah, I think, you know, going back to like sort of the earlier view that if the models are so powerful, uh the value of a GPU goes up over time. As we approach closer and closer to, you know, let's say a point where right now only OpenI and Anthropic have that viewpoint. But as we approach further and further out, actually everyone is going to, even with open source models, be able to like sort of like,

19:14

Start to see that value skyrocket per GPU. And so in that sense, you should commit now to compute. Interestingly, in like in anthropic fashion, right? Um, you know, there there there's a bit of a meme that they are uh they don't they have problems with commitment issues and they're like sort of polyramorous. So not not like this is a bit of a meme.

19:39

It explains everything. Um by the way, so there's this interesting economics effect called Alkin Allen, which is the idea that if you increase The fixed cost of different goods, one of which is higher quality and which is lower quality.

19:55

that will make people choose the higher quality good on the margin. So to give a specific example, suppose the you know, better tasting apple costs two dollars and then like the shittier apple apple costs one dollar. Okay, now suppose you put an import tariff on them. And so now Uh now it's three dollars is it's two dollars for like good great apple, medium apple, right?

20:15

Is that because they both increase by a dollar or should it be like fifty percent increase?

20:18

No, no,'cause they both increase by a dollar. The the whole the whole effect is that if there's a fixed cost of supply to both, the relative price uh the price difference between them, um The ratio changes. So previously it was like this the more expensive one was 2x more expensive, now it's just 1.5x more expensive. So I wonder if applied to AI, that would mean that look, if um if GPUs are gonna get more expensive, there will be a fixed cost increase.

20:41

In the price of compute. Yes. As a result, that will push people to be willing to pay higher margins to for slightly better models. Because the calculus is I'm gonna be paying all this money for the compute anyways. I might as well just pay slightly more to making sure it's like the very best model rather than a model that's slightly worse.

21:01

Right. So the hopper went from two to three dollars and if a hopper can make a million tokens of Opus and it can make two million tokens of Sonnet, um the price differential between Opus and Sonnet has decreased. Um because the price of the GPU has increased by a d a dollar from factory. Exactly. Um interesting.

21:19

I think that makes a ton of sense. Also, we I think we just see all of the volumes are on the best models today, all the revenues on the best models today. And in a compute limited world, um There's sort of two things that happen, right? A, companies that have locked up, you know, and and don't have commitment issues, you know, have these five-year contracts for compute.

21:41

They've kind of

21:42

locked in a humongous margin advantage because they've locked in compute for five years at a price of what it m transacted at five years ago or three years ago or two years ago, whatever it is. Whereas if you're now three years into that five year contract and someone else's two year contract or three year contract rolled off and now you're trying to buy that at, you know,

21:59

Modern pricing, when you're priced to the value of models, the price is going to be up a lot more. Um and so in a sense, like the person who committed early has better margins in general. Um, and the percentage of the market that is in long term contracts is much larger than the percentage of the market in short term contracts that can be this sort of flex capacity that you add at the last second.

22:20

And and then at the same time, right? Um, so where does the margin go, right? Uh, because models get more valuable. Um, how much can the cloud players flex their pricing? Well, if in fact, like if you look at Core Weav, they're average term duration is like over three years right now. Um uh for like 90% plus of their compute, it's over three years. Um and so they end up with this like conundrum of like, well, they can't actually flex price. But

22:47

Every year they're adding incrementally way more capacity than they had previously, right? Um this year alone, right, Meta's adding as much capacity as they had in the entire fleet of compute and data centers for all purposes for serving WhatsApp and Instagram and Facebook in 2022.

23:02

And doing AI, right? Um, they're adding that alone this year. So, in the same sense, you know, you talk about Meta's doing that, Core Weave and and Google and my Amazon, all these companies are adding insane amounts of compute year on year on year. That new compute gets transacted at the new price.

23:16

Um, so in a sense, yes, you've locked in as long as we're in a sort of a takeoff, right? Oh, OpenAI went from six hundred megawatts to two gigawatts last year and from two gigawatts to, you know, six plus this year and you know, six to twelve next year, right? The incremental added compute is where all the cost is, not the prior long-term contract.

23:33

So then who holds the card is the infra providers for charging margin, right? So now the cloud players, the neo clouds or the hyperscalers can charge the margin. Oh, they can't because um or they can at to some extent. But then as you go upstream to, oh, well, who has access to all the memory and logic capacity? Well, it's it's NVIDIA for the most part. They've signed a lot of long-term contracts.

23:52

Um, you know, they've got like ninety billion dollars of s long-term contracts today, and they're negotiating three-year deals with the memory vendors today. Um, you know, you've got you've got you know, obviously Amazon and Google through Broadcom and their, you know, Amazon directly and all these companies, sort of AMD, these companies hold all the cards because they've secured the capacity. Um and and T SMC is not raising prices, but memory vendors are just like sort of

24:14

to some extent raising a lot at price, right? So they're going to double or triple price again. But then they're also signing these long-term deals. So who is able to accrue all the margin dollars is actually, you know, potentially the cloud, potentially the chip vendors and the memory vendors. Um until TSMC or ASML like break out and they like, no, actually we're gonna charge a lot more. Um but at the same time, do the model vendors get to charge crazy margins?

24:37

Um I think at least this year we're gonna see margins for the model vendors go up a lot, right? Because they're so capacity constrained, they have to demand destroy demand, right? There is there's no way they can continue anthropic can continue at the current pace without destroying demand.

24:51

Yeah. Uh let's get into l logic and memory. Um

⁠¶ Nvidia secured TSMC allocation early; Google is getting squeezed

24:56

How specifically has NVIDIA has been able to lock up so much of both? So if you I think according to your numbers, by 27, NVIDIA is gonna have like 70 plus percent of N3 wafer capacity or something like that. Um uh were around that area. And then I I I forget what the numbers were for a memory at SK High Next and Samsung and so forth. But um if you look at

25:18

So uh think about how the neo cloud business works and how NVIDIA works with that, or how the uh RL environment business works and how anthropic works with that. In both those cases, NVIDIA is purposely trying to fracture. the complementary industry to make sure that they have as much leverage possible. So they're giving

25:35

you know, allocation to random neo clouds to make sure that there's not one person that has all the compute. Similarly, Enthropic or OpenAI, when they're working with the data providers, they say, no, we're gonna just of seed a huge industry of th these things so that um we're not locked into anyone supplier for uh for data environments. And I wonder why

25:53

On the three nanometer process, there that that's gonna be Tranium three, that's gonna be TPU V seven, uh other accelerators potentially. And why is TSMC just giving it all up to NVIDIA rather than, you know, just trying to fracture the market?

26:07

Yeah, so I think um there's a couple like points here, right? Um on three nanometer, you know, if we go back to last year, the vast majority of three nanometer was app app Right. Apple's being moved to two nanometer, memory prices are going up, so Apple's volumes may go down, right? Because as memory prices go up, they have to either they cut margin or they uh move move on. You know, there there's some time lag'cause they have long term contracts. But basically

26:32

Apple likely reduces demand slash moves to two nanometer faster, where two nanometer is only capable of uh sort of mobile chips today. Um and in the future AI chips will move there. So sort of Apple has that. And then Apple's also talking to uh third party vendors because they're getting squeezed out of T SMC a little bit.

26:47

Um because TSMC's margins on high performance computing, um HP C AI chips, et cetera, is higher than it is for mobile. Um because they have a bigger advantage in mobile, um in sorry, in HPC than they do in mobile. But anyways. When you look at what's what's TSMC running calculus here, actually they're providing really good

27:05

um allocations to companies that are doing CPUs, right? So when you think about, hey, Amazon has trained and Amazon has uh Graviton, both of those are on three nanometer, Graviton being their CPU, training being their their AI chip.

27:19

They're actually uh uh TSMC is much more excited to give allocation to Graviton than they are to train because they view CPU business as more stable long-term growth, right? And As a company that is conservative and doesn't want to ride cycles of growth too hard, you actually want to allocate to the uh The market that is more stable and lower growth rate first before you allocate all the incremental capacity to the fast growth rate market. Now,

27:46

That is that is the case generally. And so when you look at like, hey, same for AMD, right? The c allocations they get on um you know their CPUs is is like TSMC is much more excited about those than they are for GPUs. Um likewise for Amazon. And NVIDIA um is is a bit unique because all yes, they have CPUs, yes, they make switches, yes, they make networking. Um they make NV Link, they make all these different infiniband Ethernet, all these different products.

28:11

Um by and large, most of these things will be on three nanometer by the end of this year with the Ruben launch and all the chips that are in that family. Um, the GPU being the most important one. And yet NVIDIA is getting the majority of supply, right? Part of this is because you look at the market and you like sort of like, you know, TSMC and others, like they there are many ways that they forecast market demand. Um

28:34

But also it's demand market signal, right? The market signaled, hey, we need this much capacity next year. We need this much. We need this much. We'll sign non-cancelable, non-returnable. We may even pay deposits, right? Things like this. NVIDIA just did it way earlier than Google. Or Amazon.

28:51

And in some cases, Google and Amazon had stumbling blocks. You know, there was one one of the chips got delayed slightly by by a couple quarters, uh, training and and all these sorts of things happened. And then so in that case, there was

29:02

A huge sort of like, okay, well, these guys are delaying, but NVIDIA's wanting more, more, more, more, more. And we are checking with the rest of the supply chain. Is there enough capacity? Right. So they're going to all the PCB vendors and they're saying, hey, is there enough? Uh Victory Giant. Is there enough PCB? This is like one of the largest suppliers of PCBs to NVIDIA and they're a Chinese company. All the all the PCBs come from China, sort of from them. Um

29:23

Or many of them. And in any ways, they're like, Do you have enough PCB capacity? Great. Oh hey, uh memory vendors, who has all the memory capacity? Oh, okay, Nvidia does great. Um so when you look at Sort of in the same way, you know, who who is AGI pilled enough to buy compute in long timelines at levels that seem ridiculous.

29:39

to people who aren't AGI pilled, but nonetheless they're willing to pay a pretty good margin um and sign it now because they view in the future that that ratio is screwed up. The same thing happens with the supply chain for semiconductors, right? Nvidia was Well th I don't think NVIDIA's quite AGI pilled, right? You know, Jensen doesn't believe software's gonna be automated fully and all these things, right?

30:00

Accelerated computing, not AI. Right.

30:02

It's AI chips, right?

30:03

But that's what he calls it, right?

30:04

Yeah,'cause I mean I think there's a broader term, right?'Cause I is within that, but like Physics modeling and simulations and like

30:11

Or but it's just like he's not embracing the sort of like main use case in

30:14

I think he's embracing it. But like I I just don't think he's like A GI pilled like Dario, right? Or Sam. But he's still way, way more AGI pilled

30:22

Then

30:23

Google was at Q three of last year, or Amazon was at Q three of last year. And he saw way more demand, right? Um, and and and and the reason is pretty simple. You know, you can see all the data center constructions. Like, okay, I want to have this market share. Um, you know, we sort of like Have all the data centers tracked, and you know, you can see, you know, there's there's a lot of data centers that you could say, well, they could be one or the other, right? And so in some to some extent,

30:45

Google and Amazon, you know, Google especially, even though their, you know, their TPU is just better for them to deploy, they have to deploy a crap load of GPUs because they don't have enough TPUs to fill up their data centers. They can't get them fab.

30:55

Wait, can I so uh I have a question about that. Google sold, I think a million, was it the V sevens of the Ironwoods to Anthropic? And you're saying in general there's this b the b big bottleneck right now, this year or next year, I I mean, I guess going forward forever now, is gonna be the the uh you know, logic memory, the stuff that like it it takes to build these ships.

31:15

And Google has DeepMind. This is the other third prominent AI lab. And if this is the big bottleneck, why would they sell it rather than just giving it to Deep Mind?

31:23

Right. So so this is again like a a problem with like, you know, deep mind people were like, this is insane. Why did we do this? Yeah. Right. But then Google Cloud people and Google executives saw a different like thought process, right? And basically

31:35

Um you you you know, you and I know the compute team. There's one guy from you know, both of them actually came from Google uh at at uh you know the main people on on uh the compute team at Thropic. They saw this dislocation, they negotiated a deal, and they were able to get access to these to this compute. Before Google realized. And so actually, the chain of events, at least from our data that we found, was in early Q3, we saw over the course of two over the course of like six weeks.

32:01

We we saw capacity on um anthropic or sorry, on TPUs go up by a significant amount. Over the course of those six weeks and it went up like multiple times in those six weeks, right? There were multiple requests. Google even had to go to TSMC and explain to them why they needed this uh increase in capacity because it was so sudden.

32:21

But that a lot of that capacity increase was for selling to Anthropic. Yeah. Because Anthropic saw it before Google. And then Google had Nano Banano and Gemini 3, which caused their user metrics to skyrocket. And leadership at Google was like, Oh and then they started making the statement of we have to double compute every is it six months or I don't remember the exact number that they said. Um, but they they really woke up a lot more and then they're like, Oh

32:43

Hey, T SMC, we want more. We want more. And it's like, well, sorry guys, like we're sold out for next year. Um, we can work on next year, we can maybe get like five, ten percent more for twenty six, but really we're gonna work on twenty seven, right? This is sort of like you know, there's this like information asymmetry of the labs in my mind, right? I don't know if this is exactly the narrative I've spun myself from seeing all the data in the supply chain on like wafer orders and like

33:03

what's going on with the data centers that, you know, Anthropic signed and FluidStack signed and all this. Like sort of it's it's it's it's pretty clear to me that Google screwed up. And you can see this from Google's Gemini ARRs, right? Um they had next to nothing in Q1 to Q3, uh Q three a little bit, right? Once they started inflecting, but Q4 they were at like five billion ARR.

33:22

Right. Um, exiting or something like this. So it's like or five billion revenue for Q4 uh on an ARR basis. Um and so it's clearly like Google didn't see revenue skyrocket. Um and in a sense, right, anthropic was Not willing, you know, has kind of had like a little bit of commitment issues before their ARR exploded, even though they have far more information asymmetry and see what's coming down the pipe. Google is going to be more conservative than Anthropic is.

33:48

A and B, Google had had even less ARR. Um, so they they sort of were like, I think just not willing to like sort of do it and then they realized they should do it. And so now since then, Google um Has gotten absurdly AGI pilled, right? Uh in terms of like what they're doing. They bought an energy company, they're buying, putting deposits down for turbines.

34:09

Uh, they're buying a ridiculous percentage of the powered land. Uh they're going to utilities and negotiating long term agreements. They're doing this on the data center and um power side. Um Very, very aggressively, right? So, you know, I think uh Google woke up towards the end of last year, but it took them some time.

34:25

And how how many gigawatts do you think Google will have by the end of next year? You charge for that kind of information.

34:33

⁠¶ ASML will be the #1 constraint for AI compute scaling by 2030

34:34

I feel like every year the bottleneck for why is preventing us from scaling AI compute keeps changing. Uh a couple years ago was CoAs. Last year it was Power this year. You'll tell me what the bottleneck is this year. But I want to understand five years out, what will be the thing that is constraining us from Deploying the singularity.

34:51

Yeah, I think the biggest bottleneck is compute. And for that, the longest lead time supply chains are not power or data centers. They're actually The semiconductor supply chain themselves, right? It switches back from being power and data center uh as a major bottleneck to chip. And in the chip supply chain, there's a number of different bottlenecks, right? There's memory, there's uh logic wafers from TSMC, there's uh there's fabs themselves. Construction of the fabs takes a

35:19

Couple years, three, two to three years versus a data center takes uh less than a year, right? Uh we've seen Amazon build data centers in as fast as eight months. Right. So there's a big difference in lead times because of the complexity of the building, the fab that actually makes the chips. And then the tools, right? Those also have really long lead times.

35:35

And so the bottlenecks as we've scaled have shifted from, hey, what is the supply chain currently not what is it currently not able to do? Um, which was co-op. power and data centers, but those were all shorter lead time items, right? Coas is a much more simple process of packaging chips together. Um, power and data centers are ultimately way more simple than the actual manufacturing of the chips. And so there's been some sliding of of of capacity across

36:04

you know, mobile or PC to data center chips, but that's been somewhat fungible, whereas on in and whereas CoAs and Power and Data Centers has have sort of had to start anew as supply chains, but now there's sort of no more capacity for

36:18

the mobile and PC industries, which used to be the majority of the semiconductor industry, to shift over to AI, right? NVIDIA is now the largest customer at TSMC and NVIDIA is the largest customer at SK Heinex, the largest memory manufacturer, right? So it's sort of Impossible for this scaling or the sliding of resources away from the common person, right? PCs and smartphones to shift.

36:42

anymore towards the AI chips. And so now how do we scale the AI chip production? And that's the biggest bottleneck As we go to twenty thirty is those.

36:51

It would be very interesting if there's an absolute gigawatt ceiling that you can project out to 2030 based just on, hey, we can't produce more than this many EUV machines.

37:03

Right. So to scale compute further, right, there's some different bottlenecks. this year, next year, uh but ultimately by twenty eight, twenty nine, the bottleneck falls to the lowest rung on the supply chain, which is ASML, right? ASML makes the world's most complicated machine, i.e. an EUV tool.

37:20

Um, and the selling price for those is three hundred, four hundred million dollars. And currently they can make about seventy. Next year they'll get to eighty. Uh, even under very aggressive supply chain expansion, they only get to a little bit over a hundred by the end of the decade. And so what does that mean? Okay, they can make a hundred of these tools by the end of the decade and um you know, seventy right now.

37:40

Wha how does that actually translate to AI compute, right? We we see all these numbers from Sam Altman and and many others across the supply chain, gigawatts, gigawatts, gigawatts, right? How many gigawatts are we adding? Um and we see, you know, Elon saying, hey, the hundred gigawatts in space.

37:55

A year.

37:56

A year, right. The the problem with any of uh these numbers or the challenge to these numbers is, you know, actually not the power, not the data center, we can dive into that. It's it's it's manufacturing the chips, right? So a gigawatt of you know NVIDIA's Rubin chips, right? So n Rubin is announced at GTC uh I believe the week this podcast goes live.

38:16

And to make a gigawatt worth of data center capacity of NVIDIA's latest chip that they're releasing at the end of this year or towards the end of this year, you need, you know, a few different wafer technologies, right? Um you need About 55,000 wafers of three nanometer, you need about six thousand wafers of five nanometer, and then you need about a hundred and seventy thousand wafers of DRAM, right? Memory. And so across these three different buckets, um

38:43

Each of these requires different amounts of EUV, right? So when you manufacture a wafer, uh there's thousands and thousands of process steps where you're depositing material, removing them. But the sort of key critical step, which at least in advanced logic is like 30% of the cost of the chip.

39:00

something that doesn't actually put anything on the wafer, right? You take the wafer, you deposit photoresist, which is like a chemical that basically chemically changes when you expose it to light, and then you stick it into the EV tool, which shines light at it in a s certain way. It patterns it, right? Because there is what's called a mass

39:13

s which is a stencil effectively for the design. And so when you look at a wafer, um, you know, leading edge three nanometer wafer has 70 or so masks, right? Seventy or so layers of lithography, but twenty of them are the most advanced EUV, right? And that specifically You know, if you think about okay, well if I need fifty five thousand wafers

39:32

For a gigawatt.

39:33

If I do 20 EUV waiver uh passes per wafer, you then you can do the math that's like, okay, that's 1.1 million passes of EUV for a single gigawatt. So actually like it's pretty simple. And then once you add the rest of the stuff, it ends up being two million right across.

39:48

five nanometer and all the memory. You're at roughly um two million UV passes for a single gigawatt. You know, these these tools are very complicated. So um when you think about what it's doing across a wafer, it's taking the wafer and it's scanning

40:01

And it's stepping across, right? It's Danning, stepping across, and it does this hundreds of times across the entire or dozens of times across the whole wafer. And and so when you're talking about, hey, how many UV passes, that's the entire wafer is being exposed.

40:13

um at a certain rate. A wafer a a UV tool can do roughly 75 wafers per hour. Um and the tool is up roughly 90% of the time. Right. So in the end You end up with actually, I need about three and a half EUV tools to do the 2 million EUV wafer passes for the gigawatt.

40:32

So three and a half EUV tools uh satisfies a gigawatt. So it's funny to think about the numbers, right? Because we're talking about, oh, what's a gigawatt cost? It costs like fifty billion dollars, roughly, right? Whereas what does three and a half EUV tools cost? That's like one point two, right? Right. Um it's actually like quite a lower number. Which is which is interesting to think about, like oh fifty gigawatts of economic

40:51

you know, sort of CapEx in in the data center and what gets built on top of that in terms of tokens is even larger, right? It might be a hundred billion dollars worth of AI value into the supply chain is held up by this $1.2 billion worth of tooling that simply just cannot expand its supply chain quickly.

41:06

And I think so you y you you read this article recently where you're saying over the last three years, uh T SMC has done a hundred billion dollars of CapEx. So it's like thirty, thirty, uh, forty. And if if you think about I mean a small fraction of that is sort of like being used by NVIDIA for the three nanometer that it's gonna or, you know, previously four nanometer that that it's using for its chips. Um but NVIDIA has turned that into what was what what are its like

41:31

Your earnings last quarter was like forty billion. And so forty billion times four. So a hundred and sixty billion dollars. So NVIDIA alone is turning. Some small fraction of a hundred billion in CapEx that's gonna be depreciated over many years, not just this one year, into a hundred and sixty billion dollars in a single year.

41:48

And then that gets even more intense when you go down the supply chain to ASML, which is taking a billion dollars worth of machines to produce a gigawatt. And then of course those machines last for more than a year, right? So it's it's doing more than that. Okay, so now I wanna understand, okay, well, how many such machines will there be by twenty thirty if you include not just the ones that are sold that year, but are have been compiling over the previous years.

42:06

Um, and what does that imply about the Sam Altman says he wants to do a gigawatt a week in twenty thirty. Are are th uh when you add up those numbers, is that compatible with that?

42:16

Right. That's that's completely compatible, right?'Cause if you think about T S M C and the entire ecosystem. has something 250 to 300 EUV tools already. Um and then you stack on 70 this year, 80 next year, growing to 100 by 2030, you're at like 700 EUV tools by the end of the decade. Um 700 EUV tools, three and a half tools per gigawatt.

42:36

Um assuming it's all allocated to AI, which it's not, but three and a half tools per gigawatt gets you to 200 gigawatts worth of AI chips for the data centers to deploy, right? So two hundred gigawatts, Sam wants fifty gigawatts, right? Fifty two gigawatts a year. He's only taking 25% share then, right? Obviously, there's some share given to um, you know, mobile and PC, uh, assuming that

42:57

you know, s for some reason we're allowed to even have consumer goods still. Um, you know, and we don't get priced out of them. But, you know, roughly like he he's saying twenty five percent, fifty per s l you know, twenty five percent market share of the total chips fab. That's that's kind of like very reasonable given You know, this year alone, I think he's gonna have access to twenty five percent of the black wall GPUs that are deployed, right? So it's it's not that crazy.

43:20

I find it surprising that you know, w w when was the first uh when did ASML start shipping UV tools? When the seven nanometer started. So I don't know when that was exactly. But you're saying in twenty thirty they're gonna be using machines that initially were shipped in twenty twenty. So ten years you're using the same m most important machine in this m most technologically advanced industry in the world. I I find that surprising.

43:41

So ASML's been shipping UV tools now for roughly a decade, but it only entered pr mass volume production around 2020. You know, the tool's not the same. Um, you know, back then the tools were even lower throughput. Um there were there's various specifications around them called overlay, right? You know, I was mentioning you're stacking layers on top of each other, right? You'll do some EUV, you'll do a bunch of different process steps, depositing stuff, etching stuff.

44:05

cleaning the wafer, you know, dozens of those steps before you do another EUV layer. Um there's a spec called overlay, right? Which is, okay, you did all this work, you know, you drew these lines on the wafer. Um now I want to draw these dots, right? Let's just say I want to draw these dots to connect this

44:19

l these lines of metal to and then dot you know, holes and then the next layer up is another set of lines that goes perpendicular. So now you're connecting wires going perpendicular to each other. Um

44:28

There you have to you have to be able to land them on top of each other. So it's called overlay. And overlay is a spec that's been improved rapidly by ASML. Wafer throughput has been improved rapidly by ASML. And also the price of the tool has gone up, but not as much as the capabilities of the tool.

44:42

Right. Initially the UV tools were like one fifty million and over time they're now like four hundred million. Uh, you know, as I as I look out to twenty twenty eight, but the capabilities of the tools have more than doubled as well, right? Especially um on throughput and overlay accuracy, which is the ability to stack you know, accurately align the the sub sub subsequent passes on top of each other. Um even though you do tons of steps between. And so this is this is um

45:07

You know, ASML is improving super rapidly. I think it's also something noteworthy to say. ASML is n you know, maybe one of the most generous companies in the world, right? They have this linchpin thing. No one has anything competitive. Maybe China will have some EU V by the end of the decade, but no one else, you know, has anything even close to EU V. Um And yet they haven't taken price and margins up like crazy, right? You know, you go ask. You know, that we talk to all the time.

45:36

Like, you know, for example, Leopold and they're like, Well, you know, let's let's you know, the let's let's have the price go up, right? Uh because they can. The margin is there. You can you can take the margin. Like NVIDIA takes the margin, memory players are taking the margin, but ASML has never risen the price more than they've increased the capability of the tool.

45:50

45:51

And so in a sense, they've always provided net benefit to their customer. It's not that the tool is stagnant, it's just that like You know, these tools are old. Yes, you can upgrade them some and the new tools are coming. And for simplicity's sake, we're kind of ignoring, you know, the advances for this podcast, the advances in overlay or throughput per tool.

46:07

So you say we're producing sixty of these machines uh this year and then seventy, eighty over subsequent years. Aaron Powell, Jr.: Wha what would happen if ASML just decided to double its capex or triple its capex? What is preventing them from producing more than a hundred in twenty thirty? Why why are why so confident that even five years out, you can be uh relatively sure what their production will be?

46:29

So I think I think a couple factors here, right? ASML has not decided to just go YOLO, let's expand capacity as fast as possible, right? Um in general the semiconductor supply chain has not, right? It's lived through the booms and busts, and uh we can talk a bit more about it, but basically no one

46:45

You know, m some players as of very recently have like woken up, but in general, no one really sees demand for two hundred gigawatts a year of AI chips or, you know, trillions of dollars of spend a year in the semiconductor supply chain. They're just like, They're not they're not AI pilled, right? They're not AGI.

47:02

gonna get to tr a trillion dollars this year.

47:04

Yeah. I I I I I I I feel you, but I'm saying like no one really understands this in the supply chain. Um constantly we're told our numbers are way too high. And then when they're right, they're like, Oh yeah, yeah, but your your next year's numbers are still too high. And it's like but anyways, like ASML

47:20

has sort of their tool has four major components, right? It has um the source, right, which is made by Simer in San Diego. Um it has the uh reticle stage, which is made in Wilmington, uh Connecticut, right? It has the wafer stage. um and the um the optics, right? The lenses and such. And those two are made in Europe, right? And so when you when you look at each four each of these four, they're tremendously complex supply chains.

47:45

that A, they have not tried to expand massively and B, when they try to expand them, the Time lag is quite long, right? Um and so again, this is the most complicated machine that humans make, period, right? At at a volume, um at any sort of volume, but like let's talk about the source specifically, right? What is the sport?

48:07

It hits it three subsequent times with a laser perfectly. So the first one uh hits this tin droplet, expands out, it hits it again, so it expands out to this perfect shape, and then it blasts it at super high power. And um the tin droplets get excited enough that they release.

48:21

uh EU V light, thirteen point five nanometer, and then it's in this thing that is like basically collecting all the light and directing it into the lens stack, right? Then you have the lens stack, which is Carl Zeiss, right, as you mentioned, and and and some other folks, but Zeiss being the most important part of it. Um

48:36

They also have not tried to expand production capacity because they don't see any you know, they they they're like, Oh yeah, yeah, like we're growing a lot because of AI. We're going from sixty to a hundred.

48:45

Right. It's like, no, no, no, no, we need to go to like a couple hundred, but it's it's fine, whatever. Um, each of these tools has, you know, I think eighteen um of these lenses effectively. Um, mirrors. Um, they are they're multi layer mirrors which are perfect layers of molybdenum And r uh ruthenium, if I recall correctly.

49:02

um stacked on top of each other in many layers and then the light bounces off of it perfectly. But it's not just like, you know, like when we think about a lens, you know, it's it's like in a shape and it focuses the light. This is a m this is like a mirror that's also a lens and so it's pretty complicated.

49:16

Any defect in this perfect layer of sta in this in these like uh super thinly deposited stacks will mess it up. Uh any curvature issues, like there is a lot of challenges with scaling the production. Um it's quite artisanal, right, in this sense, right? Because you're not making

49:31

tens of thousands of these a year, you're making hundreds, you're making thousands, right? Uh, you know, talk about 60 tools a year, um, 18 of these per tool, you end up with, you know, you're still in the um, you know, hundreds of tools. uh or thousand you're at the thousand number roughly for these these lenses um and projection optics.

49:50

So then you and then you step forward to the reticle stage, uh, which is also something really uh crazy. This thing moves at I wanna say nine Gs, like it it will shift. 9Gs because as you step across a wafer, the tool will go. Um, and the wafer stage is complementary, it's the wafer part. So you you line these two things up, you're taking all the light through the lenses that's focused.

50:12

and and here's the reticle, here's the wafer, and you're passing uh the reticle's moving one direction, the wafer's moving the direction the other direction as it scans a uh 26 by 33 millimeter section of the wafer and then it stops. It shifts over to another part of the wafer and does it again. And it does that in just seconds, right? And it and each of them are moving at nine G's in opposite directions.

50:34

So each of these things is like a wonder and marvel of like chemistry, uh fabrication, you know, um, you know sort of like mechan mech m mech mechanical engineering, um optical engineering because you have to align all these things and make sure they're perfect. Uh all these things have crazy amounts of metrology because you have to perfectly test everything because if anything is messed up, the yield goes to zero.

50:56

'Cause this is such a finely tuned system. And by the way, you it's so large that you're building it in all these you're building in the factory in Heindhoven, uh, Netherlands, and then deconstructing it and shipping it on p many planes. to the customer site and then you're reassembling it there and testing it again. And that process takes many, many months. So like it's it's just there's so many steps in the supply chain, right? Whether it's Zeiss making their l uh lenses and projection optics.

51:21

or Simer, which is an ASML owned company, making the EUV source. And each of these has th its own complex supply chain, right? ASML's commented their supply chain has ten over ten thousand people in it, right?

51:31

Like individual suppliers.

51:32

Yes. And it n it might not be directly, it might be through like, hey, you know, Zeiss has so many suppliers and you know, XYZ company has so many suppliers, but you know, they these you know If you just think about like, okay, you're talking about two physically moving objects that are like this large and this large, you know, um the size of a wafer, right? And it has to be accurate to the level of

51:53

you know, single digit nanometers or even smaller because the entire system, the overlay, right? Uh layer to layer uh variation has to be on the order of three nanometers, right? Um and so if the overlay is three nanometers, that means each individual part the accuracy of its physical movement has to be even less than that, right? It has to be sub one nanometer in most cases because the the error of these things stacks up, right? And and and so there's no way to like

52:21

you know, just like snap your fingers and increase production, right? You know, it's things simple as power, right? The US going from zero percent power growth to two percent power growth, even though China's already at thirty, was like so hard for America to do, right? Um And and and that's a really simple supply chain with very few people in the supply chain, right? Uh who make difficult things. And there's

52:42

you know, probably what, a hundred thousand electricians slash people who work in the supply chain uh of electricity um or more in the US. And, you know, when you look at oh ASML employs like so few people, Carl Zeiss Probably employs like less than a thousand people working on this. And all of those people are like super, super specialized. So it's, you know, you can't just train random people up for this.

53:03

like in the snap of a finger. You can't just get your entire supply chain to gat get galvanized, right? Nvidia's had to do a lot. Um To get the entire supply chain to even deliver the capacity they're gonna make this year, even though when you look go talk to Enthropic, they're like, Well, we're short of TPUs, we're short of training, we're short of GPUs. When you go talk to OpenAI, they're like, We're short of these things, right? Um so OpenAI and Anthropic, they know they need X.

53:25

NVIDIA is not quite as AGI pilled and they they're building uh you know, X minus one. Um and you go down the supply chain, everyone's doing minus one. And in some cases they're doing like divided by two, right? Because they just don't they're not AGI pilled, right? I think and and and so you end up with uh the time lag for this whip to react, right? You know, the the cur the sort of

53:46

AI pilledness is and and desire to increase production is so long. And then once they finally understand, hey, we need to increase production rapidly, right? And they they think they understand, oh, AI means we have to go from sixty to a hundred.

53:59

In in addition to the tools all just getting better and faster, you know, the source getting higher power from five hundred watts to a thousand and, you know, all these other aspects of the supply chain, you know, advancing technically plus increase of production. They think they're they're like actually increasing production a lot.

54:12

But if you float through the numbers of, hey, what does Elon want? He wants a hundred gigawatts a year in space by twenty twenty eight, is it? Um, or twenty twenty nine, and you know Sam Altman wants fifty gigawatt, fifty two gigawatts a year um by the end of the decade. And you look at, you know, Prayanthropic needs the same, and then, you know, Google needs that. You know, you you go across the supply chain.

54:33

Wait, no, the the supply chain can't possibly build enough capacity for everyone to get what they want on the side of compute.

54:41

Real conversations are full of fits and starts and pauses and interruptions. I mean, just listen to this episode. At least superficially, voice models have gotten pretty good at handling these kinds of things. But at a deeper level, interruptions can throw off a model's understanding and degrade the quality of his response.

54:56

And it's not always clear why. Labelbox realized that this was a huge bottleneck for their customers. So they built an evaluation pipeline called Echo Chain to help you diagnose and fix your voice model's specific failure mode. Echo Chain starts by feeding conversations into your voice model and then injects interruptions at specific intervals and classifies any failures into one of three different modes. One, did it acknowledge the correction but keep the old plan?

55:18

Did it adapt briefly but then slide back to old assumptions? Or three, did it abandon the old task entirely? This is extremely useful information because Labelbox can get your model the exact data it needs to fix whatever issue is preventing it from being a viable and competent voice model. So if you want to ensure that your voice model stays performant in real conversations, you should reach out to LabelBox. Go to labelbox.com/slash dwarcash.

55:44

So I feel like in the um in the data center supply chain for the last few years, people have been making arguments of this specific thing we are bottlenecked by. Therefore AI compute can't scale more than X.

⁠¶ Can't we just use TSMC's older fabs?

55:57

But then as you've written about, oh no, if you know, say the grid is a bottleneck, then we just do um we just do behind the meter on the site, we do gas turbines, etc. If that doesn't work, there's like all these other alternatives that people fall back on. And I I wanna ask you a question about wh whether we can imagine a similar thing happening.

56:16

in the semiconductor supply chain. So if EU V becomes a bottleneck, well we're we you know, what if we just went back to seven nanometer and do what China is doing currently and producing seven nanometer chips with uh multi patterning with DUV machines. Um and you know, if if you look at a seven nanometer ship like the A one hundred, um there's been a lot of progress obviously since uh from the A one hundred to the B one hundred or B two hundred. But um

56:42

How much of that progress is just numerics? And then like if if you just hold constant, say uh FP sixteen from A one hundred to B one hundred. The B one hundred is like a little over one petaflop, and then uh A one hundred is like three hundred teraflops.

56:58

And so you have you have like basically three X uh holding numerics constant. You have like a three X improvement from A one hundred to B one hundred. And then some of that is the process improvement, some of that is just the accelerator design improving, which, you know, we could replicate again in the future. And so th then it just seems like that actually it's like very small effect from the process improving from seven nanometer to four nanometer. So I don't know, this is say we have uh

57:23

I don't know the numbers offhand, but let's say there's like a hundred and fifty K wafers per month of three nanometer and then eventually similar amounts for two nanometer. But then there's a similar amount for seven nanometer, right? So if you have all those old wafers. And then there's maybe a fifty percent haircut because the process you know, the bits per wafer area are like

57:41

What is it, fifty percent less or something? Um, then it's like it doesn't seem like that bad to just bring on seven nanometer wafers and then oh, that gives you a another fifty or a hundred uh another hundred gigawatts. Um yeah, t tell me why that's naive.

57:54

Yeah, so I think You know, we potentially do go crazy enough that this is this happens because we just need incremental compute and the compute is worth the higher cost power, et cetera, of these chips. But it's it's also unlikely to some extent, uh to a large extent because of I think I think just comparing, you know, some of these are like not fair comparisons, right? Um for example, you know, from A one hundred, which is three twelve teraflops, to uh Blackwell, which is like a thousand.

58:24

um ish of FP sixteen, um, or maybe it's two thousand and then Ruben is like five thousand or so F P sixteen. It's it's not a fair comparison because these chips have vastly different um you know, design targets, right? At at A one hundred, that's what that that is what NVIDIA optimized for was F sixteen, B float sixteen numerics. When you look at uh Hopper

58:47

They didn't care as much about that. They cared about FP8. When when you look at Ruben, they don't care about FP16 and BF16 as much. They care mostly about fp four and six. Right. Um, and so numerics like are what they've designed the search uh designed their chip for. Um

59:03

And so there's a couple like, you know, okay, let's just say let's redesign, let's make a new chip design on seven nanometer. Sure, we can do that. Like, and then it's optimized for uh the numerics of the modern day. The performance difference is still gonna be much larger than the flops different you mentioned, right? Um often it's easy to boil things down to flop uh per watt or flops per dollar.

59:25

But that's actually not a fair comparison, right? Um and so this is where sort of you can bring in, hey, let's look at Kimi K1 or Deep Seek. When you look at Kimmy or or Ki Kimi K2.5, sorry, and Deep Seek, when you look at these two models. And you look at their performance on Hopper versus Blackwell on you know very optimized software, you get vastly different performance, right? And

59:49

Most of this is not attributed to flops. A lot of this is a tr and or numerics, right? Because those models are actually eight bit. So it's not like Blackwell's uh and Hopper, they're both optimized for eight bit and Blackwell's not really taking advantage of its four bit there. Um

01:00:02

You know, the the performance gulf is is actually much larger. And you know, the way you can sort of compare them and think about them is sure it's one thing to you know shrink process technology and make the transistor smaller and each chip has X number of flops.

01:00:16

But you forget the big gating factors, which is these models don't run on a single chip. They run on hundreds of chips at a time, right? If you look at Deep Seek's production deployment, which is well over a year old now, they were running on 160 GPUs. Right. Um and that's what they serve production traffic on. And so they split the model across 160 GPUs.

01:00:33

Every time you cross the barrier of a chip to another chip, there is an efficiency loss because you now have to uh transmit over, you know, high-speed electrical Certies. And there is a latency cost, there's a power cost, there's a, there's all these um dynamics that hurt. As you shrink and shrink and shrink the process node, you've

01:00:51

increase the amount of compute in a single chip. Now in chip, right, uh movement of data is, you know, at at hundreds or of of at least tens of terabytes a second, if not hundreds of terabytes a second.

01:01:03

Um whereas between chips, you're on the order of a terabytes of second, right? Um and and and so this this movement of data between chips that are super close to each other physically, and then you can only put so many chips close to each other physically, so you have to put chips in different racks. the order of uh data uh between that is on the order of hundreds of gigabits a second, right? 400 gig or 800 gigasecond.

01:01:24

Um so a a hundred gigabytes a second, roughly. And so you've got this like huge ladder of like, oh, on chip I can communicate communicate at super fast speeds. Within the rack I can communicate at you know order magnitude speeds. Outside the rack I can communicate at an even order of magnitude lower than that. And as you break the bounds of chips.

01:01:41

you end up with this performance loss. So anyways, the reason I explained this is because when I look at when you look at Hopper versus Blackwell, even if both of them are using, you know, a rack worth of chips, the hopper is significantly slower because the amount of performance that you have

01:01:54

leverage to the task within that, you know, within each domain of, hey, tens of terabytes of seconds of communication between these transistors or or these processing elements, and terabytes a second between these processing elements is much, much higher and therefore the performance is much higher. So when you look at inference at let's say 100 tokens a second for Deep Seek and Kimi K 2.5, Hopper versus Blackwell, the performance difference is on the order of 20x.

01:02:21

Interesting.

01:02:21

Not two or three X like the flops performance difference indicates, even though those are on the same process. Um, you know, there's just differences in networking technologies and what they've worked on. And so you can translate some of these back. Um, but when you look at like Ruben, what they're doing on three nanometer, some of these things are just not possible to do all the way back on A one hundred, even if you make a new chip for

01:02:40

Interesting.

01:02:41

uh seven nanometer. There's just like certain architectural improvements you can port. There's certain ones you cannot. Um and and so the performance difference is not just going to be the difference in flops. It's in some senses, cumulative between the difference in you know flops per chip, networking speed between chips, how many flops are on a chip versus a system, memory bandwidth on a single chip and on an entire system, all of these things compound.

01:03:03

Can I ask you a very naive question? So uh this year, last year, the B two hundred has now two dice on a single chip. So you can get that. bandwidth on a single chip uh without having to go through enmi link or infinite band. And then next year Ruben Ultra will have four dies on one chip. What is preventing us from just doing that with And all like how how many dice could you have a single chip and still get these tens of terabytes a second?

01:03:27

Yeah, so so even within Blackwell, um, there are differences in performance when you go when you're communicating on the chip versus across the chip. Uh those those bounds are obviously much smaller than when you're going, you know, out of the entire chip, but each die versus uh you know within the package. Uh and so anyways, when you scale perfor uh you know the number of chips

01:03:48

Up, there is some performance loss. It's not just perfect, but it is way better than different entire packages. Now, how large can advanced packaging scale? Um, The way NVIDIA's doing it is co ops, the way g uh, you know, Google and uh with Broadcom and MediaTech and, you know, Amazon, Trainum, all these chips are doing is called coas. But actually you can go and look back at what um

01:04:10

What Tesla did with Dojo, right? Dojo, uh, which they canceled and restarted. I don't know. Anyways, Dojo was a chip that was the size of an entire wafer. They had 25 chips on it. Um, and there were some trade-offs, right? They couldn't put HPM on it. Um, but the positive side was it that they had 25 chips uh on it. And so to date, it is still probably the best chip for running convolutional neural networks.

01:04:34

Um, it's just not great at transformers because the, you know, the sort of the shape of the chip, the memory, the arithmetic, all all these various specifications of it are just not well suited for transformers. They're well suited for CNNs. Um And anyway, so so you know, Dojo chips were optimized around that they made a bigger package, but at the same time

01:04:54

You know, as you make packages bigger and bigger and bigger, you have other constraints, right? Networking speed, uh, memory bandwidth, cooling capabilities, all of these things start to rear their heads. It's not simple, but yes. you will see a trend line of more chips on the package. And yes, you're gonna be able to do that on seven nanometer. In fact, that's what Huawei did with their um Ascend 910 C or D. Uh they put they put they were initially just one and then they did two.

01:05:19

Um, and they're focusing on scaling the packaging up because that is an area where they can advance faster than sort of process technology where they can't shrink. But at the end of the day, that's still, you know, that's something that you can do on the leading edge chips too, right? Anything you do on seven nanometer, you can also probably do on three nanometer in terms of packaging.

01:05:36

Um so if we're if you end up in this world in twenty thirty where the West has the most advanced

⁠¶ When will China outscale the West in semis?

01:05:41

process technology, but it has not ramped it up as much. Whereas China, I don't know if you think by twenty thirty, they would have UV and I don't know, two nanometer or whatever. But they are semiconductor pills. So they're producing in mass quantity. Um, basically I'm wondering what the year is where there's a crossover where our advantage in process technology has faded enough and their advantage in scale has increased enough.

01:06:06

And also their advantage in like having one country that has the entire supply chain envisionized rather than having random suppliers in Germany and Netherlands and whatever would mean that China would be ahead in its ability to pr like produce mass flops.

01:06:21

Yeah, so to date, um China still does not have you know entire indigenized semiconductor supply chain, right?

01:06:28

But were they in twenty thirty?

01:06:29

Yeah, by twenty thirty it's it's possible that they do. Uh but but to date, right, all of of China's seven nanometer and fourteen nanometer capacity uses ASML DUV tools, right? Um And the amount that they can ship and import from ASML is is large. And uh but the point being that The m vast majority of ASML's revenue, especially on EUV, all of it, uh is outside of China. So the scale advantage is still in the favor of the of let's call it the West plus Taiwan, Japan, et cetera.

01:07:01

They're they're trying to do all these things. The question is how fast can they advance um and and scale up production as well as quality? And to date, we haven't seen that. Now I'm quite bullish that they're gonna be able to do these things over the next five to ten years, right? Really scale up production, uh really uh kick it into high gear. They have more engineers working on it. They're um they have more uh desire to throw capital less across

01:07:24

twenty thirty do they have fully indigenized DUV?

01:07:27

I think for sure, for sure. D.

01:07:28

Yes. And fully inditionized E of V by twenty thirty?

01:07:30

I think they'll have working tools. I don't think that they'll be able to manufacture a bunch yet, right? You know, there's there's sort of having it work and then there's production hell, right? Um and ultimately like ASML had EUV working in the early 2010s. At some capacity. Right. Right. Now the tools were not accurate enough. They were not

01:07:51

Uh scaled for high production for high volume manufacturing, reliable enough. And then they had to ramp production and that all took time. Production hell takes time, right? Which is why it took another five to seven years to get EUV into mass production at a fab rather than just it working in the lab.

01:08:06

So how many um DOV tools do you think anybody will manufacture in twenty thirty?

01:08:13

Well, that's a great question. Um You know, current i i i it's it's it's a bit of a a challenge to look into this supply chain, especially. We try really hard. Um you know, for in some instances they're like buying stuff from Japanese vendors and if they want to fully indigenize supply chain they need to not buy these lenses or b buy these uh projection optics or

01:08:37

stages from Japanese vendors, they need to build it internally. So it's really tough to say where they'll be able to get to. Like I honestly think it's like a shot in the dark. But it's it's probably not unlikely that they'll be able to do, you know, on the order of a hundred DUV tools a year, uh, whereas ASML is doing hundreds of e DUV tools a year currently. No one's made

01:08:57

a process node, no company has a process node where they make a million wafers a month, right? Um Elon says he wants to do it and China's obviously going to do it, right? Uh And I don't think the you you know, T SMC is trying to do that. Um the memory makers may get there as well, right? To the million wafers a month, but not in a single fab. It it's it's it's sort of mind boggling to think of that scale um and challenging to Yeah.

01:09:26

So I'm not sure, you know, I don't want to doubt, you know, China's capability to scale. Right.

01:09:29

I guess this is a interesting question and I think it might uh you know, at some point semi analysis will do the deep dive on this. But I think this question of like by when would China be able like indigenized Chinese production would be could be bigger than the rest of the West combined if you just add up like

01:09:49

all the d w w and put in the input of your model when they'll have uh do via machines at scale, when they'll have UV machines at scale. Because I think there's this like question around if you have long timelines on AI, by long meaning twenty thirty five, which is not that long in the grand scheme of things.

01:10:01

Um, should you expect a world where China's just like dominating in semiconductors? Which I think I don't know, it doesn't get asked enough visit in San Francisco, where just like thinking on time scale of like, you know, d weeks. And then if you're outside of San Francisco, you're not thinking about AGI at all.

01:10:15

And so this question of like, okay, what if we have AGI? What if you have this transformational thing that is commanding tens of trillions of dollars or hundreds of trillions of dollars of economic growth and weight, you know, uh token output and so forth? Uh, but then it happens in twenty thirty five. And like, what does that imply for the West versus China? I think it's just like I don't know. The semi analysis has got to write the definitive uh model on this.

01:10:36

Yeah, so I I think It's it's really challenging when you move timescales out that far, right? Like what we tend to focus on is like we're tracking every data center, we're tracking every fab, we're tracking all the tools and we're tracking where they're going, but the the time lags for these things are are relatively short, right?

01:10:53

Um, we can only make like reasonably accurate estimates for data center capacity based on, you know, land purchasing and, you know, permits and turbine purchasing and all these things. And we know where all these things are going and we like that's what the data we sell is, but like

01:11:07

you know, as you go out to like twenty thirty five, you know, things are just so radically different and, you know, your error bars get so large it's kind of hard to make an estimate. Uh, but at the end of the day, like you know, there is if takeoff or timelines are slow enough, right? Um, then certainly China I I don't see why they wouldn't be able to catch up drastically, right? Um, you know, in in some sense we've got like this valley, right? Of where

01:11:32

you know, call it three to six months ago, Chinese models were or maybe even now, Chinese models are as competitive as they've ever been. Uh I think I think Opus four six and GPT five point four have really pulled away and made the gap a little bit bigger, but I'm sure, you know, some new Chinese models will come out. But As we move from, you know, hey, these companies are selling tokens where they provide the entire uh reasoning chain and all that to uh selling automated, you know.

01:11:55

white collar work, right? Automate a software engineer, send them the request, they give you the result back, and there's a bunch of thinking on the back end that they don't show you. The ability to distill out of American models into Chinese models will be harder. A. B

01:12:07

As the scale of the compute that the labs have, right? Uh OpenAI exited the year with roughly two gigawatts last year. Um Anthropic will get to, you know, two plus gigawatts this year. And and by the end of next year they'll both be at like ten gigawatts of capacity. Um China has is not scaling their AI lab compute nearly as fast. And so at some point, you know, when you can't distill the learnings from these labs.

01:12:29

into the Chinese models plus this compute uh race that open ianthropic Google etc meta are all racing on, at some point they end up getting to a point where, you know, the model performance should start to diverge more. Um and then all of this CapEx that's being spent on, you know, data centers and all that, right? Amazon, you know, two hundred billion, Google one eighty, you know, so on and so forth, all these companies are spending hundreds of billions of dollars of CapEx.

01:12:56

Um, you know, there's there's, you know, nearly a trillion dollars of CapEx being invested in data centers in America this year, roughly, right? Um You you end up with, okay, well, what's the return on invested capital here? Uh, you and I would think that the return on invested capital for data center CapEx is very high. Um and at least if we look at anthropics revenues in, you know, January they added like four billion, in February, which is a shorter month, they added like six.

01:13:21

Um we'll see what they can do in March and April, um, given compute constraints are what's bottlenecking their growth, right? The reliability of quad code is actually quite low because they're so compute constrained. Uh but if this continues, then the ROIC on these data centers. is super high. Yeah. Um and at some point the US economy starts growing faster and faster over the next

01:13:40

you know, n this year and next year, because of all this capex and all this revenue that these models are generating, um, and downstream supply chain versus China doesn't have that yet, right? Um they have not built the scale of infrastructure to then invest in model uh to invest in models to get to the capabilities, to then deploy these models at such scale, right? Uh'cause when you look at like anthropics, hey, they're at call it twenty billion ARR.

01:14:04

Of that, you know, the margins are sub fifty percent, at least the last reported by the information. So then, you know, you're at okay, that's like thirteen, fourteen billion dollars of compute that it's running on rental cost wise, which is actually like fifty uh billion dollars worth of capex that someone laid out for anthropic to generate their current revenue. Um and China has just not done this if

01:14:25

If and when anthropic 10x is revenue again, uh, and I think our our answer would be when, not if, um, then China doesn't have the compute to deploy at that scale. And so there is some sense of like, oh, we're in fast takeoff ish. Right. And it's not like we're talking about, you know, Dyson Sphere by X day. It's more like the revenue is compounding at such a rate that it does affect the ec economic growth. Um and the resources these labs are gathering are s going so fast that

01:14:51

You know, and and China hasn't done that yet. So in that case, the US uh and the West is actually diverging. The flip side is actually the these these infrastructure investments have middling returns. Maybe they're not as good as as as hope.

01:15:03

You know, maybe Google is wrong for wanting to take free cash flow to zero and spend three hundred billion dollars on CapEx next year. Maybe they're just wrong. Um and, you know, people on Wall Street who are bearish and people who don't understand AI are correct. Right. Um, and in which case, then the US is building all this capacity, it doesn't get really great returns, and China's able to build the fully vertical indigenized supply chain, not you know, US.

01:15:26

Japan, Korea, uh Taiwan, Southeast Asia, you know, Europe, all these all these countries together building this like less vertical supply chain. Um And in a sense, at some point China is able to scale past us if AI takes longer to get to certain capability levels than you know, I would say the vast majority of your guests on this podcast believe.

01:15:48

It's like fast timelines, US wins, long timelines, China wins. Right.

01:15:51

But I don't know, like I don't know what fast timelines means, right? Like I I like don't think you have to believe in AGI to have the timelines where the US went.

01:16:00

Okay. Let's go back to memory because I think this is maybe uh peop people on Wall Street and people in the industry are understanding how big this is, but maybe generally people don't understand how big a deal this is.

⁠¶ The enormous incoming memory crunch

01:16:10

So we've got this memory crunch, as you're talking about. And earlier I was asking about, oh, could we solve for the EUV tool shortage by going back to seven nanometers? So let me ask a similar question about memory. Um HBM is made of DRAM but has three to four X less. Uh bits per wafer area than the DRAM it's made out of.

01:16:28

Is it possible that accelerators in the future could just use commodity DRAM and not HBM? And so just we can make much more uh capacity out of the the DRM we get. And the reason I think this might be possible is look. If we're gonna have agents that are just going off and doing work and it doesn't really you don't it's not a synchronous chatbot application. then you don't necessarily need extremely high uh fast latency kinds of things anymore. And so maybe you can have the low, low bandwidth.

01:17:00

uh because you the the reason you stack DRAM into stacks and make HBM is for higher bandwidth. And so is it possible to go to HBM uh accelerators and um and basically have the opposite of clot code fast, like have clot code slow and

01:17:15

Mm-hmm.

01:17:15

And do that.

01:17:16

Yeah. I think I think at the end of the day, the incremental purchaser who's willing to pay the highest price for tokens also ends up being the one that's like less price sensitive and, you know, the the compute should be allocated in a capitalistic society towards the val the the goods that have the highest value and the private market determines this by willingness to pay. And so to some extent, um

01:17:36

Sure, Anthropic could actually release a slow mode, right? They could release Claude Slow Mode and have an increase in tokens per dollar by a a significant amount. Um, they could probably like reduce the price of Opus four six by you know, four X, five X, and reduce the speed by another by maybe just like two X. Like the curve on inference throughput versus speed is there already just on HBM. Um, and yet they don't uh

01:18:01

n because no one actually wants to use a slow model. And furthermore, on these agentic tasks,

01:18:07

You know, it's it's great that the model can run at this time horizon of hours. That's kind of like, okay, well if if the model was just running slower, that hours would become a day, right? Um or vice versa, right? If the model's running faster, that hours becomes hour. Um And yet no one really wants to move to that day-long wait period because the highest value tasks also have some time sensitivity to them, right?

01:18:30

And and so I'm I struggle to see, you know, yes, you could use DDR, um, but then there's a couple like things that are challenging with this, right? You could use regular DRAM. Um One is you're you're still limited, you know, one of the like core constraints of chips, even though they're, you know, sort of like, you know, the there's an a chip is like a certain size.

01:18:51

All of the I.O. escapes on the edges of the chip, right? So oftentimes, you know, what you see is the left and the right of the chip are HPM. The I/O from the chip to the HBM is on the sides. And then the top and bottom are I/O to other chips, right? Um And so if you were to change from HBM to DDR,

01:19:11

then all of a sudden this I/O on this edge would have significantly less bandwidth, but it had significantly more capacity per chip. Yeah. Because and and and so yes, you're making less um You know, the the the metric that you actually care about is bandwidth per wafer, not bits per wafer.

01:19:30

B because the the thing that is constraining the flop. Is just getting in and out the next matrix. And for that, you just need more bandwidth.

01:19:39

Yeah, getting out the weights and getting out getting in and out the uh KV cache. Right. And so in many cases, these GPUs are not running at full memory capacity. Yes, there it's obviously like a system design thing, you know, model hardware, software co-design of, hey, what do I, what do I how much KV cache do I do? How much do I keep on the chip? How much do I offload to other chips and call when I need it?

01:19:58

for tool calling or whatever, how much do I um how many chips do I paralyze this on? Obviously these are like the the search space of this is like very broad, which is why we have like inference X, like this is like an open source model like searches all the optimal points on inference for a variety of eight different chips. um and models. Um anyways like the point is you don't necessarily you're not always necessarily constrained by memory capacity.

01:20:22

Uh you can be constrained by comp flops, you can be constrained by network bandwidth, you can be constrained by memory bandwidth, uh, or you can be constrained by memory capacity. There's sort of like four, if you're really to simplify it down, there's like four constraints. And each of these can break out into more, but

01:20:35

In this case, if you switch to DDR, yes, you produce four X the bits per DRAM wafer, but all of a sudden the constraints shift a lot and your system design shifts a lot. You go slower, yes. Is the market smaller? Okay, maybe possibly. But also now all of a sudden all these flops are wasted because they're just sitting there waiting for memory. It's like great, I don't need all that capacity because I can't really

01:20:56

increase batch size because then the KV cache is going to take even longer to read. And so you never you can yeah.

01:21:02

Interesting. Uh wha what is the bandwidth difference between H B M and Uh a normal DRM?

01:21:07

Yeah. So an HBM stack of HBM four. Let's just talk about like the stuff that's in Ruben, because that's what we've been indexing on, is 2048 bits. Across connected in an area that's like 13 millimeters wide. Um so 2048 bits, and it tr it transfers memory at around 10 gigatransfers a second. So HBM, a stack of HBM 4 is 20, 48 bits on an area that's 13 millimeters wide, roughly, or 11. And that's that's the shoreline that you're taking on the chip.

01:21:33

And in that shoreline, um you have 2048 bits transferring at 10 gigatransfers per second. Uh you multiply those together and you divide by eight bits to bytes, you're at roughly two and a half terabytes a second per HPM stack. When you look at DDR um in that same area, it's maybe 64 or 128 bits wide. And that DDR five is transferring at any, you know, anywhere from 6.4 gigatransfers a second to maybe eight, 8,000 gigatransfers a second.

01:22:01

So your your bandwidth is like significantly lower lower, right? It's sixty-four times eight thousand divided by eight. Um you're at sixty sixty-four gigabytes a second. Um and even if you take a generous interpretation of one twenty eight. Times eight gigatransfers, you're at 128 gigabytes a second for the same shoreline versus two and a half terabytes a second. There's a there's an order of magnitude difference in bandwidth per

01:22:23

edge area. And if your chip is a square or it's twenty-six by thirty-three, right, is the maximum size for a chip, individual die, um, you only have so much edge area. And then on the inside of that chip you put all your compute. Um there's things you can do to try and change, right? More SRAM, more caching, blah, blah, blah. But at the end of the day, you're very constrained by bandwidth.

01:22:40

Interesting. So um then there's a question of like where can you destroy demand to free up enough for AI. Um and uh and I guess the picture is especially bad because as you're saying if it takes four X more wafer area to get the same byte for HBM. You had to destroy four X as much consumer demand for good laptops and phones and whatever in order to free up one byte for AI. So Wha yeah, what does this imply for the next year or two of

01:23:08

Sorry for the run on question. I think on on your newsletter you said thirty percent of the capex in twenty twenty six of big tech is going towards memory. That's insane, right? Yeah. Like of the six hundred billion or whatever, you're saying thirty percent is going just to uh just to

01:23:23

And you know, obviously there's some level of like margin stacking that NVIDIA does and so if you separate out you know, and you apply their margin to the memory and the logic, but at the end of the day, yeah, like a third of their capex is going to memory.

01:23:33

That's that's so th that's crazy. Okay. So what is the question I'm trying to ask? It's something like, yeah, what is this impl what basically what should we expect over the next year or two as this memory crunch has?

01:23:41

Yeah, so memory crunch will continue to be harder and harder. Um, and prices continue to go up. And this affects different parts of the market differently, right? Um gets to sort of the like, are people gonna hate AI more and more? Yes, because now smartphones and PCs are not gonna get incrementally better year on year. And in fact, they're gonna get incrementally worse.

01:24:00

I if you look at the b bill of materials of an iPhone, what how what fraction of it is the memory? Like how how much more expensive does an iPhone get if the memory is twox more expensive or th whatever it has to be?

01:24:08

So um I believe an iPhone has twelve gigabytes of memory. Um each gig cost used to cost roughly three or four dollars. So it's fifty bucks. Um, but now the price of memory is like Tripled. Let's call it if it's now uh it's twelve bucks per gig for DDR. So now you're talking about$150 versus$50, right?

01:24:27

a hundred dollar increase in cost on Apple. Also Apple has some margin. They're not just gonna eat the margin. So now that's a hundred dollar cost increase. That's also just on the DRAM. The NAND also has this same sort of like market. So in fact, you know, it's probably a hundred and fifty dollar increase on the on the iPhone.

01:24:41

Apple has to either pass it on to the consumer, A, or B, they have to eat it. I don't see Apple reducing their margin too much. Maybe they eat a little bit. But at the end of the day, that means the end consumer is paying$250 more for an iPhone. Um, and now that's on like, hey, what is last year's memory pricing versus today's? Now there is some lag for Apple to have to feel the heat because they have tended to have, you know, three, six or a year long contracts for a lot of memory.

01:25:08

But at the end of the day, Apple gets hit pretty hard by this. But they won't they won't really adjust until the next iPhone release. But that's the high end of the market. Actually, that's only a few hundred million phones a year, right? Apple sells what, two, three hundred million phones a year. The bulk of the market

01:25:23

Is this mid range, low end, right? Uh used to be one point four million smartphones were sold a year. Now we're at like one point one. Uh but our projections are we maybe get down to like eight hundred million this year and next year are like six hundred or five hundred million because

01:25:37

And and and we look at like, you know, there's some data points out of China from some of our analysts in Asia and Singapore and Hong Kong and Taiwan. They've they've been trekking this and they see Xiaomi and Appo are cutting low-end and mid-range smartphone volumes by half.

01:25:52

Because yes, it's only a hundred and fifty dollar price increase on a thousand dollar smartphone or a hundred and fifty dollar bomb increase on a hundred on a thousand dollar iPhone where Apple has some larger margin. But if we look at the smaller phones, the percentage of the bomb that goes to

01:26:06

memory and storage is much larger and the margins are lower. So there's less capacity to even eat the margins. Um and they have like generally tended not to do as long-term agreements on memory. Um And why this is like a big deal is if smartphone volumes, let's say half The halving will frankly happen in the low and mid range, not in the high end. So it's not like the bits released are halving, right? Um, you know, currently consumers more than half of memory.

01:26:34

Demand, even if you half the smartphone volumes because of the shape of the halving, right? It's like low end gets cut by more than half, high end gets cut by less than half. Because you and I will buy, you know, the high-end phones that cost north of a thousand dollars, we'll buy them, even if they get a little bit more expensive. And Apple's volumes will not go down as much as like a low-end smartphone provider. And and the same applies to PCs. And what this does to the market is.

01:26:56

quite drastic, right? Um DRAM gets released, goes to AI chips who are willing to do longer term contracts, willing to pay higher margins, et cetera, et cetera, because at the end of the day, the margin that they extract is much larger from the end user or whatever. Um And so this this this probably leads to like

01:27:16

People hating AI even more, right? Because they're gonna start being like, today you already see all the memes like uh on on like PC subreddits and PC like Twitter, gaming PC Twitter is like, you know, cat dancing videos and it's like this is why memory prices is doubled and you can't get a new gaming GPU. Right, or you can't get a new uh desktop and and and it's gonna be even worse when memory prices double again, especially D RAM.

01:27:37

Another dynamic that's quite interesting is it's not just DRAM, it's also NAND. Um NAND is also going up in price. Both of these markets have expanded capacity very slowly over the last few years. Um NAND almost zero, but smartphones. Uh the percentage of NAND that goes to phones and PCs is larger than the percentage of DRAM that goes to phone and PC. So as you destroy demand, you unlock

01:28:00

More, you know, mostly for the DRAM purposes, you unlock more NAND that gets allocated and can can sort of go to other markets. And so the price increases of DRAM will be larger than those of NAND. because you've released more from the consumer. Uh in fact you've produced more memory for AI.

01:28:17

I I maybe you just explained it and I missed it. Is it i uh is it because SSDs are being used in large quantities for data centers or

01:28:23

They are but not as large quantities quantities as DRAM.

01:28:27

So okay, but this you're saying they they will also increase because they're re busying in some quantity, but like There's not as much in the eat as there is for HBM. Makes sense. Um one thing I didn't appreciate until I was reading some of your newsletters is that um basically the same constraints that are preventing logic scaling.

01:28:43

uh over the next few years are it's it's it's quite similar to what's preventing us from producing more memory wafers. In fact, like literally the same exact machine. This EUV tool is needed for memory. So I I guess, yeah, maybe uh there's a question that somebody could be asking right now, like, well, why can't we just make more memory?

01:29:00

Is that somebody you know? So I think the constraints, as I was mentioning earlier, are not necessarily EUV tools today or tom next year. Um they they become that. As we get to the latter part of the decade. But currently, right, the constraints are more so. They physically just haven't built fabs, right? So over the last

01:29:21

Three to four years, these vendors have just not built new fabs. Um, that's because memory prices were really low. Um, their margins were low. And in fact, they were losing money in twenty twenty three on memory. Um so they were like, oh, we're not building new fabs. And then like the market slowly recovered over time, but never really got amazing until last year.

01:29:39

Um, you know, in 2024 we were like banging on the drums that like, hey, reasoning means long context, which means large KV cache, which means you need a lot of memory demand. Um, and we've been talking about that for uh like a year and a half, two years. Um and people who understand AI like

01:29:53

went really long memory then, right? Um, you know, and and so you've seen that sort of like dynamic, but now it finally played out in pricing. It took so long for what was obvious, right? Hey, long context, KV cash gets bigger, you need more memory. And accelerators, half their cost is memory. So of course they're just gonna start uh you know, they can they're gonna start like going crazy on it. Um

01:30:13

It took a year for that to actually reflect in memory prices. Once memory prices reflected, then it took another six months, three months for the memory vendors to start building fast. And those fabs take two years to build. And so we don't have really meaningful fabs that you can even put these tools in until twenty late twenty-seven or twenty-eight. Right. Um

01:30:33

And so instead what you've seen is like some really crazy stuff to get capacity, right? Uh Micron bought a fab from a company in Taiwan that makes lagging edge chips, right? Um Heinnix and Samsung are doing, you know, some pretty crazy things to try and expand capacity at their existing fabs, uh, that also have like very not large knock on effects in the economy. And so, hey, why can't we build more

01:30:59

Capacity is like there's nowhere to put the tools, right? And it's not just EUV, there's other tools involved in DRAM and logic, right? Like L logic, you know, N3, 30% or so of the cost, you know, 28% of the cost is UV of the wafer, of the final wafer. When you look at like DRAM,

01:31:16

It's it's it's in the teens. Um and it's going up, but it's in the teens. So it's as much smaller percentage of the cost as DRAM or or is UV. These other tools are also bottlenecks, although their supply chains are not as complex as ASMLs. Um And so you see applied materials in LAM Research and all these other companies also expanding capacity a lot. And anyways, you don't have anywhere to put the tool because the most complex building that people make is FAPS. And Fabs take two years to build.

01:31:42

You can think of Jane Street as a research lab with a trading desk. Their infrastructure team has built some of the biggest research clusters in the world, with tens of thousands of high-end GPUs and hundreds of thousands of CPU cores and exabytes of storage. This compute is part of how Jane Street surfaces all the hidden patterns that are embedded in incredibly noisy market.

01:32:01

Even beyond the noise, the nature of the signal changes constantly in reaction to things like pandemics and elections and new regulations and even changes in sentiment. There's this unremitting game of trying to figure out whether your old models still reflect the real world, and if not, what to do about it.

01:32:15

If you're interested in working on this sort of thing, Jane Street is hiring ML researchers and engineers. They're also accepting applications for their summer ML internship program with spots in London, New York, and Hong Kong. And if you happen to find yourself at GTC, which is happening the week after this episode drops, Jane Street's GPU performance team is giving a talk. Go to JaneStreet.com slash SporeCash to learn more.

01:32:38

Uh I interviewed Elon recently and his whole plan is that I guess they're gonna build this Giga Fab, terrafab, uh some power of ten, and they're gonna they're gonna build a clean room. I won't even ask you about the dirty rooms thing, but like let's say they build the clean rooms. Um and fur okay, I have a couple of questions. One

01:33:00

Do you think this is the kind of thing that Elon Co. could build much faster than people are conventionally building it? But this is not about building the end tools. This is just about building the facility itself. How complicated is it to just Build the clean room and do it extremely fast. Is this something that like Elon with his uh move fast thing could do much faster if that's what we're modeling on this year or next year?

01:33:18

And two, um, does that even matter if in two years your view is that we're not bottlenecked on clean room space, but we're bottlenecked on the tooling?

01:33:26

So I think I think, you know, as with any complex supply chain, it takes time and constraints shift over time. And even if something isn't all any longer a constraint, that doesn't mean that market no longer has margin, right? So for example, um energy will not be a big bottleneck as we get to, you know, a couple years from now. Um but that doesn't mean energy is not growing super fast and there's no margin there. It's just like it's not the key bottleneck.

01:33:46

And in the space of fabs, right, clean rooms are the biggest bottleneck this year and next year. Um, and as we get over time, twenty-nine, you know, twenty eight, twenty nine, thirty, there will be still constraints there. The thing about Elon is I think he's had an a tremendous capability to garner physical resources and really smart people to build things.

01:34:06

And the way he's able to recruit really amazing people is just try and build the craziest stuff. Right. In the case of AI, that's not really worked because everyone's trying to build AGI. Everyone's very ambitious. But in the case of like we're gonna make you know, we're gonna go to Mars, we're gonna make li rockets that land themselves, or we're gonna make fully autonomous cars that are electric, right? Um, or we're gonna make human aid robots, right? Like these are methods of recruiting

01:34:26

The people who think that's the most important problem in the world to work on that problem because he's the only one trying really hard. In the case of semiconductors, I want to make a fab that's a million wafers per month. No one has a fab that big. That's what he stated, right? He wants to make a million wafers a month. You know, it's possible that he's able to recruit a lot of really awesome people and get them on this heroically, you know, this crazy task.

01:34:45

of trying to build a fab that does a million wafers per month. Step one is to build the clean room. And I think that he probably can do. Right. I think, you know, there's some mindset, you know, his his mindset around like delete things. It can be dirty. It's fine. Probably not right. Or actually I I think 100% it's not right. You like need the fab to be very clean. I think the entire air the entire all of the air in the fab gets replaced like every three seconds.

01:35:09

It's like that fast and it there's so few particles per but I think he can build the f uh clean room. It'll take a year or two, maybe. Initially it won't be super fast, but then over time we'll get faster and faster at it. But then the really complex part is actually developing the process technology and building wafers. And I don't think he can develop that uh quickly. I think that has a lot of built-up knowledge. It's again like the most complicated like integration of

01:35:32

very expensive tools and supply chain that's done is a T SMC or an Intel or a Samsung. And those some of the these two other two companies aren't even that great and they're like tremendously complex.

01:35:42

How how how surprised would you be if in twenty thirty people like there there just happened to be some total disruption. We're not using U U V. We're using something that has like much better effects, it's much simpler to produce. We can produce in much bigger quantities.

01:35:56

I'm sure as an industry insider that sounds like a totally naive question, but do you see what I'm asking? Like is it pa like how What probability should we put on, oh, something totally out of the left field comes out and none of this is relevant?

01:36:07

Something that's very simple and easy to scale, I have very, very low probability for. Uh, there are a number of companies working on effectively like. particle accelerators or synchrotrons that generate light that's either 13.5 nanometer like EUV or even uh X-ray, like even uh narrower wavelength.

01:36:24

like seven nanometer or whatever uh wavelengths of light to then use in lithography tools. But those things are like massive partic particle accelerators that are then generating this light. So a very complicated thing to build. So there's a couple companies and I think that that could be a big disruption to the industry.

01:36:37

uh beyond what EU V is. I don't necessarily think that like we're gonna just magically build something new that is like direct right and super simple um and can be manufactured at huge volumes, although there are some attempts to do things like this.

01:36:50

Yeah.'Cause I asked because if you think about Elon Co's in the past. Rocketry was a thing that was thought to be I mean it is incredibly complicated.

01:36:59

I'm just a naive yapper compared to Elon, right? What have I built? I so maybe it's possible, right?

01:37:04

Yeah. Um in order to be able to build more memory in the future, could we build uh three D D RAM the way we we do three D NAND and then go back to DUV?

01:37:14

Um this is the hope currently everyone's roadmap for 3D DRAM is that you'll still use EUV. Um, because you wanna have that tighter overlay because now when you're doing these subsequent processing steps, you want it to be you know, n everything's vertically stacked, you have more layers on top of each other.

01:37:30

um and you want the pitches to be tighter and all these things. So so generally people are still trying to do a EUV, but what 3D would do is it would take the c you know, hey, a single EU V pass, how many bits can it make, right? If you do this sort of like calculation. And that number would go up drastically if you go to 3D DRAM. That is the hope, but right now everyone's roadmap is sort of like you go from current

01:37:50

Uh it's called a six F squ uh cell to a four F squ uh cell and then finally three D D RAM like by the end of the decade or early next decade. So there's still like a lot of R and D and manufacturing and integration to be done. Um I wouldn't call that out of the cards. I think it's very much likely going to happen. It also is going to require a huge retooling of fabs, right?

01:38:09

The the breakdown of tools in a fab are very different, right? Actually, the lithography tool is the only thing that isn't like that different, but the number of them relative to different types of chemical vapor deposition or atomic layer deposition or uh dry etch or different kinds of etch chambers with different chemistries, all of these things.

01:38:27

You have all these different kinds of tools for different process nodes. You can't just like convert a logic fab to a DRAM fab or vice versa back and forth or a NANFAB to a DRAM fab in in a short amount of time. And in the same way, existing DRAM fabs. require a lot of retooling just to go from one B or one alpha to one b beta to one gamma process nodes because now they have to add DEUV and change the uh chemistry stacks for when you're using EUV in in terms of deposition and X.

01:38:51

And the EV tool has to be there. And furthermore, like when you change to 3D DRAM, there's gonna be an even larger shift. And so there's a lot of retooling of these fabs that needs to happen. um in terms of the tools. And so that would be a big disruption. Um that would make EU V demand generally lower, but as we've seen across time, UV demand as a percentage of wafer cost has trended up initially, or lithography, right? Lithography initially

01:39:15

I wanna say in like twenty fourteen ish era was like sixteen percent of the wafer cost, seventeen percent, and it's gone to thirty over the last, you know, fifteen years. Um and for DRAM it was in the mid teens as well or low teens, and now it's trended towards the high teens. Um And before we get to 3D DRAM, it'll likely cross into the twenties percentage range. But then if we get to three D DRAM, it tanks again in terms of the total end wafer cost as a percentage of UV.

01:39:39

Yeah. I I I guess you care less about like the percentage cost and more about how much it bottlenecks being a pretty

01:39:44

But the percentage of cost is sort of

01:39:46

Roxy, yeah, yeah. Yeah. Um so if if you're uh if you're a Jensen or Sam Waltman or whoever who's stands to gain a lot from scaling up AI compute. Um there's these stories that they'd go to TSMC and say, hey, why why can't we actually Y and Z? Um but I think the point you're making here is

01:40:04

y it it doesn't really matter in some sense what TSMC does. And in fact, even if you have Intel and Samsung building more foundries, i in in the long run you're still gonna be bottlenecked by ASML and other tool makers and other other material makers. So

01:40:17

First, is that correct interpretation? And second, then why should should basically should Silicon Valley people be going to the Netherlands to try to pitch ASM? Like right now, should they be trying to pitch ASML to make more miss tools so that like in twenty thirty they can have more AI computers?

01:40:29

You know it's it's a funny dynamic we saw in twenty three, twenty four, uh and twenty twenty five. who saw the energy bottleneck before others asymmetrically went to, you know, uh Siemens, Mitsubishi, and of course GE Vernova and bought up turbine capacity and now they're able to charge excess amounts for deploying these turbines places because of energy. And in the same sense, this could be done for EUV, except ASML is not just gonna trust any random bozo who wants to buy EUV tools. Um

01:40:58

In the sense that like, you know, these turbines are much cheaper than UV tools and there's many more of them produced, right? Especially once you like get to like industrial gas turbines or like, you know, not not just combine cycle, but like the cheaper, smaller, et cetera, less efficient ones. People put down deposits for these. So in a sense, someone could do this, right? Someone should go to the Netherlands and be like, I'll pay you

01:41:19

A billion dollars, you give me the right to purchase 10 EUV tools two years from now. Right. And I I have I'm first in line two years from now. Um and then over those two years, you then go around and wait for everyone to realize, oh crap, I don't have enough EV tools. And then you try and sell your option.

01:41:37

at some premium. But all you're effectively doing is you're saying, ASML, you're dumb. You weren't making enough margin on these. I'm going to make a margin. And the question is like, will then will ASML even agree to this? Right. And I'm like, I don't think so, right?

01:41:49

Like there's a world where they uh at least like get the demand signal from that to increase production.

01:41:53

Potentially, potentially. I agree.

01:41:55

Um but it's now it sounds like you're saying, Oh, they couldn't even increase production if they wanted to give the

01:41:59

But that's that's exactly the market in which if they can't increase production, just like TSMC cannot increase production that fast, and yet demand is ri mooning, then the obvious solution is to arbitrage this because you and I know demand is way higher than they're pro they're projecting.

01:42:13

and their capability to build. So then you arbitrage this by locking up the capacity and then sort of doing like a forward contract and and and then trying to sell it at a later date once other people realize actually shit, everything is fucked and we don't have enough capacity. And then you'll have like this insane margin that ASML and TSMC should have been charging. But the thing is, I don't know if ASML and TSMC will ever agree to this.

01:42:33

Um okay. Let let me ask about power now. So it sounds like you think power can be arbitrarily scaled. Um

⁠¶ Scaling power in the US will not be a problem

01:42:40

Not arbitrarily, but yes.

01:42:41

But but beyond these numbers. And um I think if I'm remembering correctly, your blog post on the power, uh how how yeah I love the increasing power, you were like What you were implying that uh G of Vernova and Mitsubishi and Sim uh Simons could produce in gas turbines was like sixty gigawatts a year. And then th there's these other sources but they're like less significant than And so we're not going to be able to

01:43:06

And only a fraction of that goes to AI, I assume. So w yeah, if if in twenty thirty we have enough logic and memory to do two hundred gigawatts a year. Is it do do you just think that these things are on a path to ramp up to more than two hundred gigawatts a year, or w what what do you see?

01:43:20

So I mean I mean right now we're at thirty, right? Um or twenty, twenty, twenty. So this is critical IT capacity, by the way, right? This is an important thing to mention. Qu when I'm talking about these gigawatts, I'm talking about critical IT capacity, server plugged in, that's how much power it pulls.

01:43:31

But there's losses along the chain, right? There is loss on the transmission, there's l losses on the conversion, uh, there's losses on cooling, etcetera. And so you should f gross this factor up, you know, from twenty gigawatts for this year or two hundred gigawatts by the end of the decade um to

01:43:48

Some number 20, 30% higher. And then you have capacity factors, right? Turbines don't run at 100%. In fact, like if you look at PGM, uh, which is the largest grid, I think, in America, um, sort of the Midwest sort of northeast kind of area-ish. Not not the full northeast. But anyways, PGM, they rate in in their models for like

01:44:07

Hey, turbines, how much capacity? We want to have excess, you know, roughly twenty percent capacity. Uh, in addition, in that twenty percent excess capacity, we're running all the turbines at ninety percent because they are derated some for reliability. Oh, things go down, maintenance, et cetera, et cetera, et cetera. So then

01:44:21

In reality, the nameplate capacity for uh energy is always way higher than the actual end critical IT capacity because of all of these factors. Um, but it's not just turbines, right? If you were just making power from turbines, like that's simple, boring, easy, right? Um we're, you know, humans and capitalism is far more effective. And so the whole point of that blog was, yes, there's only three people making combined cycle gas turbines.

01:44:46

But there's so much more we can do, right? We can do error derivatives, right? We can take airplane engines and turn them into uh turbines as well. And there's even new entrants in the market, like Boom Supersonic's trying to do that, right? And they're working with Crusoe and also there's all the other ones like that already exist in the market. There's

01:45:01

01:45:02

There's medium speed reciprocating engines, right? Engines that spin in circles, right? So sort of like any diesel engine, right? There's like ten people who make engines that way, right? So Cummins, you know, you know, pe at least I'm from Georgia and we we you know people used to be like, oh man, you got a Cummins engine in there um, you know, like, you know, regarding RAM trucks, but it's like, well actually auto

01:45:20

automobiles manufacturing's going down, these companies all have capacity and could scale and convert that to for data center power, right? Stick all these reciprocating engines. Yes, it's not as clean as combined cycle. Maybe you can you can convert them from diesel to get to gas if you want.

01:45:33

Um, but at the end of the day, these spinning engines, oh, what about ship engines, right? All of these engines for these massive cargo ships, those are great. Nebbius is doing that for a data center in Micro for Microsoft.

01:45:43

01:45:44

in New Jersey, right? They're running these ship engines to generate power. Oh, there's um, you know, Bloom Energy's doing uh fuel cells. We've been like very positive on them for like a year and a half now, um, because they have like such a capa capability to increase their production, um and their payback period for production increase is like very fast, even if the cost is a little bit higher than combined cycle, which is like the best cost in efficiency. Um

01:46:05

You know, and then and then there's solar plus battery, which as these cost curves continue to come down, those can come online. There's wind. And you know, of course the derating of those, you know, hey, when you put on a wind turbine, you might say, Oh, I'm only gonna expect fifteen percent of the maximum power because things like

01:46:18

just oscillate, but yeah, batteries. There's all these things. And then the other thing is that like the grid is scaled for um, you know, hey, we are not gonna cut off power at peak usage, which is like the hottest day in the summer. Um But in reality, that's a load spike that is 10, 15, 20% higher than the average.

01:46:37

Well, if you just put enough utility scale batteries or you put peaker plants that only run a small portion of the year, then all of a sudden, you know, and those could be gas, they could be industrial gas turbines, they could be combine cycle, they could be any of the other

01:46:50

sources of power I mentioned, um, they could be batteries, then all of a sudden you've unlocked 20% of the US grid for data centers because most of the times that capacity is sitting idle and it's really only there for that peak, right? Which is a day or two.

01:47:03

Right. And it's a few hours of like maybe a few a few days of the full year is that peak. And so you just r have enough capacity to absorb that that peak load and all of a sudden you've transferred all. And today data centers only three, four percent of the power of the US grid. And by twenty eighth, they'll be ten percent.

01:47:18

But if you can just unlock twenty percent of the US grid like this, like it's like not that crazy. Um, you know, and and the US grid is terawatt level, not hundreds of gigawatts level. Right. So we we can add a lot more energy. It's not easy. I'm not saying it's easy. These things are gonna be hard. There's a lot of hard engineering. There's a lot of risks that people have to take. There's a lot of new technologies people have to use. But

01:47:41

Elon was the first to do this behind the meter gas. Um, and since then we've seen an explosion of different things that people are doing to get power. And they're not easy, but people are going to be able to do them. And the supply chains are just way more simple than chips.

01:47:55

Interesting. So I I I guess uh uh he made the point during the interview that the specific blade for the specific turbine he was looking at, the delete times for that go out beyond twenty thirty. And your point is that

01:48:05

That's great. There's so many other ways to make energy. Okay. So just be inefficient. Like it's fine.

01:48:10

Right. So you're like r right now I guess c combined cycle gas turbines have capex of fifteen hundred dollars per kilowatt and you're saying you could just it would make sense to have either technologies that are much more expensive than that or other things are getting cheap enough to that to make it competitive.

01:48:24

Exactly, exactly. You know, there it can be as high as thirty five hundred dollars per kilowatt, uh even, right? So it could be twice as much as the cost of combined cycle. And the total cost of the GPU, you know, you know, on a TCO basis has gone up a few cents per hour. Right. Right. Again, if we're to because because we've been talking about uh hopper pricing, a dollar forty now becomes, you know, oh, the power price doubles?

01:48:45

Okay, the hopper that was a dollar forty is now a dollar fifty in cost. Oh, I don't care because The models are improving so fast that the marginal utility of them is worth way more than that ten cent increase in energy.

01:48:59

Okay. And then so you're saying twenty percent of the grids uh winter, what about uh twenty percent of that can could just come uh online from utility scale batteries, increasing um l what you'd come be comfortable putting on.

01:49:11

Mechanism there is like not easy by the

01:49:13

But like that's two hundred gigawatts. Like if that hypothetically happens, but you're saying on just from the different sources of gas generation you mentioned, the different kinds of engines and turbines, um, combined, how how many gigawatts could they unlock by the end of the decade?

01:49:27

Yeah, so we're we're tracking uh in some of our data where all you know, there's over sixteen different manufacturers of power generating uh things just from gas alone. Right. So, you know, yes, there's only three turbine manufacturers for combined cycle.

01:49:40

Um but we're tracking 16 different vendors and we have all of their orders and things like that. And it turns out there is just hundreds of gigawatts of orders to various data centers. As we get to the end of the decade, we think like something like half of the capacity that's being added will be behind the meter. And when we look at like a lot of this is

01:49:59

Actually behind the meter is almost always more expensive than grid connected, but there's just a lot of problems with getting grid connected and uh you know permits and interconnection cues and all this sort of stuff. So it ended up being even though it's more expensive, people are doing behind the meter and then what they're doing behind the meter with

01:50:14

ranges widely, right? It could be reciprocating engines. It could be ship engines. It could be aeroderives. It could be combined cycle although combined cycle's not that great for behind the meter. Um it could be uh bloom energy fuel cells, it could be solar plus battery, right? Like it could be any of these

01:50:28

You're saying e e any of these individually could do like tens of gigawatts?

01:50:32

Many of these individually will do tens of gigawatts and in a whole they will do hundreds of gigawatts.

01:50:36

Okay. So that that alone should uh more than s um

01:50:39

I mean, it's it's gonna take I mean, like electrician wages probably double or triple again, right? And like there's gonna be a lot of new people entering that field and there's gonna be a ton of people who make money, but it is something that I don't like I don't see that as the main bottleneck, right?

01:50:52

So uh right now in Abilene, the one point two gigawatt uh data center that um Crusoe is building for Open AI, uh I think they have like five thousand people working there. Or at peak they had did. Um and if you turn that into um 100 gigawatts. And I'm sure things will get more efficient over time. But that would be like 400K people it would take to build a hundred gigawatts.

01:51:15

And if you think about the US labor force of how many electricians there are, how many construction workers there are. Yeah, I guess there's like eight eight hundred K electricians. Uh I don't know if they're all substitutable in this way. There's m millions of construction workers. But if if we're in a world where we're adding two hundred gigawatts a year, are we gonna be crunched on labor eventually, or do you think that is actually not a real constraint?

01:51:36

So so labor is a humongous constraint in this people have to be trained. Uh likewise, we probably start importing the highest skilled at labor in this in this way, right? Because now it makes sense that, you know, hey, a really high skilled electrician in Europe who was working on destroying power plants, now comes to America and is building data center uh, you know, high voltage

01:51:59

Electricity, you know, power power moving across the data center, right? Something like this, right? Um humanoid robots maybe start to or robotics at least start to, but the main factor is going to be for reducing the number of people is modulizing things and making them in factories in Asia. Um Unfortunately, but you know, like at least for America, but

01:52:16

you know, Korea, um, Southeast Asia, in in many ways China as well. But you know, the these these areas are going to do are are going to ship more and more built out sub uh um built out sections of the data center and those will be shipped in, right? Maybe today you, you know, you currently ship servers in or a rack in and then you plug that into, you know, different pieces that you're shipping from different places. But now you'll ship it to a factory and integrate

01:52:44

the entire, you know, hey, maybe this is a two megawatt block. And this block goes from, you know, high voltage power to uh the, you know, the voltage power that you the voltage and uh and maybe DC that you deliver to the raft. um instead of being A C and high voltage, right? Or something like this, right? Or cooling, you take

01:53:02

you you ship a fully integrated thing that has a lot of the cooling subsystems already put together. Or because plumbers are also a bit constraint here. Or furthermore, you take instead of just a single rack, And now you have people wiring up all these racks of power and electricity and blah blah blah blah blah. You take a skid and you put an entire row of ex of of servers and that is shipped from the factories. Um

01:53:24

And and today a single rack may be 120, 140 kilowatts. But as we get to, you know, next generation, you know, NVIDIA Kyber and things like that, it's almost a megawatt. And then in addition, if you do an entire row, it'll have the the rack, it'll have the networking and it'll have the cooling and the power racks all integrated together. So now when you come in, actually you you have much less stuff to cable, whether it be um networking with a fiber, whether it be

01:53:50

Um the power, right? There's viewer power that power things to connect. and then there's fewer plumbing things to connect, right? And so this drastically can reduce the amount of people working in data centers and therefore the capability to build these will be much larger. And along the way there will be, you know, new things mean

01:54:06

You know, some people move faster to new things, some people move slower, right? Crusoe and Google have been talking a lot about this modularization, as has people like Meta and

01:54:15

You know, many others, right, have been talking a lot about this modularization and others are gonna be slower to doing it, but at the end of the day, you know, and and and people who move faster to new things may have more delays or people who are slower have labor problems. So there will always be dislocations in the market because this is a very complex

01:54:30

Supply chain. At the end of the day, it's still simple enough that we will be able to solve it through capitalism and human ingenuity on the time skills that are required.

01:54:38

Yeah. Okay. So speaking of um uh big problems to solve, I um uh Elon Musk is very bullish on space GPU.

⁠¶ Space GPUs aren't happening this decade

01:54:46

If you're s right that power is not a constraint on Earth. I guess the r other reason they would make sense is that even though you can phys i there is enough there'll be enough gas turbines or whatever to build it on Earth. I think Elon's next argument then is like you can't get the permitting. to build hundreds of gigawatts on Earth. Do you buy that argument?

01:55:02

Landwise, it's pretty there America's big. There does data centers don't take that much space. You can you can solve that. Um Permitting-wise, air pollution permits are a challenge, but the Trump administration's made it much easier. You go to Texas and you can skip a lot of this red tape.

01:55:17

Uh and so, you know, Elon, Elon had to deal with a lot of like this complex stuff in Memphis and then building a power plant across the border and all these things for Colossus one and two. But at the end of the day, there's a lot more you can get away with in the middle of Texas, right?

01:55:32

Why given that Elon lives in Texas, why didn't he just go to Texas?

01:55:34

Aaron Powell I think it was partially like they over-indexed on grid power for a temporary period of time, right? Because that's just what they they thought they needed more of.

01:55:42

Because they had an aluminum refinery connected to the grid there.

01:55:45

It's a it was a it was an appliance factory that was deleted. But I think they may have indexed more to what was grid power. They may have indexed more to like water access and gas access because it actually I think they bought that knowing that the gas line was right there and they were gonna tap it. Same with water. Um

01:56:02

It was a whole host of different constraints. It was probably an area where electricians and things like that were easier to find. But at the end of the day, I'm not exactly sure why they chose that site. I bet Elon would have chosen somewhere in Texas if he could have like gone back. But

01:56:15

Yeah, and because of the regulatory faces he's challenged he's uh challenges he's faced. It's it's it's ultimately like permitting is a challenge, but America is a big place and there are fifty states and things will get done, and there are a lot of small jurisdictions where you can just transport in all of the workers that you need for a temporary period of six months to a year.

01:56:35

um depending on the type of contractor. It can be even three months for depending on the type of the contractor uh that's coming in and put them in temporary housing, pay out the butt because labor is very cheap relative to the GPUs and the power, or not the power, but the GPUs and the like the networking and so on and so forth and the end value of the tokens it's going to produce. So all of these things have plenty of room to like be paid for.

01:56:55

01:56:57

It's fine, right? You you and also people are diversifying now, right? Australia, uh, Malaysia, Indonesia. India, these are all places where data centers are going up at a much faster pace, but currently still 70% plus of the AI data centers are in America, and that continues to be the trend.

01:57:14

And so I think people are figuring out how to build these things and permitting like I I just like ultimately like permitting in red tape in middle of nowhere Texas or middle of nowhere Wyoming or middle of nowhere like New Mexico is probably a hell of a lot easier than sending

01:57:30

Right. Well, uh other than the fact that the economic argument uh makes less sense once you consider the fact that energy is a small fraction of the cost of ownership of a data center. What are the other reasons you're skeptical?

01:57:41

Yeah, so obviously power's free in space basically. Um

01:57:44

That's the reason to do it.

01:57:45

Yeah, that's the reason to do it. But then there's all the other counter arguments, right? Which is Because even if power costs double, you're still at a fraction of the total cost of the GPU. The the main challenges is

01:57:58

And what we've seen that disperses, right? We have Cluster Max, which rates all the neo clouds and we test them. We test over 40 cloud companies, including the hyperscalers and neo clouds. What differentiates some of these clouds the most outside of software is their ability to deploy and manage failure, right?

01:58:12

GPUs are horrendously unreliable. Uh, even today, 15% of black wells or so that get deployed have to be RMA'd. You have to take them out. You have to, you know, maybe just plug them and plug them back in. But sometimes you have to take them out and ship them to NVIDIA, or rather, there are partners who do these RMAs and such.

01:58:28

Do you make a VLAN's kind of argument that once you have the initial um after initial phase, they actually don't feel that much?

01:58:34

Sure, but now you've you've done this, you've tested them all, you deconstructed them, put them on a spaceship, fucking put them into space, and then put them online again. That's months, right? And if your argument is that, you know, hey, GPUs have a useful life of X years. Right. If a GPU has a useful life of five years and it takes

01:58:54

Three additional months, probably six. Let's say six additional months, then that is ten percent of your cluster's useful life. And and because we're so capacity constrained, that compute is most valuable.

01:59:06

theoretically in the first six months you have it because we're more constrained now than in the future, because that compute now can contribute to a better model in the future or can contribute to revenue now, which you can use to raise more money to get better, you know, all these all these sorts of things. Now is always the most important moment. And so

01:59:21

You've delayed your compute deployment by six months potentially. And and the thing that separates these clouds is we see clouds that take six months to deploy GPUs today on Earth, right? We see clouds that take a lot less than six months, right? And so the question is, where does space get in there? I don't see how you would test them all on Earth, deconstruct them and ship, ship them and shoot'em into space, and it not take longer than just putting them in the spot that you're testing them.

01:59:44

Yeah. So the question I wanted to ask is the topology of space communication. So um Right now, uh Starlink satellites talk to each other at 100 gigabit uh gigabits per second. And you could imagine that being much higher with optical uh intersatellite laser links that are optimized for this. Um and that actually ends up being like quite close to the InfiniBand bandwidth, which is like four hundred gigabytes a second, right?

02:00:09

That's per GPU, not per rack. So so multiply that by seventy two, also like that was Hopper, when you go to Blackwell and Ruben, that two X's and two X's again.

02:00:19

All right. But but how much compute is happening per um Like d during inference, are the different scale ups still working together or is it just happening it's a batch within a single scale up?

02:00:30

Um a lot of models fit within one scale-up domain, but many times you split them across multiple scale-up domains. I think that. You really have to uh as models become more and more sparse. At least this is like the general trend, then you want to, you know, ping just a couple experts per GPU. And if leading models today have hundreds, if not thousand experts, then you want to run this across hundreds of chips or thousands of chips.

02:00:59

Um even as we continue to advance into the future. And so then you end up with this problem of Well now you need to tra you know, you need to connect all these satellites together com comms wise as well. Yeah.

02:01:11

Because I was imagining if there's a world where you could like do a batch inference for a batch on a single uh scale up, then maybe it's more plausible. But if not then it's

02:01:20

Yeah, I mean networking these ships together is a problem and and you can't just make the the satellite infinitely large, right? Like there are a lot of challenges with physics to making a satellite really big, right? So then these inner that's why you need these inner interconnects between the satellites, those internects

02:01:36

are more expensive than the you know, a cluster like twenty percent of the cost or fifteen percent of the cost is networking. All of a sudden now you're making it like space lasers instead of like pretty simple like lasers that are manufactured in millions of volumes with, you know, pluggable transceivers.

02:01:50

And those things are very unreliable as well. More unreliable than the GPUs, by the way. Across the life of a cluster, you have to unplug, clean it all the time, right? Um unplug, replug it just for random reasons. These things are just not as reliable. So you've got that also that problem as well. Like you've got a more expensive, complicated space laser to communicate instead of this pluggable optical transceiver that's been in super high volume.

02:02:10

Okay, so all in all, what does that imply for space data centers?

02:02:13

So space data centers effectively are not lit limited by, you know, hey, we have this energy advantage. It's actually just limited by the same contended resource. We can only make 200 gigawatts of chips a year by the end of the decade. So what are we going to do to get that capacity? It doesn't matter if it's on land or uh in in in space. You've you're you it doesn't really matter, right? Because you can build that power. And I think human capabilities and capacity to could get to the

02:02:41

uh period where we're adding a terawatt a year globally of of various types of power. At some point we do cross the chasm where space data centers make sense, but it's not this decade, right? It is it is much further out once you have Energy constraints actually being a big bottleneck, once you have space, land permitting be a much bigger bottleneck as it subsumes more and more of the economy. Um and and chips are no longer the bottleneck.

02:03:07

because chips are the biggest bottleneck and so you want them deployed working on AI the moment they're done ma being manufactured. And so there's a lot of things people are doing to increase that speed faster and faster, whether it be modulizing data centers or even modulizing racks where you actually put the chip in.

02:03:23

at the fa at the data center, but only the chip and everything else is already wired up and ready to go at the data center. So there's things like this that people are doing to decrease that time that you cannot do in space. And at the end of the day, All that matters in a chip constrained world is get these chips working on to uh producing tokens ASAP in a world

02:03:42

you know, maybe twenty thirty-five, once the semiconductor industry and ASML and Zeiss and all these other suppliers, land research applied materials, fab manufacturers like pendulum swings and they're able to make enough chips.

02:03:54

And really we're optimizing every dial and like it makes sense to optimize the ten percent of energy cost or fifteen percent of energy cost or as we move to A6 potentially. Uh and NVIDIA's margins aren't seventy plus percent, maybe that energy cost is thirty percent of the cluster. um and fab construction, all this like these are the things our data center construction, these are the things to optimize. But that's not a you know, Elon doesn't win by doing

02:04:16

you know, twenty percent gains. Elon never wins that way. Elon wins when he swings for the fences and does 10x gains. Right. That's what SpaceX is about. That's what Tesla was about. That's what all of its success has been about. Right. It's not a been about these chasing the twenty percent. So I think I think space data centers will eventually be a 10X gain, uh, potentially as as Earth. resources get more and more contentious. But that's not this decade.

02:04:39

Yeah. I mean I I think just to drive some intuition about how much land there is on Earth. Um obviously the chips themselves, especially if you move to a world where you have racks that have megawatts, uh megawatty charge, like literally it's not even a random factor.

02:04:52

The other thing, right? The power dense, you know, if chips are and manufacturing is the constraint. Right now, roughly it's one watt per millimeter squared. Yeah. Uh for AI chips and such. one easy way is to pump that to two watts per millimeter squared. Now you may not get

02:05:06

2x the performance, you may only get tw twenty percent more performance. And that requires much more exotic cooling, right? It requires uh more complicated cold plates and very complicated liquid cooling and or maybe it requires like things like immersion cooling. But in space, higher watts per millimeter is very difficult, whereas on Earth, these are solved problems. And one of these things enables you to get a lot more tokens, maybe it's twenty percent more tokens.

02:05:29

Wait for that's manufactured. And that's a humongous way.

02:05:32

A millimeter you mean of diarrhea?

02:05:33

Yeah, of diarrhea. Square millimeters of uh diarrhea.

02:05:36

I I mean it would be better for space because if you can run uh more watts per millimeter would be the chip runs hotter and the hotter the chip I guess it's a question of computer uh chip engineering. But but it like it cools to the power fourth by Stefan Boltzmann's law. So if you can run a very hot chip because that allows a lot of

02:05:51

You can't run it hotter, you can only run it densely. And the problem is getting the heat out of that dense area means you have to move away from standard liqu air cooling and liquid cooling to more exotic forms of liquid cooling or even immersion to get to higher uh power densities. And that's more difficult in space than it is on Earth.

02:06:08

And maybe it's at this point worth uh explaining what what what exactly a scale up is and what it looks like for NVIDIA versus um uh training versus um versus uh TPUs.

02:06:20

Yeah. So earlier I was mentioning how communication within a chip is super fast. Communication within chips that are in the same rack. Is fast, but is not as fast. And then, you know, it's on the order of terabytes. And then communication very far away is on the order of gigabytes, hundreds of gigabytes, right? So this this order of magnitude as you get further distance compute and maybe across the country it's on the order of gigabytes a second, right?

02:06:40

Scale up domain is this like tight domain where the chips are communicating on the order of terabytes a second. And so for NVIDIA, previously this meant an H100 server had eight GPUs and those eight GPUs could talk to each other at terabytes a second. With Blackwell, NVL72, they i implemented RackScale.

02:07:02

scale up. And that meant all 72 GPUs in the rack would connect to could connect to each other at terabytes a second speed. And and the speed double gen on gen, but also the most important innovation they did was going from eight to seventy two in the domain.

02:07:14

When we look at Google, their scale-up domain is completely different, right? It has always been on the order of thousands, right? With TPU V4, they had pods the size of 4,000 chips. With V8, they have pods even at, you know, or V7, they have pods in the 7,000 or sorry, 8,000, 9000 range.

02:07:29

02:07:30

And what's relevant here is that it's not j it's not the same as NVIDIA. It's not like for like. Google has a topology that's a tourist, right? So every chip connects to six neighbors. Rather than NVIDIA, the 72 GPUs connect all to all, right? So they can send terabytes a second to each other to any arbitrary other chip in that pod of scale up. Whereas Google, you have to bounce through chips. Right. So this means if TPU 1 needs to talk to TPU 76, then it has to bounce through various chips.

02:07:57

And there is always some blocking of resources when you do that. So because that one TPU is only connected to six other TPUs. And so there's a difference in topology and bandwidth. And there are trade-offs and advantages of both, right? Google gets to have a massive scale-up domain. But then they have the trade-off of you have to bounce across chips to get to from one chip to another. Um, you can only talk to six direct neighbors. And so there is like this trade-off in Amazon.

02:08:19

It has has mutated their scale-up domain. Uh, they're somewhere in between NVIDIA and Google effectively, where they're trying to make larger scale-up domains. Um, they try and do all to all to some extent, which is what with switches, which is what NVIDIA does, but also to some extent they use Taurus topologies like Google does. Um

02:08:37

And as we as we advance forward to next generations, b all all three of them are moving more and more towards o uh a dragonfly topology, which means there's sort of like there is some fully connected elements and there's some elements that are not fully connected. So you can get the scale up to be hundreds or thousands of chips, but also have it not contend for resources when you're bouncing through chips.

02:08:56

Related question. Um, I heard somebody make the claim that the reason that parameter scaling has been slow and only now are we getting bigger and bigger models from uh OpenAI and Anthropic is that Um, so original GPT-4 is over a trillion parameters. And only now are models starting to approach that again. Um, and I heard a theory. The reason is that. NVIDIA scale ups have just not had that much um uh memory capacity. And so What was the claim exactly? If you have, say, one five.

02:09:37

Let's say you have a five T model running at FP eight. So that's five uh five trillion gigabytes. Yeah. And then you have the KV cache, let's say it's like Okay, let's say it's the same size for one batch. So you need ten gigabytes, sorry, ten terabytes to be able to run.

02:09:54

Single forward pass, yeah.

02:09:55

And then only with the GB two hundred and VL seventy two do you have an NVIDIA scale up that has twenty terabytes. And before that they were much smaller. Whereas Google, on the other hand, has had these um huge TPU pods that are not all to all, but still have, I think, hundreds of terabytes of capacity in a single scale-up. So does that explain why parameter scaling has been slow?

02:10:16

I think it's partially the capacity and bandwidth, but also

02:10:19

02:10:20

As you build a larger model, the ability to deploy it is slower, right? Like in terms of like, hey, what is the inference speed for the end user? That's kind of irrelevant. What's really relevant is RL. Um and what we've seen with these models and allocation of compute at a lab is sort of there's there's a few main ways you can allocate compute. You can allocate it to uh inference, i.e. revenue, you can allocate it to development, i.e. making the next model, and you can allocate it to research.

02:10:45

Um and in development specifically you can split you split it between pre-training and uh RL, right?

02:10:52

And so when you think about, hey, what exactly is happening? Well, the model, the compute efficiency gains you get from research are so large, you actually want most of your compute to go to research, not to development. Because You know, all these researchers are generating new ideas, trying them out, testing them, and continuing to march along this and push the proto-optimal curve of scaling laws further and further and further.

02:11:13

And at least what we've seen empirically is like model cost gets ten X cheaper every year or even f more than that. Um, which at the same scale gets ten x cheaper or I get, you know, at you know, to get to reach new frontiers it costs the same amount or more, right?

02:11:27

So you don't want to train you don't want to allocate too many resources to pre uh to pre-training and post an RL. You actually wanna allocate most of your resources to research. Um and then in the middle is this sort of this like development period. If you pre-train a five trillion parameter model. Now you have to spend all this time. How many rollouts do you have to do in these RLs?

02:11:47

And these rollouts for a trillion parameter model versus a five trillion parameter model are five times larger, which then means it takes if you wanted to do as many rollouts, maybe the larger model is more sample efficient. Let's say it's two X more sample efficient. Okay, great. Now you need two and a half X much time. of RL to get the model smarter.

02:12:04

Um, or you could RL the smaller model for 2x the time and you'd be it'd be you know you'd still have a 50 percent or you'd still have a 25% difference in the big model, which is two X more sample efficient, uh, and doing X number of rollouts versus the small model, which is a trillion parameters.

02:12:18

doing, although it's less sample efficient, is doing twice as many rollouts, it's still done faster. And so you get the model faster, sooner, and you've done more RL. And then you can take that model to help you build the next models, help your engineers train and do all these research ideas. And so this feedback loop is actually weighed towards smaller models.

02:12:37

in in every case, no matter what your hardware is. And then as you look to Google, Google does deploy the largest production model of any of the th the major labs, right? Uh with Gemini Pro. It is a larger model than GPT uh 5.4. It's a larger model than Opus. And and so you end up with, yes, Google does this because they have a unipolar set of compute, right? All t almost all TPU. Um

02:13:03

Whereas Enthropic is d dealing with H one hundreds, H two hundreds, Blackwell, Traniums, TPUs of various generations, right? And and uh OpenAI is dealing with mostly NVIDIA right now, but going towards uh having a AMD in train as well. The fleets of compute like Google can can just optimize around a larger model and they can leverage a thousand chips in a scale-up domain to get you know the RL time speed much faster.

02:13:30

So that you can actually uh have this feedback loop be fast. But at the end of the day, in isolation, you almost always wanna go with a smaller model that gets RL'd faster and gets deployed into research and development so you can build the next thing and get more compute efficiency wins.

02:13:45

And then this compounding effect of, oh, I made a smaller model that I RL'd more that I then deployed into research and development earlier, and I spent less compute on the training itself because I was able and I was able to allocate more compute to the research. This like compounding effect of being able to do the research faster and faster and faster is potentially a faster takeoff. And that's all these companies want is fastest takeoff possible.

02:14:05

Um okay spicy question.

⁠¶ Why aren't more hedge funds making the AGI trade?

02:14:08

Um, you know, you're explaining you make the uh a semi-analysis sells these spreadsheets and you're always like, uh, six months ago or a year ago you told people the memory crunch, or now you're telling people the clean room crunch and then the in the future the tool crunch. Um Why is Leopold the only person that is using your spreadsheets to make outrageous money? Well w what is everybody else doing?

02:14:29

I think I think there are a lot of people making money in many ways. I think obviously Leopold Leopold jokes that, you know, he's the only client of mine that tells me our numbers are too low. Everyone else tells me our numbers are too high.

02:14:42

uh almost ad nauseum. Um, you know, whether it's a hyperscaler saying, Hey, that other hyperscaler, their numbers are too high, you know, and we're like, nah, that's it. And they're like, No, no, no, no, it's impossible, blah, blah, blah. And then you're like, Finally have to convince them through all these facts and data when we're working with hyperscalers or AI labs that in fact no, that number isn't too high. Um, that's correct.

02:14:59

But eventually and eventually like sometimes it's like six months later it takes them to realize or a year later. Um I think I think other clients like on the trading side also use our data, right? We we sell data to a lot of Um yeah, I think roughly sixty percent of my business is industry. So AI labs, data s uh data center companies, hyperscalers, semiconductor companies, uh you know, the the whole supply chain across AI infrastructure.

02:15:22

Um but then like forty percent of our revenue is like hedge funds, right? And l and you know, I'm not gonna comment on who our customers are, but I think a lot of people use the data, it's just How do you interpret it? And then what do you like view as beyond it? And I will say Leopold is pretty much the only person who tells me my numbers are too low always.

02:15:41

Um and sometimes he's too high, sometimes I'm too low, right? Uh but in general, I think other people are, you know, doing that and you can check certain you can you can look across the space at hedge funds and look at their thirteen Fs and see actually they own

02:15:55

Maybe not exactly what Leopold does, uh, because it's always like a question of like, what is the most constrained thing? What's the thing that's gonna be that's most outside of expectations? And that's what you're really trying to exploit as inefficiencies in the market. And in a sense, what our data shows is like is is like making the market more efficient by making the base data uh of what's happening more accurate versus like and but but in a sense, I think many, many funds do trade.

02:16:19

on information um that is out there. And it's not I I don't think I don't think Leopold's the only person. I think he has the most conviction on the entire uh in the entire like about the AGI takeoff though, right?

02:16:33

Right. I mean, but the the but the the bets are not about like what happens in twenty thirty five. The bets that you're making that are at least exemplified by public returns we can see for different funds, including Leopold's, are about what has happened in the last year. And the last year stuff could be predicted using your spreadsheets, right? So it's like it's a it's less about it's about buying like the next share of spreadsheet.

02:16:52

Just spreadsheets, you know, there's reports, there's API access to the data, there's a lot of data. But anyways, you know, I think

02:16:57

You see what I mean? Like it's like he it's not about some crazy singularity thing. It's about like, oh do you buy the memory crunch?

02:17:03

Simple one though is like you you only buy the memory crunch if you believe AI is gonna take off in in a huge way. And um The memory crunch, a lot of it was predicated on like, you know, at least for like people in the Bay Area who think about infrastructure, it's like,

02:17:17

obvious. KV cache explodes as context lenses get longer. So you need more memory. And then you do the math and you then and you also have to have a lot of supply chain understanding of like what fabs are being built and what data centers are being built and how many chips and all these things. And so we we we we track all these different data data sets like very

02:17:34

You know?

02:17:35

someone to fully believe that this is gonna happen. Like I think a year ago if you told someone memory prices were quadruple and smartphone volumes are going to go down forty percent, um you know, over the over the year or two after that, people were like, You're crazy. That never happened. Except a few people do believe that and those people did trade memory, right? And and and people did. I don't think like

02:17:55

Leopold was the only person buying like memory companies. I think there were a lot of people buying memory companies. He of course sized and positioned and did things in a better ways than some. Um, may maybe most, right? I I don't wanna comment on whose returns are what. Um, but you d certainly did well. Um, but other people also did really well, right? Um

02:18:16

You're trying to be like this is wow, you've made me diplomatic for the first time ever. No, no, you're fine. You're fine. I think it's hilarious, right? I'm being a diplomat, you know, whereas usually I'm like spicy. Yeah.

02:18:25

Okay. Uh maybe some rapid fire uh to to close out. Um can TSMC if you're saying, look, the the the memory logic, etc. the

⁠¶ Will TSMC kick Apple out from N2?

02:18:37

N three is mostly gonna be AI accelerators, but then there's N two, which is mostly Apple now. And then in the future, I guess AI would also want to go on N two. Can Can they kick out Apple if NVIDIA and Amazon and Google say, Hey, we really we're willing to pay a lot of money for N two capacity?

02:18:59

So I think the challenge with this is chip design timelines take a little while. And so that's more than a year. And the designs that are on two nanometer are more than a year out. Yeah. And so what would really happen is Apple or sorry, NVIDIA and all these others will be like, hey, we're gonna prepay for the capacity and you're gonna expand it for us. And then Apple would be and and maybe TSMC takes a little bit of margin, but not a ton. They're not gonna kick Apple out entirely.

02:19:23

Right. What they're gonna do is when Apple orders X, they may say, Hey, we project you only need Y or X minus one. And so that's what we're gonna give you is X minus one and then that flex capacity Apple's kind of screwed on. Um whereas traditionally Apple's always

02:19:34

overordered by like ten percent and cut back by ten percent over the course of the year. And some years they they hit the entire ten percent, just, you know, volumes vary, right? Based on the season, a macro, blah, blah, blah, blah, blah. Um

02:19:46

And so...

02:19:47

I don't think TSMC would kick out Apple. I think Apple will become a smaller and smaller and smaller percentage of TSMC's re revenue and therefore be less relevant for TSMC to cater to their demands. And TSMC could eventually start saying, hey, you gotta pre-book your capacity for next year for two years out, and you have to prepay for the CapEx, because that's what NVIDIA and Amazon and Google are doing.

02:20:07

Yeah. I I wonder if it's worth going to specific numbers on like I I I don't have any of them on the hand of like how how many N two wafers uh or s the what percentage of N two does Apple have its hands on versus over the coming years versus AI?

02:20:21

Yeah, I mean uh this year Apple has the majority of N2 that's gonna get fabricated. There's a little bit from AMD. They are trying to make some AI chips and CPU chips early. There's a little bit, but for the most part it's It's Apple. Um, and as we go forward to the year after that, Apple still, you know, gets closer to like half of it as other people start ramping.

02:20:40

But then it it falls dra drastically, right? Just like for N three, they were half um we'll we'll we'll see in and when I say N two, that includes A sixteen, which is a variant of N2. Um Over time, those those nodes will be the majority. And what's also interesting is traditionally Apple's been the first to a process node.

02:21:00

Two nanometer is actually the first time they're not. Well, bef besides Huawei, right? Huawei back in twenty twenty and before was the first with Apple, but they were both making smartphones. Now with two nanometer, uh You've got AMD trying to make a CPU and a GPU chiplet that they used advanced packaging to package together in the same timeframe as Apple.

02:21:21

And this is a big risk for AMD that causes potential delays potentially because it's a brand new process technology. It's hard. But at the end of the day, this is this is a bet that they want to do to, you know, scale faster than NVIDIA and try and beat them. As we move forward, actually, when we move to the A16 node, the first customer there is not even Apple. It's AI.

02:21:39

And as we move forward, that will become more and more prevalent. Not only will Apple not be the first to a node, there will also not be the majority of the volume to the new node.

02:21:48

And then they'll just be like any old customer. And because the scale of TSMC's CapEx keeps ballooning, but Apple's business is kind of not growing at the same pace, they become a less and less relevant customer. And They also will just cut their orders because things in the supply chain are kicking them out, whether it be uh packaging or materials or DRAM or NAND, these things are

02:22:09

increasing in cost. They can't pass on all the cost to customers likely because the consumer is not that strong. And and you end up with like this conundrum where they are just not Apple T S M C's best bud like they have been historically.

02:22:20

Do you think um if Huawei had access to three nanometer, they would have a better accelerator than Ruben?

02:22:27

I think I think Huawei they were the first with a seven nanometer AI chip as well. They were the first with a five nanometer mobile chip, but they were the first with a seven nanometer AI chip. Uh the the Huawei Ascend

02:22:39

like two months before the TPU and like four months before um Nvidia's, I wanna say, was it V one hundred or A one hundred? A one hundred, I think. And so, you know, I mean that's just moving to a process no. That doesn't imply uh software, it doesn't imply hardware design, all these other things. But Huawei is arguably the only

02:22:58

company in the world that has all the legs, right? Huawei has cracked software engineers. Huawei has cracked networking technologies. That's in fact their biggest business historically, right? And they have cracked AI uh talent. Um, but furthermore, beyond NVIDIA, they actually have better AI researchers. And furthermore, beyond NVIDIA, they have their own fabs.

02:23:18

And furthermore, beyond NVIDIA they have, you know, their own, you know, end market of like selling tokens and things like that. And Huawei tends to be like they're able to get the top, top, top talent Nvidia is as well, but not as not as in c and much concentration. And Huawei has a bigger pool in China. It's very arguable that Huawei, if they had TSMC, would be better than NVIDIA. And there are areas where China has advantages outside of um

02:23:42

in areas that NVIDIA can't access as easily, right? Around not just scale, but also like some things around um tech you know, certain optical technologies China's actually really good at. Um so there's there's certain I think it's very reasonable that if Uh in twenty nineteen that ish that that was not that Huawei was not banned from using T SMC.

02:24:02

Huawei would Huawei would had already eclipsed Apple as the biggest TSMC customer, and Huawei has huge share in networking and compute and CPUs and all these things, they would have kept gaining share and they'd likely be TSMC's bigger biggest customer.

02:24:14

Wow, that's crazy. Um I've got a kind of a random final question for you. So the other part of the Elon interview was robots. And so if um if humanoids take off faster than people expect, if By twenty thirty, there's millions of humanoids running around, which each need local uh local compute.

⁠¶ Robots and Taiwan risk

02:24:33

Any thoughts on what that implies? What would be required for that?

02:24:36

You know, there's there's a lot of like difficulties with like the VLMs and all these things that people uh VLAs that people are deploying on mo on on robots. But to some extent you don't need to have all the intelligence in the robot. Um and it would be much more efficient to not do that, right? Because in the server, in in in cloud, you can batch process and all these things. So what you may want to do is, hey, a lot of the planning and longer horizon tasks.

02:25:01

are determined by a l much more capable model in the cloud that runs at very high batch sizes. And then it pushes those directions to the robots who then interpolate between each subsequent action or is given like Hey, pick up that cup and then the model on the robot can pick up the cup. And it's like as it's picking up, it's like, oh, you know, in fact, this you know, you know, things like weight and all these things might have to be and like sh force may have to be like

02:25:26

Determined by the model on the robot, but not everything needs to be like, you know, hey, pick up the robo uh, you know, this, right? Or like, hey, that's a headphone. Actually, I'm the supermodel in the cloud. I know that this headphones are, you know, Sony XM sixes, which is not a Dorcash ad spot, but you know.

02:25:41

Is this guy plugging this thing so hard? It's like on the table, it's like on his neck when we're entering Satya together. Like, is he getting paid by Sony?

02:25:49

Um unfortunately not. Unfortunately not. But anyways, like, you know, it might say, hey, the headband is soft and and this is the grant this is the weight of it and all these things. And then the model on the robot can be less intelligent and take these inputs and do the action.

02:26:02

And it may get told by the model in the cloud every second, every 10 times a second, maybe, you know, it depends on the hertz of the action, but a lot of that can be offloaded to the cloud because otherwise if you do all of the processing on the device.

02:26:15

I believe it would be more expensive because you can't batch. Two, you couldn't have as much intelligence as you do in the cloud because the models will just be bigger in the cloud. And three, we're in a semiconductor shortage world and any robot you deploy needs a leading edge chip.

02:26:29

because the power is really bad for robots, right? You need it to be low power and efficient. And all of a sudden you're taking power or and chips that would have been for AI data centers and you're putting them in robots. Yeah. So now that 200 gigawatts gets lower if you're at if you're deploying millions of humanoids.

02:26:43

I I I think this uh this is uh very interesting because something people might not appreciate about the future is how centralized in a physical sense intelligence will be. We're right now with humans.

02:26:54

Your compute like there's eight uh eight billion humans and their compute is on on their heads, on their person. And in a future, even with robots that are out of uh physically in the world, I mean, obviously knowledge work will be done in a centralized way from data centers with huge Like in hundreds of thousands of instances or maybe millions of instances.

02:27:11

But even for robotics, the future you're suggesting is um one where there's like more centralized c uh thinking and centralized computation that's driving, you know, millions of robots out in the world. Um And so I think that's just like, yeah, th th that's uh there's an interesting fact about the future that I think people might not appreciate.

02:27:28

I I I think Elon recognizes this, which is why he's like going to different places for his chips, right? He signed this massive deal with Samsung to make his robot ships in Texas because he thinks, you know, like I I I personally think he thinks that, you know, Taiwan risk is huge.

02:27:44

And because of that and the centralization of resources in Taiwan, him having his robot chips in Texas and also being a separate supply chain that is not as constrained by no one's making AI chips really on Samsung besides Uh NVIDIA's new gr uh LPU that they're launched. Uh they're launching it next week, but we're recording it the week before.

02:28:02

It's coming out this week.

02:28:05

this episode's coming out before sick. Um so they're launching this new ch AI chip um next week. for which is built on Samsung, but that's like sort of a recent development from NVIDIA. And then that's the only other AI like AI demand there, whereas on TSMC, everything is competing. So he gets this like both geopolitical diversification, but also uh d supply chain diversity for his robots.

02:28:25

And he's not as competing as much with with the like willingness to pay of infinity of the data center of geniuses.

02:28:33

Okay. Final question. Um on Taiwan. If we believe that tools are the ultimate bottleneck How much of Taiwan's uh uh place in the AS semiconductor supply chain could we de-risk simply by having a plan to airlift every single process engineer or T S M C out when things come to if they get blockaded or something? Or do you actually still need to ship out the EUV tools? uh which would be multiple plane loads per single tool and would not be practical.

02:29:04

If you ship out all the process engineers and assuming it's like hot enough that you destroy the fabs, no one has all the fabs in Taiwan now, uh, which is a big risk, right? Yeah. Um

02:29:14

You know, these tools actually use a lot of semiconductors, which are manufactured in Taiwan. So it's like a it's like a you know, a a snake eating its own tail sort of like meme, because you can't make the tools without the chips from Taiwan, which you can't use without the tools in Taiwan. You know, there's there's obviously some diversification there, but

02:29:29

And and they don't use super advanced t uh chips in lithography tools, but at the end of the day there is some tail eating the dragon. Um just shipping out all the engineers and blowing up the fabs means China has a stronger semiconductor supply chain than the rest of the world, right? In terms of verticalization, now that you've removed Taiwan. And Now you've got all the know-how, but you've got to replicate it in, let's say, Arizona um or wherever for TSMC.

02:29:56

And it's going to take a long time to build all the capacity that TSMC has had built over the years. And so you've drum drastically slowed US. and global GDP, not just growth, you've shrunk the GDP massively. And you've got a lot bigger problems uh and your incremental ability to add compute goes to almost zero, right? Instead of hundreds of gigawatts a year by the end of the decade. Let's say by the end of the decade something happens to Taiwan. Now you're at maybe like

02:30:22

ten gigawatts across Intel and Samsung or twenty gigawatts. It's like nothing. Right. Um, and now all of a sudden you've like really cost some crazy dynamics in AI. Uh of course you have all the existing capacity, but that existing capacity pales in comparison to the capacity that's being expanded. Yeah.

02:30:37

Okay, Dylan, that was excellent. Thank you so much for coming on the podcast.

02:30:40

Thank you for having me and uh see see you tonight.

✨ This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.

Dylan Patel — Deep dive on the 3 big bottlenecks to scaling AI compute

Summary

Episode description

Transcript