Two Veteran Chip Builders Have a Plan to Take On Nvidia

Speaker 1

00:03

Bloomberg Audio Studios, Podcasts, radio News.

Speaker 2

00:20

Hello and welcome to another episode of the Odd Lots podcast. I'm Jill Wisenthal.

Speaker 3

00:25

And I'm Tracy Alloway.

Speaker 2

00:26

Tracy, here's something I know about AI. I don't know much, but here's something.

Speaker 4

00:31

I do know.

Speaker 3

00:32

How to log into chat GPT.

Speaker 2

00:33

No, I'm good at it. I'm good at that. I'm good at logging into chat GPT and claude, and I'm reasonably good at asking questions. Now, here's actually something about the actually about the business of AI that I know.

Speaker 3

00:44

Okay.

Speaker 4

00:45

I know that in.

Speaker 2

00:46

Video is making a ton of money and the stock has gone to the moon, and that other companies would like a slice of that pie.

Speaker 3

00:55

Yes, yes, that's a good thing to know.

Speaker 2

00:58

It's like a basic, simple thing, which is that when people think about AI chips, there's literally one company that comes to mind. I know others are involved. AMD has stuff, Intel obviously wants to play others, but there is obviously that one gigantic pile of cash that's flowing to this one company. I don't know if it's still but at one point, is the biggest company in the world is pulled back.

Speaker 4

01:21

A little bit.

Speaker 2

01:22

Well, I would say two things. One, other companies would like that a piece of that pie. And b companies that are in the business of building AI models would like to find a way to get cheaper, more efficient, less energy intensive chips so that they don't have to always pay the Nvidia tax.

Speaker 3

01:40

Do you want to know what I know about AI and semiconductors, Let's go for it. Okay, here's the one thing that I know, which is that whenever you have this conversation about in Nvidia, the one word that always comes up is moat.

Speaker 2

01:54

Oh yes, moat yeah.

Speaker 3

01:55

So, like you're either talking about like medieval castles or you're talking about semiconductor manufacturing. That's when you hear the word mote because over and over again people will say it is expensive to make the chips. You need a lot of money for research and development and to set up the fabs, and you need a lot of first person expertise in building them. And then there's also the network effect. So a company like Nvidia has this huge

02:20

moat around its business. The question, of course, is whether or not, getting back to the medieval castle analogy, it is unassailable, that's right.

Speaker 2

02:28

If semiconductor seems to be mote after MOTI, after mode, because there's ASML's moat, and then there's Taiwan Semiconductor's moat, and then there's Nvidia's moat, and so yes, it's like there's a series of moats, and if someone could overcome these moats or make find a way to build a bridge over one of these moats and enter this proverbial castle, that would be very lucrative. We know that many are trying to enter these moats, but it's incredibly costly and

02:58

capital intensive and difficult. There are just not many people who know how to do any of this stuff, and so the question of whether these modes can be overcome. But again, there are many businesses that would love to see more robust competition in the space so that their payment is not a attack.

Speaker 3

03:15

You know, one thing I don't know, and I don't think we've ever done an episode purely on this, but I don't really understand the different designs of chips. So I know that some chips, specifically in videos, are supposed to be better at AI. They're better at running lots of little calculations all at the same time. And I know there's basic chips that go into your refrigerator or

03:40

your car or whatever. But I don't really know the difference between what a chip that was designed specifically to run a large language model would look like compared to a standard basic chip.

Speaker 2

03:52

I don't know anything about chip design. I just sort of imagined someone on like using some CADS software, etching little lines in the thing and drawing some sort of like circuitry or you know, put it place in the trains.

Speaker 3

04:06

You know, A chip design game would be really fun, now that I think about it. Yeah, you could just draw little things on the square. Okay. Anyway, Well, we are going.

Speaker 2

04:13

To learn about how chip design works. We are going to learn about what makes a chip particularly good for the task of training and running inference on these AI models. And I have to say, I really do believe we have the two perfect guests because they are both veterans in this space, and they are both active in the attempt to bridge some of these motes and enter the

04:38

space and bring competition to the industry. We are going to be speaking with yin Or Pope, co founder and CEO of Medex, as well as Mike Gunter, co founder and CTO of Madex. It's a new company that's trying to build chips specifically for the purpose of large language models. Both of them have a lot of experience in the space we're going to we get our hands dirty, so to speak, and understand how you build the hardware for all this stuff and what makes it win and whether

05:06

it's even a winnable game. Ryan Or and Mike, thank you so much for coming on Outlaws.

Speaker 5

05:11

Thanks, happy to be here, pleasure to be here.

Speaker 2

05:14

So what do you tell us? What does a chip designer do? I know, I have this completely cartoonish view in my head that cannot possibly be right of someone on a big screen using some CAD software to sort of, you know, figure out what's going to be etched in that way for of silicon. What is the job of chip design?

Speaker 5

05:33

So maybe this is best told by what is the story of chip development from the beginning of a project to the end of it. So there's a range of different ways this can go, but there's a lot of

05:42

things that are in common. So generally a chip design team is at the low end, maybe thirty people, up to many many thousands of people at the high end, and it as the project typically runs for somewhere in the range of three to five years from conception to actually shipping to customer, and so over that time what we see in the life cycle is we tend to

06:04

start with a small team of architects. If you think of designing a house, the team of architects are the people who decide what rooms go in here, or how many bedrooms, how many bathrooms, what are the flows between them, how do people walk through the corridors, and so on, the coarse grained design of the chip, in the chip itself, that is, you know what kinds of components at the high level we have, and then after that initial exploration, this moves then over to the micro architects. These are

06:29

the people who are designing the individual rooms. What are the components that go in the individual rooms. So at that point everything we've done so far is a design stage thing. This is done in documents, spreadsheets, and it's a verbal and human communication form. But beyond that, that's when it starts to actually touch the computer in a more meaningful sense. And so the micro architects will hand over to the logic designers. They are the people who

06:52

are actually writing code. So even though you think of chips as being this very physical thing where there's wires and gates and everything. The way we try to admit this information to the computer is actually writing code. We write verolog that expresses the design of the chip. So that's what the logic designers are doing. That's an extended period of time building out all of the different you know, matrix multiplies, memories, circuitry that connects to the outside world,

07:16

and so on. And then the output of all of them is this verolog piece of software code that gets then compiled by a computer down to a set of gates which are logic gates and or gates and so on. And then why is that connect them together? That's the netlist this file. Then there's a few more stages still coming here. This file gets handed off to physical designers, who again work with CAD tools to convert this kind of logical discussion.

Speaker 2

07:40

Was right, Someone is using CAD tools.

Speaker 5

07:43

Absolutely, there's a CAD tool, but it's it's only out of the job. Okay, So the physical designers are converting the sort of logical description into a physical placement. So where do each of these gates go? Now there's two hundred billion logic gates on a chip, so a human is not going to be placing all of those manually.

08:00

So there's a huge amount of software assistance here. But what the human is doing is providing oversights through this process and saying, I've done this a ton of times before. This placement kind of looks wrong, it doesn't match my heuristics, and so I can probably do a better job here. So that's the physical designers, and the output of their work is actually eventually you get a polygons, so basically an image saying here is the thing that is going

08:21

to get etched onto a piece of silicon. So that file is ultimately a huge, like really big image in some form a bunch of polygons on it. It gets handed over to a manufacturing company such as TSMC. They spend maybe four or five months initially creating a mask set, so those are like the templates or the stencils that will be used to stamp out many many copies of the chip, and then stamps up many copies of the chip. You get a chip back. This is typically about two

08:51

or three years after you started the project. You get chips back, and now you have a bring up team who puts this chip into a whole board and connected to what to power and electricity and starts testing it, and then after another six to twelve months or maybe even more, eventually you actually can hand this over to customers. There's maybe just one or two other things which are not in that flow but very essential to call out too.

09:14

Are because of this whole process taking so long, especially the manufacturing, we also have like very large teams of verification people. So these are the people who before we actually send it to manufacturing and pay twenty to thirty million dollars of manufacturing, we have a substantial team doing a lot of testing. And this is software based testing, so writing tests in the same way a software engineer might to make sure that the functionality actually works as intended.

Speaker 6

09:39

To underlying the comparison to ordinary software, which Reiner touched on it, we're writing code, but it's on super hard mode. So if you have a if you have a software that's deployed the website, you can fix a bug and you know, ten minutes at basically zero cost. Whereas in our case, the reason that we have a large team people doing verification making sure that what we've done is correct is that it's potentially four months and thirty million

10:07

dollars for every mistake that you let through. Likewise, there is software, but it's a relatively small fraction of software that's very performance critical where you want the code to run as fast as possible. But in some sense, every line of code that you write in hardware has an impact on the overall performance of the product, because every line of code ends up getting embodied in silicon, and every line of code affects the eventual performance. So it's kind of coding, but on hard mode.

Speaker 3

10:34

So I intuitively understand the importance of getting the software right. But why does placement on the actual chip or wayfer Why does that matter? Are you trying to make it more efficient, are you trying to reduce the rise time? Or why does it matter where the little bits and bobs are placed? To use the scientific.

Speaker 6

10:56

Term, Yeah, you're right that reducing the right time is a massive issue. And you know, fundamentally the issue is that chips, you know, at a very abstract level, are composed of were at a somewhat content concrete level, really are composed of transistors and wires, and the placement has a dramatic effect on the link through the wires, which has a dramatic effect on both the performance of the

11:22

chip and how much you can fit. In terms of the impact that this has on the quality of chip that you produce, wires have over time not been shrinking in the same way that transistors have, and so getting the wearing right, which usually means getting the placement right, has become more and more important over time.

Speaker 3

11:57

Can chips be beautiful? I know code can be elegant, and some people will say certain code is beautiful, But have you ever looked at a semiconductor and been like, oh, wow, that's really nicely put together.

Speaker 5

12:10

For me, I mean I think absolutely yes. This is like why I work in this space is I just really like geeking out on the design of things. But to me, what beautiful for a chip means is that it kind of does exactly what it was designed to do, and no more and no less. I mean, obviously less would be a bit of a disappointment, but often if it does more, do you think, well, maybe I designed it for slightly the wrong purpose or something like that.

Speaker 2

12:32

I think this is a good seg into getting into your business specifically, so we all know that so much of this AI is powered by these in video GPUs, but in video GPUs have been used for a long time for many things that do not have anything to do with large language models or the specific AI applications that people are excited about today in twenty twenty four.

12:56

So for a while they were, well, the video games is obviously the big one for decades and decades, and then there was like five minutes where people got really excited to use them for ethereum mining, and now everyone's really excited about their use for artificial intelligence and large language models and some of these other generative AI applications that people are excited about right now, Why don't you tell us maybe the sort of idea behind maddex, but

13:22

specifically what you were both doing when you were at alphabet or Google, which you know it has its own chips. I believe it has something called TPUs. What was the project at Google? Why did Google find it necessary or a good business to start building their own chips for in house purposes? And then why did you feel the need to then leave to build what you're building now for LLM specifically?

Speaker 6

13:48

Yeah, So what Google was seeing, and this was at this point sometime back more than a decade ago, they were seeing that the use of artific intelligence lllms were not a thing at that point, was going up, and they were worried about how much money they would have to spend on traditional it would be it would have been GPUs at that time, and so they built a very specialized chip to do neural nets, and that chips specialize on matrix multiplication. So they put in a structure

14:27

called a systolic array, which they definitely didn't invent. It existed, has existed from the seventies that is especially good at doing matrix multiplication. Now after that, Nvidia has added a similar structure into their chips. And the initial Google TPU was an inference focused only chip, and then they have subsequently made chips that can be used for both training

14:51

and inference. And I guess now is a good point to So the very last thing that I was doing at Google was I was on the TPU team and Reiner was on the large language model team. And it's probably good to have him sort of tell free from here.

Speaker 5

15:05

So I mean, what we were seeing and this this is what we personally were seeing, but Google was seeing more generally as well. Is just large language models were a thing. There was this period of time between GPT

15:14

three and chat GIPT coming out. GPT three came out in twenty twenty, and so people who were very plugged into the field recognized the importance of it all at least to some extent, recognized the importance of it back then, and so there was this push to you know, everyone wanted to create their own large language model that was better than GPT three, and so, I mean, at the time,

15:35

I was on the Large Language Model team. We helped training Google Palm, and we were using thousands of TPUs for that, and one of the things we were saying is, well, look what does it cost to deploy this? In Google Search? There's quite a lot of search querers. I think it's the public estimates thro about one hundred thousand of them per second. If you multiply out how much each querer costs, and if you want to run that on large language models,

15:56

that's a lot more expensive. And then also I just if I want to train a model that's times bigger than my current model or one hundred times bigger, suddenly these models have just moved from costing you know, a million dollars or one hundred thousand dollars to train to tens of millions and hundreds of millions of dollars, and so the overall goal was can we make it cheaper by any way possible. So, of course there's algorithmic approaches.

16:18

There's a lot of opportunity on the algorithm and research side. But then the other really big lever is just making better hardware. So one of the things we were looking at was trying to make Google's TPUs better for large language models. What led us, actually, i mean this is personally about Mike and me in this case, or what led us to leave Google to make medics was we saw that there was We believe that there is some opportunity to make chips substantially better if you're only looking

16:42

to focus on large language models. And so the chips that were designed pre GPT three and especially pre chat GPT try to do a really good job on really good job on small models as well as a really good job on large models. And so what you find is that the circuitry in those chips, there's a bit of circuitry for what you need for small models, there's a bit of secretry for what you need for large models.

17:03

Also for maybe embedding look ups. There's three or four different kinds of workloads, and all of them take some of the real estate in your cellica. And so if you really want to make the best use of the real estate, you should just focus on the thing you care about most and hope that there's a big market there.

17:17

So that the game and or what we decided to do when we see some others deciding to do as well, is to really try and focus on just the one workload that seems like it's going to become a one hundred billion dollar or a trendion dollar industry.

Speaker 2

17:30

I know there's always this sort of cliche when talking about techno. Oh, Google and Facebook, they can just build this and they'll destroy your little startup because they have infinites amounts of money. Except that doesn't actually seem to happen in the real world as much as people on

17:44

Twitter expect it to happen. But can you just sort of give a sense of maybe the business and organizational incentives for why a company like Google doesn't say, oh, this is one hundred billion dollar market in video is worth three and a half trillion or three trillion dollars, let's build our own LM specific chips. Why doesn't that happen at these large, hyperscaler companies that presumably have all the talent and money to do it.

Speaker 6

18:13

So Google's TPUs are primarily built to serve their internal customers, and Google's revenue for the most part comes from Google Search that Google Search, and in particular from Google Search ads.

18:29

Google Search ads. Is you know, a customer of the TPUs, It's a relatively difficult thing to say that hundreds of billions of dollars of revenue that we're making, we're going to make a chip that doesn't really support that particularly well, and focuses on this at this point unproven in terms of revenue market and it's not just ads, but they

18:51

are you know, a variety of other customers. For instance, you know, you may have noticed how Google is pretty good at identifying good photos and doing a whole variety of other things that are supported in many cases by the TPUs.

Speaker 5

19:06

I think one of the other things too, that we see in all chip companies in general, or companies producing chips, is because producing chips is so expensive, you end up in this place where you really want to put all your resources behind one chip effort. And so just because the thinking is that there's a huge amount of return on investment in making this one thing better rather than

19:25

fragmenting your efforts. Really, what you'd like to do in this situation where there's a new emerging field that might be huge or might not, but it's hard to say yet, what you'd like to do is maybe spin up a second effort on the side and have like a skunk works. Yeah, that's work, right.

Speaker 2

19:38

That would be just to let Ryan er and just let the two of you go have your own little office somewhere else.

Speaker 5

19:44

Yeah, just organizationally that it's often challenging to do, and we see this across all companies. Every chip company really has essentially only one mainstream chip product that is that they're iterating on and making better and better over time.

Speaker 3

19:58

To what degree is to design driven by the customer? And what I mean by that is, so the TPUs at Google were developed to handle Google's internal workloads, but at other chip designers, to what degree will customers come and like basically do a reverse inquiry and ask for a specific chip or what does the dialogue between customers and the big chip designers actually look like.

Speaker 5

20:24

Yeah, it's a fun interplay of I want my provider to do a good job, but I also don't want to leak my IP too much. So you can see this how this played out in so Mike was talking about through the development of the TPUs which were publicly announced in twenty sixteen and around the same time in videos first GPU with the tens ocos, So that was the first GPU that was really focused on matrix multiplication. That was the vaulted generation came out at about the

20:49

same time. And some of this actually was a result of when Google had this recognition of look, matrix multiplication is so important, we need to make it really better. They simultaneously work themselves but also went to Nvideo and said we're not telling you much, but can you do better at matrix multification? And so that was enough for Nvidia to go on the first generation. They made a

21:10

pretty good attempt. But if you talk to people that in video, I'll say that actually the second generation of the tensacle which was in the MPa generation, was where they really nailed it. So when it's big enough, you sometimes see these customers coming and saying what they want, but they'll maybe they'll try and disguise what they're asking for or not giving you the absolute minimum amount of information to help a vendor make what they want without revealing too much about their.

Speaker 4

21:32

IB Let's get to maddex.

Speaker 2

21:50

Tell us the product that you're designing and how it fundamentally will differ from the offerings on the market, most notably from VideA.

Speaker 4

22:00

Yeah.

Speaker 5

22:01

Yeah, So we make chips and in fact racks and clusters for large language models. So when you look at in videos GPUs, you already talked about all of this, the original background in gaming, this brief movement in ethereum, and then even within AI they're doing small models of large models. So what that translates to in you can think of it as the rooms of the house or something. They have a different room for each of each of those different use cases, so different circuitry in the chip

22:29

for all of these use cases. And the fundamental bet is that if you say, look, I don't care about that, I'm going to do a lousy job if you try and run a game on me, or I'm going to do a lousy job if you want to run a convolutional network on me. But if you give me a large model with very large matrices, I'm going to crush it. That's the bet that we're making amatix, so we spend as much of our silicon as we can on making

22:52

this work. There's a lot of detail in making all of this work out, because you need not just the matrix multiplication, but the all of the memory bandwidths and communication bandwidths and the actual engineering things to make a pen out. But that's the core bette.

Speaker 3

23:05

And why can't Invidia do this? So you know, in Video has a lot of resources, It has that big moat as we were discussing in the intro, and it has the GPUs that are already in production and working on new ones. But why couldn't it start designing an LM focused chip from scratch?

Speaker 6

23:23

Right? So you talked about in Vidia's mode and that moat has two components. One component is that they build the very best hardware, and I think you know that is the result of having a very large team that executes extremely well and making good choices about how to serve their market. They also have a tremendous software mode. And you know, both of these moats are important to

23:48

different sets of customers, so they're tremendous software mode. They have a very broad, deep software ecosystem based on Kuda that allows it.

Speaker 3

23:59

Oh yeah, I remember this came up in our discussion with core Weave.

Speaker 6

24:02

Yeah yeah. And so that allows customers who are not very sophisticated, who don't have gigantic engineering budgets themselves, to use those chips and using videos chips and be efficient at that. So the thing about a mote is not only does it in some sense keep other people out, it also keeps you in. So insofar as they want to keep their software mode, their Kuda mote, they have to remain compatible with Kuda and compatibilility with that software mode.

24:34

Compatibilility with Kuda requires certain hardware structures. So in Videos has lots and lots of threads, they have a very flexible memory system. These things are great for being able to flexibly address a whole bunch of different types of neural net problems, but they all cost in terms of hardware, and they're not necessarily those The choices to have those sorts of things are not necessarily the in fact, not the choices that you would want to make if you

25:03

were aiming specifically at an LM. So in order to be you know, fully competitive with a chip that's specialized for LLMS, they would have to give up all of that. And you know, Jensen himself has said that the one non negotiable rule in our company is that we have to be compatible with kuda.

Speaker 2

25:23

This is interesting. So the challenge for them of spinning out something totally different is that it would be outside the family. And so it's outside the Kudah family, so to speak.

Speaker 3

25:35

And meanwhile, you already have like pie, Torch and Triton waiting in the wings.

Speaker 2

25:39

I guess, so why don't you tell us a little bit more about the business of LLLM chips specifically, because there's a lot of questions, Like, you know, one question is you have all these people in Silicon Valley who seem motivated by the idea of like agi that that's the goal, that we're going to have super intelligence one day, maybe thousands IQs and the hundreds of thousands one day.

26:03

That'll make us all seem very dumb, et cetera. Are you implicitly making a bet by your company that it'll be lllms that we'll get there, Because as you mentioned, there are other algorithmic ideas, There are other ideas for how you might be able to expand intelligent. How much of your company's bet is the idea that the future of generative AI or as we know it, is going to be along the LLLM pathway.

Speaker 5

26:28

One of the core things. I think there's two core ingredients of the LM pathway. Yeah, one so far is the transformer architecture, which is a model architecture and was

26:37

substantially better than the things that came before. But the other one, and that actually has a much longer history, is the scaling hypothesis in hypothesis in general sore, But that's the there's a general observation which has been widely recognized for a decade or more that if I am sorry, I'm training in neural net or some kind of AI model, if I want to make its quality better and make it bigger, and so what does bigger mean? Bigger means

27:03

I have to spend more compute training it. Bigger means I have more neurons. Tho's are the loosely analogous to the sort of processing power in a human brain, although analogy is weak. If I make my model bigger, I get better quality. That's a sort of simple qualitative thing to say, and that's been true for a really long time in these models. So the advantage of that, or the thing that we've seen really recently is we've seen

27:27

this turned up to eleven. So around the time when GPT three came out, So in twenty twenty, a paper was published called the Scaling Laws, and so this took this qualitative observation and made it quantitative and said, actually, we can even fit an equation to it, and so that gave people a lot more conviction to it. And this is what led to the people saying, well, if I have a better model, I can solve more problems

27:52

with AI than I could before. And so every time I spend ten times as much training on it, I unlock new use cases. And so that's what to this craze. And the remarkable thing is that while there are these diminishing returns, I have to spend ten times as much computing power to get some improvement beyond that sort of logarithmic scale. We don't see as yet any plateau an so it seems like there continues to be opportunity here.

28:17

So the key thing is this scaling hypothesis or scaling laws in general that are causing these models to grow. And then I mean as a hardware provider, what you might look at is you might say, that's the thing I really want to bet on. I want to bet on the growth of models, and I mean, now it's a little more in the details, but the thing you actually have to bet on is the growth of matrix sites, which is very strongly correlated with the growth of models.

Speaker 3

28:38

Just to hammer this point home, if more AI was learning from stuff like self play or synthetic data rather than scraping the internet, would the design of the chips have to take that into account, Like, how would the chips vary between those different learning styles.

Speaker 5

28:59

Yeah, so in general, when you're building a chip, you have to make it programmable because you're going to make this chip and you will ship a new version every two years, but what people want to do with the chip is going to change every month or so, so

29:10

it has to be programmable to some extent. So that's true for all of the chips that anyone ships, and so there's different scales of programmability and what kinds of changes you need to adapt to, So changes in kind of the way you feed it data that's maybe on the very very outer layers of doesn't affect much of the core of the chip, and so those kinds of changes tend to be some of the easier changes to

29:33

adapt to. The things that then become a little harder to adapt to is if I'm substantially changing my model architecture. So a small change might be maybe I change the number of layers, or I reorder some of the layers in my model, or maybe I use the same ingredients but shuffle them around in some way. A bigger change would be that say, Okay, I'm actually going to throw out all of these ingredients and use a completely different

29:53

set of primitives. And that's often that's that last step is the one that that would really kill you if you're betting very much on it. Partic a set of ingredients.

Speaker 6

30:01

So an example of a potential different set of primitives that are used in other models that aren't used in llms are we made mention of these embedding things that are used in recommender and ad models. So Facebook has talked about building special purpose hardware to support inference on those kind of models. Those are they have much less emphasis relative emphasis, particularly on matrix multiply. Another possible direction

30:31

that model architecture could go. That would be different and bad for a chip designed for current LLMS, would be instead of having very large matrices in about one hundred layers, you could have much smaller matrices but ten thousand layers, and that would demand a different sort of design to be good at that kind of model. So a bet that looks good given the modern history of neural nets is that matrices will get larger over time.

Speaker 2

31:01

You know, you're talking about scaling laws, and so everyone talks about okay, computation, power, energy efficiency, et cetera, and I never know if they're true. But then sometimes you read these stories they're like Sam Altman wants to go around the world and raise like five trillion dollars to like build his own semiconductor fabs and have the entire architecture,

31:22

because that's like what it's going to take. What about the data side, because this is another thing people talk about, the data wall that you know, there's only one Internet to scrape, and then you know, after that, what if you're not there at AGI yet again, I know you're solving for the hardware side, but when you think about risks going forward along the LLM pathway, what's your perspective on well, what happens when we've just we've ingested all the data.

Speaker 5

31:49

So there's two ways you can make a model better. One of them is by training on more data, and the other one is making a bigger model. And these two effects work in a really complimentary way. So you can think of a like having a bigger brain and then practicing more and so both of these are going to help to some extent. So there's a risk that

32:06

we hit a data wall. In general, there's been a long history of people predicting walls in different kinds of walls in techno training and then ingenuity overcoming this, and so I wouldn't I would bet that there's a fairly large amount of mileage to continue here. Tracy mentioned self training and generating new data. That's the vibe in the industry is that this is a promising direction for sure.

32:34

But even if you don't bet on that, there's mileage, and it's less attractive mileage, but there is mileage in making the models bigger. So I believe, and I think this is shared by many people insiders in the industry as well, is that there's at least a few more orders of magnitude available here before we run out of easy engineering knobs to turn. But of course, one of the limiting factors here is just the dollars you spend. So you have some amount of budge that I'm willing

33:01

to spend. And I mean, maybe Sam can raise five trillion dollars, I don't think necessarily everyone else can raise that amount of money to train a model. And so if you've got a fixed amount of dollars that you want to spend, and you want to train the best model, you want to make the best use of the multipliers, you want to make the best use of the dollars you spend, and so that means fundamentally, what you're paying for is the flops, which flops is a floating point operation,

33:21

so the number of multipliers you can do. And then every time I increase my model size or increase the amount of training data I've got, I'm spending more flops, and so flops converts into intelligence. And then if I've got a fixed budget, really what I want to maximize is my flops per dollar.

Speaker 3

33:38

I find this so fascinating because there are so many different directions that you could theoretically go in, and so many decisions that need to be made, you know, do you go after of that scale? How do you tailor the design for different methods of data input? Although, as you said earlier, maybe that's one of the easiest things

33:57

to respond to. But then there are other trade offs that you have to think about between speed and power consumption and I guess area utilization or the placement of all the bits and bobs that we were discussing earlier, and cost effectiveness too. How do you balance all those elements and are there particular things that you're willing to sacrifice for others.

Speaker 6

34:22

So different people can choose different targets to go after in the market, and so one one target, which you could argue in VideA is winning on currently and one of the reasons that their chips their products are so popular is, as Rayner said, just the amount of flops you can get out out of a chip, and if all the chips are roughly the same to make, that

34:46

translates into two flops flops per dollar. So another target you could also go after would be the time to respond to one user so to get the answer back. One approach is maxim the throughput that you can have and others minimizing the latency, So kind of the difference between a seven forty seven flying a group of passengers across the country versus an SR seventy one getting there in a couple hours but only bringing one or two people.

Speaker 2

35:16

Let's talk about the business itself. So you know, in the old you know, ten years ago, someone starting a tech startup, they you know, get three or four people

35:26

in an office and then they write something up. But then they have a code and it doesn't maybe they don't even have to raise any money to do it, and they certainly don't have to depend on whether Taiwan Semiconductor has any capacity at their fab or anything like this walk us through the sort of nuts and bolts of what it actually takes to build a chip business from the ground up, both in terms of costs and

35:50

time and what you have to rely on. You know, we've talked about some of the design element, what are the business side requirements and what will it take to actually succeed.

Speaker 6

35:59

So fortunately we've kind of referred to this in multiple places. There's a huge ecosystem around designing chips. So there's a portion you have to do yourself, and there's a portion that you can buy, so the placement of Tracy's bits and bobs and also the testing that we've talked about. There are DA electronic design automation companies that build those tools, like there are companies that do just manufacturing, so TSMC and their suppliers, and then there are many other other companies.

36:34

So most companies don't go directly to TSMC. So so very sophisticated companies like Apple or Nvidia interface directly with them, but most other companies go through ACIC vendors. And so you know, the prominent companies in the most prominent companies in that space are Broadcom and Marvel, and then there are a bunch of smaller companies. A couple that are close to TSMC are all Chip and GUC and so they'll do a lot of the work of taking your

37:08

code and actually getting it placed on the chip. That's often a very good thing to outsource because it's the work is somewhat seasonal. You're only ready to do that placement when you're near the end of this three year project, and so you kind of don't have work unless you're a massive company for people the whole time. So while that ecosystem means that you don't have to hire a ton of a huge number of people yourself. All of those people have to get paid, and so you do

37:39

have to raise a fair bit of money. And another big element of actually thing that you end up spending money on is there are parts of the chip that are very special, difficult to design and take multiple iterations of taping things out and seeing if they work. So the very high speed interconnect the connects to get connects

37:58

together chips is an example that. So those are designed by yet another set of companies, and the design is difficult and fairly expensive because of the need to do multiple tapeouts, and so it's very fairly expensive to buy that IP. So when you add up the cost of the IP, the cost of the ASK vendors services, and then the mask fees that TSMC charges using ASMLS and ASK creation software, you're talking about tens of millions of dollars to bring a state of the art chip to market.

38:35

It's the numbers are much lower for a simpler chip on it without the very high speed iOS and on an older node, but for an advanced node it's a pretty expensive process.

Speaker 3

38:46

When do you think you'll be able to bring your chips to market.

Speaker 5

38:49

Generally, we see these projects taking three to five years across most companies. We started on this seriously at the beginning of twenty four, so about three years from there is likely for us.

Speaker 2

39:00

Tell us about what customers because I've heard this, you know, we're all trying to find some alternative to video, whether it's to reduce energy costs or just reduce costs in general, or be able to even access chips at all, since not everyone can get them because there are only so many chips getting made. But when you talk to like theoretical customers, A, who do you imagine as your customers? Is it the open eyes of the world, is it

39:28

the metas of the world. Is it labs that we haven't heard of yet that could only get into this if there were sort of more focused, lower cost options. And then b what are they asking for? What do they say, like, you know what we're using in video right now, but we would really like X or Y in the ideal world.

Speaker 5

39:48

So there's a range of possible customers in the world. The way that we see or away you divide them up, and how we choose to do that is what is the ratio of engineering time they're putting into their work versus the amount of computers spent that they're putting in.

40:01

So the ideal customer in general for a hardware vendor who's trying to make the absolute best, but not necessarily easiest to use hardware is a company that is spending a lot more on their computing power than they are spending on the engineering type, because then that makes a really good trade off of maybe I can spend a bit more engineering time to make your hardware work, but I get a big saving on my computing costs. So companies like open ai would be obviously a slam dunk.

40:27

There's many more companies as well. So the companies that meet this criteria of spending many times more on compute than on engineering. There's actually a set of maybe ten to fifteen large language model labs that are not as well known as open ai, but you might think character Ai, Coheer, and many other companies like that in mistrial. So the common thing that we hear from those companies, all of those are spending hundreds of millions of dollars on compute,

40:55

is I just want better flops for dollar. That's actually the single deciding factor, And that's primarily the reason they're deciding on today, deciding on in videos products rather than some of the other products in the market, is because the flops for dollar of those products is the best you can buy. But when you give them a spec sheet and the first thing they're going to look at is just what's the most floating point operations I can

41:17

run on my chip? And then you can rule out ninety percent of products there on the basis of okay, just doesn't meet that far. But then after that you then go through the more detailed analysis of saying, okay, well, I've got these floating point operations, but is the rest going to work out? Do I have the memory bandwidth and the interconnect? But for sure, the number one criteria is that top line flops.

Speaker 2

41:38

When we talk about delivering more flops per dollar, what are you aiming for? What is current benchmark flops per dollar? And then are we talking like can it be done like ninety percent cheaper? What do you think is realistic in terms of coming to market with something meaningfully better on that metric?

Speaker 5

41:56

So in videos, Blackwell in their FP four format offers ten pet of flops in their chip, and that chip sells for Bullpark thirty to fifty thousand, depends on many factors. That is about a factor of two to four better than the previous generation and video chip, which is the Hopper chip. So part of that factor is coming from going to lower precision, going from eight bit precision to

42:21

four bit precision. In general, precision is in one of the best ways to improve the flops you can pack into a certain amount of silicon, and then some of it is also coming from other factors such as cost productions that in Vidia has been deployed. So that's a benchmark for ware inn video is that now you need to be at least a few integer multiples better than that in order to compete with the incumbent. So at least you know, two or three times better on that metric,

42:45

we would say. But then, of course, if you're designing for the future, you have to compete against the next generation after that too, and so you want to be many times better than the future chip, which isn't down yet, And so that's the thing you aim for.

Speaker 2

42:57

Is there anything else that we should sort of understand about this business that we haven't touched on that you think is important?

Speaker 6

43:03

One thing, given that this is odd lots that I think the reason that sam Altman is going around the world talking about trillions of dollars of spend is that he wants to move the expectations of all of the suppliers up. So as you have we've observed in the semiconductor shortage, if the suppliers are preparing for a certain amount of demand and demand you know, in the case famously of the auto manufacturers as a result of COVID canceled their orders and then they found that demand was much, much,

43:37

much larger than they expected. It took a very long time to catch up. A similar thing happened with the in videos H one hundred. So TSMC was actually perfectly capable of keeping up with demand for the chips themselves.

43:52

But the chips for these AI products are use a very special kind of packaging which puts the compute chips very close to the memory chips and hence allows them to communicate very quickly, called coos, And the capacity for coos was limited because TSMC built with a particular expectation of demand, and when H one hundred became such a monster product, their coosts capacity wasn't able to keep pace

44:22

with demand. So, you know, supply chain tends to be really good if you predict accurately, and if you predict badly, you know, on on the low side, then you end up with these shortages. But on the other hand, these companies, because the manufacturing companies have very high capex, they are fairly lows to it, predict badly on the high side because that leads them to having spent a bunch of money on capital capex that they're unable to recover.

Speaker 2

44:51

So, yeah, this is very interesting, this idea that in some part it's a signal we're not slowing down. We're you know, we have more and more that we want to do. So if you're anywhere along the semiconductor supply chain, don't start, you know, curbing your expectations or curbing your production because we want to build a lot more. I'm curious one last question, I guess for both of you.

45:14

You know, you hear a lot of people in the industry you say, like, we might just be three or four years away from AGI or super intelligence, however that's defined, and then you get into a lot of these philosophical questions and ethical questions about you know, whatever is the AI going to, well, it's gonna be the role for humans or is it gonna kill us all? Or whatever you know, fear scenario you want. But the two of

45:37

you like, how do you see that question? Like could we hit it in just a few short years where we have something that people agree is oh, this is agi Like are you is it short runway or just a couple of years away from this or does it feel like no, that's still quite a few years out.

Speaker 5

45:53

If ever, I think what we have what's your.

Speaker 6

46:00

Approximately zero to be blunt? Thank you, my p great things. I mean, I think we kind of already have great things, and we've just gotten the models of this level of quality recently and we're learning how to use them, and the quality is going up. The you know, the fact that we can get a computer to write code pretty well is fairly amazing to me. That you can ask it to tell a good joke in the style of a particular person and it can do that is also amazing. Yeah.

Speaker 2

46:32

Well, uh, I'm glad, You're I'm glad, you're I'm glad your odds of total doom and annihilation are zero. That makes me feel a little bit better. Ryan or and Mike, thank you so much for coming on odd laws.

Speaker 7

46:43

I learned as from that conversation there's a pleasure.

Speaker 4

46:58

Tracy.

Speaker 2

46:59

There was obviously ton that was really interesting in that conversation, but I particularly like the part about incentives of large legacy incumbents about entering a totally new business. So for a company like Google, the primary purpose of their chips is going to be serving an in house business purpose.

47:19

And even with all the money that they have, and even with the engineering talent, there's still a sort of trade off question involved of how much do we want to build chips for some other purpose, for some sort of external service.

Speaker 3

47:33

Yeah, and I also thought the point about why Sam Altman is going around talking about how, you know, how many billions he's going to spend was really interesting and it kind of makes sense in the aftermath of the pandemic and semiconductors. I'm sure you remember this. I think that was actually where we first learned about the bullwhip effect and this idea that very small changes in one end of the supply chain, which would be customer demand, can end up reverberate, you know, all the way through

48:02

the supply chain. And so when you had carmakers start to cut back on their orders. That had a much bigger and longer impact than you might have anticipated. And so it's interesting to see companies coming at it from the other end and saying like, no, we have all this money and we're going to be here for a long time.

Speaker 2

48:20

We're not slowing down. We are going to agi. And so if you think like, oh, we're gonna come out with GPT five and then we're going to focus on just like commercializing that and selling it to airlines to do customer support after that, and just go into glide mode and take business like they want to signal that they're like building more and more and more. I thought

48:39

that was interesting. I thought it was interesting the point about Nvidia and Kuda and the idea that, Okay, yes, the Kuda software ecosystem is perceived to be this mote that makes it harder for other semiconductor companies to break into the same business, but it's also constraining from an in video perspective, the idea that, Okay, if they want everything to be Kuda compatible or be within the same family of software usage, then that also constrains the potential

49:11

sidelines that they might get into right.

Speaker 3

49:13

And opens up space for competitors. But I don't know why I haven't really like internalized this lesson before, because it comes up in every conversation we do on semiconductors. But I think there's still a perception, or at least maybe I still have this perception that the moat around Nvidia is like the actual hardware. Yes, but it's not. It's the software. It's Kuda.

Speaker 2

49:34

It seems like it's both.

Speaker 3

49:36

Well, yeah, but I think I'm starting to appreciate how much of it is Kuda is what I'm.

Speaker 2

49:40

Saying it certainly, it certainly seems to come up over and over again. How much the fact that this is what people use. It's the software that makes it easy for less sophistic less sophisticated customers to use the applications. It seems extremely powerful. It's also interesting to hear about like the ecosystem of businesses around semiconductor design. And you know, he mentioned Broadcom. Ryner mentioned Broadcom, which is a company that I don't think we've ever really talked about very

50:12

much on the show. But if you look at that stock, I mean, it looks kind of like you're looking at a chart of in video like that has been a gigantic winner over the last few years. Back in twenty twenty, it was a thirty one dollars stock. Now it's one hundred and forty six dollars stock. Okay, I tell you a five back or so, maybe not quite in video returns.

Speaker 3

50:33

And this idea that, like how in Vidia has just skewed like I know what's expected of every stock, it's like, but this is on a different plane.

Speaker 2

50:41

And this idea that a semiconductor startup doesn't necessarily interface directly with TSMC like that really for the most sophisticated advance, and then there are some of these companies in the middle. I thought that was extremely interesting.

Speaker 3

50:54

Uh you know what, Joe, I asked chat GPT, what the most beautiful semiconductor is. Yeah, it says Gallium arsenide is considered beautiful for several reasons. It's crystal structure is often admired for its clarity and elegance. Wow, So I guess semiconductors may solum arsenide, So.

Speaker 2

51:16

There's beauty at the molecular level. Yeah, But actually I thought, you know, I thought when you asked that question, it's like, oh, it's just sort of a you know, philosophical, you know, fun whimsical question, but this idea of like doing the minimum required or not building a bunch of extra rooms

51:33

in the house that you don't really need. And as we know, I mean, it's just objectively true that even if in video chips are the best in the world for AI, they do other stuff beyond AI, and they do ethereum mining, or they used to, and that was based on proof of work back in the old days.

51:50

And of course they're for video games. But if you really just want a computer, or if you really just want a model that can speak in English or write code, or can just think without doing video games and chip mining, then perhaps there are a bunch of rooms in the house that are totally unnecessary.

Speaker 3

52:08

Yeah, And I mean there's efficiency costs to that efficiency cost. Yeah, you're trying to streamline it as much as possible. All right, shall we leave it there.

Speaker 2

52:15

Let's leave it there.

Speaker 3

52:16

This has been another episode of the All Thoughts podcast. I'm Tracy Alloway. You can follow me at Tracy Alloway.

Speaker 2

52:22

And I'm Jill Wisenthal. You can follow me at the Stalwart. Follow our guests Rein or Pope. He's at rein Or Pope and Mike Gunter. He's Mike Gunter Underscore. Follow our producers Carmen Rodriguez at Carman Ermann dash Oll, Bennett at dashbot In Kelbrooks at Kelbrooks. Thank you to our producer

52:39

Moses Ondam. For more Oddlots content, go to Bloomberg dot com slash odd Lots, where we have transcripts, a blog, and a newsletter and you can chat about all of these topics twenty four to seven in the discord Discord dot gg slash odd Laws. There's even a semiconductor room in there, so you can just go there and just talk about chips all day if you want.

Speaker 3

53:00

If you enjoy All Lots, if you like it when we talk about what the most beautiful semiconductor is, then please leave us a positive review on your favorite podcast platform. And remember, if you're a Bloomberg subscriber, you can listen to all of our episodes absolutely ad free. All you need to do is connect your Bloomberg account with Apple Podcasts. In order to do that, just find the Bloomberg channel on the platform and follow the instructions there. Thanks for listening.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript