Inside the Battle for Chips That Will Power Artificial Intelligence - podcast episode cover

Inside the Battle for Chips That Will Power Artificial Intelligence

May 08, 202359 min
--:--
--:--
Listen in podcast apps:

Episode description

Nobody knows for sure who is going to make all the money when it comes to artificial intelligence. Will it be the incumbent tech giants? Will it be startups? What will the business models look like? It's all up in the air. One thing is clear though — AI requires a lot of computing power and that means demand for semiconductors. Right now, Nvidia has been a huge winner in the space, with their chips powering both the training of AI models (like ChatGPT) and the inference (the results of a query.) But others want in on the action as well. So how big will this market be? Can other companies gain a foothold and "chip away" at Nvidia's dominance? On this episode we speak with Bernstein semiconductor analyst Stacy Rasgon about this rapidly growing space and who has a shot to win it.

See omnystudio.com/listener for privacy information.

Transcript

Speaker 1

Hello, and welcome to another episode of The Odd Blocks podcast. I'm Joe Wisenthal.

Speaker 2

And I'm Tracy Alloway.

Speaker 1

Tracy, I'm not sure if you've heard anyone talking about it or anything, but have you heard about like this sort of AI thing people have been discussing?

Speaker 2

Oh, you know what, I discovered this really cool new thing called chat gps.

Speaker 1

Oh yeah, I saw that website too. Yeah.

Speaker 2

Have you tried it?

Speaker 1

I tried it. Yeah, I kind of like write a poem for me. She's pretty cool technology. We should probably learn more about it.

Speaker 2

Yeah, I think we should know. Okay, all right, obviously we're being facetious and joking, but everyone has been talking about AI and these new sort of natural language interfaces that allow you to ask questions or generate all different types of texts and things like that. It feels like everyone is very excited about that space.

Speaker 1

Every almost every time. Mile Like I went out with some friends that I hadn't seen in a long time, Like I was at a bar last night, and like the conversation like turned to AI within like two minutes. Never got to talk about the experiments they did. But yes, there is a lot. It's basically like this, like wall of noise and everyone's been talking about actually but us, because I don't think we have done as far as

I can recall, like an AI episode. We don't want to just add to the noise and get another sort of chin stroke around. But obviously there's a lot there for us.

Speaker 2

To discuss totally, and I'm sure this will be the first of many episodes. But one of the ways that it fits into sort of classic odd lots lore is via semiconductors.

Speaker 3

Right.

Speaker 2

If you think about what chat GPT, for instance, is doing, it's taking words and transforming them into numbers and then spitting those words back out at you. And the thing that enables it to do that semiconductors chips.

Speaker 1

Right, So here's like the four things I think I know about this and so this is that A. Training the AI models so that they can do that is a computationally intensive process. B. Each query is much more computationally intensive than say a Google search.

Speaker 3

Three.

Speaker 1

The company that's absolutely crushing the space and printing money because of this is in Nvidia. Yeah, And four there's a general scarcity of computing powers, so that even if you and I like were brilliant mathematicians and AI theorists, et cetera. If we wanted to start a chat GPT competitor, just getting access to the computing power in order to do that would not be trivial, even if we had tons of money outside of it.

Speaker 2

I'm going to buy an out of business cryptomne and take all the.

Speaker 1

They've already been bought. Someone got that. But that's that's basically the extent of my understanding of the nexus between this AI and chips, and I suspect there's more to know.

Speaker 2

They're just well. I also think having a conversation about semiconductors and AI is a really good way to understand the underlying technology of both those things. So that's what I'm hoping for out of this conversation.

Speaker 1

All right, Well, you mentioned we've been doing We've done lots of Chips episodes in the past, so we're going to go back to the future or something like that. We're going to go back to our first episode, our first guest, where we started exploring Chips episodes. I think it was the first one that we did sometime maybe

in early twenty twenty one. We're going to be speaking with Stacey Raskin, Managing director and senior analyst of US Semiconductors and Semiconductor Capital Equipment at Bernstein Research, someone who's great at breaking all this stuff down has been doing a lot of research on this question now. So Stacy, thank you so much for coming back on odd lots.

Speaker 3

I am so happy to be back. Thank you so much for having me right.

Speaker 1

So I'm going to start with just sort of like not even a business question, but a sort of semiconductor design question, which is this company in video Like for years I just sort of knew them. Is like they were the company that made graphics cards for video games, and then for a while they got there like oh, and they're also good for crypto mining, and they were very popular for a while in ethereum mining when it used roof of work. And now my understanding is everyone

wants their chips for AI purposes. And we'll get into all that, but just to start, what is it about the design of their chips that makes them naturally suited for these other things? A company that started in graphics cards that makes them naturally suited for these things like AI in a way apparently that other chip makers, like saying Intel, their chips do not seem to be as used for this space.

Speaker 3

Yeah, so let me step back.

Speaker 1

Yeah, sure, if the question, if the question is totally flawed in its premise, then feel free to say your question is totally let me step back.

Speaker 3

So sure, I'd say the idea of like using cute and artificial intelligence has obviously been around for a long long time, and actually the AI industry has been through a number of what they call AI winters over the years, where people would get really excited about this and then they would do work, and then it would just turn out it wasn't working, and pretty much it was just because the compute capacity and capabilities of the hardware at the time doesn't really wasn't really up to the task,

and so interest would wane and you'd go through this winter period, and a while back, I don't know, ten fifteen years ago, whenever it was, it was sort of discovered that the types of calculations that are used for neural networks and machine learning, it turns out they are very similar to the kinds of application the kinds of mathematics that are used for graphics process processing and graphics rendering.

As it turns out it's primarily matrix multiplication and we'll probably get into this call on this call a little bit in terms of how these machine learning models and everything actually work. But at the end of the day, really it comes down to like really really large amounts of matrix multiplication and parallel operations. And as it turned out, the GPU, the graphics of processing unit was was quite suitable.

Speaker 1

Before you go on then and maybe we'll get into this an hour three of this conversation. No, we're not going to go down on but what is matrix multiplication?

Speaker 3

Yeah? So, I don't know how many of you are our listeners here have had linear algebra or anything, but a matrix is just like an array of numbers, like thinking about like a square array of numbers, okay, okay, and matrix multiplications. I've got two of these arrays and I'm multiplying them together, and it's it's not as simple as the kind of math or multiplication that maybe you're

typically used to, but it can be done. And it turns out there are some of these characteristics of these kinds of matrix' number of these matrix can be really big, and there's like lots and lots of operations that need to happen, and this stuff needs to happen like like quite rapidly. And again I'm grossly simplifying here for the listeners, But when when you're working through these kinds of machine

learning models, that that's really what you're doing. It's it's a bunch of different makes, a bunch of different arrays of numbers that contain all of the different parameters and things. But we should probably step up a bit and talk about what we actually mean when we talk about machine

learning and models and all kinds of things. But at the end of the day, you have these really large arrays of numbers that have to get multiplied together in many cases, over and over again, many many times, and it turns into a very very large compute problem. And it's something that the GPU architecture can actually can do really really efficiently, much more efficiently than you could say on a traditional CPU. And so, as it turns out,

the GPU has become a good architecture for this. Now when a video has done on top of this, not only with having the hardware is they've also built a really massive software ecosystem around all of this. They have

their software is called Kuta. Think about it as kind of like the software of the programming and environment, like the parallel programming environment for these gps, and they've layered on all kinds of other libraries, stks and everything on top of that that actually makes this relatively easy to

use and to deploy and to deliver. And so they've built up not just the hardware bus of the software around this, and it's given them a really really sort of like like like massive gap versus like a lot of the other competitors that are now trying to get into this market as well. And so and it's FUNNYO if you look at Nvidia as a stock I mean today, I mean this morning, it's about a lot of a two hundred and sixty or two hundred and seventy dollars

a share. This was a ten to twenty dollars stock forever, and they did a four to one s stock split recently, so that'd be more like, you know, like a two dollars and fifty cent to five dollars stock on today's basis for for years and years and years. And just the magnitude of the growth that we've had with these guys over over the last like five or ten years,

particularly around their data center business and artificial intelligence. Everything has just been quite remarkable, and so the earnings have gone through the roof, and clearly the multiple that you're placing on those earnings has gone through the roof, because you know, the the view is that the opportunity here is massive and that we're early and there's a lot of runway ahead of us and the stocks. I mean, it's had it tops and downs, but in general it's been a home run.

Speaker 2

I definitely want to ask you about where we are in the sort of semiconductor stock price cycle. But before we get into that, you know, I will also bite on the really basic question that you already alluded to, but how does machine learning slash AI actually work. You mentioned this idea of I guess processing a bunch of data in parallel versus I guess old style computing where

it would be sequential. But like, talk to us about what is actually happening here and how does it fit into the semiconductor space.

Speaker 3

You bet? You bet? So let me let me first abstract this up and I'll give you a really contrived example just sort of simplistically about what's going on, and then we can go a little bit more into the actual details of what's happening. But let's imagine you want to have some kind of a neural net. But the machine learning is typically done with something called a neural network, and I'll talk about what that is in a moment.

And let's let's just imagine, for example, you want to build a an artificial intelligence a neural network to recognize pictures of casts. It's just saying, okay, let's imagine I've got this black box sitting in front of me, and it's got a slots on one side where I'm taking pictures and I'm feeding them in. It's got to display on the other side which tells me, yes, it's a

cat or no it's not. And on the side of the box there are a billion knobs that you can turn, okay, and and they'll change various parameters of this model that right now are inside the black box. Don't worry about what those parameters are, but there's there's knobs that can change them, and so effectively what you're doing when you're training the thing. And by the way, when you have the artificion does what you have is you have this big black box. You need to train it to do

a specific task. That's what I'm going to talk about in a moment. That's called training, and then once it's trained, you need to use it for whatever task you've traded for. That task is called inference. So you got to do the training inference. So the training here's where we have. I got my box with a slot and the display and a billion knobs. Okay, So what I do for the training process effectively is I take a picture and a known picture okay, so I know if it's a

catter or not. I feed it into the box and I look at the display and it tells me yes it's a catteror yes it's not, and it probably gets it wrong. And so then what I do is I turn some of the knobs and I feed another picture in, and then I turned some of the knobs, and I'm basically tuning all of the parameters and sort of measuring how accurate is this network at doing this tasket recognizing is this a picture of a cat or is it not?

And I keep feeding pictures in known pictures known data set, and I keep playing with all the knobs until the accuracy of the thing is wherever I want it to be. So yes, it's decided that that now it's very good at recognizing is this a picture of a catteror is it not. At that point, my model, my box is trained. I now lock all of those knobs in place, I don't move them anymore, and I use it now I can just feed in pictures and it'll tell me yes,

it's a category, yes it's not. And so the process of training this model is what that's really what it's about. It's about varying all of the parameters. And by the way, these models can have billions or hundreds of billions or even more of parameters that they can be changed. And that's the process of training. You're basically trying to optimize

this this sort of situation. I'm changing the parameters a little bit at a time such that I can optimize the response of this thing such sus that I can get the performance of it, the accuracy of the network to be high. So that's the training process, and it is very very compute intensive, because you can imagine, if I've got a billion different knobs on turning, I'm trying to optimize the output, that takes a lot of compute. The inference process once all that is much less compute

intensive because I'm not changing anything. I'm just applying the network as it is to whatever data that I'm feeding in at that But I'm not changing anything. But I may be doing a lot more that the difference of the inference. I may be using it all the time, whereas once I've trained the model trained it. So it's more like a one and done versus like a continual use sort of thing.

Speaker 1

Since you talk said, we're getting into sort of the economics of training versus inference. A is there sort of any way to get a sense of Like let's say Tracy and me start odd Lodge GPT. It's a competitor to chat, a competitor to open AI, Like, what are we thinking of in terms of just that scale? How much we're spending to compute on the training part? Then how much are recurring costs in terms of inference are?

And then I'm also just curious, like also, like I know you said the inference is much cheaper, but how much cheaper is it versus say, asking Google question? How much more expensive is it? How much more expensive is a Chad GPT query or an odd Lodge GPT query versus just a normal Google search?

Speaker 3

Yeah, now you get and by the wahen I say cheaper. It's like for any given given single use right again, if I've got if I'm if I've got like one hundred billion different inference activities, maybe it's not.

Speaker 1

It's still expensive.

Speaker 3

Yeah, But I first want to talk about it, just just really quickly about like so that this is my big abstract, contrived example about what's going on. If if I go just a little bit deeper about what what this thing is, like, let's talk just briefly about a neural network, and then I will get true question, but it kind of influences it. So think what is a neural If I was to draw like a representation of a neural network for you, what I would do is

I have a bunch of circles. Each of the circles would be a neuron, and I wish I was there. I could draw a picture for you. But imagine like send.

Speaker 1

A picture after you're done, send a picture and we'll run it with the episode.

Speaker 3

We'll run it with the Okay, okay, I can I can do?

Speaker 1

There your a hand drawn explanation of these are varies.

Speaker 3

These are varies and fine, but anyways, but imagine like I've got like a group of circles. I've got like a column, you know, in column one with like three circles, and then column two, I've got i don't know, three or four circles, and column three, I've got some circles. These are my neurons. And imagine I've got arrows that are connecting each circle to the circles in one row, to all of the circles in the next throw. Those are my connections between my neurons. So you can see

it looks like kind of a net or a network. Okay. And so within each circle, I've got some which what's called activation function. So what each circle does is it takes an input the arrow that's coming into it, and it has to decide based on those inputs, do I send an output out out the other side or not? Right, So there's some certain threshold. If the inputs reach some amount of threshold, the neuron will fire, just just like

the neuron in your brain. Okay. Each each neuron can have more than one input coming in from from more than one neuron in the previous These are called layers. By the way, these rows of circles can have more than one input from the different neurons in the previous layer, and that the neuron can weight those those different inputs differently good, So it can say, you know, from from this one neuron, I'm going to give that a fifty percent weight, and from the other neural only weight at

twenty percent. I'm not going to take the full signal. So those are called the weights of the network. And so each neuron has inputs coming in and outputs going out, and each of those inputs and outputs will have a weight associated with it. So those those are where I talk about those knobs. Those parameters. Yeah, those weights are

are one set of parameters. And then within each neuron there's there's basically there's a certain threshold with all those all those signals coming in when you add them up, if they reach a certain threshold, then the neuron fires. Okay, So that that threshold is called the bias, and you can tune that. Like I can have a really sensitive neuron where if the bias doesn't I don't need a lot of signal coming in to make it fire. I can have a neuron that's less sensitive. I need a

lot of signal coming into portal fire. That's called a bias. That that that's also a parameter. So those are the parameters that you're setting. The structure of the network itself, the number of neurons and the number of layers and everything that's that's sort of set, and then you're trying to determine these weights and biases and again just just the level set you check GPT, which you haven't getting excited about as one hundred and seventy five billion separate

parameters that they get set during their during the training press. Okay, So that's that's kind of what's what's going on.

Speaker 2

Before you talk about the economics. Can I just ask so one of the things about the technology is it's sort of it's supposed to be iterative, right, like it's learning as it goes along. Can you talk just briefly maybe about how it's incorporating like new inputs as it develops.

Speaker 3

Yeah, So when when you when you training, let's talk about training now. So when you train the network, it happens on a static data set. Okay, so you have to start with a data set, right, and in terms of check GPT, that is you know, it has a large corpus of data that it was trained on. It was there's a lot of data from the Internet and from other sources.

Speaker 1

Right, basically trained the smart like all of the Internet, but also a lot of Reddit. So it's like we've right, like is it like we've trained just like the greatest brain of all time is like reddit pill.

Speaker 2

Now it talks like a seventeen year old boy.

Speaker 3

So there's a lot of data and and so yes, I sort of how does that data get get you know, incorporated into I don't want to get too short of getting too complet I don't want to get too complicated. Let me talk about how to standard training works, and then we can talk about chat GPT because that uses a different kind of model. It's called a transformer model. But anyways, but when when I'm training this, so, so what happens is is I feed this stuff that there's

a there's a process called it's called back propagation. Basically what you do is you sort of feed this stuff through through this through the network itself, and then you work it backwards and you're basically what you're doing is you're measuring the output against a known response. I want to sort of you know, that's my my cat picture. Is it a cat or is it not a cat, right, I'm trying to minimize the difference between because I want

to be accurate. Right, So what you sort of to do is you roll a certain step through the network, right, You measure the output against the against the known what it should be. And then there's a process that's called back propagation, where what you're doing you're actually what you're calculate what's called the gradients of all of these things.

You're basically looking at sort of like the sort of like the rate of change of of these different parameters, and you sort of work the network backwards, and that gradient that you're calculating kind of tells you how much

to adjust each parameter. So you work it back and then you work it forward again, and then you work it backward, and then you work at forward and you work at backward, and then you do that until you've converged like that that the that the network itself is accurate to to wherever you want it to be to be accurate at. That's so that's again I'm I'm I'm

grossly simplifying here. I'm trying to keep this as high level as possible, but that's kind of what you're and just in terms of the amount of can be sort of train check GPT and and checking we can do. They've actually released all the details of the network, like how many layers and what's the dimension, I parameters all this stuff, so we can do this math. It turns out to take about three times ten to the twenty

third operations to train it. And so just just that's three hundred sex tillion operations it took to train chat GPT. Now in terms of how much it costs, so CHATTYB was was they kind of said this, It was trained on ten thousand in video what they called the V one hundred. That's that's the Volta chip. That's a chip that's several years old for in video. But it was trained on supposedly about ten thousand of these. And we

did some of this math ourselves. I was coming out more like three or four thousand, but there's a ton of another assumptions you have to make it here, ten thousand seems to be the right order of magnitude for that part. That part of the time cost about you know, I don't know, eight thousand bucks. And so the number that was kind of tossed up with something like eighty million dollars to train chat GPT one time.

Speaker 1

I think on some of the it doesn't seem like that much to me. Well, so this is like did I get it, but like there are a lot of companies that could spend that have eighty millions.

Speaker 3

I actually agree with it. We're jumping ahead. But my take is that for for large language models, and we can talk about these different things, but for large language almost chat CHIPD, I actually think inference is a bigger opportunity, and you're kind of getting to the heart of it. It's because inference scales directly the more queries I run.

Speaker 1

Trained once and that's done, and that's eighty one, or even if.

Speaker 3

You're training more than once and again to your question, Tracy, like you can add to the to the data set and retrain it. But if I've already got the info, let's say I'm training it every two weeks, Okay, yeah, that'd be training it like twenty four to twenty five

times a year. But I've I've got the infrastructure that is in place already right to do that, and so the training TAM will be more around how many different entities actually develop these models and how many models each do they develop and how often do they train those models, and importantly how big do the models get, Because this is one of the things. Chat GPD is is big, but GPT four, which they've released, that is even bigger. They haven't they haven't talked about specs, but I wouldn't

be surprised. CHATCHIPD four is room to have over billion parameters like a very well mighte and you have. We're very early into this, like these these models are going to keep getting bigger and bigger and bigger. And so that's how I think the training market, the training tam will be growing. It it's a function of the of the number of trainings of all these models we're doing every year, in the size of these models, and the model will get big.

Speaker 1

So let's get it. But in your view, the big money is going to be made on the inference, So let's talk about it.

Speaker 3

I think.

Speaker 1

So think that's talk about what happens then and your sort of sense of the side. I don't know, Yeah, just talk to us about the inference part and the economics.

Speaker 3

You bet, Chat CHPT in these large language models, it's a it's a new type of model's called a transformer model, and there's a bunch of compute steps that have to happen. There's also a step in there that helps it map the relation, capture the relationship between you. You know, by the way, if you if you've ever used chatcha, you know, you type in like a querry into a box and it and it returns to respond, so that querry is

broken into what are called tokens. It's basically thinking do you think about token is kind of like a word or a group of words sort of. But the transformer model has something it's it's called a self attention mechanism, and what that does is it captures the relationship between those different tokens and the input sequence based on the training data that it has. And that's how it knows

what it's really doing. It's predictive text. It knows based on this query, I'm going to start the response with this word, and based on this word and this query and my data said, I know, these other words typically follow, and it kind of constructs the response from that. And so our math suggests that for like a typical query response called like you know, five hundred tokens or maybe two thousand words, it was something like four hundred quadrillion

operations needed to accomplish something like that. And so you can size this up because I know, for like an Nvidia GPU, and you can do it for different GPUs. I know how many operations per second each GPU can run, and I know how much these GPS ballpark kind of costs. And so then you know, you got I assume like, well, okay, how many queries per day are you going to do? And you can come up with a number, and I mean, frankly, the number can be as big as you want. It

depends on how many queries. But I think a tam you know, at least in the multiple tens of billions of dollars is not unreasonable, if not more, and just the level set I mean, I guess to your Google question, Google does about ten billion searches a day and give or take. I think a lot of people have been looking at at that level as part of like you know, like the end all bill for where this could go.

I'll be honest, like, I understand why people are, especially the Internet investors, are concerned that large language models and things like chat GPD can start to disrupt search. I'm not exactly sure that search is the right proxy person. It feels kind of limiting to me. I mean, you could imagine I've watched a little too much Star Trek, I guess, but I mean you could imagine, you know, when you have like a virtual assist in the ceiling, I'm calling out to it, and you know, it doesn't

have to be just search on my screen. I could have it in my car, right, I could have you know, I call up American Airlines that change my airline tickets and it's a checkbo that the CHET bought that's talking to me. So this could be very big and by the way, I think to get by the way, the one problem with this start to a calculation that's kind of static, Like the cost is sort of an output rather than an input. I think to drive adoption, cost

will come down, and we've already seen that. Like Video has a new product it's called Hopper, which is like two generations past those V one hundreds that I was talking about, past the Volta generation. The cost per query to do this or the cost for training on Hopper is much lower than a Bolta because it's much more efficient. Part that's a good thing, though it's camacreed if it will drive adoption, and.

Speaker 4

Video actually has specific products specifically designed to do this this kind of thing, and Hopper has specific blocks on it that actually helped with with the training and inference on these kind of large language models.

Speaker 3

And so I actually think over time, is the efficiency gets better and better, you're going to drive adoption more and more. I think this is a big thing. And I remember we're still really early. Chatchp deal only showed up in November.

Speaker 1

Yeah, it's crazy, it's really early.

Speaker 3

Still.

Speaker 2

Well, just on that note, can you draw directly the connection between the software and the hardware you're here, because I think it at this point probably everyone listening has tried chat GPT, and you're used to seeing it as a sort of you know, it's an interface on the Internet and you type stuff into it and it spits

something out. But like, where do the semiconductors actually come in when we're talking about crunching these enormous data sets and what makes us You kind of touched on this a little bit with Nvidio, but what makes a semiconductor better at doing AI versus more traditional computational processes?

Speaker 3

Yeah, yeah, you bet. So. To answer that second question, I think AI is really much more around parallel processing, and in particular thing it's this kind of MAPP matrix map. It's a single class of calculations that these things do very very efficiently and do very very well, and they do them much more efficiently than a CPO that that performs a little more really versus parallel. You just couldn't

run this stuff on CPUs. But don't get me wrong, you do some of we've been talking about inference on large language models. There's there's all kinds of inference. Inference workloads range from very simplistic to very very complex like and my my, you know, cat recognition example was very simplistic, something like this, or fakly something like autonomous driving that is an inference activity, but is a hugely computationally intense inference activity. And so there's still a lot of inference

today that actually happens. In fact, most inference today actually happens on CPUs. But i'd say the types of things that you're trying to do are getting more and more complex, and CPUs are getting less and less viable for that for that kind of that kind of anth and so that's kind of the difference between GPUs and other types of parallel offerings versus like a CPU. I should say, by the way, GPUs are not the only way to do this. Google, for example, has their own an I chips.

They call them a TPU tensor processing unit.

Speaker 1

One thing I write like about talking to Stacey to things is a I think he comes up with better versions of our questions than we do.

Speaker 2

Which it's like one thing about the question is just ask.

Speaker 1

He's always like, all right, that's a good question, but let me actually reframe the question to get a better response. So I appreciate that, and he also anticipates because I literally, like on my computer right now, I had Google Cloud tensor processing units because that was my next question. And also important because I think yesterday the information reported that Microsoft is also So why don't you talk to us about that these other and what the competing directly?

Speaker 3

Yeah, yeah, yeah, you got so Google's good. By the good, this is not new. Google has been doing their own chips for seven or eight years. It is not new, right, And but they have what they call TPU and they use it extensively for their own internal workloads. Absolutely, Amazon has their own chips. They have a training chip. It's that's called you know kind of hysterically. It's called tranium. They have an inference chip. It's called Interferentia. Microsoft apparently

is working on their own. My feeling is every hyperscaler is working on their own chat, particularly for their own internal workloads. And that is an area we talked about in Vida software remote like Google doesn't need in video software mode, they're not running Kuda. They're they're just running tensorflock right and and doing their their thing. They don't

need Kuda anything. However, that is facing an end customer, like an enterprise like end customer, like on a public cloud, like like a customer going to AWS and ranting, you know, compute power, that tends to be GPUs because customers don't have Google's just sophistication. They really do need the software ecosystem that's built around they use. So for example, I can go to Google Cloud, I can actually rent a

TPU instance. It can be done. Nobody really doesn't. And actually if you look how they're priced, typically it's actually more expensive usually even than than have the way that

Google's pricing GPUs on on on Google Cloud. It's it's similar for Amazon and others, And so I do think that all the hyper feelers are working on their own and there is a certain certainly a place for that, especially for their own internal workloads, anything that's facing a customer that that in Video GPO ecosystem is really kind.

Speaker 1

Of yeah, this is so, this is so Actually these just to clarify, because that point is really interesting that for like, if again Tracy and I want to launch odd launch GPT, part of the issue would be not necessarily the hardware, this sort of the silicon, but actually that in Video's software suite built around it would make it much easier for us to sort of start and use on in Video for training our model.

Speaker 3

Yeah, yes, it was, and they've built a lot of It's funny. You can go listen to Video's announcements in their analyst dys and things, and there as much about software as they are about hardware. So not only have they continue to extend like the basic like like COUDA ecosystem, they've layered all kinds of other application specific things on top of it. So they've got what they call RAPIDS,

which is for enterprise Machine Learn. They've got a library package called ISAACS, which is for automation robotics, They've got a package called Clara, which is specifically for medical imaging and diagnostics. They've got something called cou Quantum, which is actually for quantum computer simulations. They've got something for drug discovery. So they're layering all these things on top, right depending on your application. They've got internal teams that are working

on it's not just throwing the software out there. They've got people there that can actually like help you work or work and come along with it. They're doing other things easier, you know. So they actually just launched a cloud service, and this is with Google and Oracle and Google and Microsoft ware. You can almost they'll do like a fully provisioned in Vidia AI supercomputer in the cloud. So because like you, they sell these AI servers and

they can cost hundreds of thousands of dollars apiece. If you want now you can just go to Oracle Cloud or Google Cloud or whatever. You can sort of rent they fully provisioned in Vidia supercomputer sitting in the cloud that they'll all you got to u is access it right for a web browser. This kind of get super easy.

Speaker 2

This is going to be my next question actually because so I take the point about software, but like what do the AI supercomputers actually look like nowadays, Like is there a physical thing in a giant data center somewhere? Yeah, they mostly like cloud based or what does this look like?

Speaker 3

Like? Walk astro so video sells, and Video sells something they called a DGX. It's a it's a box. I mean it's I don't know it's when it's a two peat, but I don't know what the dimensions are two peak by two pet or something like that. It's got eight GPUs and two CPUs and a bunch of memory and a bunch of networking. They've got their own like you know, they bought a company called Melanox a while back that

did networking hardware. So it's got a bunch of proprietary network because that's but that's something else we haven't talked about. It's not just enough to have the computer the compute. These models are so big they don't fit on a single c GPU. So you have to be able to network all this stuff together, right, And so they've got networking in there, and they have this this box, and then you can you can stack a whole bunch of boxes together, like and Video has their own internal supercomputer.

It's it's fairly a high on the top five hundred less. They call it Selene. It's a bunch of these DGX like servers that they make, all just like stacked together effectively, and they sell for the older generation. Their prior generation was called Ampeer and that box sold for one hundred and ninety nine thousand dollars. I don't believe they've released pricing on the Hopper version, but I know for the Hopper GPU it costs two to three x what Amper costs the prior generation.

Speaker 1

So this really is a separate question to me, which is, Okay, there's the price, and it exists, and you could go to you could theoretically go and use Google's tensor based cloud or is it available or is it because I sort of get the impression that, like for some of the technology that people want to use, it's not available at any price, and that there is actually is that real or not?

Speaker 3

It seems to be so we're the like. So their new generation, which is called Hopper, which like I said, has characteristics of it maked very attractive, especially for these kind of like chat GPT large language models, is in tighted to play. Were at the very beginning of that product cycle. They just launched it like in the last couple of quarters, and so that ramp up takes time, and it does seem like they are seeing accelerated demand because of this kinds of stuff, and so yeah, I

think supply is tight. We've heard stories about GPU shortages at Microsoft and the cloud vendors, and I think there was a Bloomberg store the other day that said these things were selling for like forty thousand dollars on eBay. Its a thing, right, I took a look at some of those listings. They looked a little shady to me, But yeah, it's tight. You have to remember, these parts are very complicated, so the lead times to actually have more made it takes a while.

Speaker 2

Wait, so just on this snow. I joked about this in the intro, But you know, could I buy like a bitcoin mining facility and take all that computer processing power and like convert it into something that could be used for AI. Is that a possibility?

Speaker 3

You could? The big point stuff at least a lot of the big point stuff was done that was with gps. Those were still mostly gaming GPUs. People are buying gaming gps and purposing them for a bitcoin and the theory mostly etherory of mining. Yeah, they're they're not nearly as compute efficient as the data center parts, right, but I mean in theory, yeah, you could get you know, gaming GPUs if you could and stringly get but it would be prohibitive, right, And even now most of that stuff's

cleared out. I think as as Joe said, but the math is somewhat similar, I'd say for these kinds of models, though, again, like a hopper in Video's new data center product has, they have something that they call it a transformer engine. What it really does is it allows you to do the training at a slightly lower precision than unless you do it at eight bit floating point versus sixteen bit it'll so it lets you get higher performance. And then

there's another process. There's like a conversion process. Sometimes it has to go when you go from training to inference. It's something of quantization, and with these transformer engines you don't have to do that. So it increases the efficiency which you wouldn't get by picking some random GPUs.

Speaker 1

Where is Intel in this story?

Speaker 3

Well, so let's let's talk about the other competitive options that we're out there. Okay, So we talked about some of the captive silicon and hyperscalers that is there, and it is real, and they're all building their own and they've been doing it forever and it hasn't slowed anything down on the slightest because we're still early, and then the opportunity is big. By the way, I will say, I don't worry to lead with it. I don't worry so much about competition at this point because think about it.

In Videa's run rating their data center business right now, it's something like fifteen billion dollars a year. That's where it is. It's growing, but that's where it is. So Jensen in Video CEO likes to throw out big numbers, and he threw out I think he said for silicon and hardware TAM in the data center, and he thought that their TAM overtime is three hundred billion dollars, and it seemed kind of crazy. Although I would say, like it's seeming a little less and less crazy every day.

But if you thought the TAM was three hundred billion or two or one hundred billion or like whatever, and they're run rating at fifteen billion dollars, there's tons of headrooms competition doesn't really matter, and that's what we've seen. We've seen competition, but there's so much opportunity like who cares right versus like if you thought it was a twenty billion dollar ten like they would have a problem

like already today. So that's why I don't worry too much because I think the opportunity is still very very large relative to where they're running into business today. In terms of other competitors, though, sayes so you mentioned let's talk about AMD first, because A and D actually makes GPUs, they make data center GPUs. They don't sell very many of them. Their current product is something called the Mi

I two fifty and they've sold deminimus basically. And in fact, you know, when the China sanctions were put on, and you know, we didn't talk about that, but the US has stopped allowing like high end aichips from being shipped to China. The MI two to fifty eighties part was on the list, but it didn't affect them at all because they weren't selling anything. Hey, so their sales were zero.

They've got another product coming out at the following that's called the Mi I three hundred, and people have been getting kind of excited about A and B. They've been sort of looking to play it as kind of like the Foreman's and Video. I'll be honest, I don't think it's the Foreman's in video and video is doing, you know, close to four billion dollars a quarter in data revenues.

I don't know that I see anything like that with the mi I three hundred figure they in AMD as far as i fell, has not even released any sort of specifications for what it looks like at this point. So, but that is an option, and some people would say there's maybe some truth to this is you know, if you want an alternative, AV will present an alternative. And if the opportunity is really that they they'll get some. They'll they'll probably get some. Do you have that? You

have Intel? So Intel's got a few things on their CPUs. Their current version is called Sapphire Rapids. It has AI specific accelerate, is four core inference not not so much maybe for this kind of stuff, but for general inference activities. They're trying to play at the capabilities of their CPU on that fine, and why are they doing that. It's because their accelerator roadmap isn't so good. So they have a GPU roadmap. The code name for it was ponta Vecchio,

and they've kind of gutted that roadmap. So the follow on product was something called rialto Bridge that they've since canceled, and one of the Pontaventio products recently they just canceled, and a Pajaveci originally was designed for the Area supercomputer and it was massively late. I mean so like they took a much was it was something like a three hundred million dollar charge. I think it was the at

the end of twenty twenty one. It was either the end of twenty or g twenty twenty one where they're they basically they gave it away. It was so late, So that's that's how late they were. They also have another product. They bought an Israeli AI company called Habana, and Habana has a product called Goudi. It's not a GPU exactly, but it's like a specific accelerator technology. And Amazon bought some of them and they sell a little bit, but again it versus Intel's total revenues. It's the Minimus,

so they're not really there. There's also a bunch of startups and the problem with most of the startups is their their their story tends to be something like, you know, we have a product that's ten times as good as Nvidia, and the issue is with every generation, in Vidia has something that's ten times as good as in video, and they have the software ecosystem that goes with it. Neither a m D, nor Intel, nor most of the startups

have anything remotely resembling in video software. So that's another huge issue right that all of them are facing. There's a few startups that have some niche success. One of the one that's probably gotten the most attention is called Servius or Cerebraus, and their whole thing. They make a chip. It's imaginating a three hundred millimeters silicon wafer and it's

inscribing a square on it. That's their chip. It's like one chip per wafer, and so you can put very large models onto these chips, and they've been deploying them for those kinds of things. But again the software becomes an issue. But they've had a little bit of success. There's some other names that that you know, You've got Groc and some others I think that are still out there. And then there's a company called Tends toward which is interesting not because of so far what they're doing because

it's early, but it's run now by Jim Keller. And do you guys know who Jim Keller is. Jim Keller was was He's sort of like a star chip designer. He designed Apple's first custom processor. He designed A and ds as and and epic road NEPs that they've been that they've been taking a lot of share with. He was even at Tesla for a while and at Intel, and so he's now running tense to it and they

do it's a risk five. Risk five is another type of architecture, and they do they do an AI chap, So Jim is running that.

Speaker 2

So can I just ask based on that? I mean, how like capex intensive is developing chips that are well suited for AI versus other types of chips. And then secondly, like where do the improvements come from or what are the like improvements focused on? Is it speed or like scale given the data sets involved in the parallel processes that you described.

Speaker 3

Yeah, so it's a few thing so in terms of Capex intents, and these are mostly design companies, so they

don't have a lot of Capex. It's certainly r and D intensive, So maybe maybe that's that's what you're getting on in video spends like many billions of dollars a year on R and D and and VIDA has a little bit of advantage too because it's it's effectively the same architecture between day center in gaming, so they've got other other volume effectively to sort of amortize some of those investments over although now I mean this year, I mean,

data center's probably sixty percent of in videous revenues now, so I mean in video is sort of the center of data center is a center of gravity for in video now, but it's very R and D intensive and probably getting more so. And you've got folks all up and down the value chain that are investing or both the silicon guys you know, and the cloud guys and the customers and everything else. But I mean, that's that's kind of where we are in terms of what you're

you're looking for. So there's a few things you're looking for. Performance and on training, quite often that comes down to like time to train. So I've got a model, Like some of these models, I mean, you could imagine it could take weeks or months historically to train right, and that's a problem. You want it to be faster, so I can get that down you know, two weeks or you know too days or hours that would be better. So that's one thing clearly that they work on.

Speaker 1

I don't want to.

Speaker 3

It's something notice, yeah, go ahead.

Speaker 1

No finish your thought that I have a slightly oh yeah.

Speaker 3

The other think I was talking about that there's something where I'm like like scale out. So basically, remember I said, you're you're connecting lots and lots of these chips together. So for example, if if I if I increase the number of chips by ten X, does my trading time go back down by like a factor of ten or is it like by factor of two? So yeah, ideally you would want like linear scaling, right, I want, like I add resources, it scaled linearly.

Speaker 1

So this is kind of gonna was going to get into my next question. Actually, and you know, we can talk to another with someone else about certain like AI fantasy doom.

Speaker 3

But I think, but I'm not an AI. I'm not an AI architecture X. But I'm a down past here. So I could just say you may want to get aged, no I.

Speaker 1

Know somebody, but I am curious though, because I do think it relates to this question, which is that okay, like with each one like GPT five and they're going to keep adding more knobs on the box, et cetera, like and is your perception that this sort of quality of the output is growing exponentially or is it the kind of thing where it's like GPT four, you know, there's a lot more knobs and they got a big jump from GPT three. GPT five will be way more knobs,

but like is it going to be marginally better? Like what is this sort of like where are we in the sort of like what does the shape of the output curve look like? And this sort of like cost of you know, these chip developments of getting there. I don't know, it's kind of so there's a couple of things.

Speaker 3

So, so, first of all, when you're talking about large language where it was accuracy, it's sort of a nebulous term because it's not just accuracy. It's like like case, it's also capability, like what could it do? What chat GPT and GPD four can do. And also, like I think as you're going forward and you talk about the trajectors here, it's not just text right, we're talking text to texture, but there's also text images and anybody like

with like Dolly where words. You know, it's generating images from a text prompt and now we've got like video what it was it mid was it midsummer? Is that what it's called big journey? Journey can't mid journey? Yeah, so it's it's it's creating like video prompts. I mean, so like the like text is de scrapped as just the tip of the iceberg, I think in terms of what we're going to need, but they're.

Speaker 1

Never they're never going to get to where they could have three people having a conversation with voices sound like Tracy, Joe and Stacy. Right, No, I'm just kidding, No, I mean, I'm just kidding. It feels like, yeah, this job.

Speaker 3

Now one of the dangerous clearly, and maybe this gets the capabilities. So what one thing with chat GPT is it's very very good. This why I should worry about my job because it's very good about that. That's it sounding like it knows what it's talking about, where maybe it doesn't hate, So maybe I should be worried about my job, you know, And accuracy, I think is a big issue, but you have to remember it.

Speaker 1

So, but like on this accuracy question, like I assume, you know, like self driving cars, like when people were really hyped about them ten years ago, they're like, oh, it's ninety five percent solid, we just have a little bit more, and then it's solid ten years later. Yeah, ten years later, it feels like they haven't made any progress on that final five percent.

Speaker 3

Yeah. I mean, these things are always a power law.

Speaker 1

So this is my question when we talk about accuracy or these things, like are we at the point where like is it going to be the kind of thing where it's like, yeah, GPT five will definitely be better than GBT four, but it will be like ninety six percent of the way there.

Speaker 3

Well, again, let me separate out. Let me separate an accuracy from capability again. So there's an accuracy you have to remember, like it the model has no idea what accurate even means. It doesn't remember. These things are not actually intelligent. I know there's a lot of worry about like what they go like like like agi like artifice with general intelligence. Right, I don't think this is it. This is predictive text. That's all. The model doesn't know

if it's if it's viewing bull crap or truth. It has no idea, it's just predicting the next word in the the thing. And it's because of what it's trained on. So you need to add on maybe other kinds of things to ensure accuracy, maybe to put guard rails or things things like that. You may need to very carefully, like more harsh like your input like data sets and things like that. I think that's a problem now. I

think it'll get solved. There's enough date. But like and this has already been an issue and you got you can take it like the other like the I don't know if it's the converse of it or not, but things like deep fakes, people are deliberately trying to use AI to deceive. I mean, this is just human nature.

This is this is why we have problems. But I think they can work through that just in terms of capabilities now, I think it's it's really interesting to look at like like sort of similar like a response like to a similar prompt between like chat GPT and GPT four, and like what people are getting out of GPD four.

It's it's it's miles ahead of like some of the stuff that that that chat GPT, which was trained on GPT three of them, all that than what it was, what is delivering in terms of nuance, right, and color and every and everything else. I mean, and I think that's going to continue. I wouldn't be And already you're on the boat where these things can already pass the

turning tests. Oh yeah, right, it can be very difficult to know if it's if I'm put in the question of accuracy aside perment, it's very difficult to know for some of these things if if you didn't know any better whether it was coming from a real person or not. And I think it's going to get like harder and harder to tell, like whether you know even if it's not you know, quote unquote really thinking it's going to be hard for us to tell what's really going on.

That is sort of like other interesting you know, implications or for what this might be over the next five years or ten years.

Speaker 2

Just going back to the stock prices, I mean, we mentioned the Nvidia chart, which is up quite a lot, although not it hasn't reached its its peak back in twenty twenty one. The Socks Index is recovering, but you know,

still below an intel. I mean, I won't even mention, but like, where are we in the semiconductor cycle, because it feels like, on the one hand there's talk about excess capacity and orders starting to fall, but on the other hand, there is this real excitement about the future in the form of AI.

Speaker 3

Yes. Yes, So seventies in general were pretty lousy last year. They've had a very strong year to date performance and the sectors up, which is sectors up, you know, twenty twenty two percent year to date, quite a bit above the overall market. And the reason is, to your point, we've been in a cycle. Numbers have been coming down.

And we may have talked about this last time. I don't remember, but semi conutter investors, that turns out the best friend to buy stocks in general is after numbers come down, but before they hit bottoms, Like if you could buy them right before the last cut, if you could have perfect foresight. You never know when that is.

But I mean numbers of cut. But numbers have come down the laws so estimates forward estimates for the industry peaked last June and they are down over thirty percent, like thirty five percent since that when it's actually the largest negative earnings revision we've had probably since the financial crisis. Wow, and people are looking for you know, playing the ottoming theme and that hopefully things get better into the second half.

You know, we get hope, hopefully China reopening, and you've got markets like and this relates to Intel like like PCs and things where you know, we've now corrected kind of we're back like more on a pre COVID run rate for PCs versus where we were, and the CPUs which were massively overshipping at the peak, they're now undershipping. And so we're in that inventory flushed part of the cycle, and so people have been sort of playing the space

for that like second half recovery. Not now. All that being said, if you look at the overall industry, if you look at numbers in the second half, they're actually above seasonal. So people are starting to bake in that cyclical recovery to the numbers. And if you look at inventories, it just overall in the space they are ludicrously high. I've actually never seen them this five before. So we've had some inventory correction, but we may we may have not,

we may just be getting started there. And if you look at valuations. I think the sector's trading. It's something like a thirty percent of premium to the S and P five hundred, which is the largest premium we've had again, probably since things normal life after the tech bubble or after the financial crisis at least, so people have been

playing this backup recoverary. But yeah, we better get it as as as it relates to some of the other some of the individual stocks, like you mentioned Intel, It's funny. I think you guys may not know this. I just upgraded Intel. Oh. The title of the note was we hate this call, and I meant I desperately would like the standard prom It was and it was not a

we like an Intel call. It was just I think that they that they're now under shipping in PCs by a wide margin, and I think for the first time in a while, the second half street numbers might actually be too low. So that's it's not like a super compelling call. But I felt uncomfortable Push although they were port earning next week, I make I may be kicking myself, like we'll still see in Vidia, however, so it's clearly

you know you're ready. It hasn't reached its prior peak from a stock price base, and the reason the numbers have come down a lot. I mean, let's be honest, the gaining you know, business was was in plated significantly by crypto, right, and so that's all come out right. And then you know with data center, you had some impacts from from China. China general was weak, and then we had some of the export controls that they had

to work their way around, and see had some issues there. Now, all of that being said, graphics cards in gaming, we talked about some of these inventory corrections. Graphics cards actually corrected the most and the most rapidly. So those have already hit bottom and they're growing again. And in VideA has got a product cycle there that they just kicked off. The new cards are called Lovelace and they and they look really good and especially behind and they're starting to

fill out like the rest of the stack. So gaming is okay. And then in data centering again this you know, this generative AI has really caught everybody's fancy. And in Vivia had a data center of and they're saying that they were at the beginning of a product cycle in

data centerm and you know, they had an advantage. A couple of weeks ago, they're their GtC event where they actually basically and directly said we're seeing upside from generative AI even now, right, So people have been buying in VideA on on on those on that thesis, and like the last the stock hit the peak at these peaks, at least in terms of valuation. The issue is we were at the peak of their product cycles and numbers

came down. This time, valuations kind of went back to where they were at those peaks, but were the skinning of the product cycles, and numbers are probably going up knock knock down.

Speaker 1

So that's that's why Stacy I joked at the beginning that we could talk about about this for three hours, and I'm sure we could. Sure there's such a deep area, but that was a great overview of just like the state of competition, of the state of play, and the economics of this a very good way for us to sort of enter talking about AI. Stum more broadly, thank you so much for coming back on online.

Speaker 3

My pleasure. Anytime you guys want me here, just let me now, all right.

Speaker 1

We'll have you back next week for Intel take care of State. I really like talking to Stacey. He's really good at explaining complicated Yeah.

Speaker 2

I know, he made a point of saying that he's not an AI expert, but I thought he did a

pretty good job of explaining it. I do think the trajectory of how all this, I mean, this is such an obvious thing to say, but it's going to be really interesting to watch and how businesses adapt to this, and we're what's kind of fascinating to me is that we're already seeing that differentiation play out in the market, with in video shares up quite a bit and Intel, which is seen as not as competitive in the space, down quite a bit.

Speaker 1

I was really interested in some of his points about software in particular, and so I have realized that, Yeah, like I mean, I you know, like sometimes I see like someone will post on Twitter it's like, look at this cool thing and video just rolled out where they can make your face look like something else or whatever. But thinking about like how important that is in terms of like, Okay, you and I want to start an AI company and idea for a large language model or

something specifically have a model to train. There's going to be a big advantage going with the company that has this huge like wealth of like libraries and code bases and specific tools around specific industries as opposed to it seems like where some of the other competitors are, or it's just much more technically challenging to even like use the chips if they exist, like Google's.

Speaker 2

TPUs totally the other thing that caught my attention, And I know these are very different spaces in many ways, but there's so much of the terminology and like that's very reminiscent of crypto. So just the idea of like an AI winter and a crypto winter, and you can see, I mean, you can see the pivot happening right now from like crypto people moving into AI. So that's going to be interesting to watch play out. Like how much of it is hype classic sort of gartment hype cycle versus the real thing.

Speaker 1

But you know, two things, it's absolutely you know, so two things I think would be interesting. It'd be interesting to go back to like past AI summers, like what were some past periods which people thought we made this break through and then what happened? So that might be an interesting And then the other thing is like, look like you know, in twenty twenty three, I have never actually like found a reason I've ever felt compelled to

like need to use a blockchain for something. And I get use out of chad GPT on something like almost every day. And so for example, we recently did an episode you know, yeah, look, we'll do an episode now of a question. At the end, they're like, oh, what is the difference Like yesterday, you know, we recently did an episode on like lending, and so it's like, oh, what's the difference sort of structurally between the leverage loan

market and the private debt market. It's like, this might be an interesting question for a chat GPT, and like I got this like very useful, clear answer from it that like I couldn't have gotten perhaps as easily from a Google search. So I do think like some of these hype cycles like are really useful, But like I am already in my daily life and very already getting use out of this technology in a way that I cannot say for anything related like web three. No, that is very true.

Speaker 2

And you know the fact that this only came out a few months ago and everyone has been talking about it and experimenting with it kind of speaks for itself.

Speaker 1

Shall we leave it there? Let's leave it there.

Speaker 2

This has been another episode of the Oddlots podcast. I'm Tracy Alloway. You can follow me on Twitter at Tracy Alloway.

Speaker 1

And I'm Joe Wisenthal. You can follow me on Twitter at the Stalwart. Follow our guest Stacey Raskin. He's at s Raskin. Follow our producers Carmen Rodriguez at Carmen Arman and Dash o' bennett at dashbot. And check out all of our podcasts at Bloomberg under the handle at podcasts, and for more Oddlots content, go to Bloomberg dot com

slash odd Lots. We blog, we post transcripts, we have a newsletter, and check out the Odd Loots Discord people listeners chatting twenty four to seven about all the things we talk about here. We even have an AI specific world that's really fun and set and the semiconductor room, and so people chatting about these things. I even so listened to some questions for today from that group, so it's really fun. I like hanging out there. To go to Discord dot gg slash pop. Thanks for listening

Transcript source: Provided by creator in RSS feed: download file