Ep18. Jensen Recap - Competitive Moat, X.AI, Smart Assistant | BG2 w/ Bill Gurley & Brad Gerstner - podcast episode cover

Ep18. Jensen Recap - Competitive Moat, X.AI, Smart Assistant | BG2 w/ Bill Gurley & Brad Gerstner

Oct 13, 202454 minEp. 18
--:--
--:--
Listen in podcast apps:

Episode description

Open Source bi-weekly convo w/ Bill Gurley and Brad Gerstner on all things tech, markets, investing & capitalism. This week, joined by Sunny Madra (Groq) they discuss Jensen Huang’s recent appearance on the podcast, including scaling intelligence towards AGI, NVIDIA's strategic positioning, the role of CUDA in the developer ecosystem, the future of inference workloads, systems-level thinking,  Elon Musk's influence on AI development, the future of AI assistants, open vs closed AI models, safety and security in AI development, & more. Enjoy another episode of BG2.


Chapters


(00:00) Introduction and Initial Reactions to Jensen


(04:32) NVIDIA's Position in Accelerated Compute


(05:11) CUDA and NVIDIA’s Competitive Moat


(12:53) Challenges to NVIDIA’s Competitive Advantage


(18:22) Future Outlook on Inference


(24:46) The Insatiable Demand for AI and Hardware


(27:12) Elon Musk' and X.ai


(31:47) Scaling AI Models and Clusters


(34:17) Economic Models and Funding in AI


(39:08) The Future of AI Pricing and Consumption Models


(42:25) Memory, Actions, and Intelligent Agents


(47:08) The Role of AI in Business Productivity


(51:03) Open vs Closed Models in AI Development



#jensenhuang #nvidia #bradgerstner #billgurley #clarktang #xai #memphiscluster #elonmusk #noambrown #openai #gptstrawberry

Transcript

You may also be running up against the even for the Mag 7, the size of Capo X deployment where there's CFOs starting to talk at higher levels. For sure. Totally. Sunny, Bill, great to see you guys. Good to see you. Good to be back. Thanks, man. It's great to have you. We literally just finished two days of the altimeter annual meetings. I mean, we had hundreds of investors, CEOs, founders, and the theme was scaling intelligence to AGI. We had Nikesh talking about Enterprise AI.

We had Renee Haas talking about AI at the edge. We had No One Brown talking about the strawberry and no one model. In inference time reasoning, we had Sunny talking about accelerating inference. And of course, we kicked off with Jensen talking about the future of compute. I did the Jensen talk with my partner Clark Tang who covers the compute layer in the public side, we recorded it on Friday. We'll be releasing it as part of this pod. And man was at Dense.

I mean, he was, you know, he was on fire. He told me I asked him at the beginning of the pod, what do you want to do? He said, grip it and rip it. And we did 90 minutes. We want deep. I shared it with you guys. We've all listened to it. I learned so much playing it back that I just thought it made sense for us to unpack it, right? And to really analyze it, see what we agree with, what we may disagree with, things we want to further explore. Sunny, any high level reactions to it?

Yeah, you know, first, it's the first time you've, I've really seen him in a format where you got all that information out in one setting because you kind of get the, you get the tidbits. And the ones that really struck with me was when he said, Nvidia's on a GPU company. They're an accelerated compute company. I think the next one, you know, which you'll touch on is where he's really said the data centers, the unit of compute. Right. I thought that was, that was massive.

And you know, sort of just closing out when he talked about, he thinks about using an already utilizing so much AI within Nvidia and how that's a superpower for them to accelerate over everyone they're competing with. I thought those were kind of really awesome points in him, you know, eating the dog food as they say. It is incredible.

You know, there's this thing we'll talk about later, but he said he thinks they can 3X, you know, the top line of the business while only adding 25% more humans because they can have 100,000 autonomous agents doing things like building the software, doing the security and that he becomes really a prompt agent, not only for his human direct reports, but also for these agents, which, you know, really is mind boggling. Will anything stand out for you?

Well, one, I mean, you should be pleased that you're able to get his time. You know, this is at points in time, the largest market cap company in the world, if not one, two. And so it was so, I think kind of him to sit down with you for so long and during the party, saying, I can stay as long as you want. I was like, do you have something to be doing? He's incredibly generous and it's fantastic. But my other big, my, I mean, I had two big takeaways.

One, I mean, it's obvious that this guy's, you know, rolling on all cylinders here, right? Like you have a company at a 3.3 trillion market cap that's still growing over 100% of a year. And the margins are insane. I mean, 65% operating margins. There's only like five companies in the S&P 500 at that level. And they certainly aren't growing at this pace. That's right. And when you bring up that point about getting more done on the increment with fewer employees, where's this going to go?

Like 80% operating margin. I mean, that would be unprecedented. There's a lot of it's already here that's unprecedented. But obviously Wall Street is fully aware of the unbelievable performance of this company. And, you know, the multiples reflected in the market cap reflects it, but it's super powerful how they're executing. And you can see the confidence in every answer that it gives.

We spend about a third of the pod on Nvidia's competitive mode, really trying to break it down, really trying to understand this idea of systems level advantages, the combinatorial advantages that he has in the business. Because I think when I talk to people around the investment community, despite how well it's covered bill, right, there's still this idea that it's just a GPU and that somebody's going to build a better chip. They're going to come along and displace the business.

And so when he said, again, it can sound like marketing speaks, honey, when somebody says it's not a GPU company, it's an accelerated compute company. You know, we showed this, we showed this chart where you can see kind of the Nvidia full stack. And he talked about how he just built layer after layer after layer of the stack, you know, over the course of the last decade and a half. But when he said that, sunny, I know you had a reaction to it, right?

Even though you know it's not just a GPU company, when he really broke it down, it seemed like, you know, he did break new territory here. Yeah, like what was great to hear from him and really, you know, positive for, you know, folks thinking about where Nvidia lives in the stack right now is he kind of got into details and then the sub details below CUDA.

And he really started going into what they're doing very particularly on mathematical operations to accelerate their partners and how they work really closely with their partners, you know, all the, the cloud service providers to basically build these functions that they can further accelerate workloads. The other little nuance that I picked up in there, he didn't focus purely on LLMs.

He talked in that particular area about how they're doing that for a lot of traditional models and even newer models are being deployed for AI. And I think just really showed how they are partnering much closer on the software layer, then the hardware layer alone. Right. I mean, in fact, you know, he talked about, you know, the CUDA library now has over 300 industry specific acceleration algorithms, right, where they deeply learn the industry, right?

So whether this is synthetic biology or this is image generation or this is autonomous driving. They learn the needs of that industry and then they accelerate the particular workloads. And that for me was also one of the key things. This idea that every workload is, is moving from kind of this deterministic, you know, you know, hand made workload to something that's really driven by machine learning and really infused with AI and therefore benefits from acceleration.

Even something is ubiquitous as data processing. Yeah. And I shared this code sample with Bill as, you know, we were just preparing for this pod.

And you know, I knew Bill would like to process it right away and then ran it, which was it really showed like every piece of code that's out there now that's related to, or not every piece of many of the pieces have this like sort of if device equals CUDA, do X, and if it's not, do Y. And that's the level of impact they're having across the entire ecosystem of services and apps that are being built that are related to AI. But I don't know what you thought when you saw that piece.

Yeah. I mean, I think there is a, there's a question for the long term that relates to CUDA. And I want to go back to the system point you made later, Brad, but while we're on CUDA, is what percentage of developers will touch CUDA and is that number going up or down? And I could see arguments on both sides.

You could say the models are going to get more and more hyper specialized and performance matters so much that the one, the models that matter the most, the deployments that matter the most, they're going to get as close to the metal as possible and then CUDA is going to matter. The other side you can make is those optimizations are going to live in pie towards, they're going to live in other tools like that. And the marginal developers not going to need to know that.

And I don't, I could make our both arguments, but I think it's an interesting question going forward. I mean, I just asked Cheshire, GBT, how many CUDA developers are today just to be on top it? Three million CUDA developers, right? And a lot more that touch CUDA that aren't specifically developing on CUDA. So it is one of these things that has become pretty ubiquitous.

And his point was it's not just CUDA, of course, it's really full stack all the way from data and justion all the way through kind of the post-training. I think all on the ladder of your point Bill, like I think there's going to be fewer people touching that. And I do think that's a point where the mode is not as strong as a longer term, as you say.

And think about like, you know, the way, the analogy that I would go with is like, think about the number of iPhone iOS like developers working at Apple, building that versus the number of app developers, right? And I think you're going to have, you know, 10 to 100 to 100 ratio of people building at layers above versus people building down closer to the bare metal. That'd be something to watch. We can ask more people over time, obviously, it's a big lock today, for sure.

You know, and I think Bill, to your point, you know, I reached out to Gavin. Actually, before I did the interview, Gavin Baker is a good buddy and who obviously knows the space incredibly well has followed it at a deeper level for a longer period of time than I have. And you know, like when I asked him about the competitive advantage, he really said a lot of the competitive advantages around this algorithmic diversity and innovation and why could it matters?

He said, if the world standardizes on transformers on PyTorch, then it's less relevant for GPUs, you know, in that environment, like if you have a lot of standardization, right, then advantage goes to the custom A6. But I'll tell you this, you know, and I've had this conversation with a lot of people. And I asked Jensen, I pushed him on, you know, custom A6.

I was like, hey, you know, you've got, you know, accelerated inference coming from meta with their MTIA chip, you know, you got infrancial and trainium, you know, coming is like, yeah, Brad, like there, you know, they're my biggest partners. I actually share my three to five year roadmap with them. Yes, they're going to have these these point solutions that are going to do these very specific tasks.

But at the end of the day, the vast majority of the workloads in the world that are machine learning and AI infused are going to run on Nvidia. And the more people I talk to, the more I'm convinced that that's the case, despite the fact that there'll be a lot of under winners, including GROC and cerebrus, et cetera. And they're acquiring companies, they're moving up the stack, they're trying to do more optimization at higher levels. So they want to extend obviously what Kudis doing.

Don't go to inference yet. That's a whole lot of. No, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, not that bit about the deep integrations, right? That's a playbook that I think Microsoft really had done well for a long time at enterprise software. And you really haven't seen that in hardware ever.

If you go back to say Cisco or the PC era or the cloud era, you didn't see that deep level integration. Now Microsoft pulled it off with Azure and when I heard him talking all I could think about was, man, that was really smart. But he's done as he's gotten together, really understand what the use cases are and build an organization that deeply integrates into his customers and does it so well all the way up into his roadmap that he's much more deeply embedded than anyone else is.

When I heard that part, I kind of gave him a real tip of the hat on that one. But what did you, what was your take on that? You and I had this conversation after we first listened to it. And you know, if you really tell us, tell us go about. He talks as a systems level engineer, right? Even if you hear like people, you know, people went to Harvard business, go say, look in this guy possibly have 60 direct reports, right? But how many direct reports does Elon have?

These systems level and he said, I have situational awareness, right? I'm a prompt engineer to the best people in the world at the specific task. I think when I look at this, the thing that I deeply underappreciated a year and a half ago about this company was the systems level thinking, right? That these are that he spent years thinking about how to embed this competitive advantage and how it really, it goes all the way from power, all the way through application.

And every day they're launching these new things to further embed themselves in the ecosystem. But I did hear from somebody over the last two days who, you know, Renee Haas, the CEO of ARM, right? Renee was also at our event and he's a huge Jensen fan. He worked eight years at Envidia before becoming the CEO of ARM in 2013. And he said, listen, nobody is going to assault the Envidia castle head on, right? Like the main frame of AI, right, is entrenched and it's going to become a lot bigger.

At least as far as the eye can see, he said, however, if you think about where we're interacting with AI today, right, on these devices, on edge devices, he's like our installed base at ARM is 300 billion devices. And increasingly, a lot more of this compute can run closer to the edge. If you think about an orthogonal competitor, right? Again, if he has a deep competitive mode in the cloud, what's the orthogonal competitor?

The orthogonal competitor peels off a lot of the AI on the edge and I think ARM is incredibly well positioned to do that. Clearly, Envidia's got ARM embedded now in a lot of their, you know, in a lot of their grace, black well, et cetera. But that to me would be one area. Like if you looked out and you said, where can their competitive advantage, you know, be challenged a little bit. I don't think they necessarily have the same level advantage on the edge as they have in the cloud.

You started to pod by saying, you know, everyone's heard this in the investment community. It's not a GPU company, it's a systems company. And in my brain, I think had thought, oh, well, they've got four in a box instead of, you know, just one GPU or eight in a box. At the time I was listening to the podcast you did with Jensen, I was reading this, Neil Cloud Playbook and Anatomy, post by Dylan Patel. Yes. How's it going?

It was into extreme detail about the architecture of some of the larger systems, you know, like the one that X.ai that we're going to talk about that was just deployed, which I think is 100,000 nodes or something like that. And it literally changed my opinion of exactly what's going on in the world and actually answered a lot of questions I had.

But it appears to me that NVIDIA's competitive advantage is strongest where the size of the system is largest, which is another way of saying what Renee said, it's flipping it on a chat. It's not to say it's weak on the edge, but it's super powerful when you put a whole bunch of them together. That's when the networking piece thrives. That's where NVIDE link thrives. That's where Kuda really comes alive in the biggest systems that are out there.

And some of the questions that answered for me was, one, why is demand so high at the high end and why are nodes available on the internet, you know, single nodes available on the internet for at or below cost? And this starts to get it to because you can do things with the large systems that you just can't do with a single node. And so those two things can be simultaneously true. Why was NVIDIA so interested in core weave existing?

Now I understand, like if the biggest systems are where the biggest competitive advantage is, you need as many of these big system companies as you can possibly have. And there may be, if that trajectory remains true, you could have an evolution where customer concentration increases for Nvidia over time rather than going the other way.

Depending on how, you know, if Sam's right, that they're going to spend a hundred billion or whatever on a single model, there's only so many places you're going to be able to afford that. But a lot of stuff started to make sense to me that didn't before. And I clearly underestimated the scale of what it meant to be a non-GPU company to be a system company. There's, this goes way, way up. Yeah. And, you know, again, Bill, you touched on something that I think is really important here.

So, what is this question of whether their competitive mode is also as powerful in training as it is in inference, right? Because I think, I think that there's a lot of doubt as to whether their competitive mode is as strong as inference. But, you know, let's just, but I asked him if it was as strong. He actually said it was greater, right? To me, you know, when you think about that, in the first instance, right, I think it didn't make a lot of sense.

But then when you really started thinking about it, he said there's a trail of him for a behind the infrastructure that's already out there that is coot a compatible and can be amortized for all this inference. And so he, like for example, reference at OpenAI had just decommissioned Volta. So it's like this massive installed base. And when they improve their algorithms, when they improve their frameworks, when they improve their Cuda library, it's all backward compatible.

So Hopper gets better and Ampere gets better and Volta gets better. That combined with the fact that he said everything in the world today is becoming highly machine learned, right? Almost everything that we do, he said almost every single application, word, Excel, PowerPoint, Photoshop, AutoCAD, like it all will run on these modern systems. Do you buy that? Do you buy that when people go to replace, you know, compute, they're going to replace it on these modern systems?

So when I was listening to it, I was buying it. But then when I, he said one thing that kept resonating in my mind, which he said, inference is going to be a billion times larger than training. And if you kind of double click into that, these old systems aren't going to be sufficient enough, right?

If you're going to have that much more demand, that much more workload, which I think we all agree, then how is it that these old systems, which are being decommissioned from training are going to be sufficient? So I think that's where that argument didn't hold, just didn't hold strong enough for me. If that grows as fast as he says it is, as fast as, you know, you guys have seen it in their numbers, then it's going to be a lot more net new inference related, you know, deployments.

And there, I don't think that that argument holds on the transfer from older hardware to newer hardware. Well, you said, you said something pretty casually there, right? Let's underscore this, right? We were talking about the strawberry in the O1 preview and he said there's a whole new vector of scaling intelligence, inference time reasoning, right? That's not going to be single shot, but it's going to be lots of agent to agent interactions, thinking time is no one brown likes to say, right?

And he said as a consequence of that, inference is going to 100,000 X, a million X, a million X, maybe even a billion X. And that in and of itself, right, to me was, you know, kind of a while moment. 40% of their revenues are already inference. And I said over time, does your inference become a higher percentage of your revenue mix? And he said, of course, right? But again, I think conventional wisdom is all around the size of clusters and the size of training.

And if, if models don't keep getting bigger, then their relevance will dissipate. But he's basically saying every single workload is going to benefit from acceleration, right? It's going to be an inference workload. And the number of inference interactions is going to explode higher. Yeah. One, one technical detail, which is you need bigger clusters if you're training bigger models. But if you're running bigger models, you don't need bigger clusters. It can be distributed, right?

And so I think what we're going to see here is that the larger clusters will continue to get deployed. And as Bill said, they'll get deployed for folks, maybe a limited number of folks that need to deploy it for $100 billion runs or even bigger than that. But you'll see inference clusters be large, but not as large as a training clusters and be a lot more distributed because you don't need it to be all in the same place. And I think that's what'll be really interesting. It was interesting.

He, um, he simplified it even more than you did there, Brad. He said, think about a human. How much time do you spend learning versus doing? And he used that analogy as to why this was going to be so great. But I, in a little different way than, than Sonny, I thought the argument that the reason we're going to be great at inferences because there's so much of our old stuff laying around wasn't super solid.

In other words, what if some other company, Sonny's or, or, or, or some other one, um, decided to optimize inference? It wasn't an argument for optimization. It was an argument for cost advantage, um, because it might be fully distributed or whatever. And, and of course, if, if, if you had maybe poked him back on that, he might have had another answer about why for optimization.

But, but there are clearly going to be people, whether it's, you know, other chips, companies, some of these accelerator companies, they're going to be people working on inference optimization, which may include edge techniques, I think some of the accelerators may look like AICD ends, you know, if you will, and they're going to be buying stuff closer to the customer. So it, that, that, all TBD, but, but just the argument that you've got it left over, it didn't seem super solid to me.

And the three fastest companies in inference right now are not in video. Right. So, who are they, Sonny? Show it. We'll post the leaderboard. Yeah. It's, it's a combination of Grox, Arebres, and San Bonova, right? Those are three companies that are not in video that are on the leaderboards of all the models that they run. You come up for performance, performance, you see. Performance. Yeah. Yeah. And I, I would argue even price. Yeah. And make the argument, why, why are they faster?

Why are they cheaper in your mind? But yet, notwithstanding that fact, Nvidia is going to do, let's call it 50 or 60 billion of inference this year. And these companies are, you know, still just getting started, right? Why is there inference business? It's just because of install base. Yeah. I think it's a combination of install base. And I think it's because that inference market is growing so incredibly fast.

I think if you're making this decision even 18 months ago, it would be a really difficult to decision to buy any of those three companies. Because your primary workload was training and the, you know, the first part of this pod, we talked about how they have such a strong tie-in integration to getting training done properly. Right. And so, you know, as you can see, all the non-invideo folks can get the models up and running right away.

There is no tie-in to CUDA that's required to go faster, that's required to get the models running. Right. Obviously, none of the three companies run CUDA. And so that moat doesn't exist around inference. Yeah, CUDA is less relevant and inference. That's another point, you know, worth making. But I wanted to say one other thing to what Sunny just said.

If you go back to the, the early internet days, and this is just an argument that optimization takes a while, all of the startups were running on Oracle and Sun. Every single fucking one of them were running on Oracle and Sun. And five years later, they were all running on Linux in my school, like in five years. And so, and it was literally, it went from 100% to 3%. Not, and I'm not making, I'm not making that projection that that's going to happen here.

But you did have a wholesale shift as the industry, you know, they went from developing and building it for the first time to optimizing, which are really two separate motions. It seems to me I pulled up this chart, right, that we shared. We made bill way earlier this year for the pod, which showed the trillion dollars of new AI workloads expected over the next four to five years.

And the trillion dollars of effectively data center replacement, and I just wanted to get his updated kind of reaction or forecast. Now that he's had six more months to think about whether or not, you know, he thinks that's achievable. And what I heard him say was, yes, the data center replacement is going to look exactly, you know, like that. Of course, he's just making his best educated guess. But he seemed to suggest that the AI workloads could be even bigger, right?

Like that once he saw strawberry, you know, one that he thought the, you know, the amount of compute that was going to be required to power this and, you know, the more people I talk to, the more I, you know, I get that same sense, there is this insatiable demand. So maybe we just touch on this, you know, he goes on CNBC and he says, the demand is insane. Right. And I kept trying to push on that. I was like, you know, yeah, but what about MTIA? What about custom, you know, inference?

What about all these other factors? What if models stop getting so big? I said, well, any of that changed the equation. And he consistently pushed back and said, you still don't understand the amount of demand in the world because all compute is changing, right? I thought he had a, you know, one nuance that answer, which was when you asked him that, he said, look, if you have to replace some amount of infrastructure, you know, whatever that number was was really big.

And you're, you're part of that and you're a CIO somewhere task with doing this. What are you going to do? What are you going to replace it with? It's accelerated compute. And then immediately once you make that choice, because you're not going to traditional compute, then Nvidia is your number one choice. So I thought he kind of tied that back together in that like, are you really going to, you know, get yourself in trouble by having something else there? Or you're just going to go to Nvidia.

Yeah. Yeah. I said it, I didn't want to say that bill, but it felt like the old IBM argument. Yeah. Look, I mean, one thing, Brad is this company's public when a private company says, oh, the demand's insane. I, you know, I immediately get skeptical. This company's doing 30 billion a quarter, growing 122%. Like the demand isn't saying like we can see it. There's no doubt about it. And part of that demand was a conversation about Elon and X.ai and what they did.

And I thought it was also just incredibly fascinating, right? I thought it was funny. I asked them the question about the dinner that he and Elon and Larry Ellison apparently had. And he's like, you know, just just because that dinner occurred and they ended up with 100,000 H 100s. Don't necessarily connect the dots.

But listen, he confirmed that his mind was blown by Elon and he said he has an end of one superhuman that could possibly pull off, that could energize a data center, that could liquid cool a data center. And he said, what would take somebody else years to get permitted to get energized, to get liquid cool to get stood up that X.ai did in 19 days, you know, and you could just tell the immense respect that he had for Elon.

It's clear, you know, he said it's the single largest coherent supercomputer in the world today that is going to get bigger. And if you believe that the future of AI is tied closely together with the systems engineering on the hardware side, you know, what hit me in that moment was that's a huge, huge advantage for Elon. Yeah, I think he, I forgot the exact number, but like he talked about how many thousands of miles of cabling that were just in there as part of the task.

Look, you're coming to it from a bit, you know, doing a lot of that ourselves right now, building data centers, standing them up, racking and stacking, you know, our nodes, it's impressive. It's impressive to do something at that scale in 19 days. You know, it doesn't even include how quickly they built that data center. I think it's all happened, you know, within 2024. And so that's part of the advantage.

The interesting thing there is he didn't touch on it as much as what, when he talked about it, doing the integration with cloud service providers. What I'd love to kind of double click into is, because you know, Elon is in a unique situation where he's obviously bought this cluster. He has a ton of respect for Nvidia, but he, you know, is building his own chip, building their own clusters with Tesla.

So I wonder how much, you know, cross correlation or information there is for them, for them to be able to do that at scale. And you know, you guys look at this. What have you kind of seen on their clusters? I don't really have a lot of data on the non Nvidia clusters that they have. I'm sure freedom on my team does. I just don't have enough top of my head. If we have it, I'll, you know, I'll pull a chart and I'll show it.

So you said you now think the X AI clusters, the largest Nvidia cluster, a lot. Well, I, I'm saying because I believe Jensen said it in the pot that he said, it's the largest supercomputer in the world. Yeah, I mean, I, I just want to spend 30 seconds on what you said, Brad about Elon. I'm staring out my window at the gigafactory in Austin. There was also built in record time, Starlinks insane.

When we were walking into the ablo, I just kept thinking, you know who I'd love to re imagine this place, Elon, right? And I don't, the world should study how he can do infrastructure fast because if that could be cloned, it would be so valuable, not really relevant to this podcast, but worth noting. The other thing that I thought about on the Elon thing and this, this also where these pieces coming together, my mind about these large clusters and how important that was to Nvidia.

He got allocation, right? This is supposed to be like the hottest company, the hottest product backed, you know, backed up for years on demand. And he walks in and takes what equates sounds. Looks like about 10% of the quarters of availability. And in my mind, I'm thinking that's because, hey, if there's another company that's going to develop these big ones, I'm going to, I'm going to let them to the front of the line.

And that speaks to what's happening in Malaysia and the Middle East and any one of these people they're going to get excited. He's going to spend time with them put them at the front of the line. You know, well, I tell you, you know, I pushed him on this. I said, you know, Elon's going to, you know, rumor is he's going to get another 100,000, you know, H 200's, Adam to this cluster. I said, are we already at the phase of two and 300,000 cluster scale? And he said, yes.

And then I said, and will we go to 500,000 a million? And he's like, yes. Now I think these things, Bill, are already being planned and built. And what he said is beyond that, beyond that, he said, you start bumping up against the limitations of base power. Like can you find something that can be energized to power a single cluster? And he said, we're going to have to develop distributed training.

And he said, but just like with Megatron that we developed to allow to occur, what is occurring today, we're working on the distributed stuff because we know we're going to have to decompose these clusters at some point in order to continue scaling them. You may also be running up against the even for the Mag 7, the size of Capo X deployment where there's CFOs, sir, to talk it higher. For sure. Totally.

And there's a super interesting article in the information just now where it came out today where Sam Altman is questioning whether Microsoft's willing to put up the money and build a cluster. And it may have been that may have been kind of triggered by Elon's comments or Elon's willingness to do it at x.a. What I will say on like the size of the models, like we're going to push into this really interesting realm where obviously we can have bigger and bigger training clusters.

That naturally imposes that the models are bigger and bigger. But what you can't do is you can't take a single model, like you can train a model across the distributed site and it may just take you a month longer because you have to move traffic around. And so instead of taking three months, it takes you four months. But you can't really run a model across the distributed site because that inference is a real time thing. And so we do, you know, we're not pushing it there.

But when you start to get the models, it become way too big to run in single locations. That may be a problem that we want to be aware of and we want to keep on the in our minds as well. On this question of scaling, you know, our way to intelligence, one of the things I asked no, I'm brown, you know, today in our fireside chat, he made very clear his perspective.

Although he's working on inference time, reason, which is a totally different vector in a breakthrough vector at open AI, which we ought to spend a little bit of time talking about. He said, you know, now there are these two vectors, right, that again are multiplicative in terms of the path to AGI. He's like, make no mistake about it. Like, we're still seeing big advantages to scaling bigger models, right?

We have the data, we have the synthetic data, you know, we're going to build those bigger models and we have an economic engine that can fund it. Right. This company, you know, is as over four billion in revenue scaling, probably most people think to 10 billion plus in revenue over the course of the next year, they just raised six and a half billion. They got a four billion dollar line of credit from city groups. So among the independent players, Bill, right?

Like Microsoft can choose whether or not they're going to fund it, but I don't think it's a question of whether or not they're going to have the funding. At this point, they've achieved escape velocity. I think for a lot of the other independent players, there's a real question whether they have the economic model to continue to fund the activity.

So they have to find a proxy because I don't think a lot of venture capitalists are going to write multi billion dollar checks into the players that haven't yet caught lightning in a bottle. That would be, that would, that would be my guess.

I mean, you know, I just think it's hard, you know, listen, at the end of the day, we're economic animals, you know, and I've said before, you know, if you look at the forward multiple, most of us underwrote two on open AI, it was about 15 times forward earnings, right? If Chad GPT wasn't doing what it was doing, if the revenue wasn't doing what it was doing, right, this would have been massively dilutive to the company. It would have been very hard to raise the money.

I think of Mistral or all these other companies want to raise that money. I think it'd be very difficult. But you know, you never, I mean, you know, there's still a lot of money out there. So it's possible. But I think this is, you should, you should, you should, 15 times earnings. I think you met revenue or 15 times revenue, for sure. Which I said, you know, when Google went public, it was about 13 or 14 times revenue and meta was like 13 or 14 times revenue.

So I do think we're on the, on the precipice of a lot of this consolidation among the new interest. What I think it's so interesting about X is, you know, when I was pushing him on this model consolidation, pushing Jensen on it, he was like, listen, with Elon, you have somebody with the ambition, with the capability, with the know how, with the money, right, with the brands, with the businesses.

So I think a lot of times when we're talking about AI today, we oftentimes talk about open AI, but a lot of people quickly then go into all of the other model companies. I think X is often left out of the conversation. And one of the things that I, things I took away from this conversation with Jensen is again, if the, if scaling these data centers is a key competitive advantage to winning an AI, right? You, like you absolutely cannot count out X dot AI in this battle.

They're certainly going to have to figure out, you know, something with the consumer that's going to have a flywheel like chat GPT or something with the enterprise. But in terms of standing it up, building the model, having the compute, I think they're, you know, going to be one of the three or four in the game. You, you, you touched on maybe wanting to close out on the, on the strawberry like models.

You know, one, one thing we don't have exposure to, but we can guess at his cost and that chart that they showed when they released strawberry, the X axis was logarithm. So the clock of a search with, with the, with the new preview model is probably costing them 20 X or 30 X, what it does to do a normal chat GPT search. Right. I think as fractions of a penny, but figuring out which, and it also takes longer.

So figuring out which problems it's acceptable and ginsengave a few examples for it to take more time and cost more and to get the cost benefit, right? For that type of result is something we're going to have to figure out, like which problems tilt to that place. Right. And you know, the one thing I feel good about there and again, I'm speculating I don't have information from open AI on this. But what we know is that the cost of inference is fallen by 90% over the course of last year.

What we, you know, what Sonny has told us and other people, you know, in the field of tallest, the inference is going to drop by another 90% over the cost of the next, you know, period of months. If you're, if you're racing long rhythmic needs, that's, you're going to need that to that. Right. And, you know, and, and here's what I also think happens, Bill, is in this chain of reasoning, you're going to build intelligence into the chain of reasoning, right?

So that, you know, you're going to optimize where you send these, you know, each of these inference interactions, you're going to batch them, you're going to take more time with tonight because it's just a time money trade off, right? At the end of the day, I also think that we're in the very earliest innings as to how we're going to think about pricing these models, right? So if we think about this in terms of systems one, systems two, level thinking, right?

Systems one being, you know, what's the capital of France? Right. You're going to be able to do that for fractions of a penny using pretty simple models on, on, on, on chat GPT, right? When you want to do something more complex, if you're a scientist and you want to use O1 as your research partner, right? You may end up paying it by the hour and relative to the cost of an actual research partner. It may be really cheap, right?

So I think they're going to be consumption models, you know, for this, like I think we're we haven't even scratched the surface to think about how that's going to be priced, but I totally agree with you. That is going to be priced very differently. Again, I think this puts, I think opening I has suggested, you know, that the O1 full model may even be released yet this year, right?

One of the things that I'm kind of waiting to see is I think, you know, listen, having no, no, I'm brown for quite a while now, he's an N of one, right? And he wasn't the only one working on this for sure at Open AI, but you know, listen, whether it's pleuribus or, or, or winning at the game of diplomas, he's been thinking about this for a decade, right? And it was his major breakthrough on how to win the game of six handed poker. And so he brought this to Open AI.

I think they have a real lead here, which, which leads me back to this question, Bill, and I talk about all the time, which is memory and actions, right? And so I have to tell you this funny thing that occurred at our investor day. So I had Nikesh on stage and, you know, obviously Nikesh, you know, was instrumental at Google for a decade. And so I wanted to talk to him about both consumer AI as well as enterprise AI. And I asked him, I said, I want to make a wager with you.

I knew, of course, he would take a bet. And I said, I want to make a wager with you over under. I'll set the line at two years until we have an agent that has memory and can take action. And the canonical use case, of course, that I use was that I could tell my agent, book me the Mercer Hotel next Tuesday in New York at the lowest price. And I said over under, you know, two years on getting that done. I said, I'll start five thousand bucks. I'll take the under. He snapped calls me.

He says, I'll take the over. And he said, but only if you 10 X the bet. And of course, we're doing it for, we're doing it for a good cause. So I had to call him because I, you know, I can't, I can't not step up to a good cause. So we're, we're taking the opposite sides of that trade. Now, what was interesting is over the course of the next couple days, I asked some other friends who took the stage, you know, where they would come down on, on the same bet, right?

I was a friendly tang, took the under. A friend from Apple who will remain nameless kind of took the over and then know him brown who was there pleaded the fifth. He says, I know the answer. So I can't say. And so I was kind of provocative. And I, you know, I texted to cash and I said, I think you better get your checkbook ready. You know, so coming back to that bill, you know, strawberry au once an incredible break through something that thinks that this whole new vector of intelligence.

But, but it kind of makes us forget about the thing you and I focus so much on, which was memory and actions, right? And I think that we are on the real precipice of not only these models think, you know, can spend more time thinking, not only can they give us less hallucinations, you know, and just scaled compute. But I also think, I mean, you already see the makings of this. I mean, use these things today, they already remember quite a bit.

So I think they're, they're sliding this into the experience. But I think we're going to have the ability to take simple actions. And I think this metaphor that people had in their minds that they were going to have to build deep APIs and deep integrations to everybody. I don't think is the way this is going to play out. And let me just, what do you think's going to play out?

Well, I mean, the Easter egg that I thought got dropped last week is they did this event on, you know, their voice API, right? And it's literally your GPT calling a human on the telephone and placing an order. So why the hell can't my GPT just call up the Mercer Hotel and say Brad Gershner would like to make a reservation? Here's his credit card number and pass along the information. There is a reason for that.

I mean, look, but scrapers and, and form fillers have existed for how long, sunny 15 years. You, you could write an agent to go fill out and book at the Mercer Hotel 15 years ago. There's nothing impossible about that. It's the corner cases and like the hallucination when your credit card gets charged 10 grand, like you, you, you just can't have failure. And how you architect this so that there's not failure and there's trust. I'm sure you could demo this tomorrow. I have zero doubt.

You could demo tomorrow, could you provide it at scale in a trustworthy way where people are allocating their credit cards to it? That might take a little. Okay. So over under bill on two years, I mean, I'm going to get you, I'm going to get you action either way, but what's the test? The demo, I think you do it today. No, not this, not the cheesy demo. You just said, I'm talking about a release that allows me, you know, at scale to, uh, to book a hotel. We're spending your credit card.

Yeah. And not just you, but everybody full release. Yeah, well, call it a full release just because I know that's the only way I can tell you to take the bet. Hmm. Which today is October 8th, 2024. I mean, Sunny, you already know what he's going to say. We're, uh, you'll take the over, right, Bill? Yeah. Yes. Okay. So Bill's in the cash cap sunny. Where do you come down over under on two years? I mean, don't start edging bill. Don't start edging. I already said in demo today is 15 years ago.

Yeah. And I think people still are still working their way through it. You don't need a single agent right now to book the Mercer and deal with all the scraping stuff you're talking about. You can have a thousand agents working together. You can have one that's making sure that the credit card charge is not too big. You can have another one to make sure that the address is right. You can have another one checking in. So calendar. And so all of that's free.

So I'm on the under and Brad, I'll even go under one year. Wow. Yeah. Wow. I've got a little side action you and I sunny. I'm not going to go under a year, but I think it could we could have limited releases in a year. But sunny you and I now have action with Bill. What do you want Bill? A thousand bucks. Sure. To a good cause. Okay. Thousand bucks each to a good cause. And I'll just assume sunny that we'll get action from the cash as well.

And you know our friend Stanley Tang is definitely in the tank for some. So we're going to give some good money to a good cause. And listen, I think this is the trillion dollar question. I know we're all focused on, you know, scaling models and I know we're all focused on the compute layer, but what really transforms people's lives? What really disrupts 10 blue links, right? What really disrupts the entire architecture of the app ecosystem, right?

Is that when we have an intelligent assistant that we can interact with, it gets smarter over time that has memory and could take actions. And when I see the combination of advanced voice mode, voice to voice API, strawberry O1 thinking combined with scaling intelligence, I just think this is going to go a lot faster and most of us think now listen, they may pull on the reins, right? They may slow down the release schedule in order, you know, for a lot of business reasons.

That's harder to predict. But I think the technology, I mean, even know I'm said, I thought it was going to take us much, much longer to see the results that we have seen. Can I hit on one other thing? This is, you know, we started the pod a little bit talking about it. I just want to get your impression, Bill. This idea that Jensen can scale the business two or three times with, you know, increasing the head count by, you know, 20 or 25 percent, right?

We know that Meta's done that over the course of the last two years. And you and I've talked about, are we on the eve of just massive productivity boom and massive margin expansion, like we've never seen before, right? Nikesh said we ought to be able to get 20 or 30 percent, you know, productivity gains out of everybody in the business.

First of all, I think Nvidia is a very special company and it's a company that's, that's the, even, even if it's a systems company, it's an IP company and the demand is growing at such a rate that they don't need more designers or more developer engineers to create incremental revenue that's happening on its own. And so they're operating margins or record levels for the majority of companies. You know, I've always just held this belief that, you know, you evolve with your tools.

And the real, the real answer is the companies that don't deploy these things are going to go out of business. Yeah. And so I think margins get competed away in many, many cases. I think it's ridiculous to imagine, oh, every company goes to 60 percent operating margin. No, no, no, no. I mean, listen, Delta Airlines is going to do all of these things with AI and immediately because it's in a commodity market, it'll get competed away by Southwest and United. Bad industries remain bad industries.

Yeah. Yeah. But there might be some, you know, that figured out and I have another theory that always keep in mind, which is hyper growth tends to delay what you learned in microeconomic class. You know, I remember when I was a PC analyst and there were five public PC companies all growing 100 percent.

And so in moments of hyper growth, you will have margins that may or may not be durable and you'll have a number of participants in a market that may or may not be durable during periods of hyper growth. I have two more things on my mind, Sundi. Do you have any reactions? I mean, I just have to get to a couple of these topics. No, like, there's going to be a, uh, Lex Freeman, Lynx podcast, but you can answer.

No, I look, I really, you know, been thinking a lot about Jensen's point in the pod about, you know, how much AI they're using internally for design, design, verification for all those pieces, right? And I think, you know, it's not 30 percent. I, I actually think, um, sort of that's an underestimate. I think you're talking, you know, multiple hundreds of percent improvement in productivity gains. And the only issue is that not every company can grasp that that quickly.

And so, um, you know, I think he was kind of holding some cards back at that point when he made that comment. And it really got me thinking about like, how much are they doing there that they don't want everybody to know about? And you kind of see it now in the model development because they, you know, if you've noticed the last couple of weeks, they've put some models out there that are models trained on their own.

And they don't get as much noise as, you know, ones from Meta and, um, you know, the other players that are out there, but they're really doing a lot more than we think. And they, they, I think they have their arms around a lot of these very, very difficult problems. And why did they put their own model up? Well, it's related to this topic of open versus closed. So, um, Bill, you know, I hope you're proud of me. You know, I went back and I have to ask this question. I agree with him.

Right. And, you know, I thought, Jensen, you know, I thought he gave a great answer, which is like, listen, we're going to have companies that for economic reasons, right, push the boundary toward AGI or whatever they're doing. And it makes sense to have a closed model that can be the best and they can monetize. But the world's not going to develop with just closed models. We're going to, you know, he's like, it's both open and closed.

And, you know, he said, because open, he's like, it's absolutely a condition required. It's going to be the vast majority of the models in the industry. He's right now if we didn't have open source, how would you have all these different fields in science, you know, be able to be activated on AI? He talked about llama models exploding higher. And then with respect to his own open source model, which I thought was really interesting.

He said, we focused on, right, something that a specific capability and the capability that we were focused on is how to agentically use this model to make your model smarter faster. Right. So it's almost like a training coaching model that he built. And so I think for them, it makes perfect sense why that may, they may, you know, put that out into the world.

But I also, you know, a lot of times the open versus closed debate, you know, gets hijacked into this conversation about safety and security. And, you know, and I think he said, you know, listen, these two things are related, but they're not the same thing. You know, one of the things he, he comment on that is just, he said, there's so much coordination going on on the safety and security level.

Like we have so many agents and so much activity going on on making sure, you know, just look at what Meta's doing, you know, on this. He's like, I think that's one thing that's under celebrated that even in the absence of any, you know, platonic guardian sort of regulation, right, without any top down, you already have an extraordinary amount of effort going in by all of these companies into AI, safety and security that I thought was, I thought was a really important comment.

Thanks for jumping in guys, kicking this one around. It was a special one to you. If we'd grabbed on having that opportunity, that's pretty, that's pretty unique. And now we got a little wager. So I mean, listen, I am so looking forward to like doing a live booking at the Mercer on the pod, right? And then sunny, we can just drop the money from the sky. We can just collect, we can just collect exactly, exactly. Good to see you guys. Let's talk soon. All right. Peace.

As a reminder to everybody, just our opinions, not investment advice.

This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.