GPUs and how the cloud is changing (with Cedana Founder, Neel Master)

⁠¶ Cedana: Real-Time Compute Migration

00:00

So thanks so much for joining us. Neil Master will be joining us today. Neil is the CEO and co-founder of Sedana. Before this, he was the CEO and co-founder at Nguden. He spent a decade as a VC and started his career as an AI and computer science researcher. Did I get all that right? Yes. Awesome. Well, welcome to the podcast, Neil.

00:20

would love to start out with a super easy question. So what's Sedana doing? Thanks, Dan. I'm excited to be here. Sedana is we're real-time save, migrate, resume for compute. So what we can do is we can take a running... compute job. It can be any processor container. And we can move it from instance to another instance without breaking anything, maintaining all C, and without causing any interruption.

00:45

And the objective of that is when you're able to do that, when you're able to migrate jobs like that, you unlock value in terms of reliability, capacity, latency, and significant price performance.

⁠¶ Checkpoint Resume Technology Explained

00:59

That sounds like sci-fi. Is it, are you lying? Or is that all real? You know, it's interesting that, you know, one of the technologies, it's not the only technology, one of the core technologies around this is called third pointing and third point resume. And interestingly enough, Checkpoint Resume was used in high-performance computing for decades. And you had to do that because when you're on a single machine, you don't assume a hard drive.

01:25

is sort of fail. Like a hard drive might have 10,000 hours mean time between failure. But when you start to scale that up across thousands of hard drives, all of a sudden those hard drives are failing every month or every week or even every day. So they had to find a way to...

01:39

preserve their work. And so what people would do is in their software, they would write a way to save the state and then find a way to read it back and restore it. Now, the way they did that was very application specific. So they would do that. for their sophisticated computational fluid dynamics simulation or their AI software or whatever it may be. We've expanded this concept of Checkpoint Resume in a couple different ways. The first is that we said,

02:06

What if you could not just check by resume on the same machine, but you can migrate it to any other instance? And then the other is, could you write it in a way so that it's not application-specific? You don't have to write code for every different new program.

02:18

but could you make this generalized for every processing container? And to answer your question, it's not magic. It is actually working today. We have a sandbox and API that people can use today to do their own checkpoint. It resumed. It actually works on Kubernetes clusters.

⁠¶ Process Checkpointing for ML Training

02:33

So wait, just so I understand, you started by talking about hard drives. And in the hard drive space, it makes a lot of sense that you could freeze the file state. You're talking about not just freezing the state of a file. This is something like pickling, I guess, like where you pick an object in Python, but somehow for an entire process.

02:47

Is there some magic because you're in Kubernetes land about the fact that you actually can stuff on one Kubernetes cluster and continue on another? Is that like part of the secret sauce? No, it's not. And actually, when I talked about the high-performance community, they were actually doing it for programs. I mentioned the hard drives as a failure, but they were actually doing that for programs.

03:07

For example, imagine you're using PineTorch, you just save the weight. And that's actually a common way people do headpointing today on PineTorch is that they might save the weight of whatever their intermediate solution is. What we're doing is lower level. We go deeper into the process and memory. We actually checkpoint the entire process, the memory, the network sockets, the file descriptors.

03:31

And that enables us to be able to recreate that process in its entirety on another machine. Now, there's a devil in the details which you have to deal with because you don't know exactly what the environment is going to be on the next machine. You can't assume all the file systems are going to be the same.

03:46

There's a number of things that you have to do in order to be able to get this to be done. And just for some reference, there are open source solutions like there's a package called Kriyu that's made by some amazing people that allows Checkpoint. It's not really designed for the migrating point, but they do checkpoint resume. And it is the piece of software that we do leverage. So just to ask, actually, so checkpointing, if I have a machine learning process and I pause and checkpoint...

04:10

It's actually like a bunch of work where you try to make sure that you get the state of the learning rate scheduler, the state of the optimizer. The optimizer is often much bigger than the model weights themselves, which means that checkpointing at least... If nothing else, it takes a really long time. Is it any faster when you go lower level relative to checkpointing normal machine learning process? Yeah, you can take advantage of the terminal space of a CPU. So the way you can do that is...

04:35

We've leveraged the fact that the operating system is continuously... As an operating system deals with multiple processes at the same time, and it knows how to interleave between them, we are able to leverage that aspect of the operating system. so that when we take a snapshot of a processor container, we get a fully integral snapshot, meaning we snapshot it right at the place where all this computation is ready to be serialized to disks. and it can be done in a way that doesn't interrupt.

05:04

the actual process as it's checkpointing. Meaning we can copy, we can move that memory in a way that's very optimized. So when we run our actual results and we're measuring it, it has an impact of between one to 2% on actual CPU utilization. So you're actually able to like checkpoint. I mean, this sounds like transformational for training processes, right? Like if nothing else, I'm sure there are a lot of other stuff, but like...

05:27

If you're telling me that as opposed to the normal process by which you checkpoint machine learning model training, which takes a long time so you don't do it too often, you lose all the work since the last checkpoint.

05:35

It takes a bunch of compute. You're saving stuff over the network. And you're actually pausing the execution of the program so that even if there's a file that's halfway through being read, you would actually preserve the fact that you're halfway through reading it. You would preserve the state of your network connections.

05:50

at least theoretically. That's crazy. Yeah, in fact, you know, you don't have to take our words further. There's companies who, the large companies of Meta, Microsoft, a number of others, actually are using Checkpointing. in their very specialized ways. Facebook wrote a paper about how impactful their ability to checkpoint resume has been for their number one use case of AI, which is a recommendation engine.

06:16

There's a paper that they released that was... And I forgot what conference it was put in. I think it might have been one of the Unix-oriented conferences. They showed significant performance improvements. And actually, one of the things that they did for one of their training jobs is they open sourced the logs. And you can see just through hardware failures, out-of-memory failures for the workloads, how often failures happen when you're doing training and fail, and how much time and money.

06:43

Checkpoint Resume can provide in terms of savings and efficiency for training.

⁠¶ Why Compute Migration Isn't Ubiquitous

06:50

So just to ask like an obvious question, I mean, you're a startup founder, not your first startup. You've done this before. Every founder is always like trying to figure out, I think if I were a founder here, which I am, big questions, just like, why the hell is no one doing this? Like, why would you, you mentioned like Facebook doing it at scale, but.

07:04

This sounds both like very foundational, fundamental technology that you would imagine to be in every cloud. You'd imagine it to be integrated everywhere. And it sounds like the technology is there, but it's pretty low level. It's pretty obscure. Big companies have started to do it. Why is this not ubiquitous already?

07:19

I think a couple of reasons. One, I think there is a trend right now because when you think about generative AI, it's so powerful. And one of the things that's most powerful about it is the fact that it can... used by so many different people in so many different ways. Many people are really focusing their efforts on that aspect of the SAC. They're going at the higher level, prompt level engineering. They're using existing models. There's so much to do on that area.

07:43

The techniques and the skills that you need to do what we're doing really have to go lower level. You have to go at the kernel level, you have to go at the operating system level, and less people seem to want to go into that direction. So we just find...

07:55

that we're zigging while other people are zagging. I think that's one way to think about it. And it's really painful code to develop effectively because there's so much level of debugging and optimization that needs to be done. I think the second thing is that... The large companies have built aspects of this. They haven't focused on providing this as a service because I think that they see this as something that is a competitive advantage for them.

08:21

There's a project by Microsoft called Simularity. They tape all the papers around it. They've shown that this SaaS can work. They do a lot of interesting and amazing things in it. But that's not something that they plan to provide to the public anytime soon. And so... Why? Well, I think it's because...

08:40

Right now, there's a good amount of value, I think, that they're accruing by getting this efficiency themselves. But I think a bigger one is that it can cannibalize some of their revenues, right? Because if you use this... One of the big advantages is your training utilization of GPU goes from 30% to 40%, which a lot of people stay at, and you can go above 70% to 80%.

09:05

And that means you spend less money on GPUs. Wait, why would your utilization go up that dramatically? Because oftentimes, the way that these training jobs run, there's a difference between your actual usage of GPU and your... effective compute time, the amount of compute that you used that was actually progressing your training. Those two things differ widely. Actually, there's the number of papers that show that actually that difference is there, that difference between...

09:33

what training jobs use in terms of utilization versus done optimal amount is very different. And so to give an example, you might have a Pytorch job when it's training, the ethics might be... hours, if not many hours long. If it fails at any one of those points, you have to restart that epoch again. So it's wasting wall clob time, but it's also wasted compute time.

09:56

Okay, so you're saying basically, if you can increase fault tolerance, you're actually just reducing compute usage. Exactly. That's a simpler and better way to put it. Exactly. Sure. And then I know, okay, just to give the big vision, as you gave it to me, you want to turn the cloud into a robot?

⁠¶ The Cloud as a Robot

10:10

Did I say that right? Or did I say that? It sounds a little facetious the way I said it, but it sounds so cool as a concept. Yeah, so let me tell you how we think about that. One of our backgrounds is in AI and robotics, both the founders. And we see this ability to migrate compute jobs across instances as like a core primitive, but really as an actuator. Because now if you can wait, if you can move compute jobs, think about it today. Today was the entire DevOps cycle.

10:38

is based on a batch a priori mode, meaning you deploy your jobs, assuming you know what resources they should be on. You deploy them and you have to wait for either those jobs to finish or fail or restart them before you can optimize the next. set of parameters in terms of how you allocate your instances, your job, your workloads, how and when you deploy. Now, imagine instead of having to interrupt your jobs, you could actually update those parameters on a minute-by-minute basis.

11:08

That changes the way you do DevOps. And now, if you think about it, now you can do control optimization the way that you would control a robot, you can control a cloud. And the actuator is just disability. to dynamically move compute jobs across instances without losing work. And why would you do that? Because if you can do that, now you can start thinking about instead of all the time and effort that is thought of about.

11:32

Should I use AWS or GCP? Should I use this type of GPU? When should I do training? When should I do inference? You can now have policies that automate that on an ongoing basis and that are doing minute-to-minute updates. So you're imagining like...

11:46

If the compute, like, don't think about what your compute job looks like. If it's going to be CPU heavy, don't worry, we'll move it over to a CPU machine. And you don't have to know because you could just kick it off on the most high resource machine available and then just keep moving it until it's at the right machine.

11:59

And it doesn't have to stop moving at any time. And if prices change, if spot instances become available, you just keep moving it to the most optimal machine and never stop, basically. Yeah, I'll give you an example. So this is a common thing is if you start to scale as a generative AI company.

12:14

You have two main operations. You have training, which you constantly want to train your models to become bigger and better. And then you have inference. Now, those are actually two different types of jobs in terms of SLAs that you might want. For training, you might say, look, I don't care whether this training job gets done this Friday or next Friday. I want to maximize my price performance in terms of the costs. I want to use the lowest possible...

12:39

So you're imagining people actually submit a job and say, like, here, I need 100 GPU hours done by next Saturday, and they don't actually care. If those are 100 consecutive hours, 100 parallel hours, they don't care if they're all at night, they don't care if they're during the day. And you might actually have machines that are running inference during the day where you actually are optimizing. So would you imagine, am I getting this?

13:05

The right direction? Yes, exactly. You can imagine for inference to say, look, I will pay more money, but these are customer facing, so I want my inference jobs to be... it's high four or five nines of reliability. So if there's a failure, I want you to immediately, without losing any work, I want you to...

13:23

create a new instance, and resume that inference job from exactly where it last left off. So does that mean, I mean, just to ask, because there are a bunch of companies that brag about just having fast inference scale up, right? Like just having cold start times that are good. I imagine this...

13:36

is like the trick to get the fastest gold start times. Yeah, that's really great insight. You can eliminate the entire boot process by using this, because you can resume a job exactly after it's already been booted and with all of its state initialized.

⁠¶ GPU Glut or Bottleneck?

13:51

Damn. All right. All right. Awesome. I gave you, you know, you've had your chance to pitch. So now I'm curious to ask you the hard questions. The big topic you and I were talking about was basically GPUs. Like there are a lot of GPUs now. And now we're in the world of speculation. But the question is basically, I guess, like, are we in a world of GPU glut?

14:05

Or are we still bottlenecked by not having enough GPUs? What do you think? Good question. I think this is like one of those things. And again, I think why, you know, restore getting asked. Your resource is going to be such a good thing. So on one hand, you hear these announcements say,

14:22

Somebody's like Meta, which is now saying, you know, but the end of 2024, they're going to have 350,000 H100. I heard that. And a total 600k of H100 equivalents. And you're seeing, you know, the allocation... Hold on, can we just do the math here? 350,000 H100s. What do we think the price per is? Is that 10 to 20K probably at that scale? Yeah, I think it could be even more than that.

14:46

So you're talking about billions, billions of dollars. I mean, even optimistically, if it were 10K apiece, which would be crazy optimistic, that's 3.5 billion, right? And if it's... or if you include the 600, if you assume it's roughly equivalent price, we're talking minimum $10 billion of meta investment in GPUs, right? Yes, yes, exactly. I mean, between 25 to 40k for these, a pop, I'm a dad.

15:08

Not to include operating costs. These are very power hungry. A few of these a year. I don't know. I think they're 700 watts. So a few of these a year is... significant in terms of power consumption, like the equivalent of a small family household. Wait, a kilowatt hour is something like 10 cents, roughly? Yeah, I think that's depending on where you are.

15:27

Obviously it varies, but if it's 700 watts, there are 24 times 365 in a year. So that would be $876 roughly for each H100 per year. Amazing when you think about it. It's still negligible compared to the price of the machine though, right? I think that the energy becomes a big bottleneck because when you start doing this at scale, you have to procure that much power, you have to cool it, you have to operate it. And I think all of those things become more sensitive.

15:56

And if you look at the metric of flops for Joule of Energy, I think that's one of the things that comes up is that...

16:02

All of that comes into the total cost of operation. Facebook obviously has an incredible operation. They're knowing how to do this. You're just saying if they have those GPUs, they also have CPUs. The CPUs are plugged in. They also cost energy and then they have hard drives and they have networking and then they have... air conditioners and then they have arms guards who they pay a bunch of money for and then have the facilities.

16:20

all right, I can see an increase. I could see the plus $1,000 for a computer for a year is still a lot of money. I won't pretend. But you would come to this, if we were talking a year ago and Meta spent $1 billion on machines, we would have said that's absurd, right? Now they're spending $10 billion, something like that.

⁠¶ GPU Market Dynamics & Future Efficiencies

16:35

that. I mean, can this keep scaling? Have we reached a point where everyone who wants a GPU has a GPU now, and then GPUs stop being so sacred? Or what do we think? It's a great question. So let's look at Facebook. and then let's zoom out at the higher level. For Facebook, it's interesting because they've said what they've published is that they're single.

16:57

use of AI, the biggest use case for AI is recommendation engines. And recommendation engines actually in some ways a killer use case for this because for recommendation engines, they can measure how much more revenue they make. And so if there is enough market there for them to continue to send money on AI and then generate more value in terms of recommendations or engagement...

17:19

they have a financial incentive that they can map out to say like, this is worth our investment in AI. Obviously, they have other special use cases and they're doing Lama 3 and some other R&D around them. But that is one of the big drivers for them is how much of that market can they continue to capture and build on.

17:35

It's true. I mean, they also spend money on ads, meaning actually helping produce high-quality ads that get clicked on. Not just recommending ads, but also producing high-quality ads. And obviously, they're investing in future products. A lot of this is going to be Facebook AI research. A lot of this is going to be... the metaverse. A lot of this is going to be, you know...

17:52

They have their chatbots now that they've launched on every platform. And they're actually really good. Not the characters, the characters kind of suck. But the more general like meta AI, I don't know if you've used it. It's really good. So, you know, I think what we're getting at here is like, is there a cap out where they're like, now we've milked. recommendations.

18:07

Or is there going to be like another product or another 20 products? Or maybe even jumping to an extreme absurd statement here, Meta has just, they just print money. Like, is there really a point where they can't just continue to invest in AI with the hope that it pays off in 10 years? Like they're nowhere near.

18:21

out of money. I agree with that. And they're in this really great position where they have so many users that if they can monetize the VR faster, if they can monetize other use cases for AI. and they can come up with other chatbot applications for it, that becomes an area where they can continue to generate revenues on. When you pull out of this, you look at the volumes that NVIDIA can generate. between Nvidia and TSMC for GPUs, it seems like all of those allocations of GPUs are accounted for.

18:54

meaning there's some buyer on the end of all of these. And it's really hard to get allocation. But when does that stop? That seems to be baked in because there's a supply chain bottlenecks. for the next couple of years. And so you can argue that for some meaningful period of time over the next couple of years, GPUs will still be a very valuable and in some ways, large commodity.

19:17

I guess what's interesting here is you're not paying for an expensive GPU. You're paying to be the first one to get access to an expensive GPU. Yeah, and you know, how long will Facebook meta... And Microsoft and others use this. They use these H1 languages for two or three years and then...

19:33

start to use more advanced ones and other people get access to them. How does all this start to work? And the way we like to think about this is a liquid market. What if all of this is just trying to scale GPU and then as a market...

19:46

As Meta starts to buy the B100, some of those H100s become available to other people, right? And so there's like a hierarchy of how these GPUs get access. I think a really interesting... kind of applied is to look at, you know, Bill Daly from NVIDIA kind of summed up how NVIDIA has boosted the performance of his GPU on AI tests a thousand fold over 10 years.

20:10

And what's really interesting, and actually, I would recommend anyone listening to check out Dylan Patel's Emmy analysis articles on this. There's a lot of useful information. He does a really good take on this. But it was interesting just like... Everyone knows about semiconductor processes and how they get smaller and smaller. Yeah, not everyone knows about semiconductor processes, but sure.

20:32

you know, the geometry of these circuits that are etched get smaller and smaller, which leads to, you know, the doubling of transistors around every 18 months. That's kind of like the fundamental of Moore's Law. But that component really... What Bill Daly talked about really only contributed around two and a half times the performance efficiency, right? Another area, a big area of efficiency was the way numbers are represented, the floating point and other standards.

20:58

that contributed to 16x. That's something that kind of blew me away. 15x in performance increase is just the way that these numbers are represented on the chips contributed to. Yeah, you're talking about moving from FP64 to...

21:12

Basically, in date, right? Exactly. Other types of floating point improvements in the floating point representations. And the reason I bring that up that I think is really interesting is that who knows what other things will come up in the next 10 years. Model efficiency contributed according to this slide. That was the next. So while there's like this physical bottleneck of chips...

⁠¶ Evolving GPU Market & AI Applications

21:31

There's a lot of performance optimizations that I think are going to happen. Yeah, I see that. I mean, let's get concrete here. So we're talking about, I feel like when we talk about GPUs, sometimes we talk about one or, you know, eight A100s or a thousand. And then meta, we're talking about...

21:44

350,000. And we're not talking about 350,000 GPUs that are just hanging out for 350,000 users. It's basically like, what if we took 350,000 GPUs, purpose them for one purpose, which is create a foundation model that no one else can create. And if we're talking like... globally, there are a handful of organizations that can and want to do this, right? Basically, it's like Anthropic, Google, Amazon, you know, Microsoft, and Facebook. And that's...

22:07

That's probably it. I don't think Apple can even compete on anything close to this. And that's Apple. Maybe I'm wrong about this. Maybe Apple has all those. But one way to look at this is there are two GPU markets that are completely separate. There's the GPU market of Facebook, which is 350,000 GPUs.

22:22

And then there's the GPU market of the rest of us, which includes like Hugging Face and Mistral probably, where maybe you get a thousand GPUs and you get them for some period of time, but you're not getting 350,000. That's just not even, you know, in this world. point of view, I guess, with all these optimizations that you're talking about is like, what if you do cap out?

22:39

meaning in terms of what people need. What if these performance booths, switching to Int8, whatever, mean that using 350,000 GPUs is not that much better than using 1,000. And then a company like Mistral competes with GPT 3.5 because they just don't need all that compute. And I guess that's more of a scientific empirical question of what happens in the world. No, exactly. I think how that plays out, that is going to be a factor because just one...

23:05

There's so many unanticipated ways that these GPUs seem to use. And two, there's people working on every layer of this app to make all of this much more efficient, more effective. And there's this interesting tipping point I think that's going to happen, which is, we've talked a little bit about this.

23:20

Or is that I think this will be kind of like a very interesting kind of milestone or marker for just AI in general, which is this kind of notions of Jeevan's paradox, which is that... I think simply stated, when you have increased efficiency of a resource being used, you would think that the increased efficiency... would actually cause less resource consumption, less consumption in GPUs. But what actually happens is that with increased efficiency, it increases demand.

23:52

Effectively, one example of that is, if you can imagine in five years, the ability to use a model like what has the equivalent of GPT-4's capability becomes 100x more efficient. then you may see that capability embedded in every little device from a wristwatch to an embedded sensor. And people might just end up adding that to more and more things.

24:17

And that would increase the demand and you start to get greater saturations. I mean, like even just so far, by the way, GPT-4 is like used almost, I was just going to say GPT-4 is used like almost nowhere. Like at this point, it's amazing. It's phenomenal. People use it as a chatbot.

24:32

your paid subscription gets you, I don't know if they've changed it now, but it's like 25 messages max an hour for 20 bucks a month. Like they were literally capping usage because they're like, we just can't afford to give you more compute. And you got all these startups that were like, look, we've found ways to train tiny versions of models that aren't quite as good as GPT-4, but so much cheaper. And now GPT-4 is $0.02, I believe, for 1,000 input tokens and $0.01 for 1,000 output tokens.

24:56

That means if you have an 1,000 token article and you want to get a topic from it and you're like, tell me the topics, GPT-4, you're spending one cent. That means if you have 100,000 articles where you want to get topics, you're spending $1,000. That's a joke for that kind of scale.

25:10

And if you imagine these numbers just falling, I'm completely with you. It's just a matter of the nature of compute could change when suddenly you're like, yeah, totally, I could write Panda's code to do this, or I could just pass it into GPT-4 and it'll be fine. It'll do the... data transformation correctly. Exactly. That's where I think things start to become really interesting. There's a leading edge, but there's going to be a will there. And I think there is going to be a point where...

25:34

Some of these things become so low cost that they almost become like utilities fail. And that starts to drive, that could drive a bigger demand. for this type of compute. And there's another aspect of this too when we talk about that just for the GPU. Look at how many use cases now are... that have an almost direct proportional correlation to the amount of compute they use. So protein folding and computational biology, right? That seems to be an area. Drug discovery is an area where...

26:05

It's now increasing in terms of how highly correlated the amount of processor usage they have versus their ability to identify new drugs, new pathways. It's pretty incredible, right? Weather simulation and climate modeling. is another one. And I think what you're starting to see now early on is you're starting to see people stimulating enterprises. They're stimulating how their customers behave or their web traffic. And I think you'll start to see more enterprises. One of the trends here.

⁠¶ Future of Cloud: GPU Dominance?

26:32

Curious to get your take on one of the trends here we're talking about, which is just the move to GPUs, right? Like the nature of the cloud could fundamentally change if... it becomes increasingly cheap and efficient to use GPUs to do new processes through matrix multiplication that previously would have required a bunch of complex and task-specific code.

26:49

In your eyes, do you see the world, the nature of compute shifting, where GPUs just explode and eventually become 90-99% of cloud spend? That's a really good question. I don't know if I can answer that in the sense that traditional compute continues to grow. I can certainly see GPU outfacing that in terms of dollar value in the next 10 years, just because GPUs are much more expensive. But what that ratio is, it's going to be really interesting to see.

27:15

that might think about it, and I'm just thinking out loud here as a framework because I haven't really thought about it as deeply, is that how many CPUs on average does a human on Earth use, right?

27:25

If you think about all compute, if you just divide that and normalize that by the developer in a developing world, you probably get two different numbers in terms of how much average you compute each one of those users, right? And you mean like when you send me an email and then Google processes it and... sends it to spam while I'm asleep, I just used a little bit of compute right there. Or when you booked an airline ticket, and even the fractional amount of compute that was used in your flight.

27:49

to London or wherever you're going, all of that, right? If you think about it that way, because people look at energy that way too, right? People look at energy per capita use, so let's look at CPU per capita use. And so the question is, is like... how will CPU, I think one of the ways to think about it is how will CPU per capita use be versus GPU per capita? I mean, I'm imagining in the email case, right?

28:11

if every one of my emails went through a spam detection filter that was using a GPU, which it might already, to be honest, I'm not sure, but like imagining some decently sized language model processing all my emails, that might dominate compute relative to all the other stuff that my email client's doing. And then if it...

28:26

went through each one and decided, here's how I'd reply and left a draft for me. Like, this could be an astronomical amount of compute that just makes everything else look small. Or another way to put it, when you talk about per human, like... humans are expensive as workers, right? Like humans do human work. And it's just like if GPT-4 costs one cent per thousand input tokens, two cents per thousand output tokens.

28:46

to reach a point of, you know, like right now it would be hard to run language models at $10 an hour. Like if that were your goal to produce that kind of cost, like you... probably could do it, but even that is kind of hard. GPT-4 is already throttling you on how much compute you can use per hour. If you imagine full throttle, spend $100 per hour to replace humans that cost $200 per hour, doesn't that just the amount of compute you would have to use?

⁠¶ AI Costs, Winner-Takes-All, Healthcare

29:09

to get to that $10 an hour is pretty high, right? It is. Here's a question for you. How much do you think about this problem? Which is, let's say the training cost for GPT-4, it was a lot of money, like nine figures, right? But... GPT-4 is also one of the fastest growing products out there, right? And even more if we look at GPT-3 or 3.5. But what's interesting is those costs amortize across more people very quickly.

29:36

How does the amortization of costs differ from between CPUs versus GPUs? Because a lot of these GPU use cases have like this really big upfront training costs, but then inference kind of behaves in a different way. when it's inferencing across a lot of users. So I don't know if you have any thoughts about that. Yeah, I mean, I think about this a lot, actually, just because it's a weird one.

29:57

that demonstrates sort of how winner takes all the space could be at least, which is like, if we're actually talking 350,000 GPUs, and then we imagine that open AI is like 350,000, that's a joke. You only spent 6 billion. We want to spend 100. Like, the other thing is, when you think relative... to GDP about how expensive the Manhattan Project was. We haven't gotten close to that. The market cap of crypto is a trillion dollars with a T.

30:19

a trillion dollars for crypto. That's what people have invested in crypto. I don't think AI has reached those kinds of investment levels yet at all. I don't think the US government has remotely gotten to this Manhattan Project level expenses. If we imagine that, a single entity just going big.

30:33

and saying we're going to spend a trillion dollars on GPU compute, we can, I mean, at very least, you have to imagine it becomes winner takes all. Because the only way anyone does that is to say we're amortizing over the fact that we will be the only best project. And that model eventually theoretically could be distilled to be the same size and compute level as equivalent models.

30:53

it will be way more expensive to train, but it'll outperform all equivalent models. And so you leave the model, the normal CPU model, which is like, my CPU is half as good, but half the price. And then people are like, all right, sure, let's do it. No, if your model is 90% as good, it's actually going to be more expensive.

31:07

because you can't amortize over as many people. I wonder if that trend is going to start existing. Although I have to say, I thought it would happen and it hasn't yet because right now we're not in a winner-takes-all market and it's weird. Yeah. To your point of it, winner-takes-all, it's interesting because... come to government use cases, right? So the government has used like an aspect of computer vision for many years, decades now, in terms of recognizing addresses automatically.

31:31

or semi-automatic, right? Yeah, that was the original MNIST use case, right? And what's interesting, though, is to look at the kind of economics of that, you build the system to do computer vision or analyzing. envelopes. But then the value is that it reduces the percent that stands for postage, right? Because it makes it more operational efficient.

31:51

So if you think about how it amortizes there, you think about sending a lot of money on weather. There's not that many weather models. The best weather models are out there. There's several of them. predictability and accuracy in weather has significant impacts on the economy. And so...

32:10

The way to think, how you think about how much money gets sent on that, but then the benefit across how many people is pretty interesting. Correlations, though, you're going to see that, I think, in a lot of different areas, especially those that...

32:22

I think, especially from a previous company in healthcare, I think you'll see areas of opportunity where they're really high and they haven't come down and they're above inflation. That's where I think there's going to be a lot of really interesting... If you're open to commenting on it, you...

32:36

At Nguden, you guys hired human contractors to make phone calls, right? Yes. We used human contractors to make phone calls, but the other flip side of it was we used AI. We built, and this is kind of pre-GPT, we built... systems to analyze texts, in particular to unstructured texts, which comprises this to get a portion of medical records. Because what happens there is there's a huge amount of condition, sign, symptoms, medications, and other...

33:04

important insights that are in the unstructured text, but they don't get properly documented. They get overlooked. And there's a lot of valuable information there. And what we did is we analyzed that. We were able to summarize and contextualize those insights.

33:18

their managers could very rapidly, would normally take 30 to 60 minutes of EHR review time, would be compressed into a few minutes for them. And so we had a human in the loop and it basically augmented their time. I guess what I'm wondering here is like, you were trying to save human time. There's definitely, I mean, Neil and I met earlier this week, the founder of Hippocratic AI, which is trying to automate nursing. Just mentioning another Neil, not me. Yeah, sorry, different Neil, not you.

33:42

My bad. Neil Parikh. Anyway, but yeah, I mean, I guess this is the trend, right? Like at some point, if we're talking about GPU explosion, you know, my brother wrote this article about how there are... more of a GPU kind of perspective. And I think the main thing here is like revenue, right? Like for GPUs to actually explode, they have to make money. You're talking about where they can save money through efficiencies.

34:00

But at some point, we are talking about doing, you know, what is currently considered human work. How far away do you think we are from being good in being able to, if you're not comfortable commenting? live, that's fine. So I think in those areas, I think it will take time because in healthcare and because we were inside the workflow of PCPs, those are slower moving.

34:19

and you have to deal with regulatory and compliance and you have to deal with also patient behavior. And all of those things take more time and thought in terms of implementing and moving forward. So I think your brother's article actually does have a point in terms of what is the ROI on descending. And I think that...

34:36

is clear for the larger companies than it is for some of the other enterprises or smaller companies. And I think that's what all startups have to just figure out. But I also think that That's where there's going to be, I think, opportunities because there is for sure going to be areas like with some of the examples that we've spoken about, which should take advantage of augmenting units, right? And I think the question is going to be, where are those levers? Where?

35:01

one human is managing a lot of costs. And actually, that's one of the reasons why we went into primary care is that if you think about a single primary care doctor, their average channel is like over 3,000 patients. And if you think about 3,000 patients times and primary care being... entry point for preventative care and overall care management, that's a lot of medical sense that a single doctor managed.

35:23

So I think that that can be a very interesting entry point over time to really say, what are all the different ways that we can either augment all the human costs? Between a doctor and the patient, there's a huge number of people when you add them up in terms of administrative, others of the care that sit between that doctor and that patient to get care.

35:44

And a lot of that can be made more efficient. And that contributes a lot to the overall health care. So you see it as like the doctor, I guess, being the target person. You're like, let's keep their job, basically. But we should scale out their ability to already take care of the 3,000 patients. that they're already responsible for taking care of. Yes, exactly. Is that really 3,000? That's an insane number. 3,000, and just for some stats, it's only going to get worse because...

36:08

I think it's two-thirds of primary care doctors are close to retirement age. It's an older, growing number of primary care physicians. And we're going to have more people over the age of 65 in the next 10 years in America. then we're going to have under the age of 18, and that's going to be the first time in this company's country's history. So this is another good reason to invest in AI, right? The demographics aspect is going to be an area where people are living longer and...

36:35

at least as they did generations ago. And so there's going to be, we're going to need ways to address just the fundamental economics of having like a fewer number of working age people supporting. a greater number of retired people. How is that going to work? And in the US, it's actually a pretty healthy demographic. But if you start to look at other countries, they're going to have much harder times. And so there's going to be a global ease for...

37:00

applications if people can figure out how to make that relationship more effective. Yeah.

⁠¶ Startup GPU Strategy & Cloud Advantage

37:05

Alright, so just going back to GPUs so we can close up. I guess the topic, we were talking a lot about the future of GPUs, the present of GPUs, the glut, or the bottleneck. But I think there is good reason to believe if Facebook doesn't need to get rid of these... insane number of GPUs that they're holding on to. There can be a bottleneck for a long time. Do you think startups should just buy GPUs? You know, that's the question that comprises, I think, at least a two...

37:29

interesting points that a founder has to think about. The one is trying to anticipate the market, which is really always difficult to do as a founder. And then there's a second one, which is like, how do I spend my money? How do I save default a lot? Which is just... the fundamental problem every startup has. And so my take would be...

37:45

Without knowing anything, and I'm sure there's always exceptions to startups, which is part of the reason of being a founder. You want to always be the exception when you know you've got the exception. But I would say that you would err on the side of not buying them because you want to do things as capital efficiently as possible.

37:59

And if you have to buy GPUs to have an advantage, and that becomes you're spending more on GPUs than other aspects to your R&D, then the question is always like, how come someone with bigger pockets can't out-compute you? So I think it's going to be...

38:13

How can you find or use GPUs in a more efficient way? And what ways can the market, can you share them? Can you barter? Can you do other things that get you access to them? I think those are going to take entrepreneurial moves by founders. I agree.

38:27

I'm not buying GPUs right now, but it's something to think about just because the flip side is how the hell are you supposed to compete with Apple, Google, etc. without... buying gpus right like that's i think the flip side argument where you're like hey look we want you look at companies that have raised a ton of money all at once and like the pitch to me makes sense when you say like look gpus matter and we need gpus like are we supposed to like

38:48

joke with you about the ability for us to compete with Facebook. No, I mean, we want to build great models. We're going to need GPUs. And especially if we have this bottleneck, if people are keeping their GPUs private, it does seem like there's the flip side of a company that says, let's wait a while until we buy GPUs.

39:02

Like, sure, that could work. But there's also the world where, you know, that's a really bad idea. This is one of the reasons we started this down is we see an opportunity to take people more, you know, when people... by reserved instance of a GPU because they need access to it, oftentimes that utilization is less than 60-50%. They should be able to resell that unused capacity to someone who wants to do training.

39:26

ideally be focused on getting the GPU utilization as close to 90 and above as you can. Because that is an indicator of how much, how quickly you can use and that's unused to pass. And so, you know, if you can increase GPU capacity... Yeah. And I think that it's a funny one, though, because like, it's what makes me I'm personally like, I think I'm very bullish on Google.

39:46

does Google winning the AI wars? Microsoft and Google both have clouds, and that's kind of like the advantage. Google in particular is using their own clouds, but that's basically how you get perfect efficiency, right? own the GPUs, train on the GPUs, also sell them to actual companies, you know, who are using your cloud. Yeah, anyway, I wonder if that itself is just a reason to bet on big clouds.

40:07

I think, again, I'll reference Dylan Patel. He has some really good articles about this and his point is like... They have really scaled out TPUs. They have generations of them. And they've been using them for some time. They've been using them on YouTube for some time now as well as other applications. And so they have one of the biggest infrastructures that they both use that they can also...

40:28

And I think that capacity, as well as their ability to have both traditional GPUs like NVIDIA, but also GPUs will give them an edge going forward. Awesome. All right. Just to ask, is Sedana hiring?

⁠¶ Cedana Hiring and Recommended Resources

40:42

Donna is hiring. We're hiring. We tend to look for people who want to go deep into the stack. So people who understand Linux fundamentals, kernels, GPUs. all of those type of low-level embedded things and who also want to learn distributed systems. Just being real here, I don't want to ask any uncomfortable questions, but are you competing with NVIDIA in your hiring? No, we're not competing with NVIDIA.

41:07

You know, it's interesting, like these... Like I imagine just if you're dealing with low-level people, like these must be the most... I don't know any people like this, or I know very few. And I imagine these are just the most wanted people on Earth. I think, you know, it's interesting. They are in demand. Some of these people are very much in demand. We haven't been getting directly with them yet, but...

41:27

You know, who knows, maybe in the future we will. Yeah, because the other, I mean, we're talking about these resources. I imagine at some point, Sedana, if you're not investing in hardware, you're going to be investing in your team. So it seems like a good place to be.

41:38

And I know you mentioned a couple of resources already, but I'd love any resources you'd recommend for people to catch up in this space. Yeah, again, Semi-Analysis is a blog post. It's a blog that Dylan Patello does. He focuses on semi-anactor analysis. I would look at the Hot Chips conference, and Dell Dowie in particular, from NVIDIA to get it directly from the source. I think those are two great sources for people who want to understand this.

42:03

And I always recommend digging in. The good thing about this market is there's really only one company's annual report that you really need to look at, and that's NVIDIA. Yeah. That makes a lot of sense. Awesome. Well, thanks so much for joining us. This has been Neil talking about Sedona, about GPUs, some other fun stuff. Oh, yeah. And I guess disclosure that Neil is an investor in Slingshot.

42:22

I'm amazed that you somehow have balanced being a researcher and an investor and getting into healthcare and now getting into seriously deep tech. So amazed, really enjoy your conversations. Thanks so much for joining. Thanks, Daniel. I really appreciate it. That's a wrap for today. Thanks so much for joining us. If you're an ML enthusiast, I'd love to hear from you. Feel free to reach out on LinkedIn or at hello at slingshot.xyz. We'll be back with more next week.

✨ This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.

Summary

Episode description

Transcript