The Future is Fine Tuned (with Dev Rishi, Predibase)

Dev Rishi

00:00

Like, I'm sure in the next three to six months, we're going to see great open source variants that do something very similar. So open, you're thinking of like,

Daniel Reid Cahn

00:05

arms race is good, basically.

Dev Rishi

00:07

I think arms race is like, net good for everybody, like in, like, for all

Daniel Reid Cahn

00:12

consumers. Oh, awesome. So we can just jump in. Well thanks so much for joining us I'm here with Dev Rishi Dev did his undergrad and master's at Harvard in CS went on to work at Google, where he worked on Firebase, Kaggle, and then Google Cloud AI, which you mentioned became Vertex AI just towards the tail end of your time there. Yeah. And then you started Predabase, which you founded to do a couple of things you started with, if I understood, the low code framework Ludwig. Mm hmm.

00:37

Yeah. And then you guys also produced Lorax, and now you focus on Predabase. fine tuning and hosting language models that are task specifically trained. Is that

Dev Rishi

00:46

right? Yeah, that's pretty much perfect. You know, when we started Predabase in 2021, it was on top of Ludwig, which is a low code framework for building deep learning models. And we were convincing everyone why deep learning mattered. And then in 2023, everyone cared about deep learning, but just one variant of them with large language models. So we really, I think, pivoted the company to be very focused on LLMs.

01:04

And our, you know, unique views is how do we help you fine tune those models and how do we help you deploy them?

Daniel Reid Cahn

01:09

So language, Curious you weren't just focused on language, became focused on language. Do you think that language is special, or do you think people are focused on it today because that's, like, what's possible?

Dev Rishi

01:19

I think two things. One, I think language is special. I also think we're focused on it because we're very early. Like you know, the industry is early and as like startups were also quite early, like if you think about a lot of the kind of core models that have existed for a while, the transformer architecture I think has been applicable for language and like computer vision in particular.

01:38

Now you've applied it to audio, you've applied it to video and multimodal, but like the types of models people actually put into production pre OpenAI were probably like BERT, LongFormer, Decilbert, like these variants that were pre trained deep learning models. And then like, you know, the VIT and Vision Transformers from Google. Those are like the two areas. And so I think language is special because it's like, because it's possible to get started very early on that.

02:02

And we had a lot of prior art to be able to build with. And a lot of use cases are transcribed in language today. You

Daniel Reid Cahn

02:07

say it's easy to get started in language. Like, why is language hard?

Dev Rishi

02:11

For two reasons. One, I think there is that history and background already. So we like have a lot of tasks that are already defined. A lot of the types of tasks we see people looking to apply LLMs for today. Four years ago, companies were trying to do a lot of those same types of tasks, but using different model variants like BERT. So The use case imagination, the evaluation criteria, the datasets themselves, I think are already set up nicely in language.

02:34

There's more prior art there, I think is really what I want to say, actually.

Daniel Reid Cahn

02:37

Because, I mean, I tend to think the other, the other way to look at it is just like information theory, and like how condensed information is, that like, if a picture's worth a thousand words, like, that's awesome, but it has like a ton of pixels, you know? Yeah. So, You know, you now have like megabytes of data that really represent only a thousand words. A thousand words is pretty small.

02:54

I mean, like I, given that you guys have focused on infrastructure and tooling, like, do you think that there is, you know you know, is it a good place to be in the language space or is it sort of like scary that we're all handicapping ourselves by just focusing on this relatively easier problem?

Dev Rishi

03:07

To be honest, it's both. Like I feel like it is a good place to be in the sense that the compute infrastructure and layout, I think is much more optimized for being able to do. The types of production, lower latency, and higher throughput applications we'd want with language models. You know, even like, again, four years ago on like, V100s, T4s, you could effectively serve some of these types of like smaller language models. Today, you can get the type of throughput you'd want.

03:29

I kind of struggle to imagine if you're a really fine tuning multimodal video models, what kind of throughput you'd actually wanna be able to do, right? Yeah. If you're a security company and you have like security camera footage from all of your different offices, I dunno if there's realistically a great way to be able to process all of that video that comes in daily for many different cameras very quickly.

03:48

However, if you're a bank or if you're, you know, an and you have many emails that go through, we know we can scale up that infrastructure up and down. So

Daniel Reid Cahn

03:54

just to clarify, you're saying if I am. A security company and I want security cameras. Yeah. I can't use multimodal language models because they're super heavy and powerful and if my goal is just to see that someone walked in the door like that's crazy overkill, way too slow. So. I think it's harder

Dev Rishi

04:09

to use in production. Like I think you can start to use it but what you can't do is probably throw all of your video feeds directly out of it without it being really either cost prohibitive. Especially given kind of like the crunch on what hardware you need to be able to run on. Whereas I think you can go into production with something like email use cases, where people have just as many emails that get sent probably as like, you know, security camera footage as an example.

Daniel Reid Cahn

04:29

I mean, there also are general purpose, like, image models, right? Yes. Like segment anything model, like SAM. I don't know if that's outdated now, but.

Dev Rishi

04:36

Yeah, so I guess, it makes sense. And that's where actually, honestly, that's where I see the progression though. It's like language really kind of first. CV and computer version and images next, but then there's like even higher fidelity of like images mixed with audio mixed with video Yeah Where I think things like more or less get heavier and heavier over time and you asked is it like?

04:52

Convenient or is it scary and I said both the reason I'd say it's scary is because I think the space is moving so quickly That like what you want to make sure to do is like Offer benefits that aren't gonna get commoditized at the infrastructure layer, right? like like Yeah, it's possible to do language modeling that means a lot of people are doing language modeling means the infrastructure for it's getting cheaper And cheaper and so how do you think about your mark?

Daniel Reid Cahn

05:12

Yeah, whereas like very few companies are doing video because it's so freaking hard

Dev Rishi

05:16

Yes,

Daniel Reid Cahn

05:17

so language you mentioned you want to avoid being commoditized. I guess Just correct me if I'm wrong, but like, it seems like model hosting is getting commoditized. Is it? Yeah,

Dev Rishi

05:26

I think that there's I mean, I think that there are some differences that model hosting providers will go ahead and try and usurp There's two key ones that I think about. The first is like performance on throughput and latency. And so some performance like Groq I think is a great example of one that has gone to the hardware layer to be able to optimize that. The second I would say is around workflows. For like model hosting, but usually there's more advanced types of use cases.

05:48

For us, the workflow we're most interested in is when people have multiple fine tuned models, which is where we developed Lorax, a framework for hosting many fine tuned LLMs. But we've seen other people invest in hosting solutions that like, allow you to do shadow deployments, A B testing, you know, incremental rollouts, blue green deployments, and others.

06:04

Just the part that I think is getting commoditized is like if you're just doing base model inference and that's it, it's hard for me to understand why you pick one provider versus another other than cost

Daniel Reid Cahn

06:12

and quality. I mean, if you're, I, I, I'm totally with you, by the way, I think I won't list all of them, but there are other companies today offering free APIs of base models. That must be really hard to compete with.

Dev Rishi

06:25

Yes. I think that in my view, it's not actually something I want to compete with at all. So when I think there are companies and I actually think some of these companies have made meaningful amounts of revenue even but I don't know if it's necessarily high margin revenue. That's probably the thing that needs to get litigated. Right.

06:39

But essentially what you have is you sign up for these massive pre commits to GPU clusters, you optimize the throughput of your models a lot, and then you hope that you get really good utilization across those, so that you can more or less squeeze the lowest price per million tokens.

Daniel Reid Cahn

06:52

Yeah, whereas with, so Lorax, can you just quickly explain what Lorax is?

Dev Rishi

06:55

Definitely. So Lorax is essentially our serving framework at Predabase that we open sourced of last year. It's a multi Lora serving serving infrastructure, and what that really means is, We observed that when most people were fine tuning LLMs, they were doing a parameter efficient or lower fine tuning, where you only customize a small subset of the weights of the model, usually much less than 1%.

Daniel Reid Cahn

07:15

So the theory, just to clarify, is like, you can fine tune an entire model, but the vast majority of the language model is just like reading English, understanding basic concepts. And so realistically, like, like a human, you could have very specialized humans who are really damn good at their job, but 99 percent of their brain is identical. So we think like 99 percent of the model. Can be standardized, 1%.

Dev Rishi

07:36

I think that's a great analogy, yeah. Actually, Databricks just put out a paper today on LoRa fine tuning versus full fine tuning. Those are generally like the two types of domains, and obviously there are different variants. There's LoRa, there's DoRa, there's quantized LoRa, and a few others.

Daniel Reid Cahn

07:49

like when I first saw the lower paper, it's really hard to underestimate. Like it was freaking surprising as hell. Like, I do not think it was like remotely obvious. Like,

Dev Rishi

07:56

I don't think it's like, I don't think it's intuitive. And like the key intuition is used, like, you know, essentially discover a small transformation, a small subset of weights that you can go ahead and apply against like the overall model. And then that is very customized towards the tasks that you're looking to do. But then I think like,

Daniel Reid Cahn

08:09

even in the paper, I'm just saying like, they, they were, you know, they acknowledge at some point, they're like, we know Laura is not going to work for everything, but we haven't yet found a case where it doesn't work.

Dev Rishi

08:17

From our standpoint, I've been really impressed with the broadness, the broad applicability for it. And I think actually the intuition goes towards you. I'll talk a little bit about why I see that broad applicability in a second, but the intuition, I think, is actually very similar to what you said, which is, let's imagine you had, you know, two human brains. 99 percent of the overlap is actually quite similar. A lot of, like, you know, both of us understand the fundamentals of English language.

08:36

We can both see that these mics are black and, you know, that we're sitting on blue chairs, for example. But there are things that make us very unique. It's a very small subset of, like, the actual weights or neurons, like, in our brains.

08:47

And what the paper from Databricks that was released earlier today, I think actually indicated was, Lora fine tuning is not necessarily as good at full fine tuning when you have lots and lots of data, and you want to relearn something extremely specialized, but it also is not as prone to catastrophic forgetting. And what I think that really is, what I think is important there is, The kind of base model knowledge is very important.

09:12

That foundational understanding of English language, the understanding of like these, you don't want to overwrite a lot of those. And that's why when we actually talked to customers a year ago, I talked to some customers that were like, I don't believe in fine tuning, because I fine tuned and my model got way worse. Like I thought, you know, worse than base model performance. Like, and this was a common thing.

09:28

I think it was because the same things we saw with BERT models in 2017 with catastrophic forgetting and others. What we did in Lor Land was you know Lor Land was a project where we essentially picked out 27 different datasets. They represented tasks from legal clause classification to question answering to comment class to content moderation.

09:45

And we fine, we LoRa fine tuned, in fact, we queue LoRa fine tuned just for efficiency quantized LoRa, for across 27 different tasks like Mistral 7 billion. And initially our goal was just to show, hey, Actually, you know, your LoRa fine tuned model is better than your base model. But what we actually saw that was for 25 out of those 27 tasks, it was significantly better than the base model, up to the point where it was kind of matching or exceeding GPT 4 level performance.

10:10

And so I agree with the authors of the QLoRa paper where they said you know, I don't know, I'm sure it's not applicable for everything, but we're not sure what it's not applicable for yet. We found LoRa to be extremely applicable across, you know, many, many different types of tasks that we've benchmarked.

Daniel Reid Cahn

10:22

Yeah, that makes sense. And then I guess the other big benefit is you mentioned, like, the efficiency thing, which is like, if you're doing, if you're offering a base model, You can scale to a lot of GPUs, which means you have like some threshold of the number of requests per minute you can handle across the platform.

10:35

So let's say a company is like, you know, we're going to offer, you know, inference to a thousand companies, each get a thousand requests per second or whatever it is, thousand requests per minute. You know, and therefore we need a million requests per minute to handle. We're going to get those GPUs on all the time, and as long as we get utilization high, we can make money. The problem is, If you're in fine tuning land, you can't do that generally, right?

10:54

Because you would need to have, you know, if you, you need each company to have, let's say, their own fixed thousand requests per second, I guess, in your, in LoRa, what's really cool here is that you can host many different LoRa's on a single machine, right?

Dev Rishi

11:07

That's exactly right, yeah. So I think the origins of Lorax really came in because we wanted to have a free trial version of our product. We wanted anyone to be able to try out fine tuning. And it turned out that wasn't that big of a deal. You know, I could allow, we could even give away GPU access for fine tuning because the models are pretty inexpensive to train. The 27 ones that we benchmarked in Loraland were, on average, 8 or less to fine tune for a given run.

11:31

So they weren't that expensive to be able to give away for free trial access, but what was gonna get a lot more painful was once you fine tune that model, it's I was a lot more concerned about people deploying them to test them.

11:42

Because if every single person who is fine tuning a model had to spin up a new GPU, that was gonna get very kind of painful and restrictive for us, because there's only a limited set of, you know, A100s that I would say the world has, and also, you know, a much smaller subset of that that we have.

Daniel Reid Cahn

11:54

Yeah, yeah.

Dev Rishi

11:55

And so we needed this technology to allow free trial users. This is the origin story for Lorax. We needed a way to be able to allow free trial users to be able to test out their models. Because it probably wouldn't be good enough in a trial to say, Hey, congrats, you fine tuned a model, now pay us in order to be able to go ahead and try out a single inference. So we developed this way where we could host a single base model, call it Mistral 7 billion.

12:16

And then, you know, Daniel could come in with his fine tuned Lora, and I could come in with my fine tuned Lora, and we'd just exchange those Loras to be on top of the same base model. And that was, you know, where Loras came out and a lot of our kind of, I'd say, innovations in the open source have really been how do we make that exchange extremely efficient? Which is

Daniel Reid Cahn

12:34

crazy cool, by the way. I mean, like, it's, I think it's hard to overestimate like the impact of like, typically, I mean, the big thing is about cold start time to me, which is like, if I don't have dedicated compute, and I am a free trial user, or like, At Slingshot, we play with a lot of new techniques constantly. We might train 20 models just with minor differences on some experiment, and we're like, let's run inference in all of them.

12:54

Either we have to, you know, if you're just doing arbitrary compute, the cold start time to load 70 billion parameters into memory can be huge. You know? Exactly. First, allocate the machine and if I wanna make. Ten requests to ten models, then that means that like either I need to get ten machines in parallel, each one taking 15 minutes to spin up. No, no one has that kind of compute, you know, or you go to Predabase and then you're like, Hey, can you load all 10 onto one machine?

13:17

And it's like, sure, I'll do each one. It'll take me half a second.

Dev Rishi

13:19

Yeah, exactly. You know, when we first launched Lorax, we thought one of the use cases would be People who had many, many different fine tunes for many different use cases. So an example, you wanted to fine tune for many different customers. Turns out the earliest use case is just AB testing different versions of the models. And that's obvious to say like in retrospect.

13:34

But usually when you have fine tuned, you didn't just fine tune one model, you've kind of created a workflow, you've tweaked a few different parameters. Or at least

Daniel Reid Cahn

13:40

like for us. multiple checkpoints if we're like, Hey, did more training actually help? All right. I'm being really nice. So let me ask on the harder side, I guess a little bit. There's one other company doing this OpenAI. Are they a competitor?

Dev Rishi

13:52

OpenAI view in two ways. I actually love when people use OpenAI. I think they are like, look, I think they are a competitor. I think that The way that I see it is today the vast majority of workloads that exist on GenAI are on a proprietary model like OpenAI. And I think, you know, open source is a very small percentage. I think that there is absolutely roles for both of these to exist. So I don't think it's like an either or set, solid market. I don't think everything is open source.

14:18

I don't think everything is open AI. I just expect the market, like the overall pie, to grow. And I expect the overall market share of open source models to grow. The reason I expect that market share of open source models to grow, and at Predabase we do all open source LLMs and fine tuning of open source LLMs. The reason I expect that is, I think it's a very natural progression. OpenAI is extremely easy to use. We have to give them a lot of credit for that, right?

14:40

Almost all of our customers started on OpenAI. And they decided to go ahead and fine tune a smaller task specific model for three reasons. Cost, latency, or control. And so when somebody isn't an active user of OpenAI, I almost get a little bit nervous. Because I'm like, well, do you know what your GenAI use cases are going to be? Is there any reason you haven't already experimented at that level?

15:04

Whereas some of our most qualified users are saying things like, I have lots of GenAI I have lots of OpenAI GPT Four load. It's just getting very expensive or very slow at scale, I'm getting throttled on late limits, or I want to go ahead and, or, or, you know, I built my POC in prototype there, but to actually go into production, my company isn't going to allow me to use that external service, and I want to find a unit smaller open source model.

Daniel Reid Cahn

15:25

That actually makes a ton of sense, yeah. I like the framing of like, I want to know that you've used OpenAI.

Dev Rishi

15:31

Yeah, they're a competitor and they're also, I'd say, a gateway drug to the source. I think that's

Daniel Reid Cahn

15:34

interesting. So last week it hasn't been published yet, but my last podcast was with this guy Talfen Evans, a friend of mine at DeepMind who works on pre training of Gemini. And I asked him similarly, like, is there going to be one model to rule them all? And he surprised me by saying no. And he was like, no, I think that companies like OpenAI care only about the very biggest models. I think there will be a space for smaller models.

15:55

But just to ask you, I mean, Do you think there will be one model to rule them all?

Dev Rishi

15:59

No, I don't think the history of machine learning has ever been like that. And I don't see that like really going forward. The main reason I think that there won't be. One model to rule them all is, I think that there will necessarily be a price performance versus quality trade off, like over time. Like, I think you will necessarily see larger models are a little bit better, from like a, you know, quality standpoint in a general setting, and smaller models are cheaper to serve.

16:23

And I think that we see this same experience today, people distill these larger models into smaller tasks like models once they have very specific data sets as well. Just to be

Daniel Reid Cahn

16:30

clear, I think there are like two separate things here. Yeah. The first being like, will there be one general purpose model that's so powerful that we could just prompt it to do everything? Hmm. Which I think is the, is that, that's definitely You know, up for a lot of contention. I think there's also the element of just like performance, like in terms of like, it might be that that happens eventually, but it just is a long time in the future.

16:50

The other question then is just like, what if OpenAI's models are so good that fine tuning them is just so freaking good, even if you took their smallest possible model? What do you think? I mean, they don't have a small model right now. GPT 3. 5 is the smallest.

Dev Rishi

17:04

Yeah.

Daniel Reid Cahn

17:05

Do you think there's a reason why they're not going smaller?

Dev Rishi

17:07

I think, I mean, so I'm just going to speculate it to what OpenAI would say, but I think that's actually pretty aligned. Like, I think their core mission is around, you know essentially building towards AGI. And I think like if that is your core mission, one thing I've learned as a startup is you have to be extremely focused on what you do.

17:22

And I'm not convinced what OpenAI wants to be able to solve, necessarily, is like, how to be able to deploy the most optimized, small, task specific, fine tuned models that can be deployed, you know, as individual instances that people have high degrees of control with inside their VPC, versus, how do we go ahead and solve alignment issues for my single, large, big model?

17:40

So that's why I think OpenAI probably is, like, I think that it's, Not antithetical, but it's a, it's a li it's a detraction from focus on their core mission. I

Daniel Reid Cahn

17:48

think it's interesting 'cause Han said the same thing. Yeah. And he's coming from a, the opposite. He's coming from a big company. Right. You know, literally Gemini. But he had a similar conclusion. I think you're both wrong. I'm like so confused by this. Mm. Like OpenAI had smaller models for a long time. They don't right now. Yeah, but they did. That's one thing.

18:05

And secondly, yes, their focus is on AGI long run, but they've also been very consistent about this foundation model mentality where they want people fine tuning, right? So what I wonder is like if they believe themselves to be a foundation model where people are going to train on top, doesn't that necessarily mean that they, you know, want lots of fine tunes and you know, even if that's I don't know.

Dev Rishi

18:26

I think the piece that I feel like is most critical for this is like the G in AGI. Like my view is OpenAI is going to solve things like in as generalized of a setting as possible. But there are like very specific, like domain specific models that are foundation models as well that people go ahead and like train. Code Llama being like one of the most like popular subsets, right? Yeah,

Daniel Reid Cahn

18:44

but I mean the most popular code model is still AGI. OpenAIs. It's still OpenAI. Yeah. Through GitHub Copilot.

Dev Rishi

18:50

But I think, like, what I, But I think if you had, like, a low latency code classification use case, as an example. I'm just mentioning,

Daniel Reid Cahn

18:55

like, a good example here is, like, CodeX is super low latency. It's their low latency distillation of GPT, basically. Yeah. Right.

Dev Rishi

19:01

Yeah. But I think that like, will OpenAI want to offer smaller distillation versions of their own models?

Daniel Reid Cahn

19:06

They are CodeX, I'm just saying. Like, they're, let's literally CodeX.

Dev Rishi

19:09

Yeah, but I haven't seen this like across like I haven't seen this be a key focus for them, right? Like I like what if I think about like what OpenAI's key business is, Codex exists I think that there are so many other like opens like like for example, I think ServiceNow.

19:23

starcoder You know, I think there's so many of like the code foundation model space is itself so quite like competitive, and I think you're going to start to see this across like every single subset of domain, like medical, legal, you know low latency hardware on device. Maybe OpenAI will go ahead and start to chew in some of this. I think there's always a risk. Anytime you have like a hyperscaler, there's always a question of like, well, why can't they do X, Y, Z different things?

19:44

But I feel like what they're going to be, like, if you OpenAI focused on today, I feel like they're focused on, you know, How do we get to the next level of performance from GPT 4 or GPT 5, for example? And like, how do we

Daniel Reid Cahn

19:55

But then GPT 4. 0, I mean And

Dev Rishi

19:57

how do we get, and how do we get through some of these developer Like, I think they're, what they're most interested in is From a research standpoint, building towards AGI, which is where I see that performance side. And from a business standpoint, building the developer ecosystem. So I think one of the big things about GPT 4. 0 was, how do we stop, like, how do we make it, how do we eliminate some of the frictions we consistently hear from developers about using OpenAI? That's

Daniel Reid Cahn

20:18

fair. And I think that's what they care about. You know, they can do that without it being their central focus, because it's not that far off, but I guess the point here being, if they really wanted to have the best ecosystem to host their product, Lorax is basically, and lots of loras of small models. That could be a very different business model that they might or might not wanna touch.

Dev Rishi

20:33

I think that's true, but I think in that world they'd have to compete with decentralized model essentially. Where today, you know, you have many different institutions that are putting out some of these very domain specific like foundation models. Like Yeah, but I, for code and like, just to mention, so open.

Daniel Reid Cahn

20:48

We have a partnership with OpenAI. We also train our own models. But from what we've heard I believe it was not confidential. They mentioned that they had 150 partners at the time that were partnering to fine tune GPT Four. I think. I could be getting that wrong. I could be. Anyway, but their philosophy when we had spoken to them was very much like we want AGI. We want to cover every use case.

21:07

We could do that through one model to rule them all, but our actual main plan is probably support these foundation models, like have 150 perhaps or whatever, a thousand companies where we have like, you know, maybe specific AI for immunology. startup that builds their immunology model on top of GPT Four. Personally, I think about this so much because it affects my space, but like, I'm so conflicted.

21:28

Like on one hand, are we stuck relying on OpenAI because their models are going to be so freaking good that the only choice is to fine tune them? Or, you know, is OpenAI focused so much on generality, the biggest fish, the general purpose assistance that we should, you know, Not rely on them

Dev Rishi

21:42

empirically. Hasn't it been opening, has been focused on generality though? Like if you think about their subset of usage today, I would like Hazard a guess that 98% of usage, if not more, is probably like G PT 3.5, turbo G, PT four and like the, you know, G PT four. Oh. Maybe like it's very much these general purpose models rather than small task specific like I'm not confident about that.

Daniel Reid Cahn

22:02

I would have to look into it. Yeah. Obviously I don't know, but I think of demos, yes. I mean, you're probably right, 98 percent of usage might even just be demos, but I do think among companies I tend to talk to, I'm not sure. You might be right, but I wouldn't be nearly as confident about the

Dev Rishi

22:19

98%. I would say of OpenAI usage, I think that's true. What I think people do is then build a lot of workflows so that they can then customize those models, or they, you know, graduate to the point where they're using a different open source model.

Daniel Reid Cahn

22:29

Could be I just and then the other thing is we were talking about pricing. Yeah. I When we were talking about GPT 4. Oh, you were not surprised by the huge price decrease

Dev Rishi

22:37

Yeah, Look, I think if you look at the history of like OpenAI's price cuts over the last 18 months, you have seen pretty substantive cuts in price and pretty substantive like improvements in latency. And I think that it's actually very good for the ecosystem that happens, because every time that there is A massive cut in price. You see the next release of like six to twelve open source models over the next six months that are catching up at quality, but a smaller footprint than others.

23:02

And so I, look, I expect GPT Four will get cheaper too, like over the next six months.

Daniel Reid Cahn

23:07

I do think the coolest thing, like the kind of craziest thing here to notice with OpenAI has been, they have been at the forefront. Yeah. But there's also some phenomenon, there's like the classic you know, four minute mile story of like the first time someone runs the first minute mile, it's considered impossible. And then as soon as they do, other people know it's possible. And I feel like GPT 4. 0 was like a huge shock to the ecosystem. The minute it comes out, it becomes possible.

23:30

Similarly with prices. Were you shocked by GP by the audio model, by the way?

Dev Rishi

23:34

I would think I was most surprised that was the low latency. You know, that was the part that was, yeah, I think the idea that they were working on like heavily just like following what they had been doing you know, over the last six months with multimodal video adding in kind of audio towards that, like that part wasn't. so surprising. The piece that I think was actually the most interesting was where they were able to get to like kind of near real time response rates. I actually so

Daniel Reid Cahn

23:58

disagree. I personally, I don't know. I had a, so a couple of weeks back I had Chris Gagne from Hume on the podcast. And I asked him, actually, I don't think it was on the podcast, but I did ask him about when do you think we'll have end to end language models for audio? And I don't remember his exact answer, but he was like, He and I actually fully agreed on this. I really thought we were like far away. I thought two to three years. And why is that? Do you

Dev Rishi

24:19

think it's like a function of the training data? Like model architectures don't well support it or something

Daniel Reid Cahn

24:23

else? The like, I still have no idea how GPT 4. 0 did it, but biggest challenges I'd imagine are training data. There's way more text data out there than audio. Yeah. And then text data is so much cleaner. Like we were talking about with images, like text. has exactly the right signal. Most of the time, text is grammatically correct. It has sentence structures, it has meaning. Think about, like, essays. People thought about it. They, like, wrote the thing they wanted to say.

24:45

It's not just, like, a speech. Audio tends to be more this, like, speech format. It tends to have a lot of artifacts. And then fundamentally, like, architecture wise, you need some way to tokenize the speech and then return back from speech space back, sorry, from token space back to speech space. No one's really built great ways to do that. I think, like, OpenAI, OpenAI. was the closest, right? They built Whisper, which at least was speech to text.

25:07

And they were, as soon as they built it, it was like state of the art, super impressive until other people started building similar things. Do you think

Dev Rishi

25:14

Whisper was, for example, a way to be able to collect data? To be able to start to train kind of an end to end audio model?

Daniel Reid Cahn

25:19

Totally could be.

Dev Rishi

25:20

Like, I actually think that if you consider some of the work that they had done in video generation, you think about Whisper, like, the idea that they got the end to end audio model was actually, for me, a little bit less surprising. I think. The idea that the end to end audio model was able to do, like, generation in such a quick time, like, sub 200 milliseconds, that, I think, was actually really, really, that's the part that I would say, like, I didn't expect.

25:41

If I had to expect what I would see from an audio model, I think I would have expected to see something clunky.

Daniel Reid Cahn

25:45

But I,

Dev Rishi

25:46

I've never

Daniel Reid Cahn

25:46

even seen a high latency audio model. That's why I'm like, even if, personally, if they had released a crazy high latency audio model, I would have had my mind blown.

Dev Rishi

25:54

You've seen some of these startup demos, though, right? Where somebody is, like, chatting with somebody on a phone, like, on calls. Text.

Daniel Reid Cahn

25:58

to speech. The language modeling, language modeling right underneath it is not.

Dev Rishi

26:02

Yeah.

Daniel Reid Cahn

26:02

But I think like the idea is like language model generally is always pre trained on like prediction of next tokens. So conceptually that makes sense because you can have a lot of examples of like question answer message response. Yeah. Whereas with audio, you know, usually most audio, if you take a second of audio in the next second, it'll be the same speaker.

26:18

If you were trying to learn how to have a conversation from this podcast right now, you know, me and you talking, That would be so much harder than taking a transcript of the same thing, you know?

Dev Rishi

26:27

You know, it's interesting, again, I can't speculate on how they made the training data, but what you just said is like traditional modeling is speech to text to speech. Yeah. And it does feel like that actually helps you create some synthetic data sets for doing intent audio modeling. So I'm betting they

Daniel Reid Cahn

26:38

either use some synthetic data or the other possibility, which I think is phenomenal if they were able to pull it off, is learn almost entirely from text and then like 0. 1 percent of training being speech and then somehow get the whole thing to work is I think when you watch the demo It's clearly not perfect like speech generation.

26:54

It's clearly it's something it's clearly like way better than anything We've seen before Yeah but I also by the way give a huge amount of credit that I think they didn't try to like cherry pick the demos and try To show the best possible cases like the demos show things go wrong. They show it via v1. They show all the problems

Dev Rishi

27:10

Did the latency surprise you like it was crazy surprising to very surprising But I

Daniel Reid Cahn

27:13

mean, I'm also surprised that they were able to I mean For a single announcement to be you know, we made our model faster, better, and cheaper, all at once, like, usually you get one, maybe two, but faster, better, and cheaper, I don't want to like, you know, blow too much steam up their ass, but,

Dev Rishi

27:29

I don't know, I was shocked. I do think the better is parts that are getting litigated right now, though, in some of the Better in the sense of

Daniel Reid Cahn

27:36

audio is really what I meant. Okay, in audio that makes sense. Increased capabilities. Yeah,

Dev Rishi

27:39

yeah. One of the things we've been doing is 0 like in the same kind of fine tuning leaderboard that we've had. And so we're going to be releasing those results in a few days, but I think it's like, I think it remains to be seen how much better, for sure.

Daniel Reid Cahn

27:50

Yeah, in terms of like accuracy, obviously okay, I want to ask about fine tuning. Do you think fine tuning is hard?

Dev Rishi

27:57

I think that there's, I don't think it has to be hard, but I think there's practically two things that are hard about it today. Okay. The first is the data. Like, I think that's always been one of the struggles for it. Which is like making sure that you have either a good completions dataset, maybe that's a little bit less challenging, or a good instruction fine tuning supervised dataset. This was like the problem for machine learning in 1990.

28:16

It's like still the problem with fine tuning and machine learning in 2024. Then this, the thing, the part that I think has gotten a lot easier is the tweaking of the algorithms. Used to be the case that like, if you were building like a model, it was like this, Weird arts slash highly experimental setup where you're tweaking every parameter from like learning rate and batch size towards like your regularization lambda to like anything, right?

28:40

You like everything was fair game and you're going ahead and throwing around you're throwing at it with fine tuning the like scope of the number of parameters that you actually need to adjust. I think has gotten to be a lot smaller in order to be able to see kind of meaningfully good results.

Daniel Reid Cahn

28:54

Like what are they?

Dev Rishi

28:55

I think like target modules tends to matter. The lower rank I think also matters, which is like the lower rank will go ahead and correlate towards the size of the adapter and capacity. How many? Yeah, exactly. And then, you know, I think like classic things like the number of app boxes as well as your learning rate that you might wanna work for, which is like not necessarily a hyper parameter as much as it is maybe a business

Daniel Reid Cahn

29:14

What about app size and learning rate?

Dev Rishi

29:15

Oftentimes, I think now you use like automatic batch sizes and automatic learning rate schedulers with fine tuning jobs. At least that's what like we typically do, like in Predabase as a default. We do like out of batch sizing because it's really a function of the hardware that you're on. Size of the model and size of the data sets, input sequences. And so rather than having to do a lot of experimentation, you can do some of that in a warm up phase.

29:33

Yeah. So, I think like, like, I think the tricky part of fine tuning there has gotten less difficult. I think the infrastructure still tends to be a bit of a pain for people. For sure. Like, I've But, I don't know, I, I

Daniel Reid Cahn

29:42

do have to say, like, depending on the model, like, the first time our team fine tuned GPT Four, we diverged. Like, our loss just went up instead of down. Yeah, yeah. And we were looking at that, we were looking at the curve, and we were like, Wait a minute, isn't this curve supposed to go down, like, decrease loss, not up? Yeah. It could, I mean, GPT Four is fine tuned.

29:57

massive so it's particularly tricky, but I mean we do think about person I mean not, not a lot, I think we think about rank, we think about learning rate, we think about batch size, and we think about epochs. Yeah. Epochs you can compare with checkpoints because you just train multiple Right. Exactly. We do find learning rate can make a huge difference. Yes. And a way that you know, a difference that's impossible to really detect, like, it's really hard to know.

30:16

You're like, hmm, this one seems smarter, this one seems nicer, and you're

Dev Rishi

30:19

like,

Daniel Reid Cahn

30:19

that makes no sense because all I did was change the learning rate.

Dev Rishi

30:22

And I should be fair, like, I think learning rate schedules aren't perfect, so I actually do think, like, playing with learning rate is a fundamental factor within, But like, I mean, I remember, like, the biggest difference I think about is like, I remember in early versions of Predabase where you could build these deep, like, where you could build any, like, model. And then you have to think about, like, the whole architecture.

30:37

You have to think about the whole architecture, you have to think about, like, learning rate of subparameters, you have to think about dropouts and, like, regularization rates, you have to think about so many different things. Choosing your activation function and your And that was before you got to a good enough level. Like that was before you even started to see something like, you know, it wasn't like the optimization step. It was like the, I want to see any value step at all.

30:56

Yep. Today, I just chatted with somebody today who like, you know, is, is former colleague of mine can be critical. Like he, he's like honest, I would say about products and it was like, Hey, what was your experience on Predabase? And, you know, he told me a couple of things they thought could be improved, but his main point was. I didn't expect it to be that this easy. I didn't expect it to be yeah.

31:15

And he said, I didn't expect it to like it this much either in terms of like just the fine tuning piece, because now you actually can fine tune. And usually there's a lot of these parameters you want to tweak, but that first bit that you fine tuned in a platform like ours, or if you fine tune kind of, you know, externally that has the right kind of default settings, you're probably going to actually immediately get a lift that Laurel and it was all our basic defaults.

31:33

We didn't optimize the hyper parameters at all, which is kind of crazy. And I think that like, like lack of hyper parameter optimization need. is probably the part of fine tuning that has gotten easy and why fine tuning is no longer very difficult. The parts that are hard are the data and the underlying infrastructure, not the algorithms.

Daniel Reid Cahn

31:49

I mean, I think the data is really freaking hard, though. I saw you guys did some partnership with Gretel.

Dev Rishi

31:53

Yes, yeah, we just did a webinar, I think, a couple days ago in a partnership with Gretel. And I think one of the main motivations there is I think synthetic data is getting better and better kind of around this training workflow and we see a lot of people use it in different ways. So that helps anything that helps the data set creation side of it. I want to be able to be very front and center of because that means more people can start to use us for fine tuning.

Daniel Reid Cahn

32:14

Totally agree. You know, on the data side, I have to say so the challenge for us on the data side, I definitely think like when we, like at Slingshot Our, our biggest challenges on machine learning are definitely data related, but the way that we frame it is basically like. We are trying to achieve, and I think this is true of a lot of AI companies, a task for which no data exists.

32:32

So if you're training a general purpose chat assistant like ChatGPT, or if you're trying to train Hume or whatever kinds of specialized, you know, a legal assistant, those don't exist. Right? There are no legal AIs, right? And so if you're trying to show, like, what would a great answer from, you know, an AI doctor sound like, well, you can look at what doctors say. But doctors say and ha. Doctors make mistakes all the time. Doctors forget to ask the right question.

32:55

They're constrained by the amount of time they can spend. They can't write out long answers because of those constraints. They, you know, et cetera, et cetera. Yeah. And so you're basically like, if I could just find a data set of a billion examples of people going to the doctor, asking a question, getting a perfect answer, right? Boom. All I need to do is walk over to Predabase and train with that data. That would be phenomenal, right? And I'm sure you guys can handle that.

33:15

But where the hell do I get a billion examples? Like, even if I got a billion examples of doctors talking to patients, I still wouldn't have, you know, an AI doctor.

Dev Rishi

33:23

Yeah, I don't have an easy answer here, honestly. And like, this is, again, where I feel like I'm most interested in, like, tools for the data infrastructure side of things to be able to advance. What I will say is we see people maybe not as complicated as like there's no such thing as an AI doctor, but we see people that are like, I don't have label data. And the tricky thing is one way or another you have to bootstrap it.

33:45

One way that we've seen people bootstrap it is okay, the risky way you can bootstrap it is you go to a subset of your traffic and just launch with something subpar and you collect like, you know, like some user signal on kind of what you want to be able to get feedback from. That could be one approach, depending on which industry you're in. One thing we've seen is people actually find GPT 4 quality with some edits in post processing is actually roughly where they might want.

34:07

So maybe the case is like you can get, and this is not like, it's something I've seen like as a repeating pattern. Maybe the case is like GPT 4 can sound close enough to an AI doctor for what you might want. Maybe that's not the case, but like in some cases GPT 4 is at the quality where you'd want. And the real concern is like, I just can't use GPT 4 live in production because like, you know, it's the most, like one of the most expensive models that exist out there. Like it's.

34:27

It's really slow and rate limited and can be like maybe outside the organizational policy. So we've seen people bootstrap with GPT four for data collection, distill that model down into smaller open source model.

Daniel Reid Cahn

34:36

Yeah. I was going to ask, distillation. I mean, I think one, one thing I also want to ask about, so I tend to be very optimistic about AI and also pretty pessimistic just because like AI workloads out there in the world. Like AI you know, still pretty much hype. Like, I'm very excited for AGI. I'm very excited for where we're going. I think technology is freaking phenomenal. I think it's exciting.

34:56

You could really see progress, but most people I don't think should be using AI at work all that much personally. Like, I don't, I don't know about like AI writing, all that kind of stuff. And one thing that gives me a little pause here is that it does seem like a lot of the time, the use cases we're talking about with AI are still kind of the old ones. I know you guys pivoted. So Pre pivot, you were focused on those churn prediction, revenue projections type stuff.

Dev Rishi

35:17

Yeah, I mean pre pivot we were focused on, we're an ML platform, we can help you do any type of model. Which, by the way, is a very broad like value proposition. The types of things people would come with was, yeah, like churn prediction, lifetime value, like these are the types of things people knew to use machine learning on.

Daniel Reid Cahn

35:31

Yeah.

Dev Rishi

35:31

And so we saw a lot of that. So

Daniel Reid Cahn

35:32

I wonder, similarly, like, you know, you were excited about deep learning because you've got it. You're like, Oh my God, this thing is like intelligent. It could do anything. Why would you want to calculate customer lifetime value when you could literally like understand everything about your customer? Isn't that way more interesting? So similarly, I mean, do you think some of the use cases that we're imagining for AI are just limited from this deep learning?

35:51

Same like past point of view of like, you know, before we could do lifetime value prediction. Now we can still do it. But with deep learning, you know, and the analogy being like, I can go through a blog post and find, you know, a list of tags for SEO. And it's like, yeah, yeah. But is that really why we want to build AI?

Dev Rishi

36:08

Yeah, but I'm actually not convinced that's a bad thing. Which is to say like, I think that over a short period of time, like we're so early within the phases, we have to remember that most people didn't really have a good lifetime value return prediction like machine learning model. That's why they were looking for these innovative solutions to be able to do it. I talk to customers still like, you know, that have to do classification tasks or extraction tasks over email.

36:29

The way that this happens today is there is a back office function somewhere that's like, you know, maybe offshore going through a lot of these. So is it? As flashy to say like, you know, what we're going through is blog posts and extracting tasks for SEO. We're going through emails for compliance. We're going through transcripts to be able to do that.

36:46

No, but we have to recognize that the industry is like, we're, we're like, we're, the industry is in early stages of AI adoption full, full stop, whether that's like deep learning, AG or anything else along those lines. And I actually think the biggest thing we can do is start to solve some of these narrow, boring tasks so we can get to the cooler stuff.

37:04

What I actually don't love is when you see like this really interesting flashy demo from a company that's like, here's our bot that's going to teach us how to make more revenue. But it's like, all right, have you figured out how to like even just solve customer support or something else along those lines, you know, these classic use cases within it. So to me, like we were trying to say like, this is going to be a thing in 2021 using an old class of maybe like technologies now.

37:25

Not all of them are bad, like long former, but still actually quite effective models, but like using an older class and trying to show this is the value.

Daniel Reid Cahn

37:31

Yeah.

Dev Rishi

37:31

And now if we can get, you know, and I would say like, maybe 1 percent of organizations got it, right? Like, what were the, what were the number of organizations using deep learning pre GPT Four? Not many. And so what we can do is increase that 1 percent to like 20%, 30%, or 50 percent even doing those tasks. I think that's a massive win over, over the past year. So there's not like the

Daniel Reid Cahn

37:50

boring work that still needs to get automated. Yeah, let's do the boring work. Let's just get it over with.

Dev Rishi

37:54

Yeah, let's do the boring work because that boring work is taking up a lot of time. And like, like I actually, I almost think like it's not necessary, like, do we want to go ahead and skip five steps forward or do we want to go ahead and like build and solve the problems that are sort of, you know, in front of us today? I am actually personally just most excited about.

38:12

For 20 years, people have been talking about being able to automate some parts of these workflows and now we actually see, you know, more than the most advanced companies trying to do that. And I think that's, that's a great place for us to land as an industry, I think, over the next few years.

Daniel Reid Cahn

38:23

I hear that. I was, I was talking to my father in law earlier today. He runs an IT company in the Bay Area. And he was telling me, like, we're finally adopting AI for, like, our ticket understanding. A lot of our, you know, agents get these, like, support tickets and they have to read through the whole thing and they miss the context. And click one button. And AI can, like, read it, understand it, make it so much faster, customers get support faster, our team is happier.

38:43

And I'm like, yeah, that's nice, but, you know, what do you really want from AI? And he was like, I want AI that could solve the ticket,

Dev Rishi

38:49

you know? Yeah, exactly. And do you think that that's too narrow or non creative, or how do you think about that? I

Daniel Reid Cahn

38:54

don't know. No, I mean, I think, like, he's, he's very pragmatic. Yeah. Like obviously similar to like what you're describing, would he want, you know, preta base to be able to actually help him host models that solve his use case? Like of course he wants to solve his problems now he has problems now that AI can solve now. Yeah. But it also seems like if AI right now can go read through a boring ticket and understand it enough to get a human to move from taking 10 minutes to solve it to five.

39:19

Yeah. And then, you know, very soon it moves from 10 to zero. You know, there does seem like. You know, the exciting thing to me, I have to at least be excited from a sci fi point of view about this leapfrogging about the point at which this ticket, you know, the person emails in and they're like, I'm having trouble with signing into zoom. And then the guy says, like, Oh, yeah, here are three things you can try. And like GPT four isn't there yet.

39:38

Like he did experiment with like, what if GPT four wrote the answer? It's just not there yet, right? Yeah, team is smarter. But he also knows, you know, for a lot of those tickets, We can actually fully automate them just with technology that's not quite here yet. That's kind of where I'm, you know.

Dev Rishi

39:52

I think the place that I come from is like I remember getting into, like, democratizing AI in 2016, 2017. And I think that AI has never been underhyped. Like, people talk about the hype cycle for AI in 2020 in 2023 2022. We started Protobase in 21 and I remember thinking, man, we're in machine learning, this is like one of the hottest spaces, like, in the world. And I remember thinking that when I was at Kaggle Kaggle in 2019 and 2018. It's never been underhyped.

40:16

I would say it's consistently underdelivered in economic value outside of the top percentage of companies. And so if the way we get there is we solve some of these narrow use cases, I would love that a lot more than we kind of build the next series of kind of sci fi esque use cases. But, you know, every comp like the, you know, the construction companies are still consistently going through every manual invoice themselves. Like sci

Daniel Reid Cahn

40:43

fi guy? Just curious.

Dev Rishi

40:44

I do like sci fi. Okay. Yeah, yeah, I do like sci fi.

Daniel Reid Cahn

40:46

I I mean, I Look, I think there's,

Dev Rishi

40:48

I like sci fi, but I've spent so much time with the customers that are like, look, AI seems great, but here's my, like, here's the problem that I actually have. And it's the same things that we haven't solved in 2016, 2018, 2020. And now I see the solution like here.

Daniel Reid Cahn

41:01

Although I don't know, I, I go back and forth. Cause I do get it. I do get that. Like, I think we are going to go through some sort of market crash, some bubble popping, some, you know, investors saying, show me the money. What's going on. Yeah. On the other hand, you look at these, AI hype cycles, like we're talking about, and you're like, it's not really a cycle. It's just been hyped. But the truth is, like, it was pretty hyped during the big data era.

41:19

Yeah. Then it got more exciting during the ML era, and then it got more exciting when Transformers came out, and then it got more exciting when GPT came, GPT Four came out. Right. It's not like we've gone through some hype cycle where it was exciting and then not, and then more exciting and then not. It seems like it's only gotten higher and higher.

41:33

And the reason why is because I think like we moved from a place where like AI can automate back office tasks to now AI doctors, like, I also wonder just economically, like if we actually delivered on all the back office tasks, that be nearly enough to account for the investment and the hype and the excitement? Or do we really need AI doctors for, you know, That's a good

Dev Rishi

41:53

question. I feel like we should put a top three consulting or accounting firm on that to figure out like what the costs directly are. The nice thing is once you solve with AI, it's kind of like recurring value, right? And so like whatever you solved in a year, like you basically get the same. It's like the beautiful thing about Sass in some ways. So I imagine over some horizon of time, the answer is probably yes, but it'd be interesting to compare that against investment.

42:12

I mean, I will say like, I'm very excited about the really cool end to end multimodal models that we have. I've seen these end to end physical world models, too. Like, I think there's some amazing areas where AI is gonna go.

Daniel Reid Cahn

42:22

Self driving cars, by the way. Where are all the self driving cars? I just

Dev Rishi

42:25

think that, like, I actually think that the ecosystem is not currently limited in the imagination on that.

Daniel Reid Cahn

42:30

Okay.

Dev Rishi

42:30

And so I think, like, the funding exists for those types of environments. I think that the progress is happening. Maybe it could happen faster. I think the progress is happening kind of in a startup landscape there. And so I'm not so concerned that, you know, the Fortune 500s of the world are probably starting to think a lot more about back office task automation.

42:45

Because I think by the time that they figure that out, I'd love for some of these physical world models to then be like, Okay, and here's how we can actually help you. Like, end to end physical world models? Incredible. Like, I think, you know, what, incredible in like the line of thinking. Do I think there'll be something someone could use there in six months? No, and I don't want AI to be a disappointment just because, you know, they haven't they haven't quite gotten there yet.

Daniel Reid Cahn

43:04

Yeah. I mean, I also think like, like the reason why I'm excited about Predabase is because I think there are a lot of sci fi models that we can actually deliver on, hopefully on Predabase, meaning like that we can get to that point of like, you know, not just we were able to do some like tag identification, but you know, there are probably a lot of really high impact things that can be delivered with fine tuning.

43:22

Totally. I think You know, I do think there has been some lack of creativity, personally, I think there has been some, like, I hear way too often when I talk to people about AI, you know, this, this idea, like you said, about the 98 percent of compute of GPT Four requests being on the base model. Most people I talk to assume, yeah, that AI is just base models.

43:40

I hear a ton from AI engineers about, you know, I think rag has sort of fallen a bit, but for a while that was like, rag, rag, rag, don't ever fine tune. And I'm wondering, like, is it just because fine tuning is hard? You know, were people not targeting hard enough use cases to bother with fine tuning? Would Predabase make more money if people tried more hard shit on Predabase?

Dev Rishi

43:57

I actually just think that people think fine tuning is hard. I don't think it actually is hard. I think it's hard because the data

Daniel Reid Cahn

44:02

is what I mean to say. The data is, yeah. Like, if I said, I can call GPT Four to anonymize some text versus I can fine tune a model to anonymize some text. Calling GPT Four is so easy. It is so insanely easy. For the latter, maybe I have to talk to Gretel, create some synthetic data, train a llama model on Predabase. Like, that could take me, real effort and thought.

Dev Rishi

44:21

For sure, but I think that's actually why no one starts off fine tuning a model. They always start off like with GPT Four, you know, or GPT 3. 5, and then the base open source model, and then fine tuning is kind of next in that progression life cycle. And I am actively thinking about ways we can move fine tuning up closer to the beginning or at least something there like that data prep requirement becomes less painful for the user, kind of on their side.

44:45

But I think the fact that this progression exists. is it's like very logical. And to me, like some of the limitation in thinking is just. We have like this cascade of people that are going through these steps still and I feel like we've just started to see the fine tuning wave Starting right now. Like it's like the very early days of like where people are like fine tuning actually makes sense

Daniel Reid Cahn

45:06

Yeah,

Dev Rishi

45:06

and like I think I think like launches like, you know, like the Databricks paper on like Laura fine tuning and it's just full fine tuning I think like Laurel and I think all these things help advance that ecosystem Also just

Daniel Reid Cahn

45:14

like this shout out to Harvey fine tuning 10 billion tokens for legals. Yes their whole point is about fine tuning. They are fine tuning. Everyone likes to make these distinctions. That is fine tuning. What they're doing is fine tuning and that's phenomenal because they're trying to take on an insanely hard use case by getting 10 billion tokens of data. That's not easy. Exactly. I feel like again, not the training part.

45:33

I think training, we could debate how hard it is, but the point is the data is definitely the hard part.

Dev Rishi

45:37

Yeah. I feel like a year and a half, the thing was like pre training and that wave actually didn't really make a lot of sense to me because it's like, the vast majority of organizations probably don't need to pre train. Some definitely do, the vast majority definitely know, but fine tuning I think makes all the sense in the world to me.

45:51

I feel like we're in the very early like eight eras of that and so it's not like it's, I think we're gonna see a lot more creativity in terms of the ideas that got brought on. The direction I'm most interested in the sci fi like creativity is right now people still think about fine tuning as like one per use case style. Like I'm going to fine tune a model to do this.

46:07

What I'm really interested in is like we have many diff like we have the ability to fine tune many models very easily, very cheaply, create these adapters, and now with Lorax you can serve many of these adapters very cheaply. So what does it look like if you didn't have like one fine tune, but you had a hundred fine tunes, or four hundred fine tunes, that all do slightly different things? maybe, but can be served just as easily as a single base model.

46:29

And then what you really have is, you know what, I think some of these mixture of expert models have exploited, which is the ability to understand what part of a model architecture you actually want to be able to use to answer a question.

Daniel Reid Cahn

46:38

So can you imagine for me, like even if it were 10, like how would you use? 10 Lauras for a use case.

Dev Rishi

46:45

Yeah. So I think that there's two ways that I can think about it, but let's just imagine that you were in that customer. Like, let's imagine you just wanted a customer service bot, right? Like customer service is actually many different tasks, right? There's like, hey, what is the triage level and priority for this? What is the routing area that it should go to? How do I write a response back to this user?

47:02

You know if they want to cancel their subscription, how do I, it's like a lot of these different types of tasks. Right now I think that your two options are you like trying to fine tune models per task and then like deploy all those and figure out like a orchestration for them all that lives in business logic, which is like, hey, first do this step and do this step or that would

Daniel Reid Cahn

47:19

be like first class by the email. Okay, it looks like a cancellation email. Yeah, run the cancellation bot and then run the response bot.

Dev Rishi

47:24

Yeah, yeah. And then like, obviously, there's like maybe 16 other steps that go.

Daniel Reid Cahn

47:28

And then the problem is that the person says like, Okay. Cancel my order, but only if it's not able to arrive by Labor Day. 'cause I'm not in town by Labor on Labor Day. Exactly. But if you're able to tell me that it's gonna be the day before Labor Day, then I can tell my neighbor to come by the house and pick it up. Yeah. Yeah. And they're just like, oh shit, what do we do? And then

Dev Rishi

47:42

they're gonna have, and then they're gonna say something like, oh, by the way, like what was the what, what delivery mechanism is gonna, is it gonna be like FedEx or it gonna be dropped off my door? Like they have all these like interper questions, right. And so I think solving a use case like that. It weirdly reminds me of building old school assistants where like, I worked at the Google Assistant for a little while too, right?

47:58

And the way you used to do it would be basically you map everything to an intent. You map intents to like fulfillment logic and slot filling. And it kind of reminds me of that. You have to like build a deterministic logic of like fine tuned model X to this, fine tuned model Y to this. So that

Daniel Reid Cahn

48:10

was like someone says, turn the light off, turn the lights off. And then it's like, okay, this person's trying to like change the status of the lights. Which way? Oh, off. Off is one of the on off options. Yeah.

Dev Rishi

48:18

That's the way we built a lot of AI systems historically, right? Conversational AI, in particular. But I think what would be really interesting is if you had specialized adapters that were well trained for all of these different tasks, served off top of a single base model, and kind of a router logic, like a routing layer, that understood for a given question that the user is trying to do, which specialized model should I go ahead and hand this off to? So

Daniel Reid Cahn

48:40

I'd love to dive deeper there, I don't, I know we're running out of time. I want to ask just some last questions. Curious, first AGI. Bye. Do you have plans for AGI?

Dev Rishi

48:52

Do I have plans for AGI? Retire early, I think? I'm not sure actually, so With AGI At Predabase, we build like infrastructure and tooling, so I would love for people to get closer and closer towards AGI building fine tuned LLMs, maybe using this routing approach that I was talking about. I feel like what I'm still missing is a really good, crisp definition for AGI. If it's just the Turing test, I feel like we're probably, you know, around that right now.

Daniel Reid Cahn

49:18

Contentious issue. Contentious podcast on, are we, Like, how close are we to passing it? Well, is the Turing test outdated? But I, I,

Dev Rishi

49:26

Whether it's outdated or not, I wonder if, like, I think The Turing test especially is like a measure maybe of like, you know, I would say like, it's some bar around intelligence.

Daniel Reid Cahn

49:34

Sure.

Dev Rishi

49:35

I feel like we're hovering around it, whether we've passed it, whether we're not. No one talks to GPT Four and says, This is, you know, the furthest thing I've ever heard from a human. Like, there is some kind of criteria where they're close, but Let

Daniel Reid Cahn

49:46

me just, just for your sake for now, let's stick with one definition, which is like, work that can be done by a human remotely, can be done by an AI.

Dev Rishi

49:54

Yeah, I think we're very, I think in some, like, it depends on the work, but I think we're very close to that. Can your job be done

Daniel Reid Cahn

49:58

remotely?

Dev Rishi

49:59

No, but I wish.

Daniel Reid Cahn

50:01

I don't know, I, I, I don't know, I, I think this is like my, my father in law, you know, his type company that, you know, he'd be, you know, He'd be very much affected, but I, like, the other way to look at it would be the point at which humans become useless. Like, where you're like, hey, I would love to solve world hunger and then you have this AI, like, that's cute, you know? Right. Like, leave it to me, because I'm smarter and faster, and

Dev Rishi

50:20

I think there's, like, the way that the debate usually breaks up is, like, one, that would be terrible and there'd be mass unemployment in others, and then the second being, humans will find the next series of things that they want to go ahead and spend time doing. Similar to, like, Industrial Revolution, I'm no longer spending time on agriculture, what do I spend time on?

Daniel Reid Cahn

50:35

Yeah.

Dev Rishi

50:35

And I probably put myself a little bit more in the latter bucket, in terms of like, I think that if we end up like getting to the area where AGI establishes some base level of productivity for the overall economy, then I think people will choose different ways to spend their time. Some of them will be productive and expand the Pareto Frontier, some of them will be for leisure, and I think both of those are good things.

Daniel Reid Cahn

50:54

But you're gonna go for the latter. You're gonna retire?

Dev Rishi

50:57

We'll see. We'll see how everything goes with, like, the next with this with Predabase. But you know, I think that I would love to be able to have a I

Daniel Reid Cahn

51:03

heard Sam Altman say at some point, I don't know if he was being facetious, but he was like, I'll just have a lot of kids.

Dev Rishi

51:09

If that was the flower case, I mean, I think if I got to the point where like AGI was able to do, like, let's say 70 percent of my role.

Daniel Reid Cahn

51:15

Yeah.

Dev Rishi

51:16

I don't think I'd backfill 70 percent of it would just work. I think I'd like mix in some work life balance in there. More work life balance than I'd say I have today.

Daniel Reid Cahn

51:23

Very nice. And then just in terms of like resources for keeping up with AI, it sounds like you have some, if you had like an ML engineer looking for, Stuff to check out, things to read.

Dev Rishi

51:32

Yeah I mean, like, there's some obvious things. Like, I think Andrew Wing's been putting together some great, like, short form courses on YouTube with deep learning AI. We did one on efficient serving for LLMs. We have a fine tuned LLM newsletter. And I hate to say it, but I actually think I get a lot of my LLM news on Twitter still.

Daniel Reid Cahn

51:48

And

Dev Rishi

51:48

it's, like, weirdly actually something where I, like, feel like I see, like, probably everyone I follow right now is an ML influencer of some sort or the other. But The space is moving so quickly, I feel like things just get, like, organically shared on social, so. Other newspapers I read newsletters that include The Sequence and a few others but, I would say places to start would probably be, check out our fine tuned newsletter like, on top of Predabase. Great name. Yeah, exactly.

52:13

We have shirts that say the future is fine tuned, so we just called our newsletter. I

Daniel Reid Cahn

52:17

love that. Yeah. That is such a great idea to get a t shirt. We'll get

Dev Rishi

52:20

you a shirt. Yeah, we'll make sure that you have one.

Daniel Reid Cahn

52:22

Alright, well this was awesome, Dev. Thanks so much for joining us.

Dev Rishi

52:24

Yeah, you got it, Daniel. Thanks for having me.

Daniel Reid Cahn

52:26

Awesome.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript