Why I Believe in SOTA Models Over Custom Ones

00:00

I'm not completely sure I'm right about this, but I've never been a big believer in training custom models. I've also never believed in fine tuning going all the way back to 2023. My intuition has always pushed me towards the best state of the art model possible, combined with context management. I just finally crystallized my reasoning around this.

00:21

Anytime you think you're using a small model for a small task, there's usually a whole lot more going into a given decision than just that individual area of expertise. For example, labeling emails, writing reports, processing security events, searching for threats on a network. On one hand, I think these are specialized, but the fact is, the smarter and more experienced a human is who has this expertise, the

00:46

better job they're going to do. This is because most specialized tasks still benefit from the general life experience of the person doing the execution. This is why I think the future is not a whole bunch of extremely small, specialized models throughout the enterprise. I think what's far more likely is more of an opus, sonnet haiku model, where the best of the best just keeps coming down in price,

01:10

including going into open source. And those smaller models are used in conjunction with context to perform all the different tasks in an organization at much lower cost. But I think they'll still be extremely general models, not tiny and narrow custom ones. I think the Tldr here is when you think you're doing a narrow task, that narrow task is actually benefiting from a ton of general experience. And I think this applies to humans, and I think it

01:38

also applies to models. I'm not completely convinced of this. I'm about 70% sure. But yeah, I think this is the way it's going to go.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript