#80- Layer pruning and Mixture of Depths. - podcast episode cover

#80- Layer pruning and Mixture of Depths.

Apr 18, 202414 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Hey guys, continuing the series of episodes about PEFT, in this episode I talk about inference optimization techniques for LLMs.


I talk about layer pruning, where we prune consecutive layers of the LLM without almost not losing model performance.


I also talk about Mixture of Depths, a similar technique to Mixture of Experts, where we have a router that choses which tokens will be processed in which layer of the LLM.


Paper MoD: ⁠https://arxiv.org/pdf/2404.02258.pdf⁠

Paper layer pruning: ⁠https://arxiv.org/pdf/2403.17887v1.pdf⁠

Instagram of the podcast: https://www.instagram.com/podcast.lifewithai

Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai

For the best experience, listen in Metacast app for iOS or Android