Low-Rank Adapters Meet Neural Architecture Search for LLM Compression - podcast episode cover

Low-Rank Adapters Meet Neural Architecture Search for LLM Compression

Jan 30, 2025•22 min•Ep. 444
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

🤗 Upvotes: 5 | cs.LG, cs.AI, cs.CL

Authors:
J. Pablo Muñoz, Jinjie Yuan, Nilesh Jain

Title:
Low-Rank Adapters Meet Neural Architecture Search for LLM Compression

Arxiv:
http://arxiv.org/abs/2501.16372v1

Abstract:
The rapid expansion of Large Language Models (LLMs) has posed significant challenges regarding the computational resources required for fine-tuning and deployment. Recent advancements in low-rank adapters have demonstrated their efficacy in parameter-efficient fine-tuning (PEFT) of these models. This retrospective paper comprehensively discusses innovative approaches that synergize low-rank representations with Neural Architecture Search (NAS) techniques, particularly weight-sharing super-networks. Robust solutions for compressing and fine-tuning large pre-trained models are developed by integrating these methodologies. Our analysis highlights the potential of these combined strategies to democratize the use of LLMs, making them more accessible for deployment in resource-constrained environments. The resulting models exhibit reduced memory footprints and faster inference times, paving the way for more practical and scalable applications of LLMs. Models and code are available at https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning.

For the best experience, listen in Metacast app for iOS or Android