NoWag: Unified Compression for Large Language Models

Best AI papers explained

Apr 26, 2025•18 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

We discuss NoWag, a novel framework for compressing large language models (LLMs) while preserving their structure. This unified approach, encompassing both pruning (removing less important connections) and vector quantization (grouping and reducing the precision of weights), uses a normalization technique guided by weight and activation data. Experiments on Llama models demonstrate that NoWag significantly outperforms existing state-of-the-art zero-shot quantization methods with less data and achieves competitive results in pruning, suggesting a shared underlying principle for effective LLM compression.

For the best experience, listen in Metacast app for iOS or Android