Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models - podcast episode cover

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

Jan 16, 202614 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

The researchers introduce Engram, a novel conditional memory module that enhances Large Language Models by integrating a scalable lookup mechanism for static knowledge. While modern models rely on Mixture-of-Experts (MoE) for sparse computation, Engram uses N-gram embeddings to retrieve formulaic or factual information in constant time. This architectural shift creates a U-shaped scaling law that balances neural processing with static memory, allowing the model to offload simple retrieval tasks to early layers. By delegating local patterns to these lookups, the transformer's attention capacity is preserved for complex reasoning and long-context processing. Experiments show that an Engram-augmented 27B model significantly outperforms standard MoE baselines in math, coding, and general reasoning. Furthermore, the system supports offloading massive parameter tables to host memory, ensuring high efficiency with minimal computational overhead.

For the best experience, listen in Metacast app for iOS or Android