Sleep-time Compute: Beyond Inference Scaling at Test-time - podcast episode cover

Sleep-time Compute: Beyond Inference Scaling at Test-time

May 22, 202512 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This academic paper explores "sleep-time compute" for large language models (LLMs), a concept where models process information from a given context while idle, anticipating potential future queries. The authors introduce Stateful GSM-Symbolic and Stateful AIME, datasets created by splitting existing reasoning problems into context and questions to test this approach. Their experiments show that sleep-time compute significantly reduces the need for test-time compute to achieve similar accuracy, offering a more efficient inference process. Furthermore, by preparing for multiple related questions about the same context, sleep-time compute can lower the average cost per query. The paper concludes that sleep-time compute is most effective when queries are predictable from the provided context.

For the best experience, listen in Metacast app for iOS or Android