Sleep-time Compute: Beyond Inference Scaling at Test-time

Best AI papers explained

May 22, 2025•12 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This academic paper explores "sleep-time compute" for large language models (LLMs), a concept where models process information from a given context while idle, anticipating potential future queries. The authors introduce Stateful GSM-Symbolic and Stateful AIME, datasets created by splitting existing reasoning problems into context and questions to test this approach. Their experiments show that sleep-time compute significantly reduces the need for test-time compute to achieve similar accuracy, offering a more efficient inference process. Furthermore, by preparing for multiple related questions about the same context, sleep-time compute can lower the average cost per query. The paper concludes that sleep-time compute is most effective when queries are predictable from the provided context.

For the best experience, listen in Metacast app for iOS or Android