From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

Best AI papers explained

May 23, 2025•32 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This academic paper explores various methods for improving the text generated by large language models (LLMs) after they have been trained, focusing on inference-time algorithms. It categorizes these techniques into three core areas: token-level generation algorithms that operate on individual tokens, meta-generation algorithms which structure multiple generation steps, and strategies for efficient generation concerning both token cost and speed. The work formalizes the objectives of different generation approaches and discusses how to incorporate external information, such as other models or tools, to enhance output quality. The authors also analyze the cost-performance tradeoffs of these algorithms and highlight future research directions.

For the best experience, listen in Metacast app for iOS or Android