ShiQ: Bringing back Bellman to LLMs - podcast episode cover

ShiQ: Bringing back Bellman to LLMs

May 22, 202518 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description


This paper introduces ShiQ, a novel offline reinforcement learning algorithm designed for fine-tuning large language models (LLMs) by adapting traditional Q-learning methods. The authors address the challenges of applying Q-learning to LLMs, such as computational cost and initialization issues, by deriving theoretically grounded loss functions from Bellman equations. ShiQ enables off-policy, token-wise learning and is evaluated on various benchmarks, including multi-turn settings, where it demonstrates effectiveness compared to existing methods like DPO and CoPG. The paper details the theoretical basis of ShiQ and includes empirical results from both synthetic and real-world datasets.

For the best experience, listen in Metacast app for iOS or Android