Understanding Best-of-N Language Model Alignment - podcast episode cover

Understanding Best-of-N Language Model Alignment

May 25, 202514 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This paper critically examines the best-of-n policy, a common method for aligning generative language models by selecting the highest-reward sample from $n$ options drawn from a reference policy. It disproves a widely-used analytical formula for the KL divergence between the best-of-n policy and the reference, proving that the formula is only an upper bound. The authors analyze the conditions under which this bound is tight or loose and propose a new, more accurate estimator for the KL divergence. Additionally, they analyze the win rate of the best-of-n policy against the reference, providing both upper and lower bounds, and compare best-of-n to another rejection sampling method, rewind-and-repeat, showing best-of-n's superior trade-offs between win rate and KL divergence.

For the best experience, listen in Metacast app for iOS or Android