Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF - podcast episode cover

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

Oct 09, 202517 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This paper investigate two major drawbacks in the reward learning phase of RLHF: reward overfitting and reward overoptimization, which often occur because the standard cross-entropy loss is inadequate for imbalanced preference datasets. To address these issues, the paper introduces a novel algorithm called Iterative Data Smoothing (IDS), which mitigates these problems by iteratively updating hard comparison labels with softer, model-predicted labels during training. Theoretical analysis and empirical results in both multi-armed bandit and neural network settings demonstrate that IDS outperforms traditional Maximum Likelihood Estimation (MLE), offering a more robust approach to reward training.

For the best experience, listen in Metacast app for iOS or Android