A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning

Best AI papers explained

Jun 02, 2025•26 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This academic paper proposes a local data attribution framework for online reinforcement learning (RL). The framework uses influence functions to identify which training data records negatively impact the RL agent's learning within each training round. By filtering out these harmful records, the proposed method, called Influence-guided Intervention and Filtering (IIF), demonstrates improved performance and sample efficiency in standard RL tasks and also shows promise in reducing toxicity in Reinforcement Learning from Human Feedback (RLHF) for large language models. The paper analyzes the characteristics of influential records and the impact of different filtering levels on learning.

For the best experience, listen in Metacast app for iOS or Android