Value Flows: Flow-Based Distributional Reinforcement Learning - podcast episode cover

Value Flows: Flow-Based Distributional Reinforcement Learning

Oct 14, 202516 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This paper introduces "Value Flows," a novel reinforcement learning algorithm that uses flow-based models to estimate the full future return distribution, instead of flattening it to a single scalar value like traditional methods. This approach is designed to provide richer learning signals and better estimations of aleatoric uncertainty (return variance), which is then used to prioritize learning on uncertain transitions. The abstract and text detail how a new flow-matching objective is formulated to satisfy the distributional Bellman equation, while accompanying images illustrate this concept with a violin plot of return distributions and screenshots of a robotic manipulation task used for evaluation. Experiments demonstrate that Value Flows significantly outperforms prior offline and online-to-online RL methods across various tasks by achieving a 1.3× improvement in success rates and a lower distributional discrepancy.

For the best experience, listen in Metacast app for iOS or Android