Value Flows: Flow-Based Distributional Reinforcement Learning

Best AI papers explained

Oct 14, 2025•16 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This paper introduces "Value Flows," a novel reinforcement learning algorithm that uses flow-based models to estimate the full future return distribution, instead of flattening it to a single scalar value like traditional methods. This approach is designed to provide richer learning signals and better estimations of aleatoric uncertainty (return variance), which is then used to prioritize learning on uncertain transitions. The abstract and text detail how a new flow-matching objective is formulated to satisfy the distributional Bellman equation, while accompanying images illustrate this concept with a violin plot of return distributions and screenshots of a robotic manipulation task used for evaluation. Experiments demonstrate that Value Flows significantly outperforms prior offline and online-to-online RL methods across various tasks by achieving a 1.3× improvement in success rates and a lower distributional discrepancy.

For the best experience, listen in Metacast app for iOS or Android