Diffusion Guidance Is a Controllable Policy Improvement Operator - podcast episode cover

Diffusion Guidance Is a Controllable Policy Improvement Operator

Jun 02, 202517 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This document introduces CFGRL, a novel framework that bridges generative modeling, specifically diffusion guidance, and reinforcement learning. The core idea is to treat policy improvement as guiding a diffusion model, allowing for simple training akin to supervised learning while still enabling performance beyond the initial dataset. CFGRL can improve policies by combining a reference policy with an "optimality" distribution, and crucially, the degree of this improvement can be controlled during testing without retraining through a guidance weight. The paper demonstrates CFGRL's effectiveness in offline reinforcement learning and as an enhancement to goal-conditioned behavioral cloning, consistently outperforming baselines in various tasks. A key advantage highlighted is CFGRL's ability to achieve policy improvement without necessarily learning an explicit value function.

For the best experience, listen in Metacast app for iOS or Android