Lessons from reinforcement learning from human feedback | Stephen Casper | EAG Boston 23 - podcast episode cover

Lessons from reinforcement learning from human feedback | Stephen Casper | EAG Boston 23

Nov 23, 202356 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Reinforcement Learning from Human Feedback (RLHF) has emerged as the central alignment technique used to finetune state-of-the-art systems such as GPT-4, Claude-2, Bard, and Llama-2. However, RLHF has a number of known problems, and these models have exhibited some troubling alignment failures. How did we get here? What lessons should we learn? And what does it mean for the next generation of AI systems? Stephen is a third year Computer Science Ph.D student at MIT in in the Algorithmic Alignment Group advised by Dylan Hadfield-Menell. Formerly, he has worked with the Harvard Kreiman Lab and the Center for Human-Compatible AI. His main focus is on interpreting, diagnosing, debugging, and auditing deep learning systems.

For the best experience, listen in Metacast app for iOS or Android