180: Reinforcement Learning - podcast episode cover

180: Reinforcement Learning

Mar 17, 20251 hr 52 minEp. 180
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Intro topic: Grills

News/Links:

Book of the Show


Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h


Tool of the Show

  • Patrick: 
    • Pokemon Sword and Shield
  • Jason: 

Topic: Reinforcement Learning

  • Three types of AI
    • Supervised Learning
    • Unsupervised Learning
    • Reinforcement Learning
  • Online vs Offline RL
  • Optimization algorithms
    • Value optimization
      • SARSA
      • Q-Learning
    • Policy optimization
      • Policy Gradients
      • Actor-Critic
      • Proximal Policy Optimization
  • Value vs Policy Optimization
    • Value optimization is more intuitive (Value loss)
    • Policy optimization is less intuitive at first (policy gradients)
    • Converting values to policies in deep learning is difficult
  • Imitation Learning
    • Supervised policy learning
    • Often used to bootstrap reinforcement learning
  • Policy Evaluation
    • Propensity scoring versus model-based
  • Challenges to training RL model
    • Two optimization loops
      • Collecting feedback vs updating the model
    • Difficult optimization target
      • Policy evaluation
  • RLHF &  GRPO

★ Support this podcast on Patreon ★
For the best experience, listen in Metacast app for iOS or Android