Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward

Best AI papers explained

Oct 04, 2025•14 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This paper introduces CURIO (Curiosity-driven User-modeling Reward as an Intrinsic Objective), a novel framework for enhancing personalized multi-turn dialogue in large language models (LLMs). This research addresses the limitations of conventional methods like Reinforcement Learning from Human Feedback (RLHF), which often fail to personalize interactions dynamically for individual users. CURIO integrates a curiosity-based intrinsic reward derived from a user model, encouraging the LLM agent to actively infer user traits and preferences throughout the conversation to improve its user model's accuracy. By formulating personalized dialogue as a Partially Observable Markov Decision Process (POMDP) and connecting the intrinsic reward to Potential-based Reward Shaping (PBRS) theory, the authors demonstrate that CURIO significantly improves personalization performance and generalization in tasks such as conversational recommendations and educational dialogues. The overall goal is to create more adaptive and engaging conversational agents by training them to learn about the user during the interaction.

For the best experience, listen in Metacast app for iOS or Android