Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs

Best AI papers explained

Oct 09, 2025•16 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This paper assesses how well Large Language Models (LLMs) can infer, remember, and follow user preferences in long, multi-session conversations. The evaluation of 10 different LLMs using this benchmark revealed that current state-of-the-art models exhibit significant difficulty proactively following user preferences, with accuracy dropping below 10% in zero-shot settings within a short number of turns. The researchers conclude that while fine-tuning on PrefEval can improve results, the benchmark demonstrates LLMs still face challenges in personalized conversational abilities.

For the best experience, listen in Metacast app for iOS or Android