Learning and Equilibrium with Ranking Feedback

Best AI papers explained

Apr 27, 2025•18 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This paper introduces a novel model for online learning and equilibrium computation where feedback is in the form of ranked actions, contrasting with traditional numeric feedback. The authors investigate the possibility of achieving sublinear regret under different ranking models: based on either instantaneous utility or time-average utility, in both full-information and bandit feedback settings. They demonstrate limitations in achieving sublinear regret under certain conditions and propose new algorithms that can achieve it with additional assumptions, notably showing that approximate coarse correlated equilibria can be found in normal-form games when players use these algorithms with time-average utility ranking. Finally, the paper includes numerical experiments to validate the proposed algorithms.

For the best experience, listen in Metacast app for iOS or Android