Data Selection for Empirical Risk Minimization

Best AI papers explained

Apr 26, 2025•34 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This paper shifts the focus in learning theory from algorithms to data, investigating how to optimally select small subsets of training data that allow standard learning rules, specifically empirical risk minimizers, to achieve performance comparable to using the entire dataset. The authors establish theoretical bounds on the size of such subsets for various learning problems, including mean estimation, linear classification, and linear regression, and they explore these limits under different conditions, such as weighted data selection and the continuity of the learning rule. The work also presents a taxonomy of error rates achievable through data selection for general binary classification tasks, connecting these rates to fundamental concepts in learning theory like VC dimension and star number.

For the best experience, listen in Metacast app for iOS or Android