FisherSFT: Data-Efficient Supervised Fine-Tuning of Language Models Using Information Gain

Best AI papers explained

May 25, 2025•14 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This paper introduces FisherSFT, a method for making supervised fine-tuning (SFT) of large language models (LLMs) more data-efficient by selecting the most informative training examples. The key concept is to choose examples that maximize information gain, which is approximated by evaluating the Hessian of the LLM's log-likelihood. This approach uses a computationally efficient approximation based on linearizing the LLM's last layer and employs a greedy algorithm to select sentences with the highest information gain. The authors provide a theoretical analysis bounding the prediction error and empirical results demonstrating FisherSFT's superiority over baseline sampling methods on synthetic and real-world datasets, including GPT-2.

For the best experience, listen in Metacast app for iOS or Android