Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator

Best AI papers explained

May 27, 2025•14 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This paper introduces Disagreement-Aware Confidence Alignment (DACA), an unsupervised method for calibrating the confidence of post-trained large language models (PoLMs). While pre-trained language models (PLMs) are typically well-calibrated, post-training can lead to over-confidence, especially with limited labeled data. DACA addresses this by leveraging the well-calibrated confidence of PLMs on unlabeled data, specifically by optimizing calibration parameters only on examples where PLM and PoLM predictions agree. This process avoids the negative impact of prediction disagreement on calibration, resulting in more accurate confidence scores for PoLMs, which is shown to improve performance on various benchmarks and model sizes, including for open-ended question answering and selective classification.

For the best experience, listen in Metacast app for iOS or Android