LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization

Best AI papers explained

Nov 22, 2025•13 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This paper introduces the **Prompt Duel Optimizer (PDO)**, a novel, sample-efficient framework for **label-free prompt optimization** in large language models (LLMs). Recognizing that LLM performance is highly sensitive to input prompts and that collecting ground-truth labels is costly, PDO frames the optimization challenge as a **dueling bandit problem** where an LLM acts as a judge, providing noisy but usable **pairwise preference feedback**. PDO's effectiveness stems from two core components: **Double Thompson Sampling (D-TS)**, which intelligently prioritizes which prompt pairs to compare for efficient selection, and **Top-Performer Guided Mutation**, which periodically expands the candidate pool by generating variations of the best-performing prompts. Experimental results on datasets like BIG-bench Hard (BBH) and MS MARCO demonstrate that PDO consistently outperforms label-free baselines and can effectively mitigate judge noise by incorporating a small fraction of real labels when available.

For the best experience, listen in Metacast app for iOS or Android