LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization - podcast episode cover

LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization

Nov 22, 202513 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This paper introduces the **Prompt Duel Optimizer (PDO)**, a novel, sample-efficient framework for **label-free prompt optimization** in large language models (LLMs). Recognizing that LLM performance is highly sensitive to input prompts and that collecting ground-truth labels is costly, PDO frames the optimization challenge as a **dueling bandit problem** where an LLM acts as a judge, providing noisy but usable **pairwise preference feedback**. PDO's effectiveness stems from two core components: **Double Thompson Sampling (D-TS)**, which intelligently prioritizes which prompt pairs to compare for efficient selection, and **Top-Performer Guided Mutation**, which periodically expands the candidate pool by generating variations of the best-performing prompts. Experimental results on datasets like BIG-bench Hard (BBH) and MS MARCO demonstrate that PDO consistently outperforms label-free baselines and can effectively mitigate judge noise by incorporating a small fraction of real labels when available.

For the best experience, listen in Metacast app for iOS or Android