Black-Box On-Policy Distillation of Large Language Models

Best AI papers explained

Nov 20, 2025•14 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This paper introduces a novel technique called **Generative Adversarial Distillation (GAD)** for knowledge transfer from a large, proprietary teacher language model (LLM), such as GPT-5-Chat, to a smaller student LLM in a **black-box setting**. Black-box distillation is necessary when the student only has access to the teacher’s final text outputs, not its internal parameters or probabilities. GAD frames the distillation process as a **minimax game** similar to a Generative Adversarial Network (GAN), where the student acts as a generator and an adaptive discriminator learns to distinguish the student’s outputs from the teacher’s, providing **on-policy feedback** without relying on likelihood-based objectives. Experimental results confirm that GAD **consistently outperforms** traditional sequence-level knowledge distillation (SeqKD) across various benchmarks, especially in terms of out-of-distribution generalization. The research validates GAD as a stable and effective method for extracting knowledge from closed-source LLMs by treating the discriminator as a continually evolving reward model.

For the best experience, listen in Metacast app for iOS or Android