Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision

Best AI papers explained

Sep 27, 2025•16 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This paper introduces **Compute as Teacher (CaT)**, a novel method that converts a large language model's (LLM) inference-time exploration into **reference-free supervision** by synthesizing a single, improved reference answer from multiple parallel rollouts generated by the model. This synthesized reference is then used as a teacher signal for training (CaT-RL) or immediate inference-time gain (CaT). For **verifiable tasks** like math, programmatic checks compare rollouts to the synthesized answer, while for **non-verifiable tasks**, the anchor model proposes specific, auditable rubrics that an independent LLM judge scores to provide a fine-grained reward. The study demonstrates that CaT-RL significantly improves performance across multiple LLM families on both **mathematical reasoning (MATH-500)** and **non-verifiable dialogue (HealthBench)**, outperforming various selection and single-sample baselines and even achieving results competitive with human-annotated feedback. The core mechanism involves the anchor policy reconciling contradictions and omissions across rollouts to construct a superior answer, suggesting that compute can effectively substitute for missing human-labeled supervision.

For the best experience, listen in Metacast app for iOS or Android