What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT

Best AI papers explained

Sep 27, 2025•17 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This academic paper investigates what makes a Chain-of-Thought (CoT) trace effective for Large Reasoning Models (LRMs), challenging the prevailing idea that **longer reasoning traces and increased review behaviors automatically lead to better performance**. Through a systematic evaluation across ten LRMs on math and scientific reasoning, the authors demonstrate that **shorter CoTs and lower Review Ratios are often associated with higher accuracy**. To identify a more fundamental predictor, the research introduces a graph view of CoT and defines the **Failed-Step Fraction (FSF)**, which consistently and robustly predicts correctness across models and datasets, outperforming length and review metrics. Finally, test-time selection and direct CoT editing interventions provide causal evidence that **low FSF improves accuracy** by mitigating the bias that failed reasoning branches introduce to subsequent steps.

For the best experience, listen in Metacast app for iOS or Android