[AIEWF Preview] Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect
May 23, 2025•40 min
Episode description
In an otherwise heavy week packed with Microsoft Build, Google I/O, and OpenAI io, the worst kept secret in biglab land was the launch of Claude 4, particularly the triumphant return of Opus, which many had been clamoring for. We will leave the specific Claude 4 recap to AINews, however we think that both Gemini’s progress on Deep Think this week and Claude 4 represent the next frontier of progress on inference time compute/reasoning (at last until GPT5 ships this summer).
Will Brown’s talk at AIE NYC and open source work on verifiers have made him one of the most prominent voices able to publicly discuss (aka without the vaguepoasting LoRA they put on you when you join a biglab) the current state of the art in reasoning models and where current SOTA research directions lead. We discussed his latest paper on Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment and he has previewed his AIEWF talk on Agentic RL for those with the temerity to power thru bad meetup audio.
Chapters
00:00 Introduction and Episode Overview
02:01 Discussion on Cloud 4 and its Features
04:31 Reasoning and Tool Use in AI Models
07:01 Extended Thinking in Claude and Model Differences
09:31 Speculation on Claude's Extended Thinking
11:01 Challenges and Controversies in AI Model Training
13:31 Technical Highlights and Code Trustworthiness
16:01 Token Costs and Incentives in AI Models
18:31 Thinking Budgets and AI Effort
21:01 Safety and Ethics in AI Model Development
23:31 Anthropic's Approach to AI Safety
26:01 LLM Arena and Evaluation Challenges
28:31 Developing Taste and Direction in AI Research
31:01 Recent Research and Multi-Turn RL
33:31 Tools and Incentives in AI Model Development
36:01 Challenges in Evaluating AI Model Outputs
38:31 Model-Based Rewards and Future Directions
41:01 Wrap-up and Future Plans
For the best experience, listen in Metacast app for iOS or Android
Open in Metacast