Build Better AI Agents with RL & Fine-Tuning (Kyle from OpenPipe)

AI Tinkerers - "One-Shot"

Oct 17, 2025•51 min•Season 1Ep. 24

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

What you’ll learn:

• How reinforcement learning can reduce AI agent error rates by up to 60% and drastically lower inference costs.

• The critical difference between supervised fine-tuning and RL for agentic workflows, and why RL is essential for true agent reliability.

• A practical, code-level walkthrough of building and training an email search agent that outperforms OpenAI’s GPT-3.5 on a 14-billion-parameter open-source model.

• Strategies for generating high-quality synthetic data and designing nuanced reward functions with ‘partial credit’ to effectively train your agents.

• Key use cases where RL fine-tuning delivers the most significant benefits, including real-time voice agents and high-volume applications.

Kyle Corbett is the founder of OpenPipe, a platform dedicated to helping enterprises build and deploy customized AI models using advanced fine-tuning and reinforcement learning. He’s a seasoned builder who has been working at the frontier of fine-tuning since before public APIs existed.

Key topics covered:

• The limitations of off-the-shelf LLMs for agent reliability and how RL solves them.

• The importance of latency and cost optimization in real-world AI deployments.

• Detailed explanation of the agentic workflow and tool calling in an email search bot.

• The Enron email dataset as a realistic environment for agent training.

• OpenPipe’s open-source Agent Reinforcement Trainer (ART) library for building RL agents.

• The iterative process of data generation, rubric-based scoring, and model updates.

This episode of AI Tinkerers One-Shot goes under the hood with Kyle to share practical learnings for the community.

💡 Resources:

• OpenPipe Website - https://openpipe.ai

• Kyle Corbett LinkedIn - https://www.linkedin.com/in/kcorbitt/

• AI Tinkerers - https://aitinkerers.org

• One-Shot Podcast - https://one-shot.aitinkerers.org/

Social Media: @AITinkerers @OpenPipeAI @corbtt

👍 Like this video if you found it valuable, and subscribe to AI Tinkerers One-Shot for more conversations with innovators building the future of AI!

00:00 Introduction

01:09 Welcome Kyle Corbett, Founder of OpenPipe

01:55 What OpenPipe Does

02:31 OpenPipe’s Journey and YC Experience

00:04:13 Email Search Bot Project Overview

00:05:19 Why Fine-Tuning for Email Search

00:06:22 Email Search Bot: Queries and Results

00:09:23 On-Premise Deployment and Data Sensitivity

00:10:45 Agent Trace Example and Tooling

00:13:55 Using the Enron Dataset

00:15:13 Reinforcement Learning Fundamentals

00:17:01 Synthetic Data Generation with Gemini 2.5 Pro

00:18:51 Reliable Q&A Pairs and Data Scale

00:21:59 Fine-Tuning Impact on Model Performance

00:22:25 RL Adoption in Industry and Community

00:24:37 Rollout Function and Agent Implementation

00:27:52 Rubric and Reward Calculation for RL

00:30:39 Training Loop and Model Updates

00:33:52 RL Fine-Tuning vs. OpenAI’s Fine-Tuning

00:40:38 Time Commitment for RL Projects

00:41:55 Use Cases for RL Fine-Tuning

00:45:37 OpenPipe’s Offerings: Open Source, White Glove Service

00:47:07 Kyle’s Side Tinkering and Future of AI

00:49:59 Discovering AI Tinkerers

For the best experience, listen in Metacast app for iOS or Android