The Agent Benchmark That Should Scare Managers - podcast episode cover

The Agent Benchmark That Should Scare Managers

May 29, 202619 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Agentic coding tools are moving into enterprise workflows, but the week's most useful signal is a benchmark where frontier models still struggle below 50% on real IT tasks. Alex and Sam unpack Microsoft Learn grounding, agent deception, Copilot data leaks, and the practical harness every team should build before handing agents production authority.

For the best experience, listen in Metacast app for iOS or Android