γ-Bench: Evaluating LLMs in Multi-Agent Games - podcast episode cover

γ-Bench: Evaluating LLMs in Multi-Agent Games

Apr 24, 202524 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This paper introduces γ-Bench, a novel framework for evaluating the gaming ability of large language models (LLMs) in complex, multi-agent environments. It includes eight classical game theory scenarios with dynamic scoring and parameters to assess LLMs' robustness, generalizability, and strategic thinking. The study evaluates thirteen LLMs from six model families, revealing that Gemini-1.5-Pro currently achieves the top performance. The research also explores the impact of prompt engineering and different game settings on LLM decision-making.

For the best experience, listen in Metacast app for iOS or Android