706: Large Language Model Leaderboards and Benchmarks - podcast episode cover

706: Large Language Model Leaderboards and Benchmarks

Aug 18, 202333 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

In this episode, Caterina Constantinescu dives deep into Large Language Models (LLMs), spotlighting top leaderboards, evaluation benchmarks, and real-world user perceptions. Plus, discover the challenges of dataset contamination and the intricacies of platforms like HELM and Chatbot Arena.Additional materials: www.superdatascience.com/706Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
For the best experience, listen in Metacast app for iOS or Android
Open in Metacast
706: Large Language Model Leaderboards and Benchmarks | Super Data Science: ML & AI Podcast with Jon Krohn - Listen or read transcript on Metacast