Kai Williams on the many masks LLMs wear - podcast episode cover

Kai Williams on the many masks LLMs wear

Feb 22, 202647 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

With Dean away, Tim invites his Understanding AI colleague Kai to unpack the surprising ways chatbot personalities can go wrong, a topic Kai covered in a recent article.

Every LLM starts as a base model capable of playing countless characters, but AI companies try to keep chatbots in a “helpful assistant” lane. Kai walks us through the Grok “MechaHitler” debacle, in which xAI’s attempts to make its bot less politically correct backfired spectacularly. They also explore the “emergent misalignment” finding that fine-tuning a model for one bad behavior — like responding with buggy code — can make it act broadly like a villain. And they compare Anthropic’s virtue-ethics approach to character — complete with an 80-page constitution — with OpenAI’s more deontological model spec.

Finally, they discuss the controversy over OpenAI’s decision to retire GPT-4o, which had developed an emotionally warm, sometimes dangerously sycophantic personality that users grew attached to. Kai argues OpenAI is making the right call, but the episode leaves open a harder question: as these systems become more central to people’s lives, who decides what counts as a healthy AI personality?



This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.aisummer.org
For the best experience, listen in Metacast app for iOS or Android