Takes on "Alignment Faking in Large Language Models"
Dec 18, 2024•1 hr 28 min
Episode description
What can we learn from recent empirical demonstrations of scheming in frontier models? Text version here: https://joecarlsmith.com/2024/12/18/takes-on-alignment-faking-in-large-language-models/
For the best experience, listen in Metacast app for iOS or Android
