Catherine Olsson and Nelson Elhage: Anthropic, Understanding Transformers - podcast episode cover

Catherine Olsson and Nelson Elhage: Anthropic, Understanding Transformers

Aug 26, 202247 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

In episode 40 of The Gradient Podcast, Andrey Kurenkov speaks to Catherine Olsson and Nelson Elhage.

Catherine and Nelson are both members of technical staff at Anthropic, which is an AI safety and research company that’s working to build reliable, interpretable, and steerable AI systems. Catherine and Nelson’s focus is on interpretability, and we will discuss several of their recent works in this interview. 
Follow The Gradient on Twitter

Outline:

(00:00) Intro
(01:10) Catherine’s Path into AI
(03:25) Nelson’s Path into AI
(05:23) Overview of Anthropic
(08:21) Mechanistic Interpretability
(15:15) Transformer Circuits 
(21:30) Toy Transformer
(27:25) Induction Heads
(31:00) In-Context Learning
(35:10) Evidence for Induction Heads Enabling In-Context Learning
(39:30) What’s Next
(43:10) Replicating Results
(46:00) Outro

Links:

Anthropic

Zoom In: An Introduction to Circuits

Mechanistic Interpretability, Variables, and the Importance of Interpretable Bases

A Mathematical Framework for Transformer Circuits

In-context Learning and Induction Heads 

PySvelte



Get full access to The Gradient at thegradientpub.substack.com/subscribe
For the best experience, listen in Metacast app for iOS or Android