The Utility of Interpretability — Emmanuel Amiesen
Jun 06, 2025
Episode description
Emmanuel Amiesen is lead author of “Circuit Tracing: Revealing Computational Graphs in Language Models” (https://transformer-circuits.pub/2025/attribution-graphs/methods.html ), which is part of a duo of MechInterp papers that Anthropic published in March (alongside https://transformer-circuits.pub/2025/attribution-graphs/biology.html ).
We recorded the initial conversation a month ago, but then held off publishing until the open source tooling for the graph generation discussed in this work was released last week: https://www.anthropic.com/research/open-source-circuit-tracing
This is a 2 part episode - an intro covering the open source release, then a deeper dive into the paper — with guest host Vibhu Sapra (https://x.com/vibhuuuus ) and Mochi the MechInterp Pomsky (https://x.com/mochipomsky ). Thanks to Vibhu for making this episode happen!
While the original blogpost contained some fantastic guided visualizations (which we discuss at the end of this pod!), with the notebook and Neuronpedia visualization (https://www.neuronpedia.org/gemma-2-2b/graph ) released this week, you can now explore on your own with Neuronpedia, as we show you in the video version of this pod.
Chapters
00:00 Intro & Guest Introductions
01:00 Anthropic's Circuit Tracing Release
06:11 Exploring Circuit Tracing Tools & Demos
13:01 Model Behaviors and User Experiments
17:02 Behind the Research: Team and Community
24:19 Main Episode Start: Mech Interp Backgrounds
25:56 Getting Into Mech Interp Research
31:52 History and Foundations of Mech Interp
37:05 Core Concepts: Superposition & Features
39:54 Applications & Interventions in Models
45:59 Challenges & Open Questions in Interpretability
57:15 Understanding Model Mechanisms: Circuits & Reasoning
01:04:24 Model Planning, Reasoning, and Attribution Graphs
01:30:52 Faithfulness, Deception, and Parallel Circuits
01:40:16 Publishing Risks, Open Research, and Visualization
01:49:33 Barriers, Vision, and Call to Action
For the best experience, listen in Metacast app for iOS or Android
Open in Metacast