Transformers Represent Belief State Geometry in their Residual Stream

LessWrong (Curated & Popular)

Apr 17, 2024•24 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Produced while being an affiliate at PIBBSS[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS. Work done in collaboration with @Paul Riechers, @Lucas Teixeira, @Alexander Gietelink Oldenziel, and Sarah Marzen. Paul was a MATS scholar during some portion of this work. Thanks to Paul, Lucas, Alexander, and @Guillaume Corlouer for suggestions on this writeup.

Introduction.

What computational structure are we building into LLMs when we train them on next-token prediction? In this post we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the data-generating process. We'll explain exactly what this means in the post. We are excited by these results because

We have a formalism that relates training data to internal structures in LLMs.
Conceptually, our results mean that LLMs synchronize to their internal world model as they move [...]

The original text contained 10 footnotes which were omitted from this narration.

---

First published:
April 16th, 2024

Source:
https://www.lesswrong.com/posts/gTZ2SxesbHckJ3CkF/transformers-represent-belief-state-geometry-in-their

---

Narrated by TYPE III AUDIO.

For the best experience, listen in Metacast app for iOS or Android