18 - Concept Extrapolation with Stuart Armstrong

AXRP - the AI X-risk Research Podcast

Sep 03, 2022•1 hr 46 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Concept extrapolation is the idea of taking concepts an AI has about the world - say, "mass" or "does this picture contain a hot dog" - and extending them sensibly to situations where things are different - like learning that the world works via special relativity, or seeing a picture of a novel sausage-bread combination. For a while, Stuart Armstrong has been thinking about concept extrapolation and how it relates to AI alignment. In this episode, we discuss where his thoughts are at on this topic, what the relationship to AI alignment is, and what the open questions are.

Topics we discuss, and timestamps:

- 00:00:44 - What is concept extrapolation

- 00:15:25 - When is concept extrapolation possible

- 00:30:44 - A toy formalism

- 00:37:25 - Uniqueness of extrapolations

- 00:48:34 - Unity of concept extrapolation methods

- 00:53:25 - Concept extrapolation and corrigibility

- 00:59:51 - Is concept extrapolation possible?

- 01:37:05 - Misunderstandings of Stuart's approach

- 01:44:13 - Following Stuart's work

The transcript: axrp.net/episode/2022/09/03/episode-18-concept-extrapolation-stuart-armstrong.html

Stuart's startup, Aligned AI: aligned-ai.com