Asymptotic Safety Guarantees Based On Scalable Oversight

Best AI papers explained

May 06, 2025•19 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This details a presentation by Geoffrey Irving, Chief Scientist at the UK AI Safety Institute, discussing approaches to achieving asymptotic safety guarantees for AI. Irving critiques existing methods like scalable oversight (including techniques like debate), arguing that current theories and experiments suggest they will likely fail due to issues such as obfuscated arguments and exploration hacking. He proposes that while a full formal verification of neural networks is likely too difficult, an intermediate goal involving theoretical frameworks combined with empirical testing offers a more promising path forward. The discussion highlights the need for novel complexity theory to address problems like obfuscated arguments and suggests that the field needs significantly more researchers to tackle these fundamental challenges in AI safety.

For the best experience, listen in Metacast app for iOS or Android