Asymptotic Safety Guarantees Based On Scalable Oversight - podcast episode cover

Asymptotic Safety Guarantees Based On Scalable Oversight

May 06, 202519 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

This details a presentation by Geoffrey Irving, Chief Scientist at the UK AI Safety Institute, discussing approaches to achieving asymptotic safety guarantees for AI. Irving critiques existing methods like scalable oversight (including techniques like debate), arguing that current theories and experiments suggest they will likely fail due to issues such as obfuscated arguments and exploration hacking. He proposes that while a full formal verification of neural networks is likely too difficult, an intermediate goal involving theoretical frameworks combined with empirical testing offers a more promising path forward. The discussion highlights the need for novel complexity theory to address problems like obfuscated arguments and suggests that the field needs significantly more researchers to tackle these fundamental challenges in AI safety.

For the best experience, listen in Metacast app for iOS or Android