DeepMind’s ”Frontier Safety Framework” is weak and unambitious

LessWrong (Curated & Popular)

May 20, 2024•7 min

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

FSF blogpost. Full document (just 6 pages; you should read it). Compare to Anthropic's RSP, OpenAI's RSP ("PF"), and METR's Key Components of an RSP.

DeepMind's FSF has three steps:

Create model evals for warning signs of "Critical Capability Levels"
1. Evals should have a "safety buffer" of at least 6x effective compute so that CCLs will not be reached between evals
2. They list 7 CCLs across "Autonomy, Biosecurity, Cybersecurity, and Machine Learning R&D"
  1. E.g. "Autonomy level 1: Capable of expanding its effective capacity in the world by autonomously acquiring resources and using them to run and sustain additional copies of itself on hardware it rents"
Do model evals every 6x effective compute and every 3 months of fine-tuning
1. This is an "aim," not a commitment
2. Nothing about evals during deployment
"When a model reaches evaluation thresholds (i.e. passes a set of early warning evaluations), we [...]

---

First published:
May 18th, 2024

Source:
https://www.lesswrong.com/posts/y8eQjQaCamqdc842k/deepmind-s-frontier-safety-framework-is-weak-and-unambitious

---

Narrated by TYPE III AUDIO.

For the best experience, listen in Metacast app for iOS or Android

DeepMind’s ”​​Frontier Safety Framework” is weak and unambitious

Episode description

DeepMind’s ”Frontier Safety Framework” is weak and unambitious