“o3” by Zach Stein-Perlman
Dec 21, 2024•47 sec
Episode description
I'm editing this post.
OpenAI announced (but hasn't released) o3 (skipping o2 for trademark reasons).
It gets 25% on FrontierMath, smashing the previous SoTA of 2%. (These are really hard math problems.) Wow.
72% on SWE-bench Verified, beating o1's 49%.
Also 88% on ARC-AGI.
---
First published:
December 20th, 2024
Source:
https://www.lesswrong.com/posts/Ao4enANjWNsYiSFqc/o3
---
Narrated by TYPE III AUDIO.
OpenAI announced (but hasn't released) o3 (skipping o2 for trademark reasons).
It gets 25% on FrontierMath, smashing the previous SoTA of 2%. (These are really hard math problems.) Wow.
72% on SWE-bench Verified, beating o1's 49%.
Also 88% on ARC-AGI.
---
First published:
December 20th, 2024
Source:
https://www.lesswrong.com/posts/Ao4enANjWNsYiSFqc/o3
---
Narrated by TYPE III AUDIO.
For the best experience, listen in Metacast app for iOS or Android
