Ryan Greenblatt from Redwood Research recently published "Getting 50% on ARC-AGI with GPT-4.0," where he used GPT4o to reach a state-of-the-art accuracy on Francois Chollet's ARC Challenge by generating many Python programs.
Sponsor:
Sign up to Kalshi here https://kalshi.onelink.me/1r91/mlst -- the first 500 traders who deposit $100 will get a free $20 credit! Important disclaimer - In case it's not obvious - this is basically gambling and a *high risk* activity - only trade what you can afford to lose.
We discuss:
- Ryan's unique approach to solving the ARC Challenge and achieving impressive results.
- The strengths and weaknesses of current AI models.
- How AI and humans differ in learning and reasoning.
- Combining various techniques to create smarter AI systems.
- The potential risks and future advancements in AI, including the idea of agentic AI.
https://www.redwoodresearch.org/
Refs:
Getting 50% (SoTA) on ARC-AGI with GPT-4o [Ryan Greenblatt]
https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt
On the Measure of Intelligence [Chollet]
https://arxiv.org/abs/1911.01547
Connectionism and Cognitive Architecture: A Critical Analysis [Jerry A. Fodor and Zenon W. Pylyshyn]
Software 2.0 [Andrej Karpathy]
https://karpathy.medium.com/software-2-0-a64152b37c35
Why Greatness Cannot Be Planned: The Myth of the Objective [Kenneth Stanley]
Biographical account of Terence Tao’s mathematical development. [M.A.(KEN) CLEMENTS]
https://gwern.net/doc/iq/high/smpy/1984-clements.pdf
Model Evaluation and Threat Research (METR)
Why Tool AIs Want to Be Agent AIs
Simulators - Janus
https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators
AI Control: Improving Safety Despite Intentional Subversion
https://arxiv.org/abs/2312.06942
What a Compute-Centric Framework Says About Takeoff Speeds
Global GDP over the long run
https://ourworldindata.org/grapher/global-gdp-over-the-long-run?yScale=log
Safety Cases: How to Justify the Safety of Advanced AI Systems
https://arxiv.org/abs/2403.10462
The Danger of a “Safety Case"
http://sunnyday.mit.edu/The-Danger-of-a-Safety-Case.pdf
The Future Of Work Looks Like A UPS Truck (~02:15:50)
SWE-bench
Using DeepSpeed and Megatron to Train Megatron-Turing NLG
530B, A Large-Scale Generative Language Model
https://arxiv.org/pdf/2201.11990
Algorithmic Progress in Language Models
https://epochai.org/blog/algorithmic-progress-in-language-models