Full post: https://www.interconnects.ai/p/elicitation-theory-of-post-training If you look at most of the models we've received from OpenAI, Anthropic, and Google in the last 18 months, you'll hear a lot of "Most of the improvements were in the post-training phase." The most recent one was Anthropic’s CEO Dario Amodei explaining Claude 3.7 on the Hard Fork Podcast : We are not too far away from releasing a model that's a bigger base model. Most of the improvements in 3.6/3.7 are in the post-train...
Mar 10, 2025•8 min•Transcript available on Metacast Link: https://www.interconnects.ai/p/where-inference-time-scaling-pushes There’s a lot of noise about the current costs of AI models served for free users, mostly saying it’s unsustainable and making the space narrow for those with the historical perspective of costs of technology always plummeting. GPT-4.5’s odd release of a “giant” model without a clear niche only amplified these critics. With inference-time compute being a new default mode, can we still have free AI products? Are we just in t...
Mar 05, 2025•14 min•Transcript available on Metacast More: https://www.interconnects.ai/p/gpt-45-not-a-frontier-model As GPT-4.5 was being released, the first material the public got access to was OpenAI’s system card for the model that details some capability evaluations and mostly safety estimates. Before the live stream and official blog post, we knew things were going to be weird because of this line: GPT-4.5 is not a frontier model. The updated system card in the launch blog post does not have this. Here’s the original system card if you need...
Feb 28, 2025•10 min•Transcript available on Metacast https://www.interconnects.ai/p/character-training The vast majority of evaluations used to measure progress on post-training at frontier laboratories are internal evaluations rather than the evaluations you hear about all the time like MATH or GPQA. These, the well-known intra-industry evaluations, are certainly important for ballparking behavior, but for every public evaluation, these frontier laboratories are likely to have 10+ fine-grained internal evaluations. The internal evaluations these ...
Feb 26, 2025•12 min•Transcript available on Metacast On Monday, February 24th, 2025, Anthropic announced their latest model , Claude 3.7 Sonnet, which is their first model explicitly trained to use more inference time tokens to improve performance. This is another reinforcement learning (RL) trained model (mentioned in system card ). With this model, they also released Claude Code as a limited research preview, which is a “command line tool for agentic coding.” Continuous improvements in models are enabling new modalities and domains addressable b...
Feb 24, 2025•10 min•Transcript available on Metacast Full post: https://www.interconnects.ai/p/grok-3-and-an-accelerating-ai-roadmap xAI launched their latest flagship model, Grok 3, last night via a live stream on X , which is a new take on the launch process, but it largely felt familiar. Grok 3 is a state-of-the-art model on some important benchmarks. The core is that it is state-of-the-art relative to available models and we know better models are out there. Only some of them have been announced , some of them have been teased , and others lie...
Feb 18, 2025•12 min•Transcript available on Metacast The era we are living through in language modeling research is one characterized by complete faith that reasoning and new reinforcement learning (RL) training methods will work. This is well-founded. A day | cannot | go | by | without | a new | reasoning model , RL training result , or dataset distilled from DeepSeek R1 . The difference, compared to the last time RL was at the forefront of the AI world with the fact that reinforcement learning from human feedback (RLHF) was needed to create Chat...
Feb 13, 2025•40 min•Transcript available on Metacast Article: https://www.interconnects.ai/p/deep-research-information-vs-insight-in-science (sorry about some more audible breaths in this -- I'm going to work on it!) We at Ai2 released a local LM iPhone app for our OLMoE model (1B active, 7B total params), with greatly improved scores ! Let us know what you think, or read more here . OpenAI’s Deep Research has largely been accepted as a super valuable tool for knowledge workers and analysts across the economy, but its real engine of economic progr...
Feb 12, 2025•14 min•Transcript available on Metacast As many of you know, this weekend I appeared on the Lex Fridman Podcast with my friend Dylan Patel of SemiAnalysis to cover DeepSeek and the implications on the AI ecosystem. I recommend you check it out. This post was tricky to pull together. I decided to share it anyways given the timeliness of the topic and other more exciting things I have to get to. The minor, thematic contradictions on motivations, costs, and trajectories are exactly indicative of why analysis and productionization of open...
Feb 05, 2025•16 min•Transcript available on Metacast This post is early to accommodate some last minute travel on my end! The new models trained to express extended chain of thought are going to generalize outside of their breakthrough domains of code and math. The “reasoning” process of language models that we use today is chain of thought reasoning. We ask the model to work step by step because it helps it manage complexity, especially in domains where the answer requires precision across multiple specific tokens. The domains where chain of thou...
Jan 28, 2025•12 min•Transcript available on Metacast We're here to share the story of building our Open Language Models (OLMos) and what we improved to build the OLMo 2 7B/13B model that is competitive with the Llama 3.1 8B model. This is all about building an effective, small language modeling team that can share all it learns with the scientific community. Dirk, Luca, and Kyle are some of the people I learn the most from and have more knowledge (and entertainment) to share than we have time. Some questions were pulled from Twitter , but please c...
Jan 22, 2025•1 hr 13 min•Transcript available on Metacast Full post for links, images, etc: https://www.interconnects.ai/p/deepseek-r1-recipe-for-o1 I have a few shows to share with you this week: * On The Retort a week or two ago, we discussed the nature of AI and if it is a science (in the Kuhn’ian sense) * I appeared on Dean W. Ball and Timothy B. Lee ’s new podcast AI Summer to discuss “thinking models” and the border between post-training and reasoning methods. Listen here . * Finally, a talk I gave at NeurIPs on how I think about post-training fo...
Jan 21, 2025•20 min•Transcript available on Metacast Full post for images, etc: https://www.interconnects.ai/p/to-meta-ray-ban-local-ai With the Rabbit r1, the Humane pin, the Friend thing, the Sam Altman rumors, Meta Ray-Bans, and everything in between , it is obvious that we are going to get new devices in the near future driven by advancements in AI. Trying some of those that already are public makes this obvious from a functional perspective rather than a marketing perspective. Even though many of these devices will have a shelf life drastical...
Jan 15, 2025•10 min•Transcript available on Metacast Original post: https://www.interconnects.ai/p/deepseek-v3-and-the-actual-cost-of Chapters 00:00 Opening 03:15 DeepSeek’s learning efficiency 06:49 DeepSeek’s compute transparency and reality Figures Fig 1: Benchmark Results Fig 2: ChatBotArena Results Fig 3: Compute Usage Table Get full access to Interconnects at www.interconnects.ai/subscribe...
Jan 09, 2025•17 min•Transcript available on Metacast Slides for this post-training talk and slides for the full tutorial on language modeling (with a bit less post-training content and no recording yet). Here are some timestamps for the video: 00:00 Introduction 10:00 Prompts & Skill Selection 14:19 Instruction Finetuning 21:45 Preference Finetuning 36:17 Reinforcement Finetuning 45:28 Open Questions 52:02 Wrap Up Psssst… we just recently released our technical report for OLMo 2 — 2 OLMo 2 Furious , check it out for tons of training details and ti...
Jan 08, 2025•54 min•Transcript available on Metacast In 2025 we need to disambiguate three intertwined topics: post-training, reasoning, and inference-time compute. Post-training is going to quickly become muddied with the new Reasoning Language Models (RLMs — is that a good name), given that loss functions that we studied via advancements in post-training are now being leveraged at a large scale to create new types of models. I would not call the reinforcement learning training done for OpenAI’s o1 series of models post-training. Training o1 is l...
Jan 02, 2025•16 min•Transcript available on Metacast Original post https://www.interconnects.ai/p/2024-interconnects-year-in-review Get full access to Interconnects at www.interconnects.ai/subscribe
Dec 31, 2024•6 min•Transcript available on Metacast Original post: https://www.interconnects.ai/p/openais-o3-the-2024-finale-of-ai Chapters 00:00 Introduction 02:51 o3 overview 05:57 Solving the Abstraction and Reasoning Corpus (ARC) 10:41 o3’s architecture, cost, and training (hint: still no tree search) 16:36 2024: RL returns Figures Fig 1, Frontier Math results Fig 2, Coding results Fig 3, ARC AGI results Fig 4, ARC AGI result details Fig 5, ARC AGI example 1 Fig 6, ARC AGI example in text Fig 7, ARC AGI example “easy” Get full access to Inter...
Dec 20, 2024•18 min•Transcript available on Metacast Original post: https://www.interconnects.ai/p/the-ai-agent-spectrum Chapters 00:00 Introduction 03:24 Agent cartography 08:02 Questions for the near future Figures Fig 1. multiple feedbacks diagram Get full access to Interconnects at www.interconnects.ai/subscribe...
Dec 18, 2024•11 min•Transcript available on Metacast Original post: https://www.interconnects.ai/p/openais-reinforcement-finetuning Chapters 00:00 Introduction 04:19 The impact of reinforcement finetuning’s existence 07:29 Hypotheses on reinforcement finetuning’s implementation Figures Fig. 1, Yann’s Cake Fig. 2, Grader config Fig. 3, RLVR learning curves Get full access to Interconnects at www.interconnects.ai/subscribe...
Dec 11, 2024•13 min•Transcript available on Metacast Finbarr Timbers is an AI researcher who writes Artificial Fintelligence — one of the technical AI blog’s I’ve been recommending for a long time — and has a variety of experiences at top AI labs including DeepMind and Midjourney. The goal of this interview was to do a few things: * Revisit what reinforcement learning (RL) actually is, its origins, and its motivations. * Contextualize the major breakthroughs of deep RL in the last decade, from DQN for Atari to AlphaZero to ChatGPT. How could we ha...
Dec 05, 2024•1 hr 9 min•Transcript available on Metacast Original post: https://www.interconnects.ai/p/openais-o1-using-search-was-a-psyop Figures Figure 0: OpenAI’s seminal test-time compute plot Figure 1: Setup for bucketed evals Figure 2: Evals with correctness labels Figure 3: Grouped evals Figure 4: Hypothetical inference scaling law Get full access to Interconnects at www.interconnects.ai/subscribe...
Dec 04, 2024•12 min•Transcript available on Metacast Full post: https://www.interconnects.ai/p/olmo-2-and-building-language-model-training OLMo 2 demo: https://playground.allenai.org/ OLMo 2 artifacts: https://huggingface.co/collections/allenai/olmo-2-674117b93ab84e98afc72edc Chapters 00:00 Building AI Teams 06:35 OLMo 2 Figures Fig 1, pretrain plot: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/olmo2/pretrain.webp Fig 2, pretrain table: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main...
Nov 26, 2024•10 min•Transcript available on Metacast Original post: https://www.interconnects.ai/p/tulu-3 Chapters 00:00 History 05:44 Technical details sneak peak Figures Fig 1, results: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/tulu3-img/results.webp Fig 2, overview: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/tulu3-img/overview.webp Fig 3, preferences: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/tulu3-img/preferences.webp Fig 4, RLVR: http...
Nov 21, 2024•8 min•Transcript available on Metacast Original post: https://www.interconnects.ai/p/scaling-realities Get full access to Interconnects at www.interconnects.ai/subscribe
Nov 14, 2024•4 min•Transcript available on Metacast Original post: https://www.interconnects.ai/p/saving-the-nairr Chapters 05:26: Do we need an AI research resource or an LM research resource? 08:59: Policy roundups Get full access to Interconnects at www.interconnects.ai/subscribe
Nov 13, 2024•11 min•Transcript available on Metacast Tim Dettmers does not need an introduction for most people building open-source AI. If you are part of that minority, you’re in for a treat. Tim is the lead developer behind most of the open-source tools for quantization: QLoRA , bitsandbytes , 4 and 8 bit inference , and plenty more. He recently finished his Ph.D. at the University of Washington, is now a researcher at the Allen Institute for AI, and is starting as a professor at Carnegie Mellon University in fall of 2025. Tim is a joy to talk ...
Nov 07, 2024•1 hr 16 min•Transcript available on Metacast Andrew Carr is co-founder and chief scientist at Cartwheel , where he is building text-to-motion AI models and products for gaming, film, and other creative endeavors. We discuss how to keep generative AI fun and expansive — niche powerful use-cases, AI poetry, AI devices like Meta RayBans, generalization to new domains like robotics, and building successful AI research cultures. Andrew is one of my well read friends on the directions AI is going, so it is great to bring him in for an official c...
Oct 31, 2024•54 min•Transcript available on Metacast Full post: https://www.interconnects.ai/p/why-i-build-open-language-models Get full access to Interconnects at www.interconnects.ai/subscribe
Oct 30, 2024•10 min•Transcript available on Metacast How Claude's computer use works. Where OpenAI, Anthropic, and Google all have a lead on eachother. Original post: https://www.interconnects.ai/p/claudes-agency Chapters 00:00 Claude's agentic future and the current state of the frontier models 04:43 The state of the frontier models 04:49 1. Anthropic has the best model we are accustomed to using 05:27 Google has the best small & cheap model for building automation and basic AI engineering 08:07 OpenAI has the best model for reasoning, but we don...
Oct 23, 2024•11 min•Transcript available on Metacast