The podcast by and for AI Engineers! In 2025, over 10 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0.
We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al.
Full show notes always on https://latent.space
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more
In this conversation with Malte Ubl , CTO of Vercel ( http://x.com/cramforce ), we explore how the company is pioneering the infrastructure for AI-powered development through their comprehensive suite of tools including workflows, AI SDK, and the newly announced agent ecosystem. Malte shares insights into Vercel’s philosophy of “dogfooding” - never shipping abstractions they haven’t battle-tested themselves - which led to extracting their AI SDK from v0 and building production agents that handle...
In this deep dive with Kyle Corbitt , co-founder and CEO of OpenPipe (recently acquired by CoreWeave), we explore the evolution of fine-tuning in the age of AI agents and the critical shift from supervised fine-tuning to reinforcement learning. Kyle shares his journey from leading YC’s Startup School to building OpenPipe, initially focused on distilling expensive GPT-4 workflows into smaller, cheaper models before pivoting to RL-based agent training as frontier model prices plummeted. The conver...
At OpenAI DevDay , we sit down with Sherwin Wu and Christina Huang from the OpenAI Platform Team to discuss the launch of AgentKit - a comprehensive suite of tools for building, deploying, and optimizing AI agents. Christina walks us through the live demo she performed on stage, building a customer support agent in just 8 minutes using the visual Agent Builder , while Sherwin shares insights on how OpenAI is inverting the traditional website-chatbot paradigm by embedding apps directly within Cha...
Dylan Field (CEO Figma) on how they are letting designers build with Figma Make, how Figma can be the context repository for aesthetic in the age of vibe coding, and why design is your only differentiator now. Full show notes: https://www.latent.space/p/figma This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe...
Quinn Slack (CEO) and Thorsten Ball (Amp Dictator) from SourceGraph join the show to talk about Amp Code, how they ship 15x/day with no code reviews, and why subagents and prompt optimizers aren’t a promising direction for coding agents. Amp Code: https://ampcode.com/ Latent Space: https://latent.space/ Full Video Episode Timestamps 00:00 Introduction00:41 Transition from Cody to Amp03:18 The Importance of Building the Best Coding Agent06:43 Adapting to a Rapidly Evolving AI Tooling Landscape09:...
Lance: https://www.linkedin.com/in/lance-martin-64a33b5/ How Context Fails: https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html How New Buzzwords Get Created: https://www.dbreunig.com/2025/07/24/why-the-term-context-engineering-matters.html Content Engineering: https://rlancemartin.github.io/2025/06/23/context_engineering/ https://docs.google.com/presentation/d/16aaXLu40GugY-kOpqDU4e-S0hD1FmHcNyF0rRRnb1OU/edit?usp=sharing Manus Post: https://manus.im/blog/Context-Engin...
Our chat with Ari shows that data curation is the most impactful and underinvested area in AI . He argues that the prevailing focus on model architecture and compute scaling overlooks the “bitter lesson” that “models are what they eat.” Effective data curation—a sophisticated process involving filtering, rebalancing, sequencing (curriculum), and synthetic data generation—allows for training models that are simultaneously faster, better, and smaller . Morcos recounts his personal journey from foc...
We first had Nathan on to give us his RLHF deep dive when he was joining AI2 , and now he’s back to help us catch up on the evolution to RLVR (Reinforcement Learning with Verifiable Rewards), first proposed in his Tulu 3 paper. While RLHF remains foundational, RLVR has emerged as a powerful approach for training models on tasks with clear success criteria and using verifiable, objective functions as reward signals—particularly useful in domains like math, code correctness, and instruction-follow...
ChatGPT handles 2.5B prompts/day and is on track to match Google’s daily searches by end of 2026. AI agents don’t browse like us—they crave queryable, chunkable data for tools like ChatGPT & Perplexity. A new industry is being born, some are calling it AI SEO, others GEO, but what is clear is that it drives amazing results. Businesses are seeing 2-4x higher conversion from visitors coming from AI compared to traditional search. Robert McCloy is the co-founder of Scrunch AI (https://scrunchai...
Saoud Rizwan and Pash from Cline joined us to talk about why fast apply models got bitter lesson’d, how they pioneered the plan + act paradigm for coding, and why non-technical people use IDEs to do marketing and generate slides. Full writeup: https://www.latent.space/p/cline X: https://x.com/latentspacepod Full Video Episode Timestamps 00:00 - Introductions 01:35 - Plan and Act Paradigm 05:37 - Model Evaluation and Early Development of Cline 08:14 - Use Cases of Cline Beyond Coding 09:09 - Why ...
Speak (https://speak.com) may not be very well known to native English speakers, but they have come from a slow start in 2016 to emerge as one of the favorite partners of OpenAI , with their Startup Fund leading and joining their Series B and C as one of the new AI-native unicorns, noting that “Speak has the potential to revolutionize not just language learning, but education broadly”. Today we speak with Speak’s CTO, Andrew Hsu , on the journey of building the “3rd generation” of language learn...
When the first video diffusion models started emerging, they were little more than just “moving pictures” - still frames extended a few seconds in either direction in time. There was a ton of excitement about OpenAI’s Sora on release through 2024, but so far only Sora-lite has been widely released. Meanwhile, other good videogen models like Genmo Mochi, Pika, MiniMax T2V, Tencent Hunyuan Video, and Kuaishou’s Kling have emerged, but the reigning king this year seems to be Google’s Veo 3 , which ...
Our last AI PhD grad student feature was Shunyu Yao , who happened to focus on Language Agents for his thesis and immediately went to work on them for OpenAI . Our pick this year is Jack Morris , who bucks the “hot” trends by -not- working on agents, benchmarks, or VS Code forks, but is rather known for his work on the information theoretic understanding of LLMs, starting from embedding models and latent space representations (always close to our heart). Jack is an unusual combination of doing u...
Solving Poker and Diplomacy, Debating RL+Reasoning with Ilya, what’s *wrong* with the System 1/2 analogy, and where Test-Time Compute hits a wall Full Video Episode Timestamps 00:00 Intro – Diplomacy, Cicero & World Championship 02:00 Reverse Centaur: How AI Improved Noam’s Human Play 05:00 Turing Test Failures in Chat: Hallucinations & Steerability 07:30 Reasoning Models & Fast vs. Slow Thinking Paradigm 11:00 System 1 vs. System 2 in Visual Tasks (GeoGuessr, Tic-Tac-Toe) 14:00 The ...
Emmanuel Amiesen is lead author of “Circuit Tracing: Revealing Computational Graphs in Language Models” (https://transformer-circuits.pub/2025/attribution-graphs/methods.html ), which is part of a duo of MechInterp papers that Anthropic published in March (alongside https://transformer-circuits.pub/2025/attribution-graphs/biology.html ). We recorded the initial conversation a month ago, but then held off publishing until the open source tooling for the graph generation discussed in this work was...
Solomon most famously created Docker and now runs Dagger… which has something special to share with you on Thursday. Catch Dagger at: - Tuesday: Dagger’s workshop https://www.ai.engineer/schedule#ship-agents-that-ship-a-hands-on-workshop-for-swe-agent-builders - Wednesday: Dagger’s talk: https://www.ai.engineer/schedule#how-to-trust-an-agent-with-software-delivery - Thursday: Solomon’s Keynote https://www.ai.engineer/schedule#containing-agent-chaos Full Video Episode Timestamps 00:00 Introductio...
As part of our AI Engineer World’s Fair preview , we’re releasing a special cross podcast recorded with Sam Charrington of TWiML AI at last week’s Google I/O! TUESDAY: Shrestha and Kwindla’s workshop: https://www.ai.engineer/schedule#milliseconds-to-magic-real-time-workflows-using-the-gemini-live-api-and-pipecat TUESDAY: Kwindla’s workshop: https://www.ai.engineer/schedule#building-voice-agents-with-gemini-and-pipecat WEDNESDAY: Shrestha and Kwindla’s talk: https://www.ai.engineer/schedule#milli...
One of the new tracks at next week’s AI Engineer conference in SF is a new focus on LLMs + Robotics, ft. household names like Waymo and Physical Intelligence. However there are many other companies applying LLMs and VLMs in the real world! CloudChef , the first industrial-scale kitchen robotics company with one-shot demonstration learning and an incredibly simple business model, will be serving tasty treats all day with Zippy (https://www.cloudchef.co/zippy ) their AI Chef platform. This is a li...
We are joined by Eno Reyes and Matan Grinberg , the co-founders of Factory.ai . They are building droids for autonomous software engineering, handling everything from code generation to incident response for production outages. After raising a $15M Series A from Sequoia, they just released their product in GA! https://factory.ai/ https://x.com/latentspacepod Full Video Episode Timestamps 00:00 Introductions 00:35 Meeting at Langchain Hackathon 04:02 Building Factory despite early model limitatio...
In an otherwise heavy week packed with Microsoft Build, Google I/O, and OpenAI io, the worst kept secret in biglab land was the launch of Claude 4, particularly the triumphant return of Opus, which many had been clamoring for. We will leave the specific Claude 4 recap to AINews, however we think that both Gemini’s progress on Deep Think this week and Claude 4 represent the next frontier of progress on inference time compute/reasoning (at last until GPT5 ships this summer). Will Brown’s talk at A...
Note from your hosts: we were off this week for ICLR and RSA! This week we’re bringing you one of the top episodes from our lightning podcast series, the shorter format, Youtube-only side podcast we do for breaking news and faster turnaround. Please support our work on YouTube! https://www.youtube.com/playlist?list=PLWEAb1SXhjlc5qgVK4NgehdCzMYCwZtiB The explosion of embedding-based applications created a new challenge: efficiently storing, indexing, and searching these high-dimensional vectors a...
We’ll keep this brief because we’re on a tight turnaround: GPT 4.1 , previously known as the Quasar and Optimus models , is now live as the natural update for 4o/4o-mini (and the research preview of GPT 4.5). Though it is a general purpose model family, the headline features are: Coding abilities (o1-level SWEBench and SWELancer, but ok Aider) Instruction Following (with a very notable prompting guide) Long Context up to 1m tokens (with new MRCR and Graphwalk benchmarks) Vision (simply o1 level)...
We are calling for the world’s best AI Engineer talks for AI Architects, /r/localLlama, Model Context Protocol (MCP), GraphRAG, AI in Action, Evals, Agent Reliability, Reasoning and RL, Retrieval/Search/RecSys , Security, Infrastructure, Generative Media, AI Design & Novel AI UX, AI Product Management, Autonomy, Robotics, and Embodied Agents, Computer-Using Agents (CUA), SWE Agents, Vibe Coding, Voice, Sales/Support Agents at AIEWF 2025 ! Fill out the 2025 State of AI Eng survey for $250 in ...
We are happy to announce that there will be a dedicated MCP track at the 2025 AI Engineer World's Fair , taking place Jun 3rd to 5th in San Francisco , where the MCP core team and major contributors and builders will be meeting. Join us and apply to speak or sponsor ! When we first wrote Why MCP Won , we had no idea how quickly it was about to win. In the past 4 weeks, OpenAI and now Google have now announced the MCP support, effectively confirming our prediction that MCP was the presumptive win...
If you’re in SF: Join us for the Claude Plays Pokemon hackathon this Sunday! If you’re not: Fill out the 2025 State of AI Eng survey for $250 in Amazon cards! Unsupervised Learning is a podcast that interviews the sharpest minds in AI about what’s real today, what will be real in the future and what it means for businesses and the world - helping builders, researchers and founders deconstruct and understand the biggest breakthroughs. Top guests: Noam Shazeer, Bob McGrew, Noam Brown, Dylan Patel,...
If you’re in SF: Join us for the Claude Plays Pokemon hackathon this Sunday! If you’re not: Fill out the 2025 State of AI Eng survey for $250 in Amazon cards! For this episode: Thanks to Matija and Dan and Meng Shao for sharing on socials. We are SO excited to share our conversation with Dharmesh Shah , co-founder of HubSpot and creator of Agent.ai . A particularly compelling concept we discussed is the idea of " hybrid teams " - the next evolution in workplace organization where human workers c...
We are working with Amplify on the 2025 State of AI Engineering Survey to be presented at the AIE World’s Fair in SF ! Join the survey to shape the future of AI Eng! We first met Snipd ( affiliate link! we get a free month, you get a free month. but this is not a sponsored pod, we’ve never done one ) over a year ago, and were immediately impressed by the design, but were doubtful about the behavior of snipping as the title behavior: Podcast apps are enormously sticky - Spotify spent almost $1b i...
While everyone is now repeating that 2025 is the “Year of the Agent”, OpenAI is heads down building towards it. In the first 2 months of the year they released Operator and Deep Research (arguably the most successful agent archetype so far), and today they are bringing a lot of those capabilities to the API: * Responses API * Web Search Tool * Computer Use Tool * File Search Tool * A new open source Agents SDK with integrated Observability Tools We cover all this and more in today’s lightning po...
David Hershey from Anthropic discusses the creation and mechanics behind Claude Plays Pokémon. He explains the project's origin as a tool for experimenting with agents, the architecture, and the challenges Claude faces in navigating the game, including vision and memory limitations. David also touches on the model's learning capabilities, token usage costs, and potential future improvements.
In this episode, Paul Klein, founder of Browserbase, joins the Latent Space podcast to discuss building browser infrastructure for AI agents. They explore the AI-specific challenges in browser automation, the role of multimodality, and the importance of authentication. The conversation also covers Browserbase's open-source framework, Stagehand, and the future of computer-using agents.