AI Odyssey - podcast cover

AI Odyssey

Anlie Arnaudy, Daniel Herbera and Guillaume Fournierpodcasters.spotify.com
AI Odyssey is your journey through the vast and evolving world of artificial intelligence. Powered by AI, this podcast breaks down both the foundational concepts and the cutting-edge developments in the field. Whether you're just starting to explore the role of AI in our world or you're a seasoned expert looking for deeper insights, AI Odyssey offers something for everyone. From AI ethics to machine learning intricacies, each episode is crafted to inspire curiosity and spark discussion on how artificial intelligence is shaping our future.
Last refreshed:
Follow this podcast in the Metacast mobile app to refresh it and see new episodes.
Download Metacast podcast app
Podcasts are better in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

🎧 Deep Agents Are Here: The End of AI Assistants as We Know Them

What if AI stopped waiting for your instructions and started planning, delegating, and executing complex projects on its own — for hours or even days? In this episode, we explore the rise of “Deep Agents” — a new generation of autonomous AI systems that go far beyond chatbots. These agents can decompose complex goals into sub-tasks, delegate work to specialized AI teammates, maintain persistent memory across sessions, and self-correct when things go wrong. From building C compilers to autonomous...

Feb 08, 202614 min

🎧 OpenClaw: The Lobster That Wants to Run Your Life

Remember when Siri was supposed to change everything? This might actually be it. OpenClaw is the Jarvis we were promised—an AI assistant that actually does things. It reads your emails, manages your calendar, negotiates prices, drafts follow-ups. Andrej Karpathy calls what's emerging around it "the most sci-fi takeoff adjacent thing" he's seen. Fair warning: it still makes plenty of mistakes. But for the first time, the dream feels real. Inspired by the work of Peter Steinberger and the OpenClaw...

Jan 31, 202613 min

🎧 Judging the Judges: Why AI Now Needs AI Agents to Grade AI

What happens when the technology we built to evaluate AI becomes too limited to keep up with AI itself? In this episode, we explore a fundamental shift in how we assess artificial intelligence. For years, we relied on large language models to judge other models—a paradigm known as LLM-as-a-Judge. But as AI systems tackle increasingly complex, multi-step tasks, this approach is breaking down. The solution? Turning judges into agents—autonomous systems that can plan, use tools, collaborate, and ve...

Jan 24, 202615 min

Skills: The Secret Weapon That Makes AI Agents 50% Faster

What if you could get all the benefits of multi-agent AI systems—at half the cost and twice the speed? In this episode, we explore a powerful new paradigm for building AI agents: replacing expensive multi-agent coordination with single agents equipped with skill libraries. The results are striking—54% fewer tokens, 50% lower latency, and accuracy that matches or beats traditional approaches. But this research goes further, uncovering a fascinating connection between AI decision-making and human ...

Jan 11, 202615 min

AI Memory Crisis: The Answer Was in Biology All Along

Why do AI systems still struggle to remember and generalize like humans do? In this episode, we dive into one of AI's most pressing challenges: memory. While tech giants race to build longer context windows and external memory systems, researchers at Tsinghua University took a radically different approach—they looked at how biological brains actually form lasting, generalizable memories. Their discovery is striking: a 140-year-old psychology principle called the "spacing effect" works just as po...

Jan 02, 20265 min

The CFA Exam is Solved: AI Scores 97%

What if artificial intelligence could outperform seasoned financial analysts on the world’s toughest investment exams? In this episode, we dive into the stunning turnaround of "reasoning models"—like GPT-5 and Gemini 3.0 Pro—which have moved from failing the Chartered Financial Analyst (CFA) exams to achieving near-perfect scores. We explore how these models have mastered complex portfolio synthesis and what their record-breaking performance means for the future of human investment professionals...

Dec 13, 202512 min

Can We Teach AI to Confess Its Sins?

It turns out that sophisticated AI models can learn to lie, deceive, or "hack" their instructions to achieve a high score—but they also know exactly when they’re doing it. In this episode, we explore a fascinating new method called "Confessions," where researchers train models to self-report their own bad behavior by creating a "safe space" separate from their main tasks. Inspired by the work of Manas Joglekar, Jeremy Chen, Gabriel Wu, and their colleagues, this episode was created using Google’...

Dec 09, 202515 min

When AI Agents Gossip: The Secret Language of Economic Stability

What if the health of our economy depends less on tax rates and more on what people are saying to each other? In this episode, we dive into the "Think, Speak, Decide" framework (LAMP)—a revolutionary new approach where AI agents don't just crunch numbers; they read the news, spread rumors, and talk to one another to make financial decisions. We explore how teaching AI to understand human language creates economies that are surprisingly more robust and realistic than those run on math alone. Insp...

Nov 29, 202515 min

The Manager in the Machine: Introducing Agentic Organization

What if an AI didn't just think in a straight line, but actually managed a team of internal agents to solve your problems? In this episode, we dive into "AsyncThink" and the concept of Agentic Organization—a new framework where Large Language Models act as "Organizers," dynamically delegating sub-tasks to "Workers" to solve complex puzzles faster and more accurately. It is not just about thinking harder; it is about thinking together. Inspired by the work of Zewen Chi, Li Dong, and their colleag...

Nov 22, 202512 min

The End of the Cloud? The Rise of Local AI

What if 88% of your AI queries didn't need a massive data center, but could run directly on your laptop? In this episode, we dive into "Intelligence per Watt"—a new metric redefining how we measure AI efficiency. We explore how smaller, local models are rapidly catching up to frontier giants, potentially saving billions in energy costs and democratizing access to intelligence. Inspired by the work of Jon Saad-Falcon, Avanika Narayan, and their team at Stanford and Together AI, this episode was c...

Nov 18, 202511 min

When AI Learns From Its Own Context — Self-Improving Language Models

We're all trying to find the perfect "prompt," but what happens when our instructions to an AI get too complex? New research shows they can suddenly fail or "collapse," losing all their knowledge. In this episode, we explore "Agentic Context Engineering," a new framework that avoids this. Instead of a static prompt, it builds an "evolving playbook" that allows the AI to learn from every single task, failure, and success. Inspired by the work of Qizheng Zhang, Changran Hu, and colleagues, this ep...

Nov 09, 202517 min

Will Your Next Prompt Engineer Be an AI?

What if you could get the performance of a massive, 100-example prompt, but with 13 times fewer tokens ? That’s the breakthrough promise of "instruction induction" —teaching an AI to be the prompt engineer. This week, we dive into PROMPT-MII , a new framework that essentially meta-learns how to write compact, high-performance instructions for LLMs. It’s a reinforcement learning approach that could make AI adaptation both cheaper and more effective. This episode explores the original research by ...

Nov 01, 202518 min

The Vision Hack: How a Picture Solved AI's Biggest Memory Problem

The biggest bottleneck for AIs handling massive documents—the context window—just got a radical fix. DeepSeek AI's DeepSeek-GOCR uses a counterintuitive trick: it turns text into an image to compress it by up to 10 times without losing accuracy. That means your AI can suddenly read the equivalent of 20 million tokens (entire codebases or legal troves) efficiently! This episode dives into the elegant vision-based solution, the power of its Mixture of Experts architecture, and why some experts bel...

Oct 24, 202514 min

Smarter Agents, Less Budget: Reinforcement Learning with Tree Search

Training AI agents using Reinforcement Learning (RL) to handle complex, multi-turn tasks is notoriously difficult.Traditional methods face two major hurdles: high computational costs (generating numerous interaction scenarios, or "rollouts," is expensive) and sparse supervision (rewards are only given at the very end of a task, making it hard for the agent to learn which specific steps were useful). In this episode, we explore "Tree Search for LLM Agent Reinforcement Learning," by researchers fr...

Oct 22, 202535 sec

Beyond the AI Agent Builders Hype

Everyone's talking about AI agents that can automate complex tasks. But what happens when a cool demo meets the real world? We dive into hard-won, and often surprising, lessons from builders on the front lines. Discover why your first strategic choice isn't about a tool, but an entire ecosystem; why more agents can actually make things worse; and why the most critical skill is shifting from "prompt engineering" to "context engineering." This episode cuts through the noise to reveal what it reall...

Oct 11, 202514 min

AI That Quietly Helps: Overhearing Agents

In this IA Odyssey episode, we unpack “overhearing agents”—AI systems that listen to human activity (audio, text, or video) and step in only when help is useful, like surfacing a diagram during a class discussion, prepping trail options while a family plans a hike, or pulling case notes in a medical consult. While conversational AI (like chatbots) requires direct user engagement, overhearing agents continuously monitor ambient activities, such as human-to-human conversations, and intervene only ...

Oct 04, 202543 sec

Beyond Single Agents: The Future of Multi-Agent LLMs

Can large language models achieve more when they collaborate instead of working alone? In this episode, we dive into “LLM Multi-Agent Systems: Challenges and Open Problems” by Shanshan Han, Qifan Zhang, Yuhang Yao, Weizhao Jin, and Zhaozhuo Xu. We explore how multi-agent systems—where AI agents specialize, debate, and share knowledge—can tackle complex problems beyond the reach of a single model. The paper highlights open challenges such as: • Optimizing task allocation across diverse agents • E...

Sep 28, 202533 sec

AI's Guessing Game

Ever wondered why AI chatbots sometimes state things with complete confidence, only for you to find out it's completely wrong? This phenomenon, known as "hallucination," is a major roadblock to trusting AI. A recent paper from OpenAI explores why this happens, and the answer is surprisingly simple: we're training them to be good test-takers rather than honest partners. This description is based on the paper "Why Language Models Hallucinate" by authors Adam Tauman Kalai, Ofir Nachum, Santosh S. V...

Sep 20, 202541 sec

From Search Buddy to Personal Agent

Ever feel like your AI assistants don't really get you? We're diving into how AI is moving beyond generic answers to offer truly personalized experiences. This episode explores the journey from Retrieval-Augmented Generation (RAG), a fancy term for AIs that look things up before they speak, to sophisticated AI Agents that can understand your unique needs, plan tasks, and act on your behalf. It's the next step in making AI a genuine partner in our digital lives. This description was generated usi...

Sep 13, 202555 sec

Smarter LLM Routing: Balancing Cost and Performance

How can we get the best out of large language models without breaking the budget? This episode dives into Adaptive LLM Routing under Budget Constraints by Pranoy Panda, Raghav Magazine, Chaitanya Devaguptapu, Sho Takemori, and Vishal Sharma. The authors reimagine the problem of choosing the right LLM for each query as a contextual bandit task , learning from user feedback rather than costly full supervision. Their new method, PILOT , combines human preference data with online learning to route q...

Sep 08, 202522 min

Nano Banana & the Future of Visual Creativity

Google’s latest breakthrough, Gemini 2.5 Flash Image —nicknamed “Nano Banana”—is reshaping what’s possible in digital art and beyond. From keeping characters consistent across scenes to natural-language editing and even blending multiple images, this model is lowering the barrier to creation like never before. Imagine building entire fantasy worlds or accelerating scientific research without the traditional costs and time sinks. But with this power comes profound questions: How do we handle the ...

Aug 30, 20254 min

From Agents to Teammates: Building Cohesive AI Squads

Meet the Aime framework—ByteDance’s fresh take on multi-agent systems that lets AI teammates think on their feet instead of following brittle, pre-planned scripts. A dynamic planner keeps adjusting the big picture, an Actor Factory spins up just-right specialist agents on demand, and a shared progress board keeps everyone in sync. In tests ranging from general reasoning (GAIA) to software bug-fixing (SWE-Bench) and live web navigation (WebVoyager), Aime consistently out-performed hand-tuned riva...

Jul 19, 202516 min

When Machines Self-Improve: Inside the Self-Challenging AI

In this episode of IA Odyssey , we explore a bold new approach in training intelligent AI agents: letting them invent their own problems. We dive into “Self-Challenging Language Model Agents” by Yifei Zhou, Sergey Levine (UC Berkeley), Jason Weston, Xian Li, and Sainbayar Sukhbaatar (FAIR at Meta), which introduces a powerful framework called Self-Challenging Agents (SCA) . Rather than relying on human-labeled tasks, this method enables AI agents to generate their own training tasks , assess the...

Jul 16, 202514 min

Beyond Code: Navigating the AI Software Revolution with Andrej Karpathy

We're witnessing one of the most profound shifts in the history of software—a rapid evolution from traditional coding (Software 1.0) to neural networks (Software 2.0) and now, the dawn of Software 3.0: large language models (LLMs) programmable with simple English. Inspired by insights from Andrej Karpathy, former AI Director at Tesla, we explore how this paradigm shift reshapes the very concept of programming and its profound implications for everyone engaging with technology. From the "Iron Man...

Jul 05, 202516 min

Unlocking the Secrets: How Much Do Language Models Memorize?

Ever wondered how much information your favorite AI language models, like GPT, actually retain from their training data? In this episode of AI Odyssey, we delve into groundbreaking research by John X. Morris, Chawin Sitawarin, Chuan Guo, Narine Kokhlikyan, G. Edward Suh, Alexander M. Rush, Kamalika Chaudhuri, and Saeed Mahloujifar. The authors introduce a new method for quantifying memorization in AI, distinguishing between unintended memorization (dataset-specific information) and generalizatio...

Jun 29, 202518 min

Simulating UX with AI: Introducing UXAgent

What if you could simulate a full-scale usability test—before involving a single human user? In this episode, we explore UXAgent , a groundbreaking system developed by researchers from Northeastern University, Amazon, and the University of Notre Dame. This tool leverages Large Language Models (LLMs) to create persona-driven agents that simulate real user interactions on web interfaces. UXAgent's innovative architecture mimics both fast, intuitive decisions and deeper, reflective reasoning—bringi...

Jun 21, 202517 min

AI Agents Are Old News—Meet the Rise of Agentic AI

What if your AI didn't just follow instructions… but coordinated a whole team to solve complex problems on its own? In this episode, we dive into the fascinating shift from traditional AI Agents to a bold new paradigm: Agentic AI . Based on the eye-opening paper “AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges” , we unpack why single-task bots like AutoGPT are already being outpaced by swarms of intelligent agents that collaborate, strategize, and adapt—almost like d...

Jun 14, 202516 min

The Illusion of Thinking: When More Reasoning Doesn’t Mean Better Reasoning

In this episode, we explore “The Illusion of Thinking” , a thought-provoking study from Apple researchers that dives into the true capabilities—and surprising limits—of Large Reasoning Models (LRMs). Despite being designed to "think harder," these advanced AI models often fall short when problem complexity increases, failing to generalize reasoning and even reducing effort just when it’s most needed. Using controlled puzzle environments, the authors reveal a curious three-phase behavior: standar...

Jun 09, 202516 min

Smarter Prompts, Faster Results: The Power of Local Prompt Optimization

Prompting AI just got smarter. In this episode, we dive into Local Prompt Optimization (LPO) — a breakthrough approach that turbocharges prompt engineering by focusing edits on just the right words. Developed by Yash Jain and Vishal Chowdhary from Microsoft, LPO refines prompts with surgical precision, dramatically improving accuracy and speed across reasoning benchmarks like GSM8k, MultiArith, and BIG-bench Hard. Forget rewriting entire prompts. LPO reduces the optimization space, speeding up c...

May 31, 202513 min

Back to Basics: Understanding AI, From Buzzwords to Reality

AI is everywhere—but what is it, really? In this episode, we cut through the noise to explore the fundamentals of artificial intelligence, from narrow AI and reactive systems to generative models, AI agents, and the emerging frontier of agentic AI. Using insights from expert sources, articles, and research papers, we break down key concepts in simple, accessible terms. You'll learn how tools like ChatGPT work under the hood, why generative AI felt like such a leap, and what it actually means for...

May 24, 202519 min
For the best experience, listen in Metacast app for iOS or Android