🎙️ EP 214: Nvidia’s $68B AI Boom, Agents That Learn by Watching You Work - podcast episode cover

🎙️ EP 214: Nvidia’s $68B AI Boom, Agents That Learn by Watching You Work

Feb 26, 2026•11 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

AI just crossed a strange turning point. GPUs are sold out, AI agents are learning from videos, and big labs are quietly reshaping the future of compute behind the scenes.

We’ll talk about:

  • Why Nvidia’s record $68B quarter proves AI compute is now direct revenue
  • DeepSeek freezing out Nvidia while still relying on its chips and what that signals globally
  • FDM-1, the new AI model learning computer skills just by watching videos
  • Why always-on agents and remote AI control may change how we work sooner than expected

Keywords: Nvidia, AI compute, DeepSeek, ChatGPT Pro Lite, Claude Code Remote Control, Veo 3.1, agentic AI

Links:

  1. Newsletter: Sign up for our FREE daily newsletter.
  2. Our Community: Get 3-level AI tutorials across industries.
  3. Join AI Fire Academy: 700+ advanced AI workflows ($14,500+ Value)

Our Socials:

  1. Facebook Group: Join 280K+ AI builders
  2. X (Twitter): Follow us for daily AI drops
  3. YouTube: Watch AI walkthroughs & tutorials

Transcript

I want you to close your eyes for a second. Just picture this. You are sitting in the driver's seat of a car. Okay. You are right in the middle of San Francisco. It is completely chaotic. Oh, yeah. You have got the trolleys. You have pedestrians dodging traffic. That specific heavy fog is rolling in off the bay. Now look at the driver next to you. But here is the twist. The driver is not a person. Right. And it is not a computer program that was fed a million lines of code about traffic

laws. It is not reading a manual. Exactly. It has never been told what a stop sign actually means. Yeah. It was just shown a video of a person driving. Just watching pixels. Just watching pixels change on a screen. And it figures out how to steer, how to brake, how to navigate the real world. It is basically driving by watching a YouTube tutorial. It sounds like science fiction. It really does. Like some chaotic future timeline.

But this is happening right now. Yeah. We are moving from the era of telling computers what to do, and we are entering the era of showing them who to be. It is terrifying and fascinating. Welcome back to the Deep Dive. Today, we are doing something a bit different. We have a lot of ground to cover. We do. We are pulling apart the entire ecosystem that makes that self -driving car scenario possible. Right. We have this massive stack of notes from the latest AI fire dispatch.

We are going to trace the money. The chips and the code. Because you cannot understand the software without the hardware. It is all connected. Lay out the roadmap for us. We have three main pillars today. First, we start at the macro level. The engine room. NVIDIA. Exactly. NVIDIA's monster quarter. We are going to explain this concept that compute is revenue. The fundamental economic law right now. Right. Second, we look at the

challengers. The hardware wars, new startups taking a swing at the king, and how pricing models are shifting for you. And third. We land on that breakthrough you mentioned. Standard Intelligence and their FDM1 model. The video learning brain. That is the one. The implications are going to stick with you. All right, let's get into the engine room. NVIDIA. Every time we talk about them, the numbers are just big. Massive. But looking at this recent report, big is the wrong

word. It feels historical. It is relentless. Let's look at the raw data so we are all on the same page. NVIDIA posted $68 billion in quarterly revenue. $68 billion. In three months. Up 73 % year over year. That is absurd. Right. For a company valued at over $4 .7 trillion. Usually growth slows down at that size. You expect that from a startup, not a titan. Exactly. And peeling it back gets crazier. $62 billion came from data centers. $51 billion from AI compute GPUs alone.

Let's pause on that. Wait. $51 billion spent on silicon chips in 90 days. Who is buying this and why? This is that concept circulating right now. Compute is revenue. Unpack that for us. Think about it like this. Every single time you interact with an AI, you are generating tokens. So a token is just a basic chunk of text or data. Perfect definition. A word is a token. A pixel is a token. Here is the physics of it. Every token requires a physical GPU chip to do a math

operation. Which is called inference. Right. Inference is the physical math calculation needed to produce one token. So unlike downloading a static file from the old internet. Exactly. With AI, every word generated costs electricity and processor time. It is a direct linear relationship. More users means more tokens. More tokens means you physically must have more GPUs. If you lack GPUs, the service literally stops. So more AI usage equals more data centers. Right. Which

equals more NVIDIA revenue. It is a perpetual motion machine of money. It is an intelligence tax. NVIDIA collects a tax on every thought the AI has. And the demand is exponential. Completely exponential. Everyone wants the new Blackwell chips, the Ferraris of the industry. But they're sold out. Totally sold out. So companies are buying everything. Even six -year -old GPUs, the equivalent of a 2018 Honda Civic. They are fully... booked. Fully booked in cloud environments.

Prices are rising everywhere. Supply cannot touch demand. It's the classic gold rush analogy. Selling the shovels. But Nvidia sells the shovels, the pickaxes, and they own the mountain. There is a complication though. China. Yeah, this is a crucial nuance. Despite partial export approvals, Nvidia reported essentially zero revenue from China. Zero. That seems statistically impossible. It is a rounding error. And executives acknowledge why it is not just U .S. regulations. It is the

local market adapting. Right. Rising Chinese competitors are stepping in. Backed by massive IPO funding, they have to build their own infrastructure. So can NVIDIA sustain a $4 .7 trillion valuation if they're locked out of the second largest economy? As long as token demand exists, Western infrastructure remains absolute gold. Demand outweighs the geographic lockout. Got it. For now, yes. Let's shift gears then. Kings do not stay kings without a fight.

The market hates a monopoly. It makes customers very nervous. Tell me about the challengers. The newsletter mentioned Mad -X. Mad -X is an AI chip startup. Most people haven't heard of them yet, but they just raised $500 million. Half a billion dollars to fight NVIDIA. What is their specific angle? NVIDIA builds general -purpose GPUs. Good at graphics, good at crypto, good at AI. MATX is specializing. Stripping it down. Building chips designed only for large

language models. Hyper -efficient. So less flexible but faster for this one specific task. Exactly. It signals investors want alternative hardware desperately. And then there is DeepSeek. We saw those efficiency headlines recently. Right. The rumor is DeepSeek is freezing NVIDIA out of their next model entirely. Which aligns with that zero revenue figure. It does. But it is messy. There are strong rumors they trained previous models on smuggled NVIDIA Blackwell chips. Through gray

markets. Yeah. But for the next iteration, Huawei reportedly got early access. So it is a mix of geopolitics and pure strategy. Turbulence under the surface. It is not just the hardware shifting, though. The pricing models for accessing these tools are changing. Big time. OpenAI is testing a ProLite tier for $100 a month. Yeah. One hundred dollars. I have to be honest here. I saw that number and I felt some friction. I still wrestled with justifying the $20 tier myself. It is a

psychological hurdle. It really is. Jumping to $200 for the enterprise tier makes sense for companies, but $100 for an individual. You are not alone in that feeling. So what does this $100 tier signal about the future of AI users? It bridges the gap between casual chatters and always -on power users. Moving from casual tool to digital co -worker. Exactly. Think about an agentic workflow. Define agentic for us quickly. It means the AI acts autonomously to complete

multi -step goals. Okay. You tell it to research five companies, write a report, and draft emails. It thinks for 30 minutes? That burns massive compute. So the $20 plan has to limit that. Right. The $100 tier is for someone inextricably linked to the AI all day. Speaking of being linked, Anthropic just launched remote control for cloud code. This is wild. You can run your Mac terminal from your phone browser. Sitting in a coffee shop, controlling your home desktop via AI. It

blurs the line of where compute happens. The computer is just a cloud surrounding us. Which leads us perfectly to the third pillar today, the software breakthrough. Standard intelligence, FDM1. This is mind -bending. For years, we talked about large language models, text in, text out. Predicting the next word. Right. But FDM1 is a computer action model. It does not learn from text prompts. It learns by watching video. Think about a toddler. You don't hand a toddler a manual

on how to open a door. They just watch you turn the knob. They build a world model visually. FTM1 does this with software. It watches frames and reverse engineers the actions. Exactly. No manual labeling. It sees the cursor move, sees the click, sees the result. It is like stacking Lego blocks of visual data. And the demos are practical. It built mechanical gears inside Blender. Blender has a notoriously complex interface. A nightmare of menus. Writing code rules for

that would take decades. But watching a video solves it. It learns the spatial relationships automatically. If found software bugs, it followed long desktop sessions. And the driving example from the open. Whoa, imagine scaling that visual learning to a billion everyday tasks. It used keyboard inputs based on live feeds to drive a real car. Arrow keys to steer. Spacebar to brake. Mapping visual input directly to action. It doesn't need to understand the car's underlying

code. Just the behavior of driving. It creates a real moment of wonder. If it learns driving by watching, can it learn plumbing? Surgery? What about the compute cost, though? Training on video sounds expensive. This is the surprise. Some tasks required less than one hour of training footage. Less than an hour. Less than an hour of video. Highly efficient. That changes the economics entirely. You don't need to scrape

the whole internet. You just record your expense report process for an hour and it learns it. If AI learns by just watching an hour of video, what does this mean for prompt engineering? We are shifting from telling AI what to do to simply showing it. Showing, not telling. That removes the abstraction of language entirely. You don't need to be a good explainer, just a good doer. It is a cohesive narrative arc today. Let's recap the big idea. We started with NVIDIA. The massive

physical infrastructure required. The $68 billion engine room. Then the challengers. Maddox specializing hardware. Open AI shifting pricing for power users. And finally, software becoming more human. Standard intelligence proving the future of learning is visual observation, not text. Which leaves me with a somewhat provocative final thought. Beat. We always worry about AI taking jobs or scraping our data. Right. But if AI learns best by watching us work, then our daily workflows

are not just work anymore. No, they're valuable assets. Every time you navigate a spreadsheet, edit a video, or drive to the store, you are generating high -value training data. You are actively performing the curriculum for the next generation of models. That is a heavy thought for your next Zoom meeting. Sit up straight. The AI might be watching to see how it is done. A slightly dystopian, but very real possibility. You are not just an employee. You are a training

set. We will take a quick break here. That wraps up today's deep dive. We will track standard intelligence closely to see where this visual learning goes next. Thanks for listening. Catch you next time.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android