The scale of modern A .I. is it's almost impossible to really grasp. I mean, right now, the cost of building the current A .I. infrastructure is already more than the Manhattan Project and the Apollo space program combined. That is a staggering comparison. It truly is. We're talking about physical structures that need the same amount of land as, what, 450 soccer fields? And they require enough electricity to light up a million homes. This isn't just, you know, scaling
up tech. This is a whole new standard for global infrastructure. Welcome to the Deep Dive. We've gone through the latest research to bring you the most crucial insights into this, this massive shift. Our mission today is pretty clear. We're exploring two huge frontiers in AI that are happening at the same time. First, there's the conceptual one, why LLMs have sort of hit a wall, and what this idea of spatial intelligence actually means
for what's next. And second, we're going to dive into the physical reality of it all, the just eye -watering cost and the unbelievable scale of these new ultra -mega data centers that have to run these models. That includes the, frankly, astonishing $32 billion Stargate project. So let's unpack this. We have to start with Fei
-Fei Li. For anyone who doesn't know, she's the Stanford professor behind ImageNet, which is, I mean, it's essentially the foundation for the whole deep learning revolution we're living through. Right. And she just put out what is basically a manifesto. It's a really strong declaration that LLMs, large language models, have pretty much reached their ceiling. She argues that the future of AI isn't just about better language. It requires something she calls spatial intelligence.
So what does that actually mean for an AI, I mean? What is spatial intelligence when you take it out of the human context? It's the kind of thing we do every single day without even thinking about it. Yeah. It's the ability to understand and navigate and interact with three -dimensional space. Yeah. Like think about catching a set of keys, someone tosses you in a dark room. Or a firefighter who has to instantly read a chaotic smoke -filled space to find the safest way out.
It's about applying the basic rules of physics and prediction to the world you can actually touch. Exactly. Yeah. Light's big critique is that LLMs, for all their amazing chat abilities, are completely blind to reality. They don't know that if you drop something, it falls down. Always. They have zero concept of gravity or mass unless we spell it out for them in text. So to fix that, she's proposing that AI needs what she calls
world models. And she says these models have to have three core capabilities to bridge that gap between just language and actual physics. Okay, so the first one is that they have to be generative. The models have to be able to create these really complex 3D environments that, and this is key, strictly obey the rules of real world physics. No weird floating teacups. Second, they have to be completely multimodal. So they
need to process everything at once. Text. images, video, depth maps, even data from real world sensors. It's a full sensory understanding. Right. And finally, they have to be interactive. They need to predict with really high accuracy what's going to happen when a user does something inside that simulation. That's how you get real cause and effect learning. And this isn't just a theory,
right? Her team at World Labs is already shipping a tool called Marble, which takes a simple text prompt and turns it into a 3D scene you can actually walk around in. Yeah, and what's so fascinating is how they're doing it. They're probably using things like neural lamberis, neural radiance fields, which are these models that can build a 3D scene from 2D images. It's moving from AI
describing the world. to AI building it. I still kind of wrestle with trying to figure out how to bridge what I type into an LLM and what the real world does. It just seems like such a massive leap. That's a totally fair struggle. It really gets to the heart of the challenge. But the need for agents that can operate in the real world, you know, robots, self -driving cars, it means the industry is being forced to push for these hybrid spatial models right now. Since language
is so, so foundational to how we think. How quickly is this shift from pure language to these spatial models actually going to impact the AI tools that we're all using every day? The need for real -world interaction is already pushing the immediate deployment of these new... hybrid models. So moving from the theoretical, let's look at what's happening on the ground right now. Even as we're looking towards spatial AI, we're still grappling with some really basic challenges and
seeing some fascinating new applications. Yeah, on the challenge side, there's still that fundamental problem of why AI struggles so much to tell the difference between a fact and a subjective belief. One report called it the missing piece, which is really about causal modeling. The AI often gets the what, but it totally misses the why behind the data. And at the same time, agents are getting so much smarter. I love this example of the Minecraft AI agent named Steve. Oh, it's
brilliant. You can give Steve just one high -level command. Something like, mine some iron or build me a castle. And Steve doesn't just do it. It actually spawns a bunch of other little agents that coordinate and work together like a real team to get it done. OK, let's talk about friction, because that is definitely heating up, especially around data. We saw a huge example of this with
Wikipedia recently. Right. Wikipedia, which is, let's be honest, the bedrock for so much AI training data, is basically telling AI companies to please stop scraping its entire site. They want everyone to use their paid API instead. Why is that such a big deal? Well, it's about fairness. Yeah. You know. Giving credit for all that human labor and making sure the data quality stays high.
They even made a pointed reference to Grokopedia, which was a pretty clear jab at Grok and XAI for allegedly relying so heavily on scraped content. So it sounds like quality training data, the fuel for all of this, is going to be heavily monetized from now on. And that ties directly into the money, into financing. Crusoe, which is an AI energy and infrastructure company, just
secured a huge investment. 1 .3. eight billion dollars it puts their valuation at 10 billion that is a massive vote of confidence what makes them so special crusoe's all about tackling the energy problem They capture wasted energy, like natural gas that would just be flared off at a site, and they use it to power computing centers right there. The fact that NVIDIA is a major investor just shows how critical that link between energy and compute has become. We're also seeing
friction in creative ethics. The showrunner for Amazon's House of David called using over 350 AI -generated shots magical filmmaking. But a lot of critics just immediately called it cheap, a way to replace human artists. And that tension is not gone. going away. It's a real philosophical split between efficiency and human artistry. Let's end this section on a positive note though. Privacy. Google launched something called Private
AI Compute. It lets you use the full cloud power of Gemini, but it ensures that no one, not even Google, can see the data you're processing. Yeah, they do it using secure enclaves, which are like these hardware -level black boxes. It allows for really sensitive data to be computed privately, which is a huge deal for a lot of applications. So what does Wikipedia demanding payment mean for the future long -term availability of truly
open training data? It suggests quality training data, the fuel of AI, will be heavily monetized moving forward. Okay, let's shift now to the physical frontier. the sheer infrastructure you need for all this. An Epoch AI report said that spending on these specialized AI data centers is on track to pass $300 billion by the end of 2025. You have to put that number in perspective. $300 billion is almost 1 % of the entire U .S. GDP. It's more than the Apollo program and the
Manhattan Project combined. This isn't just an investment. It's a state -level commitment to a single piece of technology. And the headline example, the one everyone's talking about, is OpenAI's proposed Stargate Abilene project. The numbers are just. They're hard to believe. We're talking a $32 billion price tag. It needs those 450 soccer fields of land. And the critical part, it will draw one gigawatt of power. That is enough electricity for about a million homes, all for
one site. And it will have 250 times the compute capacity of GPT -4. But here's the thing that really changes the paradigm, the engineering implication of this. It's the paradox of latency. It used to be that you had to build data centers near users to be fast. Now, latency just doesn't matter as much. The reason is that the time it takes the model to actually think and generate an answer, the inference time, is about 100 times longer than it takes to send data all the way
around the globe. Whoa. So imagine having so much computing power that you could literally bounce data off the moon and you would still be bottlenecked by the model itself, not by how fast you could send the signal. That changes everything. It really does. The era of needing to be geographically close for speed is just over. Yeah. The entire physical constraint has shifted. Now, data centers get built wherever power is cheapest and most available, not where
the users are. Which brings us right back to that power challenge. Only a few countries can realistically handle multiple sites that each need over a gigawatt of power constantly. The solutions seem to be starting with natural gas, then layering on solar and wind through big grid interconnects. The race isn't for faster chips anymore. It's a race for massive, reliable energy sources. So if the competition has completely shifted from speed to just sheer scale and access
to power, does that fundamentally change? who's going to lead the next phase of AI. Absolutely. The future is being shaped by the very few entities that can build, finance, and secure these colossal power sources faster than anyone else on the planet. So if we synthesize everything from our deep dive today, we really covered three major themes. First, we're redefining the core objective of AI. We're moving away from just pure language and towards these complex spatial world models.
Second, at the exact same time, we're battling this intense data and ethical friction in the applications we have now. You know, everything from creative tension in Hollywood to who owns and gets to use Wikipedia's data. And finally, all of this innovation, both the big ideas and the stuff happening today, is being built on an infrastructure of just completely unprecedented scale. It's far bigger than any historical megaproject
in its cost and its power demand. So you now have a pretty clear structure for understanding how AI is going to evolve over the next decade. You can see the conceptual limits, the immediate frictions, and the staggering physical cost of making it all happen. And here's something to think about. If one gigawatt of power becomes the minimum price to play in advanced AI, what happens to the innovative coder in a garage? What happens to that decentralized future we
all used to imagine? Something to mull on. Thank you for joining us for this deep dive into the dual frontiers of artificial intelligence. We really encourage you to keep exploring these topics.
