Imagine needing the power of a nuclear plant, a whole gigawatt. Right. And imagine needing that much compute power every week just for AI. Yeah, it's kind of staggering, isn't it? That's the new reality Sam Altman is talking about. He calls it abundant intelligence. Exactly. And the response from industry, it was immediate and huge. Welcome to the Deep Dive. Today we're taking a calm. But I think really curious look at the sources that are defining this huge AI
infrastructure race. Yeah, we're drawing from a newsletter that really digs into, you know, the hardware side, the software breakthroughs. And maybe most interestingly, these new ways people are trying to measure AI performance. Who's really best? So our mission today is really to unpack three main things. First, this literal race, building the grid for AI. Second, the just absurd speed of breakthroughs and what these models can do. AI's capability. Yeah, the pace
is nuts. And third, a pretty revolutionary new way to figure out which AI works best because, spoiler, it's not the same for everyone. Let's start with the Giga Ones then. Altman's big idea. Right, this vision of abundant intelligence. He basically argues that access to AI or intelligence should be like a fundamental human right. That's a pretty profound statement. It is, but making that real, that takes scale. Almost unimaginable scale. And that's where the numbers come in.
OpenAI's goal. Yeah. One gigawatt of compute capacity per week. Exactly. And that demand just kicked off this massive infrastructure build out. Hyperscalers, VCs, everyone jumped in. The investment scale is. Yeah. It's genuinely tough to visualize. We hear big numbers, but. 5 .5 gigawatts. Yeah, that's Oracle. They're building these huge data centers. Texas, New Mexico, the Midwest. 5 .5 gigawatts. What does that even compare to? Well, think about a major city like
Dallas, maybe. That's more than its base power needs. So, yeah, it's massive. And they're talking, what, 25 ,000 jobs just from that push? Wow. Construction tech. Right. They're not just building server farms. It feels like they're building, you know, the next century's economic engine. And soft banks in the mix, too, right? Yeah, correct. Aggressively. Oh, yeah. They committed 1 .5 gigawatts and they want it done in 18 months. 18 months. That's incredibly fast. New sites
in Ohio, Texas. These aren't small steps. These are like moonshots, big bets. So if you add it all up, where are we heading? Well, the early commitments, if you tally them, point towards something like $500 billion. And 10 gigawatts total by the end of 2025. Half a trillion dollars. Yeah. 10 gigawatts. And they're already well on their way. Like 400 billion and 7 gigawatts are basically planned and funded already. It's like stacking these incredibly advanced Lego
blocks, you know, but at warp speed. And then you have players like Alibaba taking a slightly different angle. Right. They talk about being the electric company for AI, but they're also going vertical. Super fast. Like dropping six major product launches in one day. Exactly. It highlights that speed, but also owning the whole process from the chip all the way up to the app. That seems key to their strategy. So why does this infrastructure layer matter so much to someone
listening right now? Well, because this isn't just about tech companies anymore. It's becoming a race between like governments, big money VCs, the cloud giants to own the fundamental infrastructure, the plumbing for intelligence itself. It goes way beyond just making chat bots a bit better. It's foundational. OK, so the scale is huge. The cost is astronomical. Does all this frantic building actually mean that AI access gets cheaper? and sooner for regular people. Yeah, I think
the signs point that way. This level of aggressive competition, the sheer amount of money pouring in, it suggests better, cheaper access is coming. And probably fast. Right, because of that competition and just the pace of the tech improving. Exactly. Okay, so all this hardware is being built because the software, the capabilities are demanding it. Let's shift gears then from building speed
to intelligence speed. yeah the physical stuff is really just trying to keep up with the software breakthroughs the pace of improvement and models and what they can do it's uh it's kind of absurd right now we saw that play out recently didn't we there's that live marketing showdown oh yeah with the big llms chat gpt gemini perplexity claude right and in a real business test one of them just walked away with five out of the six wins. That shows you like the immediate practical
value is already there. It's not theoretical. And then there's the sheer size increase. You mentioned Quinn 3 Max. Yeah. Alibaba's new model. One trillion parameters. Whoa. One trillion. Just try to imagine scaling that. A trillion variables for the model to learn from. That's a huge jump in complexity. It's a massive number. And it's immediately showing results, beating older models, even early GPT -5 versions on some tasks. But the really stunning thing. What's
that? It's math performance. It scored 100 % on the AME25 math benchmark. 100%. Wow. Okay, that's the moment of wonder right there. That's not just good. Perfect mastery on a really tough academic test. Right. Models hitting expert levels almost right out of the gate. And it's translating to the real world, too, isn't it? The finance exam. Oh, yeah. The CFA exam. Claude Opus. Gemini 2 .5 Pro. They both passed level three. Which is notoriously the hardest level. Takes humans,
what, over a thousand hours of study? Usually, yeah. Intense study. And these models did it. In minutes. In minutes. The implication there for, like, knowledge work, professional training, it's just huge and immediate. You have to wonder, what's the value of that certification if a machine can ace it instantly? Exactly. And look how fast it gets integrated. Microsoft already plugged Cloud into 365 Copilot. That's the first external AI model inside the main Office tools, right?
Yep. The speed from breakthrough to application is relentless. It really is. Okay, here's maybe a vulnerable admission. Uh -oh. Even trying to follow this closely, I still kind of wrestle with prompt drift. Oh, absolutely. You're definitely not alone there, where you ask the same thing or use the same image prompts like two weeks apart. And the results are completely different because the model changed underneath without you knowing. Yeah. It's a real challenge, especially,
like you said, with visual tools. That Redditor's game, real versus AI images. It's getting almost impossible to tell consistently now. For anyone. We're struggling to keep up with the tools, really. But the tools just keep coming. They do. Quick hits here. Cling 2 .5 Turbo. Getting really good at realistic images, video. Pushing towards that believable synthetic media. And that AI bandage thing. That sounds wild. Isn't it? An AI -powered smart bandage monitors the wound, adjusts things.
They claim it heals 25 % faster. And maybe less exciting, but relevant. Ads are probably coming to free chat GPT by 2026. Ah, ties back to the compute costs, I guess. Pretty much has to, yeah. Okay, so with these models acing things like the CFA exam in minutes, what does that tell us about where the ceiling is? Is there even a ceiling for these current models? That immediate mastery, it really shows we have to constantly
reevaluate. Even models that were top tier just months ago, the ceiling just keeps rising perpetually. Okay, so if the capabilities are moving that fast, the way we measure them needs to change too, right? Exactly. Which brings us to the metrics, how we decide what's good. Traditionally, we've had leaderboards. El Marina, places like that, they've been the standard. Yeah, but they had this huge blind spot, a really unavoidable one.
They basically treat every user the same. Doesn't matter who you are, where you are, why you're using it. An engineer in Tokyo, a lawyer in London. Or a student in Buenos Aires, yeah. The leaderboards just give you a raw performance score, undifferentiated. That doesn't really tell you if it's useful for your specific need, though. Not at all. So the new idea is from Scale AI. They launched Seal Showdown. Seal Showdown. Okay, what's different?
It completely flips the ranking method. Instead of just raw speed or whatever, it ranks models based on real user preferences, and it connects those preferences to demographic info. Oh, okay. So it's subjective based on who's using it. Exactly. Think about, say you're a bank choosing an LLM for your Spanish -speaking customers in Miami. El Marina, useless for that decision. But SEAL Showdown could actually give you useful data because it breaks down results by age, education,
language, country. So you can pick the model that that specific group actually prefers. Precisely. That's actionable utility. How are they getting the data? Globally. They say preferences from 100 countries, 70 languages. Try and get a real global snapshot. And how do they stop people from gaming the system? Like model creators trying to boost their scores. Seems like they've thought
about that. Voting is voluntary, anonymous, and they hold back the results from public view for 60 days to prevent that kind of manipulation. Okay, so it's a big shift. El Marina gives you raw speed, like engine horsepower. SEAL Showdown tells you which car people actually like driving in their specific neighborhoods. That's a great analogy, yeah. Which tool is perceived as best by your users? That creates real market pressure. So you'd expect the big players, OpenAI, Anthropic,
Google. They'll have to react somehow. Oh, definitely. They'll likely try to build their own versions or maybe just plug into scale system. You can't really ignore that kind of specific customer feedback. So what's the core assumption about AI performance that this new benchmark really challenges? It fundamentally challenges that old idea that there's one single best model, one size fits all, that it can serve every single user everywhere equally well. SEAL says, no,
it's more complicated. Okay, let's try to wrap this deep dive up. Quick recap of the big layers we talked about. Sure. First, the foundation, the infrastructure, this gigawatt race heading towards $500 billion, all chasing that abundant intelligence vision. It's a battle for the means of production, really. Second, the capabilities, just stunning growth, models mastering complex skills like that CFA exam almost instantly. It's constantly pushing the performance ceiling up,
impacting work and education. Yeah. And third, how we measure it all. Shifting towards personalized, user -focused benchmarks like SEAL Showdown. Moving past Rossby to ask, who is this really best for? It brings us right back to Altman's original claim, doesn't it? Yeah. If AI access becomes a fundamental right, how does that change global power dynamics? When only a handful of massive companies, Oracle, SoftBank, Alibaba, can actually afford to build the 10 gigawatts
needed to deliver that right. That's the billion. Maybe trillion dollar question, isn't it? Who owns the intelligence grid and what responsibility comes with that ownership? That's the core geopolitical tension underneath this whole race. Something to think about. We'd encourage you listening to consider the sources we unpacked. Maybe think about how your own use of AI tools or how you judge them might be skewed by those old metrics. Are you using the fastest model or the one that
actually works best for you? Good question to ponder. Thank you for joining us on the Deep Dive.
