🎙️ EP 136: Apple Rents Google’s Brain for Siri ($1B/Year?!) + Perplexity's Big AWS Hack

00:00

Probably the biggest plot twist in AI this year, Apple. You know, the company famous for total control is paying its biggest rival, Google, a billion dollars. Yeah. It's really the ultimate sign, isn't it? That trying to keep everything proprietary, building your own intelligence completely in -house, it's just too slow right now. The pace is incredible. Welcome to the Deep Dive. We're unpacking a really fascinating stack of

00:26

sources this week. We're looking at who is truly building, or maybe more accurately, who is buying AI power right now. Exactly. We're going to dive into the, frankly, shocking details of that Apple -Gemini partnership. We'll also look at some dramatic breakups and bailouts shaking up the big AI giants. And then we'll examine a pretty crucial technical paper that's basically democratizing access to those huge trillion parameter models, making them more accessible. Okay, right. So

00:53

let's start with that core conflict. Apple, you know, the champion of the closed garden, the controlled ecosystem, they're licensing Google's Gemini model for their main Apple intelligence stuff. And that price tag, a reported $1 billion a year. It just shows the scale of this compromise, doesn't it? It's definitely more than just irony. It feels like a temporary necessity. I mean, our sources point out what a lot of users already

01:17

feel. Siri has been, and this is a quote, embarrassingly bad for years, especially when you compare it to modern AIs. Right. Apple is racing to build its own massive model, like a one trillion parameter one. But the timeline seems to be. Well, maybe late 2026 at the earliest. And they just couldn't wait. So we should clarify, Gemini isn't handling

01:39

the actual voice part of Siri. No, exactly. It's powering the really heavy lifting underneath, the complex stuff, specifically the summarizer tool and those sophisticated planner features in the new OS updates. They needed that advanced reasoning like yesterday. Because their internal models weren't cutting it. Apparently not. That need became pretty urgent. because their internal efforts were struggling. We heard Apple recently lost, what, four to seven key researchers, people

02:04

working on those foundational models. Precisely because the in -house solutions weren't performing well enough. So they kind of had to go shopping. And what's really fascinating here, especially given Apple's reputation, is how they've set up the security. Google gets the billion dollars, sure, but the AI itself runs entirely on Apple's own system, the private cloud compute architecture. Right. which means Google doesn't touch the user

02:29

data. That lets Apple keep that critical control layer they value so much, even if the core brain is, well, outsourced. Beat. And it wasn't like Google was the only option they looked at. No, definitely not. We know they actively tested competitors. OpenAI's models were in the mix. Anthropix clawed. But Gemini won out. Why? Seems like it demonstrated superior instruction following.

02:52

so it handles multi -step commands better, and it apparently has a longer context memory, which is super important for planning complex tasks or summarizing long documents. OK, so it was really about the technical fit for the specific job. Makes sense. Yeah. But the complexity just keeps growing, doesn't it? Because Google is banned in mainland China. Apple's now having to cut separate deals, local deals with giants like Alibaba and Baidu just for the version of

03:19

Siri that runs there. Which creates this sort of fragmented intelligence stack, doesn't it? Yeah. Different brains, depending on where you are. We're thinking about what this means for Apple long term. OK, let me just ask directly then. Beyond the huge price tag, what specific technical capability really made Gemini the unavoidable winner for Apple's needs, especially for those planning tools? It showed better instruction following and could maintain context over longer

03:45

interactions. Simple as that. Okay, now shifting gears a bit, let's talk about the corporate tectonic plates. Because we're seeing some major realignments happening. There's this big shift, maybe even a split, happening between Microsoft and OpenAI. Microsoft is now officially breaking away, setting up its own separate superintelligence team. Their goal, build AGI independently. Wow. So they're

04:08

still partners, but strategically. Strategically, the priority seems to be shifting towards Microsoft having its own proprietary AGI development effort. Still working with OpenAI, but... Also hedging their bets, maybe building their own thing. And this probably all ties back to the sheer cost, right? We saw that little drama where OpenAI's CFO hinted they might need a, what was it, a $1 .4 trillion chip bailout? Yeah. A number so big it sounds like a typo. Sam Altman immediately

04:36

denied it, of course. Right. But whether that specific number is real or not, it definitely highlights the absolutely staggering amount of capital needed to seriously chase AGI. Like nation state level spending. Only a tiny handful of entities on the planet can even think about playing at that level. It's mind boggling. But, you know, looking at the other side of that investment coin, it's not all just about buying more chips.

04:58

The OpenAI Foundation is also investing heavily, about $25 billion into things like health care applications and AI resilience. And they're partnering strongly with Microsoft and SoftBank on that. So it seems like a kind of two pronged strategy. Chase AGI, you know, at almost any cost, but also fund these critical sector applications. Meanwhile. We also saw a pretty vulnerable admission that really illustrates the maybe the ethical debt the scaling race can create. You're talking

05:27

about Meta. Yeah. Sources revealed that Meta, to help fund its massive AI expansion, was bankrolled by roughly $16 billion. And part of that came from deliberately allowing scam ads, ads targeting users daily. It's pretty stark. They apparently. tolerated these profitable scams because, well, they generated too much revenue to just shut them down easily. Ouch. And that pursuit of short -term profit, it creates a long -term problem,

05:52

doesn't it? If your revenue relies on bad actors, it inevitably poisons the data you're using to train your models. And it just fundamentally damages user trust. Yeah, that tension between integrity and just sheer scale. It's not just social media either. Look at Google. They've aggressively pushed into becoming what sources call a full -on financial AI researcher. They're rolling out features, answering complex market questions, providing live earnings transcripts

06:18

for traders in real time. They're pushing into every potentially profitable sector they can find with AI. It's an aggressive expansion. So thinking about that meta -admission specifically. What does that pursuit of high revenue, even when it comes from bad actors, tell us about the maybe the foundational integrity of the AI models being built on that data? High profit from bad actors risks data quality and user trust,

06:44

creating ethical debt. OK. Let's dive into the tech side now, because this is where it gets really interesting for me. We need to talk about the physical barriers, the hardware barriers that have mostly kept these super advanced AI models in the hands of just a few giants like Google or Nvidia. Right. We're talking about

06:59

the huge models, the ones with like. a trillion parameters they usually need specialized incredibly expensive hardware setups exactly and critically most ai teams have kind of avoided using standard cloud infrastructure like aws for these really massive models and there's a specific reason why which is aws's networking tech called efa lacks a key feature something called gpu direct async okay hold on gpu direct async we need to break that down what does that missing feature

07:30

actually mean in Okay, think of it like this. Imagine your giant AI model is spread across, say, 10 different computers, each packed with GPUs. Got it. Those 10 machines need to talk to each other constantly, sharing data back and forth, like... instantaneously for the model to work. GPU direct async lets the GPUs on different machines talk directly to each other super fast. It cuts out the middleman, which is usually the

07:57

main computer brain, the CPU. Without it, the GPUs have to kind of wait for the CPU on each machine to manage all that data traffic. It creates bottlenecks. The communication slows down so much that these giant complex models often just... crash or fail. Yeah. That's been the big hurdle on standard cloud setups like AWS. Right. The communication highway wasn't fast enough between the different GPU workers. Exactly. But here's

08:23

the breakthrough. It comes from perplexity. They published some really groundbreaking research showing how you can actually run these trillion parameter Mogi models, models like Kimi K2 and DeepSeek V3 on regular off -the -shelf AWS cloud machines. How? If the hardware feature is missing. They basically found a clever software workaround. Instead of needing new expensive hardware, they built an intelligent system using software. The CPU still helps coordinate, but the key is how

08:49

they move the data. Okay. They pack and shuttle the data really smartly using a different technology called RDMA. RDMA. Remote Direct Memory Access. That's basically a way for computers to swap data directly between their memories super fast, right? Precisely. It's like... Imagine stacking Lego blocks of data between the machines incredibly fast and doing lots of stacks at the same time concurrently. They figured out how to orchestrate

09:13

this data flow efficiently. using software and rdma even without that specific gpu direct async hardware feature so they built a software pipeline yeah essentially a high -speed software pipeline that makes the standard cloud function almost like one of those specialized super expensive supercomputers at least for this kind of workload whoa okay just pause there for a second imagine scaling your ai to handle like a billion queries without needing custom -built proprietary hardware

09:40

that costs millions and millions This is like figuring out how to run a Formula One car and run it well on regular city streets using standard infrastructure. That feels really profound, a huge shift in accessibility. It absolutely is. That's the core impact, democratization. Suddenly, any development team, any startup with the decent cloud budget and the necessary brainpower. they can potentially join the Trillium Parameter League.

10:07

It kind of dissolves that hardware barrier. It proves that really smart software engineering, efficient code, can actually beat massive capital spending. at least in some cases. So for the developers, for the companies out there listening, what's the key technical lesson from Perplexity's work here about getting around these hardware constraints? Smart software orchestration can often bypass assumed limits of commodity cloud hardware. Clever software beats brute force hardware.

10:36

Love it. Okay, just a quick detour to highlight a couple of interesting new tools we're seeing pop up. There's one called Maya One, which is generating highly expressive speech. We're talking over 20 different emotions. So moving way beyond that typical kind of flat robotic AI voice, much more natural sound. That's cool. And the other one? Llama .cpp. This is really interesting for developers and tinkerers. Okay, define Llama

10:59

.cpp for us simply. It's basically a lean framework letting you run large open source AI models directly on your own computer, even a decent laptop sometimes. And the significance of that, you know, it really shouldn't be understated. It means you don't have to send your sensitive data off to Google or OpenAI or whoever to get really good AI performance. Right. It ties back to that control theme we

11:23

started with. Apple's struggle. Exactly. It potentially gives sovereignty back to the user or the small developer. You can run powerful models locally. And sticking with actually using these models, what about advanced prompting? Any new tricks? Well, sources continue to highlight the effectiveness of persona prompting. You know, telling the model to ultra -think like Steve Jobs. or act as an expert physicist explaining quantum entanglement.

11:49

Does that really work? It tends to get much sharper, more focused answers from models like Claude or GPT -4. It works, I think, because you're essentially forcing the model into a specific defined set of constraints, a particular style or knowledge base, rather than letting it give a generic average response. Yeah, I have to admit, I still wrestle with prompt drift myself sometimes.

12:10

Getting a model to consistently sound like a specific leader or maintain a complex persona over a long conversation, it takes constant tweaking and refinement. It's never quite as simple as those quick online guides make it seem. It's definitely an iterative process. Lots of back and forth. For sure. It's an art as much as a science right now. Okay. So if we pull way back now, look at everything I've discussed. Yeah. The big idea emerging today seems pretty clear.

12:39

Control. That old model of total proprietary control is rapidly becoming obsolete, or at least incredibly difficult to maintain. You have Apple, the absolute champion of proprietary power, forced to essentially rent its core intelligence from its biggest rival, Google. And at the exact same time, you have this grassroots technical brilliance, like what Perplexity demonstrated, that's dissolving the hardware barriers that used to enforce that control. It sends a powerful message, doesn't

13:08

it? That really efficient code, smart software can fundamentally challenge the financial might of the world's biggest tech companies when it comes to AI development. The material we explore today really shows that the whole balance of power in this AI race. It's unstable. It's constantly shifting based on a new partnership deal one week or a new algorithmic breakthrough the next. Which leads to maybe a provocative thought to

13:32

leave folks with. If the world's most secretive, most control -obsessed company, Apple, finds itself forced to outsource core AI intelligence, how much longer can any major company realistically maintain true proprietary control over the most advanced forms of AGI when they emerge? Is that even the right model anymore? That's a great

13:55

question to ponder. And for you, the learner listening today, if you want to dig deeper into the tech enabling some of this shift, we really encourage you to look into the concept of RDMA. That's remote direct memory access. It's kind of the invisible engine fueling this new era of cloud efficiency for AI and understanding it might be key to understanding where a large scale AI is heading next. Thank you for joining us on this deep dive. Outro music.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript