🎙️ EP 245: The Coding Holy Trinity & Meta’s Plan to Delete the OS

00:00

We are witnessing the death of a prophecy. Beat. The idea of one AI tool to rule them all is dying. Meta wants to delete the computer operating system entirely. Welcome to the deep dive. Thanks. It is genuinely great to be here today. Today's stack of sources reveals a really massive shift. We are moving far away from single AI tools. We're heading into complex, customized tool chains. We're scaling to... billions of autonomous agents, and we are completely rethinking how AI interacts

00:30

with our computers. What's fascinating here is how fast this is moving. We are jumping from the current coding trenches today, right straight into the future of neural operating systems. Yeah, let's start down in those coding trenches. Developers are feeling this massive shift right now. The dream of a monolithic coding AI is practically dead. It's totally fragmenting. I mean, the AI coding market is now a three -layer stack. Devs are actively composing their own tool chains.

00:53

They're basically combining cursor, quad, and codex together. Right. It's like stacking Lego blocks of data. You don't want a pre -built house anymore. You want specialized blocks to build your own custom toolchains. So you want Claude as your head chef. He handles the complex architectural recipes. And you want Codex just chopping vegetables in the background. Yeah, and it fundamentally changes the whole workflow. Just look at Cursor 3, which they're calling Glass. It's actively

01:22

distancing itself from its VS Code roots. It focuses on its agents window and agent tabs. They call that the manager surface, right? Exactly. They officially call it the manager surface. A manager surface implies active delegation. You know, you're no longer just typing code yourself. You're managing digital workers who write it for you. Precisely. You're orchestrating the build process. And then you have quad code entering the mix. It holds 46 % of the most loved market

01:47

share. It uses an MCP -based plug -in system to run things. Let's define that really quickly for you listening. An MCP -based plugin is a universal plug letting AI talk directly to local files. Spot on. That system acts as the essential glue. It's the primary execution layer for complex architectural refactoring. It basically does the heavy intellectual lifting. But looking at OpenAI's placement in this stack, it feels like a downgrade. Are they really just the review

02:13

layer now? Yeah, they repositioned Codex to fit exactly that niche. They pragmatically embedded it directly into Cloud Code. Using a plugin, right? They did, yeah. It's called the Codex Plugin CC. They are actively fighting what developers call agent fatigue. You use Claude for the big picture architectural stuff, but then you swap models for the smaller tasks. You hand low complexity tasks to Codex or Kimi K2 .5. Just to clarify, Kimi K2 .5 for everyone, it's a lightweight model

02:42

for simple background tasks. You do that to manage your token burn. Right, because token burn is a huge issue. Wasting expensive compute on simple tasks will bankrupt you. It's a highly pragmatic token economy. I still wrestle with prompt drift myself. Oh, we all do. It's exhausting. Yeah, the friction of managing these different tools is incredibly real. Keeping context perfectly aligned across platforms is tough. So let me ask you this. Is OpenAI's pragmatic plug -in

03:09

strategy a sign of surrender? Or is it pure genius? I think it's pure genius. They willingly surrendered the terminal to win the workflow. They embedded themselves exactly where the work actually happened. So they abandoned the interface to capture the actual process. That's the perfect way to phrase it. But to run these multilayered tool chains, you need serious infrastructure. You need massive computing power to handle that load. That brings scale, but also very serious risks. Risks are

03:35

getting incredibly massive. And the infrastructure is scaling up right now. Cloudflare just announced Agents Week. They're rolling out edge -based infrastructure designed to run billions of agents simultaneously. Billions of active digital workers. Imagine scaling to a billion queries. The physical compute power required is staggering. to think about. And consumer adoption is totally matching that scale. CloudCore just moved to full release for paid plans. Pro, Max, Team, Enterprise tiers.

04:03

The demand is off the charts. At the Humanex AI conference last week, one chatbot came up in every panel. Industry leaders dubbed CloudCode the absolute must -have tool. But this environment is getting genuinely risky. The sheer scale is unprecedented. Sam Altman recently broke his silence in a New Yorker profile. Yeah, right after that frightening attack on his home. Right. He admitted to some past leadership mistakes,

04:28

but he also issued a really stark warning. He warned that the AGI race is pushing into highly risky behaviors. The chaos on the ground is real. TechCrunch is even updating their official glossary of terms just to help normal people navigate the madness. Which you desperately need in this space. Take the term hallucination, for example. Hallucination. When an AI confidently makes up fake information. Exactly. We need simple definitions for these complex problems. The vocabulary has

04:56

to keep up. But let me push back here for a second. We are scaling up to billions of fully autonomous agents. Even Altman is explicitly warning about risky behavior. Are we basically building the massive engine without installing any brakes? It's a super valid concern right now. The infrastructure expansion is totally outpacing the safety guardrails. We're deploying agents before we understand their emergent behaviors. So how do we actually resolve this tension between massive digital scale and

05:24

fundamental human safety? Well, we have to build robust guardrails at the infrastructure layer itself. We just can't rely on the models to police themselves anymore. We have to hardwire those safety brakes directly into the infrastructure. That is the only viable path forward. Okay, let's untack this. Beat, we spent the entire last decade putting glowing screens on everything. We moved all our private files to the distant cloud. And

05:48

now Apple wants to remove screens entirely. LM Studio wants to take everything off the cloud. It's a massive philosophical pivot. It really is. It's a direct reaction to the friction of modern computing. People are tired of the costs and risks of cloud AI. We're seeing a massive pivot toward offline, highly localized tools. People want specialized personal AI on their own hardware. And LM Studio just made a big strategic move there. They certainly did. They acquired

06:13

a fascinating company called Locally AI. Adrienne Grondin is joining to lead native AI experiences. They're bringing open source models natively to your personal devices. iPhone, iPad, Mac integration. With no cloud server connection required at all. Zero cloud connectivity. No sign -up friction. No monthly fee. It's just localized processing utilizing your own silicon. Apple is moving in a remarkably similar direction too. They're testing four distinct smart glasses styles for 2027.

06:45

Crucially, these glasses feature absolutely no displays. None. Zero visual displays. It's a huge departure from their previous AR headsets. It's just a camera, spatial music, phone calls, and Siri. It's a much simpler approach to ambient computing. The tools are becoming hyper -specialized and local. Let's look at a few examples from our sources. There's a fascinating new tool called Ray. Oh, Ray is incredibly interesting. It acts as a terminal -based CFO. It reads your real

07:11

computer transactions locally. It tells you what to do, helps plan budgets, and it runs entirely on your own private machine. Then there's R0Y, spelled with a zero, a natural language financial studio. It builds full investing dashboards in seconds, tailored to you. And Google's Gemini just introduced interactive visual simulations. You don't just read text. You ask Gemini to show me or help me visualize. You can actually play with abstract concepts locally. We also have

07:37

the Eleven Labs music marketplace. Creators generate tracks and publish them seamlessly. They earn real money on downloads. Eleven Labs has already paid out $11 million to voice creators. It's birthing a totally new creator economy. The localized empowerment gives individuals studio -level production capabilities. Two sec silence. So looking at all this screenless, highly specific tech, is the true future of AI actually completely invisible and entirely offline? History shows the best

08:06

technology always fades away. It disappears completely into the ambient background of our daily lives. So the best technology simply becomes an invisible part of our environment. It just becomes the silent air we breathe. We talked about powerful local tools. We discussed the shift toward invisible, display -free hardware. Here's where it gets really interesting. Beat, Meta is trying to leapfrog this entire localized paradigm. They want the AI to literally become the operating system.

08:34

Yeah, they're trying to fundamentally delete the traditional OS entirely. Meta is moving away from models that simply use computers. They're developing models that actually act as the computers themselves. It's a monumental shift in architecture. Their new neural computer prototype executes desktop tasks directly via learned behavior. There are no APIs involved, no complex orchestration layers to translate commands. This is radically different from traditional AI tool use, isn't

09:01

it? Completely different. When ChatGPT calls a Python interpreter, that's basic tool use. It uses a programmatic bridge. Meta's prototype is a unified, end -to -end, learned system. It's like teaching someone to drive by showing them thousands of photos of a dashboard. You never actually explain what a steering wheel mechanically does. You just show the visual results of turning it. That's a perfect visual metaphor. It was trained on thousands of hours of raw screen recordings.

09:26

It passively watched human cursor movements and watched complex terminal sessions unfold pixel by pixel. So the logic isn't programmed with standard code. It's encoded directly into the model's mathematical weights. Yeah. And that changes how it actually operates. It predicts the next screen state exactly like GPT predicts the next word. It treats human UI interaction as raw visual. So if it needs to move a file, what does it actually do? It doesn't call a computer

09:54

command or an API. It literally thinks. the specific cursor movements required it generates the exact keyboard inputs needed to drag and drop that file it essentially hallucinates the physical mouse moving across the screen mechanically yeah it predicts the pixel shifting but uh there are some glaring limitations right now it's just an early prototype not a finished product what happens when it tries a complex multi -step task it really struggles with sustained context it

10:20

might remember to open a system folder but three steps later it completely forgets why it was there it loses the thread it lacks basic digital object permanence. That's exactly the problem. It hasn't learned the concept of a button. It has only learned the visual pattern of a button. It recognizes the pattern of an X icon in the corner. Right. It sees the X and knows to click it. But until the model learns that an X icon

10:44

always means close, it remains limited. It has to understand that concept regardless of the application. Until then it's basically a highly sophisticated macro recorder. Kate, let's dig into that limitation. Is the barrier between recognizing visual patterns and understanding underlying concepts the hardest problem in machine learning today? I firmly believe it is. Bridging that massive cognitive gap is basically the definition

11:10

of true artificial general intelligence. It's the difference between mimicking intelligence and possessing actual comprehension. So scaling up visual patterns might not organically lead to true conceptual understanding. That remains the absolute biggest unanswered question in AI research. Scaling visual patterns just doesn't guarantee true cognitive reasoning. You're finding that out the hard way, yeah. Let's zoom out for a second. We've covered a massive amount of technical

11:33

ground today. If we connect this to the bigger picture. We're witnessing a total restructuring of how humanity computes. Devs are hacking together specialized tool chains today. We're scaling infrastructure to support billions of agents tomorrow. We're bringing open source models locally

11:49

to our pockets. And Meta is aggressively trying to bypass all of it by turning the AI... into the computer itself it is a massive unprecedented transition thank you for joining us on this deep dive take a moment today to look at your own digital workflow try to find where you're getting your own agent fatigue and see where you can specialize your daily tools the landscape is shifting incredibly fast beneath our feet it really is and it leaves me with one final fascinating

12:14

question to consider beat if meta's neural computer eventually succeeds in operating our graphic interfaces will the software of the future be designed to look good to human eyes Or will it be optimized entirely for AI vision models to easily read? Out to your own music.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript