Grok 4.2 Beta: Inside xAI’s Multi-Agent AI Breakthrough

Speaker 1

00:01

Welcome to the Sentient Code, where intelligence is engineered, autonomy is emerging, and a line between human and machine grows thinner. Each episode, we decode the algorithms, explore the robotics, and examine the ideas shaping the future of artificial minds.

Speaker 2

00:23

Okay, let's just take a second to breathe.

Speaker 3

00:26

Yeah, a deep breath is probably a good idea.

Speaker 2

00:29

Because if you were online yesterday, I mean, if you were anywhere near a notification stream or x or just a tech newsticker, you didn't just see a press release, No, you felt a tremor.

Speaker 3

00:39

That is the perfect word for it, a tremor, a shift in the bedrock.

Speaker 2

00:43

You know, we had become so desensitized to updates, haven't we. It's always version one point one, version one point two.

Speaker 3

00:49

Oh yeah, minor bug fixes.

Speaker 2

00:51

It's usually bug fixes, maybe a dark mode, maybe the apuploads what five percent faster? We just scroll right past it. But what happened yesterday, February seventeenth, twenty twenty six was it was not that?

Speaker 3

01:03

No, it was a paradigm shift disguised as a decimal point, a very very significant decimal point.

Speaker 2

01:09

We are talking, of course, about the massive news from Xai Elon Musk took to x and announced the immediate public beta of Grock four point two.

Speaker 3

01:17

And not Grock four point twenty, as some of the early memes were suggesting.

Speaker 2

01:20

Write that little nod to internet culture, but they clarified it's free point two. And I have to say, looking at the sheer density of the documentation, the white papers, the architecture diagrams, salt calling this an update feels like an insult. This feels like a different species of intelligence.

Speaker 3

01:34

It really is. And to understand why this matters, you have to look past the branding. Usually a point two release is an optimization, it's a tweak, the refinement exactly. But Xai is claiming this model is designed to be and this is a direct quote, an order of magnitude smarter and faster than Grock four.

Speaker 2

01:54

An order of magnitude. That's not a small claim.

Speaker 3

01:56

It's a huge claim. And the kicker, they aren't promising this for next year or Q four. They're saying this is happening now in a public beta that concludes in about a month.

Speaker 2

02:05

That timeline is what stopped me in my tracks. It's just, yeah, it's breathtaking.

Speaker 3

02:10

Leefast it's aggressive. I mean even for them, it's aggressive.

Speaker 2

02:13

Let's just put it on a calendar for a second for everyone listening. Grock four was released in July of twenty twenty.

Speaker 3

02:18

Five, right, which already felt like a huge leap, a huge leap.

Speaker 2

02:21

Then Grock four point one followed in November twenty twenty five. Fine, now here we are February twenty twenty six, and we get four point two. This isn't software development time now, this is evolutionary time. This is compounding at a speed that feels unnatural.

Speaker 3

02:36

It's the fastest major iteration cycle we've seen in the history of the company, and you know, arguably in the history of the entire sector. They are compounding intelligence at a rate that is becoming genuinely difficult to track.

Speaker 2

02:50

So here's our mission for today. We aren't just going to read the release notes.

Speaker 3

02:54

Yeah, one can do that.

Speaker 2

02:55

Anyone can do that. We have the technical breakdowns, we have some of the leaked benchmarks, and the early user reports are flooding in. We need to understand not just what changed, but how this thing actually operates. Because the central claim is that it thinks differently than anything we've used before.

Speaker 3

03:12

That's the key insight. You said it perfectly. It's not just a bigger brain, it's a different kind of mind. It's a whole new cognitive architecture.

Speaker 2

03:20

Okay, So before we get into the software wizardry, which honestly it's mind bending stuff, we have to ground this in physical reality the hardware, because AI feels like magic, but it runs on metal. It runs on silicon and copper. We need to talk about the Memphis Colossus.

Speaker 3

03:36

The Colossus. It sounds like a wonder of the ancient world, doesn't it.

Speaker 2

03:39

It basically is the modern equivalent. The stats on this supercluster are hard to even visualize. We are talking about a cluster that has now scaled to over one point two million GPUs.

Speaker 3

03:52

Let's just pause on that number for a second. One point two million.

Speaker 2

03:55

I mean, I remember just two years ago, back in twenty four, we were looking at clusters with one hundred thousand GPUs and our minds were blown. We were thinking, this is it. This is the peak. You can't possibly connect more than that efficiently exactly.

Speaker 3

04:08

We thought that was industrial scale. This this is planetary scale. This is a city made of compute.

Speaker 2

04:13

A city. That's a great way to put it, and.

Speaker 3

04:15

The reason you, the listener, need to care about that number isn't just because it's big and impressive. It's because quantity has a quality all its own.

Speaker 2

04:25

What do you mean by that?

Speaker 3

04:26

When you have one point two million GPUs at your disposal, you aren't just training the same old models faster. You unlock entirely new training techniques that are physically impossible when you're compute constrained.

Speaker 2

04:37

So it's not just about cooking the steak faster. Now it allows you to cook a completely different meal.

Speaker 3

04:42

Precisely. You can run parallel simulations on a massive scale. You can do reinforcement learning loops that would take a smaller cluster a decade to finish, and you can now do them in a week. That hardware, that colossus is the absolute prerequisite for the software breakthroughs we're about to get into.

Speaker 2

04:58

Which brings us to the first big one, the rapid learning architecture. Now, I really want you to break this down for us, because learning is a word we throw around a lot in AI. It's almost lost its meaning. How is this different from the old way?

Speaker 3

05:12

Right? So, the old way, and by old I mean the standard practice for pretty much every major model in twenty twenty four and twenty twenty five is what i'd call the snapshot methods. The snapshot, you gather the entire internet, basically petabytes of text books, code everything, you feed it into the model, You cook it for months on a giant cluster, and then when it's done, you freeze it.

Speaker 2

05:33

You freeze the weights like printing an encyclopedia exactly.

Speaker 3

05:36

It's a perfect analogy. Once that training run is done, the weights, the neural connections inside the model are set in stone. If the world changes a day after you finish training, well, too bad. The model doesn't know. It's a frozen artifact of the past.

Speaker 2

05:50

And that's why we always had those knowledge cutoffs. You'd ask about a news event from last week and the AI would say, sorry, my knowledge ends in September twenty twenty three wrecked.

Speaker 3

06:00

That was the fundamental limitation of the static paradigm. Grock four point two is trying to kill the snapshot. They've introduced what they call a hybrid post training process.

Speaker 2

06:11

This is the real time adaptation feature I saw mentioned everywhere.

Speaker 3

06:14

Yes, so instead of being a solid block of ice, think of the model as having a fluid outer layer. It's a lightweight continue learning layer that ingests anonymized high signal user feedback almost constantly.

Speaker 2

06:27

So when I'm using Groc and I click that thumbs down, i have flag an answer as unhelpful, or maybe I've paste into correction, or I'm working through a complex code bug with it.

Speaker 3

06:36

It's not just going into a log file that a human intern might read in six months. It is being distilled mathematically into these tiny micro updates. And this all leads to what Xai is very cleverly branding as the Friday Ritual.

Speaker 2

06:52

I love this branding, the Friday Ritual. It sounds a little culty, but in a cool Silicon Valley kind of way. Yeah, what exactly happens on Fridays.

Speaker 3

07:00

Every Friday, Xai pushes a global update to the model's weights. This isn't a full retrain, but it's a significant update based on the aggregated, verified learnings from millions of users over the previous week.

Speaker 2

07:13

It is that's wild.

Speaker 3

07:15

It means the Groc you talk to on a Monday morning is measurably mathematically smarter than the one you were talking to on Sunday night.

Speaker 2

07:21

Wow.

Speaker 3

07:22

It has metabolized the experiences, the corrections the hard problems that millions of people threw at it over the last seven days.

Speaker 2

07:29

Just think about the feedback loop there. I mean, if a brand new programming library comes out on a Tuesday YEP, and thousands of developers are struggling with it, and they're using GROC and they're correcting its mistakes on Wednesday and Thursday.

Speaker 3

07:41

By Friday's update, Grock knows the new library. It stops making those mistakes.

Speaker 2

07:46

It's absorbed that knowledge from the collective.

Speaker 3

07:48

It's evolution on a weekly cycle. It's an unprecedented speed of adaptation.

Speaker 2

07:53

But and I have to play the skeptic here because I can hear the safety researchers screaming into their pillows right now.

Speaker 3

07:59

Isn't this incredibly It's the first question everyone asks.

Speaker 2

08:02

We all remember pay the Microsoft chatbot from a decade ago. It lasted what a day?

Speaker 3

08:07

Less than a day.

Speaker 2

08:08

You let the internet teach an AI, and it just becomes a toxic, racist, conspiratorial nightmare.

Speaker 3

08:15

That is the primary risk. Absolutely, If you just pipe raw x data, you know, formerly Twitter, into the model's brain, you get garbage, you get chaos. But XAI is keenly, keenly aware of this. The documentation emphasizes over and over that this isn't raw learning, it's high signal learning.

Speaker 2

08:32

So there's a filter, a big one.

Speaker 3

08:34

A massive one. Think of it as curated evolution. They use automated alignment checks, which are basically other AI models whose entire job is to grade the proposed updates to verify that the new information is actually factual, helpful, and crucially not a jail break or an attempt to corrupt the model.

Speaker 2

08:50

So it's less like a parrot repeating every single thing adhares and more like a diligent student who checks a new fact against a trusted textbook before accepting it is.

Speaker 3

09:01

True, a student with a very very strict teacher grading their homework before it gets committed to their permanent memory. They are filtering for a utility and truth, trying to distinguish between the internet's noise and it's signal.

Speaker 2

09:16

Okay, that makes sense, so that covers how it learns. It's a living system now, not a static object. But the thing that really seems to be dominating the technical discourse, the thing everyone's buzzing about is how it thinks.

Speaker 3

09:28

Yes, the cognitive architecture.

Speaker 2

09:30

We're talking about the four agent system.

Speaker 3

09:32

This is the revolution. Honestly, if you take one thing away from our entire discussion today, let it be this. This is the core innovation.

Speaker 2

09:40

So set the scene for us. Previously, an AI like Grock for GBT four was a monolith.

Speaker 3

09:46

Right, a monolith. You ask a question and one giant neural network starts predicting the next word, then the next, then the next, based on pure probability. It's a stream of consciousness, one voice. It's incredibly impressive, but it's prone to getting lost in its own rambling. It can hallucinate, it can contradict itself because there's no internal checking mechanism.

Speaker 2

10:07

But Grock four point two doesn't work alone.

Speaker 3

10:09

No, for simple things like what's the weather? Or tell me a joke, it stays simple. It uses the base model. But for any non trivial query, how do I design a structurally sound shed or analyze this complex legal contract for loopholes? Grock four point two spins up a team a team. It creates an internal council of four specialized independent agents.

Speaker 2

10:33

I want to be the team because the documentation gives them these almost distinct personalities. It's fascinating. Let's go through them. One by one who is Agent one.

Speaker 3

10:41

Agent one is the reasoner, the reasoner, the logition, the pure logician, the mathematician. Think of this agent as the Spock of the group. Its entire job is step by step decomposition. It doesn't care about being polite. It doesn't care about facts in the outside world, not at first. It cares only about internal consistency.

Speaker 2

10:58

A leads to B, B leads to C.

Speaker 3

11:00

Does A lead to B?

Speaker 2

11:01

Does?

Speaker 3

11:02

The math checkout is the code logically sound from top to bottom. It's the one that prevents those weird intuitive jumps where an AI just guesses the answer to a math problem instead of showing its work.

Speaker 2

11:13

So it enforces that chain of thought we hear so much about exactly.

Speaker 3

11:17

It's the rigorous scientist of the group. It breaks a complex problem down into its smallest possible components and solves them linearly methodically.

Speaker 2

11:26

Okay, but logic isn't enough. If your initial facts are wrong. You can have a perfectly logical argument that's based on a complete lie, which.

Speaker 3

11:33

Brings us directly to Agent two, the verifier.

Speaker 2

11:36

The truth seeker, fact checker. The verifier is the journalist or the librarian in the room. It has a live real time connection to the Internet, specifically the X fire hose for breaking news and the broader web for established knowledge. Its job is to look at what the reasoner is proposing and say, hang on a second, with a minute, does that scientific paper you cited actually exist? Is that chemical reaction possible at room tenpure? What is the current

12:01

stock price of that company? Not the price from six months ago. So it's the hallucination killer.

Speaker 3

12:06

It is designed to be the hallucination killer. It's the editor with the big red pen. It prevents the model from confidently lying to you. If the reasoner says, based on the twenty twenty five tax code, you oex amount, the verifier is there to check. Wait, the tax code was updated in January twenty twenty six. Your premise is wrong.

Speaker 2

12:24

That is a crucial, crucial distinction. Okay, okay, So we have a logitian and a fact checker, a powerful combo. Who's number three?

Speaker 3

12:33

Agent three is the embodied simulator.

Speaker 2

12:36

This is the one that sounds of both sci fi to me. Embodied simulator? What does that even mean?

Speaker 3

12:40

This is the imaginative one, but it's an imagination rounded in physics. It understands three D space It understands object permanence, friction, gravity. If you ask a question about robotics or mechanical engineering, or how an object might move through space, this agent actually runs a mental physics based simulation of that event.

Speaker 2

12:58

So if I ask it to write code for a robot arm to pick up a delicate glass object, Agent three isn't just guessing words based on another code it's seen. It's actually simulating the fragility of the glass.

Speaker 3

13:09

It's modeling the physics of the grip. It's asking how much pressure is too much pressure? What's the optimal trajectory to avoid collision. It's the bridge between the digital brand and the physical world. It's the engineer and the architect.

Speaker 2

13:22

Of the group. Mind blowing okay. And finally, Agent four, the one running the whole show.

Speaker 3

13:27

Agent four the synthesizer, the boss, the project manager. The manager is the perfect term. The synthesizer doesn't generate the initial raw ideas it listens. It takes the logical breakdown from the reasoner, the factual corrections from the verifier, and the physical simulations from the simulator. It looks at all their.

Speaker 2

13:46

Drafts and it must notice where they disagree.

Speaker 3

13:48

That's his most important job. It notices the conflicts and it integrates them into the final coherent answer that you, the user, actually see on your screen.

Speaker 2

13:57

This is where that concept of the debate comes in, right, The sources mentioned a hidden reasoning trace. Yes, exactly before I see my answer, these agents are actually arguing. Are they fighting it out?

Speaker 3

14:09

They are debating, and arguing might be the right word. In some cases, the reasoner might propose a solution, saying, logically, this is the most efficient path. The verifier might jump in and say, actually, current federal safety regulations prohibit that method.

Speaker 2

14:24

Entirely, and then the simulator chimes in.

Speaker 3

14:26

The simulator might add and even if it were legal, if you try that, the engine will overheat in thirty seconds because of the friction involved.

Speaker 2

14:34

And the synthesizer. The manager has to resolve that conflict.

Speaker 3

14:37

It has to reconcile those discrepancies. And this matters so much because in a single model, monolithic system, the AI just picks the most statistically likely path and commits to it. It often doubles down on its own errors. Right here, the system challenges itself before it ever speaks to you.

Speaker 2

14:55

It's like having a boardroom of diverse experts in your pocket instead of just one really smart but sometimes overconfident intern.

Speaker 3

15:02

That is the perfect analogy, and it explains why the leaked benchmarks for complex, open ended engineering problems are so high. Grock four point two isn't guessing on these problems. It's holding a committee meeting at the speed of light.

Speaker 2

15:15

Now, usually when you tell me committee meeting, I hear slow.

Speaker 3

15:19

Right bureaucracy, red tape.

Speaker 2

15:21

Exactly if I have to wait for four different AI agents to argue it out, am I waiting ten minutes for a response? Because we've all become very, very impatient users. We want our answers instantly.

Speaker 3

15:33

You would think, so, it's logical, more computation, more agents, more steps, it should equal more time. But this is the engineering miracle of Grock four point two. The stats on speed are genuinely baffling. What we talk about inference latency is actually down by a factor of three to five.

Speaker 2

15:51

Down, not up. It's faster to run four agents than it was to run one old model.

Speaker 3

15:56

Much much faster. Responses that used to take a sluggish eight to twelve seconds on GROC four are now taking one to three seconds.

Speaker 2

16:02

Okay, hold on, how is that physically possible. That seems to defy the lies of computation.

Speaker 3

16:07

It comes down to two main things, extreme parallelism and a new memory innovation they're calling anagram primitive.

Speaker 2

16:13

Okay, let's unpack parallelism first. That makes some sense.

Speaker 3

16:16

The agents don't run in a sequence. It's not reasoner than verifier, then simulator.

Speaker 2

16:20

It's not a bucket brigade.

Speaker 3

16:22

No, they spin up simultaneously on that massive colossus cluster. The reasoner is doing its math at the exact same time the verifier is queerying the web for facts. They work in parallel, not in a line, and then the synthesizer sorts out the results.

Speaker 2

16:37

Okay, that accounts for some of it. Yeah, but n grams. That sounds like something straight out of a cyberpunk novel. Yeah, I need to slot in a new anagram for kung fu.

Speaker 3

16:46

Hey, yeah, it does have that ring to it. It's a fascinating memory innovation. The simplest way to think of it is like a highly advanced ZIP file for concepts.

Speaker 2

16:55

A ZIP file for a concept.

Speaker 3

16:56

Okay. Normally, for an AI to access a specific piece of knowledge, say the entire tax code of France, it has to compute that information through its entire massive neural network. That's billions of parameters firing. It's computationally heavy lifting.

Speaker 1

17:10

Right.

Speaker 3

17:11

Anagram primitives are pre computed, compressed memory representations of large stable concepts. They allow the model to recall and reason over vast knowledge bases without needing to activate the entire brain for every single query.

Speaker 2

17:24

So it's like instead of having to read the whole textbook again, every time it has a photographic memory of a specific page it needs, it can just pull that instantly.

Speaker 3

17:32

Roughly, Yes, it's a very clever shortcut for high speed recall. It allows the agents to access huge amounts of domain specific data instantly without the computational drag. It effectively prefetches the context it thinks it will need for a given problem, and.

Speaker 2

17:49

That must tie into the context window. We're seeing a native one million token window.

Speaker 3

17:53

Native one million, and it's expandable to two million for enterprise users, for.

Speaker 2

17:57

Anyone listening just for scale one million tokens is huge. It's the entire Lord of the Rings trilogy plus the Hobbit. It's a massive code base.

Speaker 3

18:05

It's the ability to hold an entire complex project in memory at once. And because of this anging ram system. It doesn't feel like the AI is loading all that data. It feels native. It just knows it instantly.

Speaker 2

18:17

So we have speed, unbelievable speed, but speed is useless if you're just confidently wrong faster. Of course, we touched on accuracy with the verifier agent, but what do the actual numbers say about error rates?

Speaker 3

18:29

The internal evaluations, which are now being corroborated by early user benchmarks, are claiming a forty to sixty percent reduction in air rates on complex multi step reasoning problems compared to Grock four point one.

Speaker 2

18:42

A sixty percent reduction. That's not an incremental improvement. That is a generational leap.

Speaker 3

18:46

It's the difference between a novelty and a professional tool. If an AI is wrong twenty percent of the time on hard problems, you can't really trust it with your job. You spend more time checking its work than doing your own. Sure, if it's only wrong, say one or two percent of the time, you can start building a company on top of it. The four agent system is pushing it across that critical reliability threshold.

Speaker 2

19:08

And I saw a specific note about coding, which is always a key benchmark.

Speaker 3

19:11

Yes, the capabilities there are really impressive. It can maintain context across ten thousand plus line code bases. It's outperforming all previous versions on standard benchmarks like Human Evil and we bench. But the key phrase and the release notes that jumped out at me was autonomous debugging.

Speaker 2

19:29

Autonomous debugging. That's the dream or the nightmare, depending on your job security as a developer.

Speaker 3

19:35

Well, for now, let's just call it a massive productivity multiplier.

Speaker 2

19:38

Okay, so what does that look like in prexice? What does autonomous debugging actually mean?

Speaker 3

19:42

It means Grock four point two doesn't just write a block of code and hand it to you. It writes the code and then, using the embodied simulator agent, it runs the code in its own internal, sandboxed environment.

Speaker 2

19:55

Ah, so it tests its own work.

Speaker 3

19:57

It tests, It sees the error message, understands why it failed. Maybe it missed a semicolon, maybe it imported the wrong library, a logical flaw. It then fixes its own mistake and runs it again. It iterates internally.

Speaker 2

20:11

It fixes its own mistakes before you even.

Speaker 3

20:13

See them exactly. All you see is the final working code. It's a huge step forward.

Speaker 2

20:18

Let's pivot. Then, let's get into the real world. Because all these stats, latency parameters, tokens, they can be a bit abstract. What does this actually mean for us? What are people doing with it? Right now?

Speaker 3

20:28

In the beta, the applications are really starting to broaden out. One of the big ones is in scientific simulation. You know, Xai has always had that very grand mission.

Speaker 2

20:37

Statement, understand the universe right.

Speaker 3

20:39

Very modest goals always, but Grock four point two is taking that vision very seriously. We are already seeing users in the beta prototyping novel robotics control algorithms, simulating protein folding, modeling chemical reactions. And this brings us right back to Age.

Speaker 2

20:55

In three, the embodied simulator.

Speaker 3

20:57

The simulator, because this is where the connection to the physical world becomes so strong.

Speaker 2

21:02

Why is that robotic connection so front and center with this release.

Speaker 3

21:05

Well, you have to look at the whole ecosystem. Xai isn't just a software company existing in a vacuum. It's a sister company to Tesla. And what are we seeing from Tesla right now an explosion of humanoid robots, optimists and other robotics projects.

Speaker 2

21:19

They are everywhere on social media. You can't scroll for five minutes without seeing a robot folding laundry or walking a dog exactly.

Speaker 3

21:25

Grock four point two is being positioned as the mind being built for those bodies. When you ask it to write code to make this robot walk over uneven terrain, it isn't just guessing syntax based on texts from the internet.

Speaker 2

21:37

The simulator agent is running it.

Speaker 3

21:38

The simulator agent is modeling the friction of the ground, the robot's center of gravity, the momentum. It's trying to solve the problem from first principles of physics it is.

Speaker 2

21:49

It's just a wild concept. It's bridging the gap between digital intelligence and physical action. It's not just describing the world anymore. It's actively learning how to move through it.

Speaker 3

22:00

It's giving the AI a sense of proprioception, a sense of its own body and how it exists in space. It's a critical step toward general purpose robotics.

Speaker 2

22:09

And on the complete flip side of the cold hard science, there's the personality. Groc has always been known for that witty, non corporate.

Speaker 3

22:17

Vibe, right, the rebellious humor. The anti hr chatbot, as some people call it, is.

Speaker 2

22:22

Four point two. Keep that or has the committee of agents made it boring and safe?

Speaker 3

22:26

From what I'm seeing and experiencing. The reports say it has actually deepened it. It seems to have better emotional intelligence. Now it's not just snarky for the sake of being snarky. It can be more nuanced.

Speaker 2

22:36

It can read the room better.

Speaker 3

22:38

It can read the room, and it can handle long form creative writing things like essays screenplays with much better coherence. That's because the synthesizer agent is there to maintain the narrative arc and thematic consistency.

Speaker 2

22:52

So it doesn't just lose the plot halfway through a story exactly.

Speaker 3

22:55

And the visuals are tighter too. The integration with groc Imagine is much more seamless of the multi agent reasoning. You can refine images with more abstract conversational commands.

Speaker 2

23:05

What do you mean?

Speaker 3

23:06

You can say no, make the lighting more moody, you know, like a film war movie, and the verifier and simulator agents actually understand the stylistic implication of film noir, the deep shadows, the high contrasts, rather than just looking for keywords.

Speaker 2

23:21

So it's a better artist, a better engineer, and a better comedian.

Speaker 3

23:24

It's a true Renaissance model. It's a generalist that uses specialists.

Speaker 2

23:27

Okay, I'm sold. I want to use it. Everyone listening probably wants to use it. How do people get access to this? It's a beta, it is.

Speaker 3

23:34

But it's a public beta available right now as we speak.

Speaker 2

23:38

So I just go to grock dot com or the x app.

Speaker 3

23:41

Yep, go to grock dot com or open up the x app, start a new chat and there should be a model selector usually at the top of the screen. Just click that and choose GROC four point two public.

Speaker 2

23:52

Beta, and that's it. No waitlist, no secret handshake.

Speaker 3

23:56

No waitlist. XAI wants the data. They need that Friday ritual to kick in. They need millions of people to be the teachers.

Speaker 2

24:04

They want to accelerate the flywheel. And for the power users, the developers, the researchers.

Speaker 3

24:09

If you're a supergrock or an ex Premium plus user, you get higher rate limits, which is nice, but more interestingly, you get access to some of the debugging features.

Speaker 2

24:18

You can peak behind the curtain.

Speaker 3

24:20

You can peek behind the curtain and see some of that hidden reasoning trace that debate between the agents.

Speaker 2

24:25

I suspect a lot of people will be upgrading their subscriptions just to see that I would.

Speaker 3

24:29

I mean, seeing the agents argue is probably as entertaining and definitely as educational as the final answer itself. It shows you the process of intelligence, not just the product.

Speaker 2

24:38

So let's zoom out for the last section. What is this strategic play here? Why release a powerful beta this aggressively, Why commit to weekly public updates? It feels risky.

Speaker 3

24:51

It's a huge strategic beat against the rest of the industry. I mean, look at their main competitors, open Ai, Google. They tend to hold their models back. They keep them in the lab until they are perfect or perfectly safe. They bake them for a year or more.

Speaker 2

25:03

Right, they are more cautious. They create it like a traditional, polished product.

Speaker 3

25:06

Launch Xai is treating this like a continuous open science experiment. They are betting that the fastest path to AGI to artificial general intelligence isn't just more compute or more data, it's continuous, user grounded learner.

Speaker 2

25:22

They're trying to build an unbeatable flywheel exactly.

Speaker 3

25:25

By opening the beta to everyone and committing to weekly updates, they're accelerating their own feedback loop at a pace no one else can match. If they genuinely improve every single Friday, that's fifty two major iteration cycles a year.

Speaker 2

25:38

Wow.

Speaker 3

25:38

Their competitors might only have one or two major releases in that same timeframe. The compounding effect of that is just the math is hard to beat.

Speaker 2

25:47

It's the SpaceX approach applied to AI development. Launch, test, iterate, repeat rapidly, even if things break occasionally along the way.

Speaker 3

25:56

That's it move fast and fix things.

Speaker 2

25:58

And looking ahead, what's on the roadmap after this beta.

Speaker 3

26:02

Short term, this public beta is scheduled to run until late March twenty twenty six, so we have about a month of this rapid weekly evolution.

Speaker 2

26:10

To watch and then what comes next.

Speaker 3

26:12

Then all eyes turned toward Grock five. The internal target for that is Q two or Q three of this year twenty twenty six.

Speaker 2

26:20

Rock five.

Speaker 3

26:21

It just never stops, and that will likely run on the next generation of their hardware, which they're calling Colossus two, and will probably involve even more sophisticated and autonomous agent orchestration.

Speaker 2

26:32

What does that mean? Autonomous agent orchestration?

Speaker 3

26:34

We might start to see agents that can go out and perform complex tasks on the web for you, not just answer questions. Agents that can research and book an entire vacation or manage your calendar, or negotiate a purchase on your behalf true digital assistance.

Speaker 2

26:47

So to recap before we wrap up here, we have a new model, GROC four point two. It runs on a supercluster the size of a city. It learns every single week based on our direct feedback via this Friday ritual, thanks using an internal council of four specialized agents, a reasoner, a verifier, a simulator, and a synthesizer. And it's available for anyone to try right now.

Speaker 3

27:09

That's the summary. We've moved from static, frozen models to a living, evolving intelligence. It's a fundamental shift.

Speaker 2

27:18

It's a massive shift, and I just keep coming back to that Friday ritual, the idea that the machine is growing up alongside us week by week.

Speaker 3

27:25

It completely changes your relationship with it. You aren't just a consumer of its answers. You're an active contributor to its growth. You're part of the training data.

Speaker 2

27:33

Which brings me to a final thought. I wanted to throw at you a bit of a provocation to leave our listeners with as they go and try this thing out.

Speaker 3

27:39

Okay, let's hear it.

Speaker 2

27:40

We talked about agent three, the embodied simulator, it runs physics simulations to help robots interact with the real world. Right, And we talked about the feedback loop bus the users telling the AI what is true or helpful or correct.

27:54

So if the model updates every single Friday based on our collective human interactions, and it has an agent specifically designed to simulate and then operate in reality, at what point does the feedback loop between us and the AI begin to actively shape reality rather than just describe it.

Speaker 3

28:13

That is a heavy question, a very heavy question.

Speaker 2

28:16

Think about it. If millions of us tell the simulator that this is how a financial market should work, or this is how traffic flow should be optimized, or even this is how society ought to function, and the AI then begins to write the code for the robots and the autonomous systems that run our infrastructure based on that learned consensus. Yeah, we're interesting, a very very strange territory.

Speaker 3

28:37

It becomes the ultimate consensus reality. We are collectively training the engine that will then build our future infrastructure. And if we as a collective feed at our biases or our errors, or our collective delusions, those delusions become code, they.

Speaker 2

28:50

Become concrete, They become how the robot acts exactly.

Speaker 3

28:53

We aren't just chatting with a bot anymore. We're, in a very real sense, teaching the operating system of the future. So we better be damn careful what we teach it on Tuesdays and Wednesdays.

Speaker 2

29:04

Indeed, we better be very careful what we flag as helpful on that slightly terrifying but also exhilarating note, I think we're going to wrap it up there.

Speaker 3

29:14

Go test it out, see what the Council of Agents has to say for itself.

Speaker 2

29:18

Go collaborate with the new system. Be a good teacher. We'll see you in the next discussion.

Speaker 3

29:22

Thanks for listening, everyone, By everyone,

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript