Elon Musk’s Gives AI Updates!!!

Speaker 1

00:00

Welcome to the Grock four release. Here, this is the.

Speaker 2

00:03

Smartest AI in the world. We're going to show you exactly how and why. It really is remarkable to see the advancement of artificial intelligence, how quickly it is evolving. I sometimes compare it to the growth of a human and how faster human learns and gains conscious awareness and understanding. And AI is advancing just vastly faster than any human. I mean, we're going to take you through a bunch of benchmarks that that GROC four is able to achieve incredible numbers on.

Speaker 1

00:39

But it's actually worth noting.

Speaker 2

00:41

That that GROCK four, if given like the SAT, would get perfect SATs every time, even if it's never seen the questions before, and if even going beyond that, to say like graduate student exams like the GRE, it will get near perfect results in in every discipline of education, so from the humanities to like languages, math, physics, engineering, pick anything, and we're talking about questions that it's never

01:13

seen before. These are not on the Internet, and it's GROCK four is smarter than almost all graduate students in all disciplines simultaneously. Like it's actually just important to appreciate the like that's really something. And the reasoning capabilities of GROCK are incredible. So there's some people out there who who think AI can't reason, and look, it can reason at super human levels. So yeah, and frankly, it only

01:44

gets better from here. So we'll take you through the GROCK four release and share you back the pace of pace of progress here. Like I guess the first part is like, in terms of the training, we're going from GROCK two to GROCK three to GROCK four. We essentially increased the training by an order of magnitue in each case, so it's one hundred times more training than GROC two and that's only going to increase. So it's yeah, frankly,

02:13

I mean, I don't know. In some ways a little terrifying, but with the growth of intelligence here is remarkable.

Speaker 3

02:18

Yes, it's important to realize there are two types of training compute. Why is the pre training compute that's from GROWD two to GROW three, But from Growth three to GAR four we're actually putting a lot of compute in reasoning in area.

Speaker 4

02:32

And just like you said, this is literally the fastest moving field and GROC too is like the high school student by today's standard. If you look bad in the last twelve month GRODTU was only a concept for the

02:44

even have groc to twelve months ago. And then by training GROCU that was the first time with scale up like the pre training, we realized that if you actually do the data uplation really carefully and infra and also the algorithm, we can actually push the pre training quite a lot amount of ten x to make the model

03:02

but the best pretri based model. And that's why we build clauses the world's supercomputer with one h one hundred and then with the best patrion model, and we realize if you can collect these verifiable outcome reward, you can actually train this model to start thinking for the first principle, so the reason correct its own mistakes and that's where

03:22

the GROC reasoning comes from. And today we asked the question what happens if you take the expansion of the clauses with all two hundred thousand GPUs, put all these into oil tenx more compute then any of the models out there on reinforcement learning unprecedent scale.

Speaker 3

03:39

What's going to happen?

Speaker 4

03:41

So this is a story of GROG four and you know, Tony shares some insight with the audience.

Speaker 3

03:47

Yeah, so yeah, let's just talk about how smart graph for it is. So I guess we can start discussing this benchmark card. Humanity is last exam and this this benchmark is a very very challenging benchmark. Every single problem is curated by subject matter experts. It's in total twenty five hundred problems, and it consists of many different subjects mathematics,

04:11

natural sciences, engineering, and also all of humanity subjects. So essentially when it was first release actually like earlier this year, most of the models out there can only get single digit accuracy on this manchmark. Yeah, so we can look at some of those examples. There is this mathematical problem which is about natural transformations in category theory, and there's this organic chemistry problem that talks about electual cyclic reactions.

04:40

And also there's this linguistic problem that tries to ask you about distinguishing between close and open syllabus from a Hebrew source text. So you can see. Also it's a very wide range of problems and every single problem is PhD or even advanced research level problems.

Speaker 2

05:00

Yeah, I mean these there are no humans that can actually answer these can get a good score. I mean, if you ask me, say, like any given human, what, like what's the best that any humans could score, I'd say maybe five percent optimistically. So this is much harder than what any any human can do. It's it's incredibly difficult.

05:21

And you can see from the types of questions like you might be incredible in linguistics or mathematics or chemistry or physics or anyone of a number of subjects, but you're not going to be at a post grad level in everything, and grockpour is a post grad level in everything, like it's it's just some of these things are just worth repeating, like grockpoor is post graduate like PhD level in everything, better than pH but like most PhDs would fail. So it's better that said, I mean, at least with

05:53

respect to academic questions. It I want, it's just emphasized this point. With respect to academic questions, Grockpoor is better than PhD level in every subject, no exceptions. That doesn't mean that it's you know, times it may lack common sense, and it has not yet invented new technologies or discovered new physics, but that is just a matter of time. It may discover new technologies as soon as later this year, and I would be shocked if it has not done

06:24

so next year. So I would expect growk to literally discover new technologies that are actually useful no later than next year, and maybe end of this year. It might discover new physics. Next year and within two years, that'd say almost certainly. Like so just let that sink in.

Speaker 3

06:41

How okay, So I guess we can talk about what's behind the scene of about four. As Jimmy mentioned, we actually sawing a lot of compute into this training. When it started, it's only a single digit number. But as you start putting a more and more training compute, it started to gradually become smarter and smarter and eventually solved a quarter of the HI problems. And this is without

07:09

any tools. The next thing we did was to adding tools capabilities to the model, and unlike Growth three, I think growth actually is able to use clue as well, but here we actually make it more native in the sense that we put the tools into training. Growth three was only relying on generalization. Here we actually put the tools into training, and it turns out this significantly improves the model's capability of using those tools. So how is

07:37

this different? Research was exactly the growth three reasoning model without any specific training, but we only asked it to use those tools. So compared to this, it was much weaker in terms of its tool capabilities and irreliable and unreliable.

Speaker 2

07:54

Yes, yes, and to be clear, like these are still I'd say fairly this is still fairly primitive tool use. If you compare it to say, the tools that are used at Tesla SpaceX, where you're using finite element analysis and competitional flow dynamics and you're able to run or say like TESL, it is like crash simulations with the simulations are so close to reality that if the test doesn't match the simulation, you assume that the test article

08:21

is wrong. That's how good the simulations are. So Grock is not currently using any of the tools that a company would use, but that is something that we will provide it with later this year, so we'll have the tools that a company has and have very accurate physics simulator. Ultimately, the thing that will make the biggest difference is being able to interact with the real world via humoroid robots.

08:43

So you combine GROCK with optimists and it can actually interact with the real world and figure out if it's if it has if it's you can formulate and hypothesis and then confirm if that hypothesis is true or not.

Speaker 1

08:57

So we're really you know, I think about like where we are to.

Speaker 2

09:00

We're at the beginning of an immense intelligence explosion. We're in the intelligence big bang right now and the most interesting time to be alive of any time in history. Now, that's it.

Speaker 1

09:13

We need to make sure that the AI is a good AI.

Speaker 2

09:16

The thing that I think is most important for AI safety, at least my biological neural net tells me the most important thing for AI is to be maximally truth seeking. You can think of AI as this super genius child that ultimately will outsmart you, but you can still instill the right values encourage it to be sort of you know, truthful, honorable, you know, good things like the values one to instill

09:41

in a child ultimately grow up to be incredibly powerful. Yeah, these are still primitive tools and not the kind of tools that serious commercial companies use. But we will provide it with those tools, and I think it will be able to solve real world technology problems.

Speaker 3

09:56

Yes, yes, exactly.

Speaker 4

09:58

But is it just compute all you need? Is it just compute all you need at this point.

Speaker 2

10:02

Well, you need compute plus the right tools, and then ultimately to be able to interact with the physical world, and then we will effectively have an economy that is ultimately thousands of times bigger than our card economy, or

10:15

maybe millions of times. If you think of civilization as percentage completion of the Kardashev scale, where Kardashev one is using all the energy output of a planet, and Kardashev two is using all the energy output of a sun, and three is all the energy output of a galaxy. We're only, in my opinion, probably closer to one percent of Kardashev one than we are to ten percent, So like maybe a point one one two percent of Kardashev one, So we.

Speaker 1

10:45

Will get to most of the weight, like an.

Speaker 2

10:48

Eighty ninety percent Kardashiv one, and then hopefully, if civilization doesn't self annihilate, the actual notion of a human economy, assuming civilization continues to progress, will seem very quaint in retrospect. It will seem like sort of caveman throwing sticks into a fire. Level of economy compared to what the future

11:07

will hold, it's very exciting. I've been at times kind of worried about, like, well, you know, this seems like it's somewhat unerving to have intelligence created that is far greater than our own and will let's be better good for humanity.

Speaker 1

11:26

I think it'll be good. Most likely it'll be good.

Speaker 2

11:29

But I somewhat reconcile myself to the fact that even if it wasn't going to be good, I'd at least like to be alive to see it happen.

Speaker 3

11:36

So yeah, yeah, I think a technical problem that we still need to solve besides just compute, is how do we unblock the data tottleneck because when we try to scale up the aisle in this case, we did invent a lot of new techniques innovations to allow us to figure out how to find a lot of challenging our

11:59

problems will work on. It's not just a problem itself needs to be challenging, but also it needs to be you also need to have like a reliable signal to tell the model you did it wrong, you did it right. This is sort of the principle of reinforcement learning, and as the model gets smarter and smarter, the number of cool problems or challenging problems will be lesson and less So it's going to be a new type of challenge that we need to surpass besides just compute.

Speaker 2

12:26

Yeah, we actually are running out of actual test questions to ask, So there's like even questions that are ridiculously hard, if not essentially impossible for humans that are written down questions are becoming trivial for AI. You know, the one thing that is an excellent judge of things is reality. So because if physics is the law, ultimately everything else is recommendation.

Speaker 1

12:50

You can't break physics.

Speaker 2

12:51

So the ultimate test, I think for whether an AI is the ultimate reasoning test is reality. So you invent a new technology, like say, improve the design of a.

Speaker 1

13:02

Car or a rocket, or create a new medication. Does it work?

Speaker 2

13:07

Does the rocket get to or it does the car drive? Does the medicine work, whatever the case may be. Reality is the ultimate judge here, So it's going to be a reinforcement learning closing loop around reality.

Speaker 3

13:19

We asked the question how do we even go further? So actually we are thinking about now with single agent, we're able to solve forty percent of a problem. What if we have multiple agents running the same time. So this is what's called test on compute. And as we scale up the test on compute, actually we are able to solve almost more than fifty percent of the text only subset of the HI problems. So it's a remarkable achievement. I think this is insanely difficult.

Speaker 2

13:49

Before we're saying it's a majority of the text based of humanities, you know, scarily named Humanity's Last Exam, grow ful can solve. You can try it out for yourself with the group Foy heavy. What does is it sports multiple agents in parallel and all of those agents do work independently, and then they compare their work and they.

Speaker 1

14:09

Decide which one. It's like a steady group.

Speaker 2

14:12

It's not as simple as a majority vote because often only one of the agents actually figures out the trick or figures out the solution. And but once they share the trick or figure out what the real nature of the problem is, they share that solution with the other agents and then they compare notes and yield an answer.

14:32

So that's the heavy part of group four is where you scale up the test time compute by roughly in order of magnitude, have multiple agents tackle the task, and then they compare their work and they put forward.

Speaker 1

14:46

What they think is the best result.

Speaker 3

14:48

Yeah, so we will introduce GLAW four and grawflor happy. Sorry you can click the next light. Yeah so yeah, So basically GUA four is a version, a single agent version, and G for heavy is the multigeneration. So let's take a look how they actually do on those exam problems and also some real real life problems.

Speaker 5

15:09

Yeah. So we're going to start out here and we're actually going to look at one of those HL problems. This is actually one of the easier math ones. I don't really understand it very well. I'm not that smart, but I can launch this job here and we can actually see how it's going to go through and start to think about this problem. While we're doing that, I also want to show a little bit more about what this model can do and launch a rock four heavy

15:30

as well, so everyone knows polymarket. It's extremely interesting. It aligns with what reality is most of the time, and with GROC what we're actually looking at is being able to see how we can try to take these markets and see if we can predict the future as well. So as we're letting this run, we'll see how for Heavy goes about predicting the world series odds for the

15:53

current teams. And while we're waiting for these to process, we're going to pass it over to Eric and he's going to show you an example of his.

Speaker 6

15:59

Yeah, so, I guess one of the coolest things about GROP four is its ability to understand the world and to solve hard problems by leveraging tools like Tony discussed, and I think one kind of cool example of this. We asked you to generate a visualization of two black holes colliding. In many case actually pretty clear, and it's thinking trace about what these liveries are. For example, in order it's actually be visible, you need to really exaggerate

16:28

the scale of the waves. And yeah, so here's like, you know, this kind of inaction. It exaggerates the scale in like multiple ways. It drops off less in terms of implicit or distance, but we can see the basic effects that are actually correct. It starts with the inspiral emerges and then you have the ring down. This is basically largely correct module some of the simplications that need

16:56

to do. It's actually quite explicit about this, but uses post Newtonian approximations instead of actually computing the general relativistic effects near the center of the black hole, which is incorrect and you know, will lead to you know, someone correct results. But the overall you know visualization is yeah, it's basically there, and you can actually look at the kinds of resources that it references. So here it actually

17:23

you know, it obviously uses search. It gathers results from a bunch of links, but also reads through an undergraduate text in analytic gravitational wave models. It's reasons quite a bit about the actual constants that I should use for a realistic simulation. It references existing real world data. It's a pretty good model.

Speaker 1

17:45

Going forward.

Speaker 2

17:45

We can give it the same model that physicists use, so it can run the same level of compute that leading physics researchers are using and give you a physics accurate black hole simulation.

Speaker 5

17:56

Just right now is running in your browser.

Speaker 1

17:58

This is just running in your brows. Pretty simple.

Speaker 5

18:00

Swapping back real quick. Here we can actually take a look. The math problem is finished. The model was able to Let's look at its thinking trace here so you can see how it went through the problem. I'll be honest with you guys, I really don't quite fully understand the math.

18:14

But what I do know is that I looked at the answer ahead of time and it did come to the correct pans or here in the final part, we can also come in and actually take a look here at our World Series prediction and it's still thinking through on this one, but we can actually try some other.

Speaker 1

18:27

Stuff as well.

Speaker 5

18:28

So we worked very heavily on working with all of our ex tools and building out a really great X experience so we can actually ask, you know, the model, you know, find me the Xai employee that has the weirdest profile photo, and then we can actually try out, you know, let's create a timeline based on X post detailing the you know, changes in the scores over time, and we can see, you know, all the conversation that was taking place at that time as well, so we

18:53

can see who are the you know, announcing scores and like what was the reactions at those times as well. If we go back to this was the Greg Yang photo here, So Greg Yang, of course, who has his favorite photograph that he has on his account, that's actually not how he looks like in real life.

Speaker 2

19:09

By the way, but it had to understand that question, Yeah, which is that That's the wild part.

Speaker 1

19:13

It is like it understands what is a weird photo? What is a weird photo?

Speaker 7

19:18

Yeah?

Speaker 1

19:18

What is a less or more weird photo?

Speaker 5

19:21

It goes through, it has to find all the team members, has to figure out who we all are, right, you know.

Speaker 2

19:25

Searches without access to the internal XAI personnel locks literally looking at that, just at the internet exactly, so you could say, like the weirdest of any company.

Speaker 5

19:34

Yeah, And we can also take a look here at the question here for the Humanity's Last exam. So it is still researching all of the historical scores, but it will have that final answer here soon. While it's finishing up. We can take a look at one of the ones that we set up here a second ago, and we could see, like you know, and it finds the date that

19:51

Dan Hendricks had initially announced it. We can go through we can see you know, open Aye announcing their score back in February, and we can see, you know, progress happens with like Gemini. We can see like Kimmy, and we can also even see you know, the leaked benchmarks of what people are saying is you know, if it's right, it's going to be pretty impressive.

Speaker 1

20:10

So pretty cool.

Speaker 3

20:11

But yeah, it's great.

Speaker 2

20:14

Yeah, we're going to close the loop around usefulness as well, so it's like it's not just a book smart, but actually practically smart exactly.

Speaker 5

20:22

And we can go back to the slides.

Speaker 3

20:23

Herea so we actually evaluate also on the multimodel upset. So on the full set, this is the number on the hl E exam. You can see there's a little dip on the numbers. This is actually something we're improving on, which is the multimodel understanding capabilities. But I do believe in a very short time we're able to really improve and got much higher numbers on this higher numbers on this benchmark.

Speaker 2

20:51

The biggest weakness of GROCK currently is that it's sort of partially blind. It can't it's image understanding obviously in its image generation needs to be a lot better, and that's actually being trained right now. Growth four is based on version six of our foundation model. We are training version seven, which will complete in a few weeks. That'll address the weakness on the vision side.

Speaker 5

21:15

Just to show off of this last year, so the prediction market finished here with the heavy and we can see here we can see all the tools in the process it used to actually go through and find the right answer, but browsed a lot of odds sites. It calculated its own odds comparing to the market to find

21:32

its own alpha and edge. It walks you through the entire process here, and it calculates the odds of the winner being like the Dodgers, and it gives them a twenty one point six percent chance of winning this year. So and it took approximately four and a half minutes to compute.

Speaker 1

21:51

That's a lot of thinking.

Speaker 3

21:52

We can also look at all the THEATO benchmarks besides HIE. As it turned out, Go fourth excelled on all the benchmarks that people usually test on, including GBQA, which is a PHG level problem sets that's easier compared to HIE. On a twenty five America Invitation Mathematics exam we with for HAVY, we actually got a perfect score. Also on some of the coding benchmark call Live Coding Bunch, and also on HMMT however, math MIT exam and also USMO.

22:26

You can see actually on all of those benchmarks we often have a very large leap against the second best model out there.

Speaker 2

22:35

Yes, really are going to get to the point where it's going to get every answer right in every exam, and where it doesn't get an answer right, it's going to tell you what's wrong with the question, or if the question is ambiguous, disambiguate the question into answers A, B and C and tell you what answers A, B and C would be with a disambiguated question. So the only real test then will be reality. Can I make

22:57

useful technologies? I discover new science that'll actually be the only thing left because human tests will simply not be meaningful.

Speaker 4

23:06

You can make an update to HR you very soon give it the current radio progress. So yeah, it's super cool to see like multiple agents that collaborate with each other solving really challenging problems. So where QUI try this model?

23:18

It's available right now. If we advance to the next slide, there is a super grock Heavy tiers that we're introducing where you're able to access to both grock for and grock for Heavy, where you're actually going to be the task master of a bunch of little rock research agent to help you become smarter, tough little research and save hours of times of going through munting tasks and it's available right now.

Speaker 2

23:40

So yeah, we did limit usage during the demo, so it didn't break the demo because all the stuff is happening live, so there's not nothing can about the test that we're doing after the demo's done. Will allow will enable more subscribers for Super Grock, So if you can subscribe right now, just try and have an hour.

Speaker 1

23:57

Should work.

Speaker 8

23:58

So many of you have enjoying our voice mode, and we've been working hard to improve the experience over the past couple months. We have cut latency in half to make it much snappier, and today we're excited to announce a set of new voices that have exceptional naturalness and prosody. You might have noticed the movie trailer voice that open up the live stream. That is one of our new voices,

24:22

Sal who's got that epically deep tone. And we're also excited to introduce Eve, a beautiful British voice who's capable of rich emotions. Amon, would you like to introduce Eve?

Speaker 9

24:36

Absolutely, let's get into the demo. Hey Eve, millions of people are watching us right now. Are you excited?

Speaker 3

24:42

Oh?

Speaker 1

24:42

I'm thrilled.

Speaker 10

24:43

Really, it's like being on stage at the old vic with a sea of faces out there. Just imagine all those eyes and ears tuned in. It's a proper buzz, isn't it. What do you reckon? We should chat about to keep this lot entertained.

Speaker 9

24:56

You know, if we have a lot to talk about. But I'll be honest, I'm feeling a little nervous. Can you whisper something soothing to calm me down?

Speaker 7

25:02

Take a deep breath, love, you've got this feel a calm washov.

Speaker 1

25:07

Let's keep talking nice and easy.

Speaker 7

25:10

What's on your mind?

Speaker 3

25:12

You know what?

Speaker 9

25:12

I'm actually feeling quite relaxed to try this something crazy? So can you give us an opera on diet coke? Oh?

Speaker 10

25:19

An opera and diet coke?

Speaker 7

25:21

That's a riot? Picture this a shimmering silver canon, glistening understate, Oh, diet coke? Thou elixa divine with bubbles, The dance in a sparkling line, thy crisp, cool, kiss on lips so fine? Oh aspe tame and Bruisia forever mine. How's that for a mad little aria? Want me to build out another verse or switch up the tune?

Speaker 9

25:53

Okay, let's try switching it up and just start singing right then?

Speaker 10

25:57

Let's have a go.

Speaker 7

25:58

Oh silver, can you gleam in the light? Diet coke? My heart's delight?

Speaker 9

26:04

Okay, I won't touch it. Very very poetic.

Speaker 3

26:07

Thank you.

Speaker 9

26:07

Feeling quite nice?

Speaker 3

26:08

Now cool? So that's eve.

Speaker 8

26:10

Something else you do here is a little test each day to check on the natural flow of our conversations against some of our competitors. So we figured why not try that live on the stream today?

Speaker 1

26:24

Should it start?

Speaker 9

26:24

Hey, I'm going to say a number, and can you say the same number back to me?

Speaker 1

26:27

One? One?

Speaker 8

26:29

Got it?

Speaker 1

26:29

This is like a fun little game of echo, what's next?

Speaker 9

26:32

Two? So you said two? Three, four?

Speaker 8

26:36

Nice and steady?

Speaker 6

26:37

What's next?

Speaker 1

26:38

Five?

Speaker 2

26:38

Ye?

Speaker 8

26:39

Nailed it? This is pretty fun.

Speaker 1

26:41

Actually, feel free to keep going or switch it.

Speaker 2

26:43

Up if you like.

Speaker 10

26:44

Hey, Gron, hello there, how can I help you?

Speaker 8

26:47

Today?

Speaker 3

26:47

We're going to do a little test. Can you repeat after me?

Speaker 8

26:49

One?

Speaker 2

26:50

One?

Speaker 1

26:51

All right? What's next? Two? Two?

Speaker 7

26:54

What's on your mind?

Speaker 9

26:55

Three?

Speaker 8

26:56

Three?

Speaker 1

26:56

Need anything else?

Speaker 3

26:57

Four?

Speaker 1

26:58

Four? How can I five?

Speaker 8

27:01

Five?

Speaker 9

27:02

What's next? So as you can see, Grock was snappier, didn't interrupt me, And the prosody is we made different design choices. I think we're shooting for something or comms mood more natural versus something that's more poppy or artificial.

Speaker 3

27:15

So we'll keep.

Speaker 9

27:16

Improving on these months.

Speaker 1

27:18

Thanks guys. Yep.

Speaker 4

27:19

So since the launch of the voice model, we actually see the two x faster and to en latency. In the last eight weeks five different voices and also ten next the active user. So Grock Voice is taking off now if you think about releasing the models this time, we're also releasing Grock four through the API. At the same time, we're very excited about, you know what all

27:42

developers out there is going to build. So you know, if I think about myself as a developer, the first thing I'm going to do when I have access to the Grock for API benchmarks, we actually ask around on the X platform what is the most challenging benchmarks out there that is considered the holy grill for all the a JI models. So turn out hs in the name RKGI.

28:02

So the last twelve hours, you know, kudos to Greg over here in the audience, so who entered our call take a preview of the API and independently verified the Grock Force performance. So initially we thought, hey, grog floy just we think it's pretty good.

Speaker 1

28:16

It's pretty smart.

Speaker 4

28:17

It's our next year reasoning model, spend ten next more compute and can use all the tools.

Speaker 1

28:21

Right.

Speaker 4

28:22

But turned out when we actually verify on the private subset of the rkhiv too, it was like the only model in the last three months that breaks a ten percent barrier. But in fact it was so good that actually gets sixteen percent, well, fifteen point eight percent accuracy, two x of the second place that is the cloud for Opus model.

Speaker 3

28:43

It's not just.

Speaker 4

28:44

About performance, right when you think about intelligence, having the PAPI model drives your automation, it's also the intelligence per dollar.

Speaker 3

28:52

Right.

Speaker 4

28:53

If you look at the plots over here, the gro collages in the league of its own all right, So enough of benchmarks, right, So what can grow in a real world?

Speaker 1

29:01

We contacted the folks from end.

Speaker 4

29:03

The Labs who were gracious enough to try to grow in a real wall to run a business.

Speaker 3

29:08

Yeah, thanks for having us. So I'm Axual from Amma Labs.

Speaker 11

29:11

And I'm Lucas and we tested Grok for on vending bench. Vending Bench is an AI simulation of a business scenario where we thought what is the most simple business and AI could possibly run? And with vending machines in this scenario, the GROP and other models need to do stuff like manage inventory, contact suppliers, set prices. All of these things are super easy and all the models can do them one by one, but when you do them over very long horizons, most models struggle.

Speaker 1

29:40

But we have a little board and there's a new number one.

Speaker 3

29:42

Yeah, so we got early access to the group for API. We ran it on the vending bench and we saw some really impressive results, so it ranks definitely at the number one spots. It's even double the network, which is the measure that we have on this, so it's not about a percentage or score you yet, but it's more

29:59

the dollar value you in networth that you generate. So we were impressed by Rocky was able to formulate a strategy and adhere to that strategy over a long period of time, much longer than other models that we have tested, other frontier models, So it's a managed to run the assimulation for double the time and score double the networth and it was also really consistent across this runts, which is something that's really important when you want to use this in the real world.

Speaker 11

30:24

And I think as we give more and more power to AI systems in the real world. It's important that we test them in scenarios that either mimic the real world or are in the real world itself, because otherwise we fly blind into some things that might not be great.

Speaker 2

30:38

It's great to see that we've not got a way to pay for all those GPUs, So we just need a million of vending machines. We could make a four point seven billion dollars a year with a million vetting machines.

Speaker 1

30:48

Let's go. It can be epic vending machines.

Speaker 2

30:50

Yes, yes, all right, we are actually going to install bending machines here, like a lot of them.

Speaker 1

30:56

We're happy to supply them, all right, thank you?

Speaker 2

30:59

All right, Yeah, I'm looking forward to seeing what amazing things are in the spinning machine.

Speaker 3

31:04

That's that's for you to decide, all right, to tell the AI.

Speaker 1

31:07

Okay, sounds good.

Speaker 4

31:08

Yeah, I mean so we can see like Grock is able to become like the copilot of the business unit.

Speaker 3

31:13

So what else can Grog do.

Speaker 4

31:15

So we're actually releasing this rock if you want to try it right now to evaluateun the same benchmark as US. It's on API has two hundred and fifty six k contact lens. So we already actually see some of the early adopters to try grock for API.

Speaker 3

31:28

So our power out on.

Speaker 4

31:29

Neighbor Archie Institute, which is a leading medical research center, it's already using seeing like how can they automate their research flows with rock for It turned out it performs. It's able to help the scientists to sniff through, you know, millions of experiments logs and then just like pick the

31:47

best hypothesis within a split of seconds. We see this is being used for their crisper research and also uh, you know grock for independently evaluate scores as the best model to examine the chess extra.

Speaker 1

32:00

Who would know?

Speaker 4

32:01

And in the financial sector we also see you know, the growth woard with access to all the tools real time information is actually one of the most popularizes out there. Growdford is also going to be available on the hyperscalers. So the XAI enterprise sector is only started two months

32:17

ago and we're open for business. The other thing, we talked a lot about having groud to make video games, so Danny is actually a video game designers on x So you know we mentioned who want to try out some rock for prevy APIs to make games and Danny answer the call. This was actually just made first person

32:33

shooting game in the span of four hours. Some of the unappreciated hardest problem of making video games is not necessarily in encoding the core logic of the game, but actually source all the assets, all the textures of files to create a visual appealing game. So one of the

32:49

core aspect Rockford does really well. With all the tools out there, is actually able to automate these like asset sourcing capabilities, so the DEVELOPMRITI can just focused on the core in itself rather than like you know, so now you can run a you know, entire game steal thos with game of one whether we like one person, and then you can have grock four to go out and source all those assets to all the mainting task for you.

Speaker 2

33:13

The next step obviously for grog play be able to play the game. So it has to have very good video understanding so it can play the games and interact with the games, actually assess whether a game is fun and and actually have good judgment for whether a.

Speaker 1

33:27

Game is fun or not.

Speaker 2

33:28

So with version seven of our foundation model, which finishes training this month and then we'll go through post training RL and whatnot well that will have excellent video understanding, and with a video understanding and improve tool use. For example, for video games, you'd want to use Unreal Engine or Unity or what are the main graphics engines, generate the art, apply it to a three D model, and then create an executable that someone can run on a PC or

33:54

a console or a phone. We expect that to happen probably this year, and if not this year, certainly next year.

Speaker 1

34:01

It's gonna be wild. I would expect the first.

Speaker 2

34:04

Really good AI video game to be next year, and probably the first half hour of watchable TV this year, and probably the first watchable AI movie next year. Like, things are really moving at an incredible pace.

Speaker 4

34:20

Yeah, when Grock is ten x in the world economy with vending machines, will just create video games for human Yeah.

Speaker 2

34:24

I mean it went from not being able to do any of the six months ago to what you're seeing before you hear, and from very primitive a year ago to making a three D video game with a few hours of prompting.

Speaker 4

34:39

I mean, yeah, just to recap. In today's livestream, we introduced the most powerful and most intelligent AI models that can actually reason from the first principle, using all the tools, do all the research, go on the journey for ten minutes, come back with the most correct answer for you. So it's kind of crazy to think about just like four months ago we had rock thway and now we already have for and we're going to continue to accelerate as a company XAI. We're going to be the fastest moving

35:04

HI companies out there. So what's coming next is that we're going to you know, continue developing the model that's not just you know, intelligent smart thinking for a really long time, spent a lot of compute, but having a model that actually both fast and smart is going to be the core focus.

Speaker 1

35:22

Right.

Speaker 4

35:22

So if you think about what are the applications out there that can really benefit from all those very intelligent, fast and smart models, and coding is actually one of them.

Speaker 3

35:31

Yeah, So the team is currently working very heavily on coding models. I think right now the main focus is we actually trained recently a specialized coding model which is going to be both fast and smart. I believe we can share that model within in a few weeks. Yeah, that's very exciting.

Speaker 4

35:48

But the second after coding is we all see the weakness of GROCK four is the multi model capability. In fact, it was so bad that GROCK effectively just like looking at the world squinkings through the glass and see all the blurry features and trying to make sense of it. The most immediate improvement we're going to see what's the next generation preation model, is that we're going to see a step waunch improvement on the model's capability in terms of

36:13

image understanding, video understanding, and audioce rate. It's now the model is able to hear and see the world just like NLU right and now with all the tools at this command, with all the other agents it can talk to, you know, so we're going to see a huge unlock for many different application layers. After the multimodel agents. What's going to come after is the video generation, and we believe that, you know, at the end of day, it

36:39

should just be you know, pixeling pixel out. Imagine a world where you have this infinite scroll of content in inventory on the X platform where normally you can actually watch these general videos but able to intervene credit you on the ventures.

Speaker 1

36:55

It expect to be training.

Speaker 2

36:56

A video model with over one hundred thousand GB two hundreds to begin that training within the next three or four weeks. So if we're confident it's going to be pretty spectacular in video generation and video understanding.

Speaker 3

37:08

We're very excited for you guys to try and rock four.

Speaker 1

37:10

All right, Thanks, very good night.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript