The Unreasonable Effectiveness of AI in Computer Vision with Ashish Part 1

Speaker 1

00:00

Welcome to the Occult Green Jacks. Tonight, we've got a very special guest, Ashih, who is going to tell us a lot of really deep intelligent stuff about AI and all all the rest of the stuff. He'll get into it, but first let's go to Nick. Nick, how you doing what you got coming out? Tell us about your herbal video that just popped out?

Speaker 2

00:22

Oh, thank you, thank you?

Speaker 3

00:23

Yeah that it seems like I seemed to be getting like you even mentioned good reviews are right, ab, we'll getting some good comments.

Speaker 2

00:29

It's a different way of looking at it.

Speaker 3

00:31

I do cover the like kind of the occult angle of the witchcraft angle, maybe even like a little bit of history, like I forgot what it might have been, peppermin or something, spearman something I was mentioning, like even I think in Greek Greek history of Greece they'd wear it around their heads, is like to help them study or something when they.

Speaker 2

00:46

Took tests, some random shit like that.

Speaker 3

00:48

Like and I'm throwing all that stuff in there, and then I'll go into how it actually affects your twelve cranial nerves and when you saw it seeing like when it's actually going to you your your you know your vegas system and your uh you're glad uh if it basically like the some effect like up to three at a time, and that gets pretty interesting when it starts doing that. You know your trigeminal, your vegas system, and maybe one or two one other one.

Speaker 2

01:15

But uh yeah, long story short.

Speaker 3

01:18

I think it's interesting for people to see like what some of these things actually do when you just inhale them, what kind of effects it's gonna have on your body and how you're gonna feel afterwards, and some of them that I even show, Uh, if you decide to just all of a sudden going crazy dosages with it, you'll notice it starts to shut your system down. And I start to wonder about that with maybe shamanistic rituals.

Speaker 2

01:41

You know, that's why you were fumigating that stuff, you know.

Speaker 3

01:45

So it's just interesting to see the kind of science with these things because you start to wonder, did they know this back then, because like the effects that you'll see, like that would be in like Scott Cunningham's book about what these herbs are gonna do for you, you can say, well, that's what it's doing to you fucking physically, just from hill or putting it on your skin, you know, so you almost wonder like were they were still getting it twisted?

Speaker 2

02:06

You know what I'm saying.

Speaker 3

02:07

It is still more science unfortunately with that stuff, and somehow it got woo wooed.

Speaker 2

02:11

You know.

Speaker 3

02:11

I don't know did we lose did we lose the original? I guess a cult cult idea behind it, something like that. So check it out. It is getting good reviews. I was a little surprised that actually has as many comments as it does now. So, uh yeah, it's got about three parts one drop today. We got a few other ones that will be coming out. I got part two of the Colors coming out. I should have Johannes Kepler dropping in like two weeks. I should be editing that

02:36

up and that will be out. And I got into my park that should be dropping any within the week, I'm hoping. So yeah, we got a bunch of shit coming out. I have a bunch of stuff edited. I just haven't actually wrapped it all up yet. But uh yeah, then we do have some stuff going forward. We'll be

02:50

covering the Stargate project next week. Again, it's gonna be a big group of the whole cult reject team we'll be here covering more of the Stargate or Gateway project, PaperWorks whatever was Yeah, so that should be interesting.

Speaker 2

03:02

So we got a lot of stuff coming.

Speaker 1

03:04

Thank you all right, sounds awesome, Ahi, Ish tell the people about what you've been through, tell them about who you are and what you do.

Speaker 4

03:13

Hi. Good to be back on guys. Thanks for having me in case you don't know me from the last times I was on here. My background, my name's Ashiesh. My background is in artificial intelligence and in finance. I spent ten years working in finance as a trader for Edward Jones Investments, and part of my responsibilities were covering tech at the time. After that, I transitioned full time into AI and I was specifically trying to apply AI

03:40

to computer vision. And the presentation I want to give today is related to that because AI is a hot topic and I think one of the things that's happening, just like every other subject we've seen, there's a massive amount of distortion that's happening in AI, And what I want to do is I want to help people. So a lot of people think that AI is like a super advanced subject that's impossible to tackle or understand unless you have some kind of like background in computer science,

04:10

and to be honest, it's not really necessary. I hope I can make this accessible for people to understand real AI. And I think it's really important that people understand this because there's so many profound things that are actually true

04:25

about AI that are not really publicly discussed. And so one of the big topics that is kind of a big deal with AI and a lot of people who are like doomers on AI, they think that there's some kind of artificial life form that's about to be created, and this artificial life form will turned into the terminator or some apocalyptise scenario or something like that. Hopefully, what I can do here is bring a little bit of

04:52

rationality this subject without getting overly technical. But there are a couple of technical concepts that I want to explain, no, I'll make it. And I also want to be able to show what real AI is and how you know what real AI is with versus what is not because there's a lot of junk AI out there. So let's just kind of start with like what is AI? Like fundamentally, at a kindergarten level, what is AI? AI is a mathematical function, and it is primarily based on a reward

05:31

and cost. There's a function for reward that can be basically anything you want, and then there's a function for cost for that same reward, which can also basically be whatever you want. And the objective in AI is to make these two functions resemble real life as much as possible. So this is really what AI is. It's a type of pattern recognition map. And this pattern recognition map is intense.

06:00

It's extremely large. Some of these models can have billions of weights, and I'll kind of explain a little bit about that. And because of the massive size of these models, they are also energy intents and the energy you may know in you know, in America, there's this huge push for energy investment. Weightly, we're spending trillions upon trillions of dollars, multiple trillions of dollars into both our energy grid and

06:30

energy utilities. And then on top of that, AI as well is also independently receiving or trillions of dollars are independently being invested in AI. So there are two major energy intense aspects of AI. There is the training side, which is like what you do to build the model, and then there's the inference side, which is what happens

06:56

after the model's already done you deployed it. There's people's hands and then they do a querry on something like chat, GPT or whatever, and then that quarry has a small amount of energy that's required to do this thing called inference, which then outputs your answer output to your result. Now, the training side is massive, and the inference side individually is very small, But where the massive energy constraint comes

07:21

from is the scale. So once these developed models get deployed out into the world, the energy demand from the sum total of them all combined. That's where the energy intensity goes through the roof. So two sides that are heavily energy intense that are required in order to make

07:40

AI work. So what you can see from this though, and what you should kind of interpret in your head just starting with this, is that if an AI model requires that much energy in order to generate the model and then be able to use it, you're probably in the vacuum stage, a vacuum tube stage, because this is nowhere near the efficiency of the compute that happens in the human brain. And I'm going to.

Speaker 1

08:09

Start relating hold on because I think this is a two way street. It's it's not only just taking everything that you're you know, sort of, it's not only taking your questions. It's taking all of the information together. So this is a massive data gathering tool as well as a data you know, output tool. Yeah, and that's the

08:32

main sort of correlation that's being drawn here. I think they're sucking up a lot of core information, psychological information about each each one of us, how we interact with it.

Speaker 4

08:44

Yep. And the kind of that kind of description that you just gave that is most relevant to the concept of l ms, which are large language models. So this is like your chat Ebt, Gemini, crock whatever, take your flavor of the month. There's so many these things. This is a good, good thing you mentioned that, because what I want to show here, and one of the things that I'm hoping to promote, is that computer vision is a way better, more accurate version of AI than a

09:17

large language model. So what I was going to say earlier was that I'm relating AI to the brain. And what I was originally trying to do here and when I was working on AI, was to model artificial intelligence off the human brain. So you're going to hear me relate to the human brain many times in this presentation. I'll get to that because that's a big part of

09:41

what I'm doing here. So one of the things to know about that is that in a large language model, an LM is a pattern recognition math built around literally language text. And if you think about the human brain, the language center of the brain is a very small part of the brain right here over here. It's a very small part actually, And I'm not saying it's not important. A large language model and that technology is important, but that is not what's going to get you to this

10:11

artificial life form. And that's controversial to say because there's so many people that think that a large language model will just suddenly become this artificial life form. But language is not a life that's not how language is a form pattern recognition math. And the easy way to understand this without having any understanding of the math behind it is if you walk into a room and there are ten people speaking different languages that you've never heard before,

10:37

not languages that you don't understand but have heard before. No, no, no, languages that you've never heard before, and you will not be able to distinguish between all that noise one voice from another. However, if one of those ten people start speaking English in that room, all of a sudden, you'll be able to zone in on that wavelength and filter it out, and you'll be able to hear the English

11:01

from the noise. This is how you know that language is a pattern recognition math and you're not born with language. Another thing to understand about language, two people in this world can hear the same sound and interpret something totally different from that sound. In other words, there can be a word in English and then another word in a different language that sounds exactly the same. That mean two totally different things. Two different people can understand two different

11:28

things from this same sound. So this is how you know that language is an acquired skill. That is a form of pattern recognition. And this is not anything that you would call a life form. This is a talent, you could say, but it isn't a life form. And one of the other things is and I actually brought this up just so you could see in case anyone forgot from their seventh grade science class, but the definition of life to this day still does not include intelligence.

12:02

And in my opinion, intelligence is equal to life these definitions. If you read what this is saying here on the screen, the actual definition of life, this sounds like primates trying to describe humans horribly. By the way, this is not the definition of life. If you take away reproduction, for example, is part of life. It is an essential element of what is considered life. If you took away reproduction from everything on Earth tomorrow, does that mean there's no life

12:32

on Earth? That is how life works. So what I'm trying to argue here is that intelligence is part of the definition of life. And in case you're interested, there's something else that I want to say before I get into the actual formal presentation as a little set up here, So if you're interested in understanding more about intelligence and perception, my home basis perception. By the way, AI my focus

13:01

area was perception. If you want to understand perception, it turns out, and what I'm going to demonstrate in this presentation, it turns out that computer vision by accident implicitly understands neuroscience. And I'll describe this in detail as I go through the presentation. But if you want to learn more about what I'm talking about on the neuroscience side, which I'm

13:27

not going to do too much about. I'll talk mostly on the AI side, but if you want to learn more about the neuroscience side, I highly recommend this lecture series, and this one will break it down easy enough for you to understand some basic neuroscience about vision. So computer vision, in my opinion, doesn't get the rep that it deserves in AI. And here's something to note about all of this.

13:50

So as I was saying, intelligence is not part of the definition of life, my dad was a neurologist and a psychiatrist, my mom is a pharmacist with a bunch of degrees and all this stuff. There's still disappointed that I'm not a doctor. And what what is interesting and what I've learned over the years is a two major things. So number one in evolution, one of the things, one of the first structures that was ever evolved was a

14:18

photoreceptor attached to a flagella. A photoreceptor just says, so a flagella is a tail, and it was a photoreceptor was attached to it, and a photoreceptor it just says whether light exists or not, yes or no, one or zero. And so you can see that this is the beginning of binary map, whether light exists or does not exist. So another thing that is commonly known but often forgotten is that eyeballs are brain cells. So this is very

14:49

important to understand. In the human eye, the ratio of cones uh they're there are these two types of uh self. There's these uh raw shaped cylinders which are gray and they determine grayscale. They determine basically opacity light light, the gradient of light, brightness of light, and grayscale. And then they have these cones RGB red, red, green, blue. The ratio on this is roughly twenty six three one. Roughly if you're a female, you'll probably have a little bit

15:24

more blue, so twenty for twenty. The ratio of cones in the human eye is overweighted gray cylinders. Ratio of twenty on the gray, six on the red, three on the green, and then one or a little bit higher than one if you're female on the blue cones. So that's interesting. And what are the part that I'm going to skip in a neuroscience is about V one through V four cortex and the visual cortex. And this is outside of the you know, the purview of the presentation.

15:56

But if you want to go deeper down the eyeball route, recommend after watching this presentation on the AI side. For me, if you want to understand what I was saying earlier about how computer vision experts implicitly, whether they know it or not, accidentally already understand the visual cortext just from computer vision, because computer vision is doing exactly what the visual cortex is doing exactly. It is literal, and I'm

16:24

going to show published papers that demonstrate this. So what I'm saying is that computer vision in the modern day is mimicking the brain, and it is published and known, and computer vision experts, whether they know it or not, accidentally figured out the visual cortext and understand it implicitly. An example of this in computer vision, we understand why

16:47

you want to use grayscale to detect motion. First, it's energy efficient and that's what happens in V one and that's why you have predominantly gray cylinders and your eyeball. And then there are functional purposes for RGB as well, and this is also discussed in computer vision. Okay, so that was a good like little preamble for the setup, and hopefully you're seeing that this is shit that no one's ever talked about. And this is real.

Speaker 2

17:17

AI, have you spent a lot of time? Go ahead, I was just gonna say, this is right up your all.

Speaker 1

17:24

He's upgraded all of these symbols in occultism directly to the eyeball, and the eyeball is so central to occultism in general. I mean it's in the word itself. And he breaks down all of these symbols and how they relate to stuff, and it's just like, yeah, it's brain tissue, this is what's programming.

Speaker 4

17:43

I also know the connection and spirituality, and I wasn't going to discuss it that much, but here's something to know. In the beginning, there was light and another thing about Yeah, I know a lot about that, but that's a whole wormhole and if I say something about it, we'll have to oh for like an hour. But in other words, yeah, there is it's it's there's a lot of indication that in religion that there was an understanding of evolution, especially

18:11

in Hinduism, there was an understanding of evolution. And there's a lot of not just written documentation of this, but there's also archaeology that demonstrates this. Uh literal you can see it with your own eyes, thousands and thousands of years old, and they're demonstrating that they understood evolution and more beyond that.

Speaker 2

18:30

You know, it's real quick, you know, no, go ahead, let's go ahead, let's okay.

Speaker 1

18:36

Thees zinc calcium reaction at conception. It's a flash of light.

Speaker 4

18:41

Yeah, there's a Yeah, it's kind of like the Paho electric effect.

Speaker 2

18:44

But real quick, is she? I think it's interesting.

Speaker 5

18:48

You know.

Speaker 3

18:48

There was when a boy I think actually friends with them pravine. Uh, he was a cosmic summit. I went up to him and I said something to him. I was trying to explain to him. I was like, I know, like what you were showing on the screen, you were showing with something else. I said, but have you ever wondered, like if those are depictions of inside your rible?

Speaker 2

19:07

And he like like lost it.

Speaker 3

19:09

Like he like I looked like I literally like freaked him out to where.

Speaker 2

19:13

He like sat down and sat sat down to study eating his food.

Speaker 3

19:16

He's like, oh yeah, I was like, yo, I literally just bugged this guy the fuck out.

Speaker 2

19:19

But like you get it, you get it.

Speaker 4

19:21

Yeah, Pravine is one of my most favorite.

Speaker 2

19:25

Yeah, yeah, maybe you could maybe you could. Maybe you can tell that and he'll take it a little bit more serious, like.

Speaker 4

19:29

Oh, I've I've talked his ear off about this stuff. He's fully aware of everything.

Speaker 2

19:33

Oh, I don't know. He looked like he was bugged out when I said something to him, like he.

Speaker 4

19:36

Looked at that time. Oh okay if I had just met Pravine at that time, so he didn't know that much about me at that time. But we became good friends over time. Oh. Nice, I'm actually gonna go to Cambodia with him and a couple of months or a few months.

Speaker 2

19:51

Nice, that's awesome.

Speaker 1

19:52

Be careful of those planes.

Speaker 4

19:54

A lot of yeah, yeah, yoh.

Speaker 2

19:58

We'll have to get you on with Day to talk about the eyeball. I did like a five or six.

Speaker 3

20:01

I don't have many plots series with the cult symbolism in the eye, But if you ever want to come back on and talk about that ship, we're talking about.

Speaker 5

20:06

The eyeball at all. Day Man.

Speaker 2

20:09

A lot of my lot of all.

Speaker 3

20:10

My art, believe and I see the crossing of the arms. That's that's the crossing of the optic nerve. Like a lot of my art literally is depicting the inside of the eyeball, the one that I have like a chick.

Speaker 2

20:20

It looks like she's on a sea show.

Speaker 3

20:22

That's like the hyloid collac canal and the oriserata and then the ac humor underneath. I totally all my ship's all basically depicted on an eyeball on the brain man.

Speaker 4

20:31

And don't forget about the pineal gland, right, yeah, yeah, so we got all and then the third eye and everything. Yeah, there's quite a lot to do with vision. And uh, one of the how do I say so? One of the there's another thing that I want to discuss about evolution before I get into the formal presentation, and then we'll go from there. But one more thing about evolution. And I'm not sure how this happened, but for some reason in this zeitgeist, it is commonly thought or said

21:04

that DNA is the cause of evolution. And this is just not true. There is no scientists saying that. The people who discovered DNA didn't say that. Darwin didn't say that. As far as I know, there is no official source anywhere that is saying that DNA causes evolution. DNA is the product of evolution, and that's why they're correlated, but correlation is not Causation is understood at kindergarten level, and as far as I know, no scientist is actually saying that.

21:40

But for some reason in the psycheiasis is commonly thought that DNA causes evolution. This is not true, and there are it is academically understood and known. It is academically accepted that the brain has known mechanisms for controlling DNA. And even every in your everyday life, your DNA is not static. Things turn different, Switches turn on and off in your DNA all the time, and so and this mechanism is controlled by the brain, and often environmental factors

22:09

are the reason for why this happens. So again, and another way, just like if you if you're just really thinking about it from a high level, you've got to understand that, like, uh, there's so much infrastructure that's required in order to make DNA work. In the first place, you need to have the hisstones to zip the file, because I look at DNA as being software. You need to have the zip hisstones to zip the file. And then you need to have a way of unraveling it.

22:35

You need to have a way of splitting it. You need to have the transcription enzymes to transcribe the DNA. You need to have all the different proteins already there in order to even transcribe them in the first place. So there's and and then so much more. In order to make DNA work, it needs all of these other things as well, and so in order of operations. But

22:56

how is like DNA by itself doesn't do anything. There's no magic that happens with DNA by itself, And so in order of operations, you would need to have all of this other infrastructure available or at least functional to some degree before DNA becomes useful. So in terms of order or operations, DNA can't be at the front of the line. This is the software. Software doesn't come before the hardware. So this is kind of you know, you can kind of understand this stuff at a pretty basic level.

23:24

This is all commonly understood and all this stuff, but for some reason, there's just a different idea of how it works out there. So let's get into the presentation. So what I was doing at Arizona State University, as I was working on computer vision applied to artificial intelligence in automated vehicles, and so I was trying to apply

23:48

camera based systems to automated vehicles. And one of the issues that I ran into in automated vehicles, especially at ASU, is that a overwhelming amount of investment and research in R and B effort, especially in academia but also out in the real business world, has been dedicated to the task of making liedar part of the automated vehicle apparatus system. And what I'm sure I'm going to be arguing in this presentation is that LDAR is completely useless. It's actually work.

24:31

It makes your system worse, not better. And I want to be clear upfront. Lightar is the real technology and it does have real world application. It's used in space, it's used to map layout. There are actual, real functional purposes for lightar. But what I am going to hit hard on in this presentation is it LDAR makes no sense at all in an automated vehicle. Doesn't matter what your philosophy is, there is no reason to have lightar

24:58

in a vehicle. So automated vehicle so me since maybe not everyone is familiar with how light our works, just to set it up the way that light our works is so the major companies that you may have seen that have light or Weimo is the big one in the US and Waimo is owned by Google. And then in the past there were other companies that have also

25:21

used them. There was General Motors GM, but they ended up having a problem a couple of years ago in California where they ended up using a light our based system and this car dragged a person underneath their vehicle for like twenty or thirty feet, and so then they had to end basically operations. They were even caught lying like trying to deceive about it too when they were doing it. It was all over the news. People probably remember another company that was doing it was Uber and

25:49

I live in Phoenix, Arizona. This is a heavy testing

25:52

ground for automated vehicles. And the reason why Arizona is a heavy testing ground or Phoenix specifically is a heavy testing ground for automate vehicles is because we have three hundred and sixty days of sunshine and we have some of the widest and best roads in the US because we have massive trucking that comes through here, and so this is and we have a very grid like road system, very squared, grid like road system is very clean at organized, so we're pretty much ideal set up for this, and

26:22

so WEIMO eventually came here, and what they ended up doing is they ended up going to ASU. WEIMO did, and they ended up investing a huge amount of money into their engineering department. And that was very problematic for me, which I didn't know about at the time, because I was going in there applying with this presentation for a PhD in computer vision to use it for automated vehicles. And so you'll see why this gets to be a big problem when you're in the hornet's nest with WEIMO.

26:54

So before I get to that, let's kind of build it all up and kind of set up what I'm talking about here. My home basis perception. But and by the way, if this math is scaring you, don't worry. I'm not going to get to mathavy. I'm breaking everything down. You don't need a degree to understand any of this stuff. But what is important here is that there are three major elements to UH automated vehicles for doing the doing

27:22

controlling automated vehicles. So the first element is perception, and then dynamics and then control and then basically what I have on this side is intro to robotics. So this is this is the functional math for how you do robotics. And what I have here on the left side, this big old crazy looking equation. All this, all this is

27:43

saying is what is the least wrong answer? So based off of these three equations on the right side, the left side is trying to figure out what is the least wrong answer based off of that, And what you'll see is that perception, which is ZT. Perception is used in all three equations, is fundamental to all three equations. So perception is the most important task in robotics. Perception.

28:13

So here are some of the major brands. So remember that this presentation was done in twenty twenty three, so some of this is a little bit dated, but almost all of it is still relevant today. This is a company called Argo on the top left here. This was owned by Ford. This is a now defunct company. But they were also trying to force fit lighter into their cars. They weren't able to do it, so they went bankruptor

28:43

they closed on the company. But what I wanted to point out about this is that they have these light our things on here. But notice the cameras that they used are ring cameras, literal ring cameras, and so you can tell that they went way out of their way to do go all in on cameras. Huh. Of course, Weimo's the big one. This company down here in the center is this should be Wave if I recall correctly, it's been a little while since I've done this. And

29:12

then there's a couple others. It's fine. There's a Chinese company over here called Xpaying, which people should know about if they care about this stuff. So LDAR is so prevalent and overwhelmingly so. Almost every major company has been trying to force fit LDAR into their automated vehicle, and almost every single one of them is failing. The only one that is achieving scale, despite what narratives are, the only one that is achieving scale on the autonomous vehicles

29:44

in this world is Tesla. And so Tesla is a camera based system that does not use any other perception sensor, just cameras and way moos use lightar and camera. So here's another thing about LDAR. Even if if you do use a light ar system, no matter what, you still need a camera based system. Doesn't matter who you are, what philosophy you believe in it doesn't matter anything. You still need a camera based system. So I'm going to discuss why that's important or why that is. This is

30:19

like Chinese companies. I'm going to skip some of this stuff because this presentation is actually quite long, and so I'm going to skip it to condense it quite a lot and stick to really the most important points. So one of the issues is that lighter has got this spinny thing that's going around. You've probably seen it if you've seen a Waymo vehicle or something like that. Zeukes I've just seen in Las Vegas has also started to deploy, which I believe it's owned by Amazon. This is a

30:46

relatively new addition, but again still using widar. And what a light oar system does is this little spinny thing that I'm talking about. It shoots a laser beam and it shoots a ton of these all over the and then there's like you're supposed to get a reflection back. This is an imperceptible beam of light, laser beam of light, and then based on that time and distance, they're able

31:09

to get a point return of information on depth. So, in other words, this laser beams functional purpose and what it's doing, is it's trying to measure depth of the things around it. The actual literal laser precise depth calculation, and it is laser precise, it is super precise depth calculation. But this is effectively the entire in its entirety, the full and complete explanation of the functional purpose of why these light oar systems exist on these cars is for

31:44

precise measurement of depth. But lightar is like a dumber version of sonar, that's the way to think of it. And it creates a point cloud return kind of a grip around it in point cloud return. And I'm going to show videos and pictures in all this of that. But what I want to show first before I get into the videos of that, is I want to show why a few different cases of why lid are doesn't make any sense in an automated vehicle. Why am I

32:17

saying that so strongly? So this what I have on the screen, And by the way, every slide should have a source on it, So like down here, this is the actual paper, academically accepted paper, and i E. The two major conferences I'm going to be talking about are i EE and CVPR. I EE is like basically the biggest engineering conference in the world, and CVPR is the

32:43

largest computer vision conference in the world. So all of these papers and slides and presentations that I'm going to be talking about are all from the largest possible conferences, the most academically accepted conferences that there are in the world, not just in America, in the world. So this paper that I have on the screen is a little dated. It's from twenty twenty one, but this is considered originally the foundational paper on identifying what is known as VRUS.

33:12

VRUS stands for Vulnerable road user and this is a regulatory term, not an AI term. This is a regulatory term and it is basically a catch all phrase for pedestrians, dogs, cats, et cetera. And what this paper is showing is that it at that time and it identified thirty two attributes on identifying a pedestrian, which is like one of the most important tasks that a computer, any automated vehicle needs

33:40

to do. And if you look at what these attributes are that they're talking about, like a handbag, clothes and things like that, the bags that they're carrying male female. The issue with a point cloud return is that it only gives you the shape. It does not give you the color quality that you get from a camera. So liedar is like a one dimensional point. It's just a

34:10

point that tells you death and that's it. And you get a whole bunch of points, millions of points, and then hopefully you can identify the shape of an object with these points. That's what light our does. Well. It turns out that if this is what you're prioritizing, and this is what you're trying to maximize for, which is what a lot of pretty much all these light our companies are doing well, you miss out on all of those extra fine quality details that are actually most relevant,

34:38

the most important aspect is this stuff. And this is academically understood. So this is in twenty twenty one, so this is dated. So this has actually been improved significantly since then. But this is just a cited This is known for like a long time in the industry. I'm gonna kind of skip this part and go to some

34:59

more of the meat. So right here is something called this is Argo verse, which is Argo that the company that was that company that was owned by Ford that was also trying to force fit LDAR into their into their cars. They're now defunct, but in twenty twenty three at CVPR they presented this information that I'm showing on the screen. Now. One of the things that has happened in academia, and I'm pretty sure, at least as far

35:27

as i'm aware to this day, is still true. One of the things that has happened in academia is that research around light our degradation over range seems to be heavily suppressed. And before ARGO went defuncts they published this and this is, as far as I'm aware, the only major company that has ever published anything like this. What this is showing on the screen here is the accuracy on object and classification identification using light ar based systems.

36:04

So you can see in their best case scenario they're getting about fifty one percent accuracy at zero to fifty meters of range, but once you go to fifty to one hundred meters of range, it drops by half, and then once you go one hundred to one hundred and fifty meters a range, it drops by half again. So what this is showing is that it's showing a log to degradation in accuracy every fifty meters using a light ur based system. And I just did the math down here.

36:35

This is some basic maths. This is about as intense as the math is going to get. At sixty miles per hour, that's twenty six point eight two meters per second. That means fifty meters of accurate range. Gives you two seconds of forecasting time, so you're able to see at two seconds of what ahead of you with with this system at sixty miles per.

Speaker 1

36:56

Hour, this completely takes automated trucking off of the board. You cannot have those type of tolerances if you're going to actually have a truck that's now commanded by AI or whatever.

Speaker 4

37:10

No doubt about that. Yeah, that's why there isn't a light hour based truck as far as I'm aware, it's not one that carries weight. But there is the Tesla semi which does exist and is on the roads. It is working, and there are millions of vehicles Tesla vehicles, automated vehicles on the road. In fact, I have one. Oh I should also mention this. I've been beta testing the Tesla automated system since twenty eighteen. I have more than one hundred thousand miles driven on their system since then.

37:39

Over multiple cars and it's night and day. Comparatively to anything else that you can never see, it's like ridiculous, how much better it is. And again, Tesla has millions and millions of these vehicles on the road and they're all over the country doing this and actually the elsewhere in the world. But Waimo only operates geo fence spaces and a few select cities. They only have two thousand

38:04

vehicles in their fleet. And I'm going to get into more about all that, but it's that that's worth keeping in your mind for now, and it's gonna come up later again why that's relevant.

Speaker 3

38:16

So I think I did want to ask you though, like what do you think about like light art kind of being used on the ground to service, Like I mean, you aren't even using that as a way of like kind of like a ground truthing for like these satellite images like even on the Pyramids or other stuff.

Speaker 1

38:33

That's fine, all right, yeah, ye think about this. So they're only taking in one wavelength of light, so they don't know what what the fuck anything is. It's just a distance. You've only got one dimension of vision. It's it's worse than gray scale.

Speaker 4

38:51

Yeah, And I'm setting up a few things here on purpose before I show you the video of what the light are actually looks like, because I want you out kind of see how many different problems problem scenarios there are. And then when I show you the video, you'll see you'll have all the context you need for the video to make sense. So you're going to see that, don't worry, it's coming. You're going to see why how much of a difference there is between light R and basically vision.

39:18

It's like not even close. So what I was saying earlier though, was that light our based systems require geofencing, meaning they have to map to a specific area and they can only operate within this specific area, whatever this area block is, and they have to go through this area over and over and over again, and then they map it. It's effectively memorizing the area. And that's what geo fencing is called, and that's what WEIMO does. They

39:44

don't operate everywhere in Phoenix. They operate within confined regions and that's for a reason. And another thing about this is that that means that you're only allowed to do in that system. In a light our based system, you're only at this point as far as I'm aware, and there's no technology that can beat this. Are only allowed to do point to point travel, specific point pre determined to specific point predetermined. And this is different than say

40:12

at Tesla, which can do anywhere to anywhere travel. And so that means you can start from anywhere go to anywhere Whereas a way, MO has to start in pre defined spots only and go to pre defined spots only, and these have to be these are finite. It's not anywhere in that space. So even within the geofence region, they can only go from specific place to specific place,

40:35

not anywhere anywhere you want in there. And so one of the things that has happened, and one of the distortions that has happened, this is infected the regulatory space at a kindergarten level, this is broken somehow, some way.

40:51

Waimo and other light our companies have convinced everyone that point to point travel is more difficult than anywhere to anywhere travel, And to this day these systems are still classified that way, where point to point travel, which is an ADS system in regulatory terms, is somehow, some way from a regulatory perspective, it is somehow more difficult than anywhere to anywhere travel, which is an ADAS system. Sort of well, I should be careful what's saying ADS versus eights. Actually,

41:31

let me come back to ADS versus ADS later. But the point is that somehow they've convinced regulatory people that point to point is harder than anywhere to anywhere, when it's clearly the other way around. By far, it's exponentially more difficult to do anywhere to anywhere travel than point to point, especially when you're not confined by geofencing. One of the other things is that these systems, these light

41:54

our based systems, are extremely expensive. Now I'm sure the cost has come down over the past couple of years, but in twenty twenty three, the average cost on those weay moths for the lighter based system that apparatus that they put on there, that cost is fifty thousand dollars just for the system alone. Then you have to at that time roughly maybe it's come down a little, but

42:14

it's still stupidly expensive. And then for whatever reason, they're putting these things on Jaguars, So that's another eighty thousand. So just between these two things alone, you have more than one hundred and twenty thousand dollars being spent on these systems one hundred and twenty one hundred and thirty thousand, and that's not including a safety driver, that's not including R and B, that's not including any kind of testing or anything.

42:38

That's just the hardware alone with nothing else. No software is even put in it, No AI has even been put in there yet, So already this stuff is automatically absurdly overpriced to begin with, whereas the Tesla vehicles are thirty forty thousand dollars by comparison all in with everything already done, and so the economics here is something that people need to understand is that there is nothing like the scale is not reasonably possible with a light our

43:16

based system because the expense on light our based systems, even after the costs have come down, is still absurd. By comparison, a camera costs about one hundred dollars. The cameras that are on the Teslas that they've been doing for years and years that do all this stuff anywhere to anywhere, travel just one hundred bucks each, so the

43:37

cost difference is absurd. One of the other issues with geofencing is that if geo fencing and basically memorizing the area means that you have to expect that the area is static, because as soon as the area changes in the material way, your training data goes out the window. And so what construction becomes a problem because these space says will continuously change. So as soon as construction arise, all your training data for that area is gone, and

44:05

then you have to retrain with the construction there. And then as soon as the construction is done, the whole area changed again, and so you have to redo the training with new data all over again. So it is not easily adaptable. So one of the big problems with interacting in the real world is that it's an open set environment. What that means in mathematical terms, what that

44:28

means is that there's infinite possibilities. So if you're trying to memorize an answer, which is what basically every light our system is trying to do, if you memorize an answer,

44:38

you cannot possibly memorize infinity. And the problem they're trying to solve is an open set problem, which means there's infinite possibilities, So you cannot memorize infinity, so you are destined to fail from the beginning if your system is dependent on geofencing or memorization, which it's basically the same thing.

44:58

So just to kind of demonstrate other issues that come up with light AR, this is an intersection in China, And if you look at this intersection in China, just look at the heavy complexity that's involved here, the path and complexity, all the lands, all the turning, all the different conditions that would be required, and the amount of traffic that you would have normally going through this. How

45:20

intense it would be to navigate and manage all that. Well, this is yet another situation where light R doesn't do shit for you, nothing at all. Light R does not help you with this problem in any way, shape or form, exactly zero. It's useless, doesn't do anything. Same thing as true in America. I was just showing this one because one of the issues is that you want to be able to generalize an answer. So if you try to train on just US infrastructure, then you're pigeonholing yourself to

45:49

US infrastructure and you can't work anywhere else. And one of the things I'm trying to get it is I'm trying to model off the human brain, so we need to find a generalizable answer that works anywhere regardless of where you are a situation. Here's another problem. You have all kinds of different signs, all kinds of different lights, and they're constantly evolving, and there's so many different versions of them, and they constantly change. And this is another

46:16

situation where light ar doesn't do shit for you. Exactly zero. There's nothing about lightar that helps you with anything here on the screen, zero exactly zero.

Speaker 1

46:27

So they ishud on our streets. I mean, if they can't read a light, what the fuck are we doing letting it drive around?

Speaker 4

46:36

Not a clue? I mean, I will make the argument for I will a little bit later give the fundamental argument about why light R is used, which is about depth perception. This is the fundamental functional purpose that light our serves. But we need to get a little more involved into this presentation before I really hit it. But the argument for lightar is it is accurate death. And

46:59

I'll get into that a little later. Let me kind of do all the setup for everyone, because these are all going to be new concepts, so we're gonna want We're going to do this right. Let's put it that way. So, as I was saying before the primary issues that we have an open set environment. If we're trying to interact in the universe, in the real world, then there's going to be an infinite number of things that can happen. So there are three major categories of problems. There are

47:24

the known common problems that we always know about. That's this green bubble. Then there's the long tail known cases, the rare events but we know about them. Those are that's the blue circle. And then there is this much larger orange circle, which is the long tail unknown events, the unusual events that we don't know about, which is the actual larger set of problems, way larger than the

47:50

other two. So just to give you some examples, at least once a year, in this top left photo, at least once a year, there might be a dinosaur cross the road. And on the top right photo you have a literal it's I know it's kind of hard to see, sorry small, but that's a literal plane on the road.

48:10

And then you have a few other different issues here, like on the bottom that and we'll hit on a lot more of this stuff, but you can see how there's a lot of unusual things that happen in fact, the number of weird things that can happen is probably way higher than you really are common Like, even if you drive a lot, you'd be surprised how many weird things happen that you've probably personally never seen before. So in particular, what I want to highlight here is the

48:38

center top video and the center bottom video. So in the center top video you have a bunch of small mammals all over the road. In the center bottom video, you have like flying pages. Now I was mentioning earlier that the way light our works is that lied Our tries to do object detection and classification by shape. So what really is problematic is if you're trying to classify by shape, what ends up happening is if you get

49:20

this wrong, meaning you reverse these two scenarios. In the top scenario, you may go very cautiously because of these mammals. In the bottom scenario, you don't want to you're on the highway. On this scenario, you don't want to force the break right away. You would actually go really aggressively

49:36

because you would know that these are pages. However, if you're classifying by shape, you might reverse these two scenarios, and that's when the horrible stuff happens, and so this is another example where shape can easily be fooled and where the quality, the quality of color becomes very relevant and important. And in fact, not long ago I think it was actually when we were at the comp for instant weimo run over a cat and so uh yeah, so I mean that also became really I'm a big

50:06

cat fan, So that's that really hurt. And of course you have a lot of unusual scenarios.

Speaker 1

50:13

Top right, what's the difference between a cat and a bag blowing in the wind. If a cat's at full sprint, how do you tell the difference in shape from that? And maybe a bag that's now in the shape of a cat.

Speaker 4

50:27

Yeah, you would be surprised at how much better computers are at computer vision than humans when you really want to do And one of the things that I'm not going to hit on it, but maybe it's worth mentioning since you asked, is that once you can do the visual spectrum, why can't you do everything else? There's nothing stopping you. And so that's exactly what the evolutionary path will look like when you get really advanced. Is that

50:50

once you can do wavelength light there's nothing stopping. That's a very tight that's like the smallest liver of the wavelength available to you. Once you can do that, you can do all the other stuff. So just add infrared for example, and you can automatically see through smoke, you can see through fog, you can see through you can see at night. You'll be able to do all kinds of stuff. So this is all and it's all vision, not anything to do with light, ar or anything like that.

51:16

And of course you'll have some unusual situations like in the far right scenario that's a random washer rolling down the road, and so that can also be problematic. Sometimes you'll have trees that are in the middle of the road, like a bottom left and then this situation I'm going to bring up now discuss it a little more. But second from the bottom right, this is a trash can

51:39

that's blowing from the wind through the road. Now, the reason why in AVS for a brief period of time this scenario became complicated was because there's this thing called kinematic bits. These things that kinematic bits is just motion bits, motion information data about motion. And so so people started training on kinematic bits in addition to the wavelength light and stuff like that. And so what they did was they trained the system to be able to recognize every

52:09

single trash can there is, and it did it. But one of the problems was that the training that they did was on trash cans that didn't move, and so when they deployed it, there was this trash can that started moving and so it had kinematic bits, but every trash can it was trained on had zero, no kinematic bits. So trash can is not supposed to move. And so you can see how a detail like that will mess

52:34

up the system. And so one of the things that's important is to understand where are all the advantages and the disadvantages, and kinematic bits are actually an advantage, you just need to make sure you train on them. And so this is earlier. This scenario that I'm talking about has already been figured out. Don't worry. This is you know, long ago. So people have figured this dumb shit out. But all we had to go through the evolutionary process.

52:57

And I think in some of the that I've seen, you know, because there's so many companies that are trying to do light ar at this point and I see all kinds of horrible stuff, and I see a lot of cheats as well, And I'll talk a little bit about one of those cheats right now, which is a bounding box. So this is a in computer vision, this is not a big deal. But in basically any other like what do you call it, any other modality, this

53:27

bounding box problem I've seen is a big problem. And even in twenty twenty three, I was still seeing papers that were being submitted to CVPR or to IEEE that we're using bounding boxes. Now, that paper that I was showing before about the pedestrians the vr us in twenty twenty one, that's a found I was saying that the foundational paper heavily cited, well understood, academic accepted, you know, big deal. That paper also demonstrated that bounding boxes are

53:55

a cheat. And what a bounding box is is that not only do you need to be able to identify what an object is, but you also need to be able to position that object on the frame, be able to identify that position. And what a bounding box is basically a box that goes around the object you're trying to identify. And the issue with this is that a bounding box is going to always be larger than the actual object itself, and it's not going to be the

54:23

shape of the object. The bounding box is just going to like for example, in this circle that you're seeing here, the bounding box in this example is way larger than the actual truck itself, but so you can see how much larger it is. So in the scenario like you have in the middle here with this overturned truck, if you imagine a bounding box with this kind of error rate on it in terms of the size of its shape, and apply it to this truck, the bounding box is

54:51

going to be bigger than the frame. Now, why that's important is that it's easier to see on this bottom left photo. That's important is because if you're using a bounding box and it's larger than the object, you can see that there's bounding box on the side here. You can see here this side here is covering up the road.

55:13

So from a from the AI's perspective, there is no road to travel on, and so you'll see the AI systems start to stutter here because it doesn't think there's actually anywhere to go, any room to go, and so it has no way of path projecting through a bounding box because you're not supposed to interact with this object. So this is why bounding boxes are a cheat. And this is still, as far as I know, this is

55:36

still actually happening. Unfortunately, in computer vision, I think AI specialists have figured this dumb shit out, but in everything else, I think they haven't. And this is a real shame. And I'll show you the solution to this problem, how you solve the bounding box problem. But I just want to bring it up right now that I've seen many light our based systems use this, and this is a cheat.

55:59

This is a known cheat, and this is academically published as a cheat, and it's unfortunate that this thing is still happening. So here's the WEIMO system on the on the right here, and you can see the little spinny thing that's a lighter. And like I was saying, no matter what your philosophy is, you still need to have cameras anyway. And what this is doing is it's kind

56:23

of showing the cleaning system that they're using. So this whole apparatus is being put on top of jaguars in Phoenix and in La San France and maybe a couple other cities now Austin, I think also, and so what I want. So there's a couple major issues with this. So number one, there's a reason why we do wind

56:45

tunnel testing. And so if you're putting all these light oar systems on it, and it's not just this little thing on top, it's also got light oars on the front, on the two corners on the back, and the two corners I think on the center back as well. So you're increasing the amount of mass and you're increasing the amount of surface area on the exterior of the vehicle in order to do this. And there's a reason why

57:06

we do windsital testing. So if you do all this, you're gonna basically invalidate all that and this is going to affect your range efficiency. So you're gonna have range problems as a result of this. And this is especially going to become a problem at speed. And so the higher speed you go, the more wind resistance you're going to get in, the more problems you're gonna have with range. Another issue with this, and what I was showing here with this video is the cleaning system. And if you

57:30

just apply one brain cell. Another thing you can do is you can just take these cameras and put them behind the windshield and then you don't ever have to do this dumb shit ever again. And then you can use the cabin features like the windshield wiper and the defrost settings that are already there and you don't have to build anything new and it only will cost you like a couple hundred dollars and that's it. So that's

58:00

the kind of dumb we're dealing with. So this is why I needed to do this setup for it before I get into the more complicated stuff, because I need a level set on what kind of level of dumb we're dealing with. So I'm gonna skip this. This is called a distribution shift. I don't want to get too technical, but this is basically saying some stuff about time and how it's relevant. I want to point this company out

58:25

just briefly. I'm not going to go deep into it, but I just want to just so that people don't think I'm just promoting Tesla whatever, and that's it. This is another company out of the UK. This is the company's name is wave Wayve and this is run by a guy named Alex Kendall, and they're trying to do a monocular based system. So a monocular based system is one camera and they're using like other things like HD

58:50

maps and probably a couple other things. But the point is their system is one camera and that's all they're using, and this is based out of the UK, and they're trying to build it up. I don't think they're fully deployed yet, or at least not that I'm aware of. But what is important to understand here is that even

59:12

with one camera, automated vehicles can be achieved. And they've done a really good job with this, Like I'm skipping a lot here, but like it's actually impressive what they're able to do with one camera and how money abstract ways that they've been able to improve on the system

59:28

that are unconventional. And so the important aspect to understand here is that again, the camera's one hundred dollars and this camera based, this singular camera based system is still capable of doing automated vehicles, and Waimo's spending one hundred and twenty thousand dollars one hundred and thirty thousand dollars per vehicle to not really achieve that so it's one thing for me to say that Waymo's messing up and stuff, but it's another thing to act. Showed the hardcore data

01:00:02

on it. So I just want to bring this up real quick. Well where to go there it is. This is from the NITSA dot gov website. So this is official crash reporting data. This is a standing general Order on crash reporting. You're required to report and this is NITSA dot gov if you can see at the top. And this is national reporting. So any automated vehicle that is operating on public roads you're required to report if it's involved in the collision. And what this is again

01:00:43

national reporting. So I want to iterate again. WEAIMO only operates in geofen spaces within specific cities in the country, and that's it. WEAIMO. This is this data is being reported on. The last reported on November seventeen, twenty twenty five. This is continuously updated every couple months, every few months. I've been following it since twenty twenty three, and WEIMO has about two thousand vehicles on the road. And you can see here it's kind of tiny. Here, I'll read

01:01:12

it for you. It's one thousand four hundred and twenty six collisions. So in their fleet of two thousand vehicles across only a few different cities, they're reporting one thousand, four hundred and twenty six collisions. Now, another thing to know about this, Weimo recently expanded to a couple new cities, so that fleet of two thousand vehicles is recently like

01:01:42

increased recently. In May of this year, I saw they had like twelve hundred vehicles on the road, fourteen hundred vehicles, something like this, and they were reporting roughly eighty percent of their fleet has already been involved in a collision. And I was I've been watching this chart since twenty twenty three, and on average it's eighty percent and sometimes I've even seen more than one hundred percent. In other words, on average more than one collision per vehicle in their

01:02:14

fleet on Waimo. Now, let's compare this is Tesla's collisions. Tesla has I think five million vehicles on the road with the eight ass system, millions and millions of vehicles, and also around the world, not just in the United States, and they're reporting two thousand I think that says eight hundred and forty five collisions with millions of vehicles on the road, so pretty large, massive magnitude of difference between the two. The collision rate with Waimo's is pretty insane.

01:03:05

I mean, any other vehicle company that you've ever heard of having a sixty eighty hundred percent collision rate and still operating after all these years, I've never heard of one, but Waimo seems to be able to get away with it. And so one of the other issues.

Speaker 3

01:03:20

Here, well, one thing I did want to ask you it might be a little too conspiratorial. I was wondering you think the LIGHTO is maybe the actually map the area, Like it's an excuse, that's just one of the excuses it's there.

Speaker 4

01:03:32

There is actually that excuse it is made, and that is true.

Speaker 3

01:03:35

Like I wonder if it's actually like collecting data and you're thinking like, oh, it's on top of the car to help get it somewhere.

Speaker 4

01:03:40

It's too Actually, yeah, so this is actually true. So you're you're making a good point. There is actually a temporary use for light ar in automated vehicles, and this is known so one of them and I will get into it a little bit later. But one of the major issues is calculating depth This is pretty much one of the big problems that needed to be solved in

01:04:00

AVS is depth calculation. And one of the things that they learned is that no matter what your system is, even if you're using a computer vision only or into system like Tesla, one of the things that they learned is that you can temporarily put a light our system on the vehicle along with the vision and as you're training the model, and you can map the area to measure depth calculations around you, and you can use that to help train the vision system on how to train depth.

01:04:33

So but then you would still take the light our system off after that. The only reason you would have it on is temporarily to get depth calculation to get the system kickstarted into training, but then you would remove it later on and it would be able to calculate depth off of the vision system alone. Now even this is not required anymore, but there was. This is an argument that has made this is sometimes done and this is supposed to be temporary, but this is not what

01:04:55

Wemo's doing. Weymo's making it, and they've multiple times set this in public ands including their executives. They're all saying that light oar is here to stay for them, meaning that they're trying to make light are a fundamental element of their solution. The fund, I should say, the fundamental element of their solution, which is nonsense, complete nonsense.

Speaker 1

01:05:19

The vital solution for the Silicon Valley nazis.

Speaker 4

01:05:23

He well, well there's stuff there. It's important to understand that Weimo is owned by Google and there are numerous incentives for doing this. What I had started off by saying was that when Weimo came to Phoenix, they went to ASU and they started dumping enormous amounts of money into the engineering department at ASU. I didn't know that at the time, but that's what ended up getting me basically in a lot of problems. And so when I was applying to my PhD at ASU, I was using

01:06:01

this presentation. I presented this to all the lab directors at ASU and the Perception and Robotics group, which is a five floor building filled with PhDs, and basically none of them are working on computer vision. They had one robot arm with a single camera and that was like their computer vision product. Everything else was light our or light our related technologies that they were working on. And in fact I had to teach my professors all this

01:06:25

stuff that I'm talking about. They had no clue about computer vision and even some of the basic stuff. At a certain point it became completely pointless for MEDI even b at ASU, because not the professors, every single person in the perception of robotics group, not a single person at that time had a clue about anything I'm saying here.

01:06:42

And it's how do I say this? These problems are too dumb for somebody that has a PhD and not know about in AI and let alone an entire university filled with PhD is not knowing a single clue about any of this. That's corruption, that's not a mistake. So what I was so earlier, I was saying I wanted to do this setup so you can understand some of the issues that come with this. Now I'll show you what Waimo's light our based system actually sees. So here's

01:07:18

a visual demonstration of what I'm talking about. So this is a point cloud representation of what the Weimo vehicle sees. You have the literal cameras on top, and then below it is what the light our system is seeing, creating a point cloud return representation. This is what I'm showing on the bottom is what it means to have a point cloud return representation. And like you were saying earlier, it's just one dimensional point of light. It's not even

01:07:45

really the wavelength. It's just that one point and that's it. Let me play it again. So this is like a person walking across with a box. You can see the point cloud representation here again, these people move in this stuff. You can see it kind of misses the second person, it looks like, but it's very low quality, right, Like you would never trade the top for the bottom, right though, But somehow they've convinced you that this is exactly what

01:08:13

you should do. One more time, just to iterate the point. This is what a point cloud representation means. Now remember this video because I'm going to show you what you can do with computer vision, and anybody that has understands video games and how video games work and all that stuff, you're about to have your mind blown because it's exactly the same thing. So whatever you can do in video games, that's what you can do with computer vision, and you

01:08:40

cannot do that dumb shit with light art. Okay, this is in my opinion, the most important slide, and this is the slide that got me in trouble. I know there's a lot on here. I'm going to break it down. It's not that hard, but this is really worth knowing. And in my opinion, like I sh, I want to mention that there's a lot of things that I'm not saying. Some of it is to condense, but a lot of it is also to avoid creating even more problems for myself.

01:09:19

I'm happy to talk about things if you want to, but I'm gonna skip some of the really bad stuff that could get me in a lot of trouble. But this is this is good. This got me in enough trouble what I have here on the screen, This got me in enough trouble. But I already did this, so I'll go ahead and talk about it. So this is the state. So if you ever see the phrase soda Sota state of the art, soda state of the art, So this is the state of the art. Way MOO

01:09:49

leaderboard results. On June twenty ninth, twenty twenty three WHIPS. This was presented at CVPR twenty twenty three, the Major Computer Vision Conference by Chen Wu who's for WAMEL. I have the link here and this is the paper that is associated with that presentation on the left side. So this is the reference paper that I'm talking about on the left side. Here on the left side you have this table. This table represents what is industry standard for

01:10:20

object classification around the world. Everyone in the world that is doing object identification and classification in AI, they all use MIOU mean intersection over union in this task. This is industry standard, not just in America, everywhere in the world. This is well understood, well known. This is standard. What MIOU is that there is a minimum. It's at the time it was a minimum sixteen classes, but actually the

01:10:59

numbers in significantly since then. I think in this case WEIMO was doing twenty two which at the time in twenty twenty three was acceptable to do twenty two classes. Since then this number has increased to even more classes. But just know at the time this is actually industry standard and accepted. You're going to have two sets of data here, validation set and the test set. The validation set is the performance of the model on the training data, and then the test set, which is the real measure

01:11:30

of performance. The test set is on data after the model has been trained and is being tested on scenarios that it hasn't ever seen before. So on average, you would expect that the validation set would be or I mean, the test set would be lower performance than the validation set, So that's supposed to be it. So it's all these different classes. They measure the performance on each one and then they do an intersection over union on them, and then they average it, and that's what this is table

01:11:58

is saying. So what you're getting here is forty six point eighty two percent accuracy on MIOU from WEIMO at this one and they presented us this table was on the last page of their paper here and it was not referenced in the paper, and this table was hidden at the very end, not talked about. Everything else on this slide is what weimo's actually doing. This table is industry standard. Everything else is not. This is what weimo's doing, So here we go. What you need to know is this.

01:12:39

Look at this line that says swformer dot threef hours, referring to WEIMO, and then also down here s w former threef hours. These two lines are the lines you need to look at. What you'll see is they're showing the performance using this metric apaph on pedestrians, for example, Whatever the hell this apaph metric is not miou and I have never seen this metric before, so I had to go find it. What the hell is this apaaph thing they're showing on this whatever this metric is apaph,

01:13:15

whatever the hell that means? This thing is showing an eighty two point nine percent accuracy rate up here or maybe down here, like eighty two point one three on the pedestrian L one whatever ap me and aph. So I went and looked for where the hell did this come from? And that's what's on the right. And it turns out this metric apaph it was also made by Waimo in a paper from twenty nineteen, which I'm showing here on the right side. And what this effectively does

01:13:49

this is the formula for it on the right side. Whoops. What this does is it takes this data on the left, runs it through this magic formula on the right, and outputs these results. That's literally what it does. I'm not exaggerating.

01:14:05

That's what it does. It takes this data, runs it through that formula, manufactures this data and as you can see, the performance that they're showing off of this metric, which is what they're using for their safety reports and what they published in their safety data in twenty twenty three

01:14:22

and twenty twenty four. This data here is almost double the performance of the industry standard, and they totally ignored the industry standard forty six percent versus eighty two point nine percent or whatever on a metric that they invented.

01:14:38

So let me make it clear here. WEIMO invented the metric that nobody in the world uses, and then they use that metric for their safety data, and that metric that they invented is showing a performance that is nearly double the actual performance that is known as industry standard around the world. So this is important to know. Now, one other thing that I was saying earlier and the previous side was the degradation of lid or over range.

01:15:09

So what you're getting these point cloud returns. One of the things that happens is that the number of points that you get from the light our system the farther you go out is lower. That's why the quality goes down. So when I was saying fifty meters one hundred meters, one hundred and fifty meters, the argoverse results were showing degradation at a log two rate half every fifty meters. This this thing that I just described, this is called sparsity.

01:15:39

In AI terms, this is referred to as sparsity. How sparse are the points? You want to have higher resolution data? More points? Well, these results were showing degradation at range, and I was also stating that it seems like research on this specific topic about degradation over range seems to be heavily suppressed because I just can't find anything anyone really doing any serious research on it. And it's not, in my opinion, it's impossible that nobody thought about that.

01:16:13

For how prevalent and common, and how many businesses, major multi billion dollar businesses have been using lightar and failed, the graveyard of businesses that have tried to make lightar work and failed, there is no possible way that nobody thought about degradation of lightar over range. I promise you this is like from an engineering perspective, you have to be completely idiot. You have to be a complete idiot

01:16:40

to not think about that. This is like even in cameras, you think about like video games, you know about like how you have lower quality pixels. Further down you go and in order to say, remember, you have to understand this stuff. It's like at an elementary level. There's no possible way people didn't think about that. So why is that important? I want to read this thing that came

01:17:00

from the paper. Some of the latest commercial light ours can sense up to two hundred and fifty and three hundred meters in all directions around the vehicle, leading to a large range of point clouds. Okay, so they're fully aware of point how point clouds work, and they're talking about how long range lightar can go up to this range. But notice they don't tell you how sparse it is at that point. So let's go to this bottom right here, I'm going to read this. Our experiments are primarily based

01:17:30

on the challenging WEAIMO open data set. Okay, One thing to know about this WEIMO open data set. At the time that I published this, there were a total which is November twenty twenty three, at the time I was publishing this, which is four or five months after CVPR. This CVPR, they had a total of six submissions to

01:17:52

the WEIMO open data set. By comparison, in Nvidia at on their data set had more than four hundred submissions and I'm bringing that up because the next line they say is which has been adopted in many recent state of the art three D detection methods. As far as I'm aware, basically nobody in the world uses the Waymo open data set other than weymo, and even the submissions, the six submissions that I saw in November twenty twenty three, even those submissions were from nobody's I've never heard of

01:18:31

any of them. But on the in Vidia data set, with more than four hundred, every major computer vision every major university that competes in computer vision has competed there, including in Vidia and all other major computer vision participants,

01:18:50

it's like the standard. Effectively, the data set contains one hundred and fifty seven split into seven hundred and ninety eight training, two hundred two validation one hundred fIF Each scene has about two hundred frames, where each frame captures the full three hundred and sixty degrees around the EGO vehicle.

01:19:08

The data set has one long range lighter with the range capped at seventy five meters, four near range lightar, and five cameras, so it doesn't matter that the long range lighter can since up to two hundred and fifty to three hundred meters. They're capping it at seventy five. And this helps support what I was saying earlier and what Argoverse results were showing at the same timeframe as this,

01:19:39

that lighter degrades over range. And notice the seventy five meter cap just a little beyond the fifty meter degradation.

Speaker 1

01:19:46

How much you would have bet that that seventy five meter cap that they're giving it is probably a way over estimation as well.

Speaker 4

01:19:55

Yeah, I'm pretty sure. Yeah, you're going to see some stuff. I'm just being a little I'm not trying to be.

Speaker 1

01:20:08

Yeah, they're overestimated. Even in that seventy five meters. They would have gone higher if they could have.

Speaker 4

01:20:15

Yeah, they would have gone high there. I mean, the seventy five meters is sixty miles an hour, gives you less than three seconds of forecasting time, you know, So that's still not much better, you know. So let me talk a little bit. I mean, I'll skip all this technical stuff, but if anyone cares and wants to know about the architecture and how it really works for the camera based system, this is actually a really useful slide. It pretty much lays out the actual architecture if anyone

01:20:41

actually cares. But I'm going to try and skip all the really technical stuff. I do want to just mention something from this paper, which was a big deal. This is called one Former. This was a big deal in computer vision and all this stuff. The element here that I'm just going to briefly is that, as you would expect, it turns out that vision also improves large language models semantics, and it's actually a two way road. They kind of go hand in hand. But the point is that vision

01:21:17

is an augment to an LLM. So another way of saying that is being able to see a red wagon allows you to be able to describe a red wagon in language better. And this one form is kind of showing all that. Let me skip some of this stuff. This is technical stuff. It's kind of cool, shows some really interesting things, but I kind of want to get away from it. Segment anything for anyone in computer vision

01:21:48

is totally worth knowing. This is a This was made by Meta and when it came out, it was a big deal because it did it did object classification pretty robustly across a wide variety of topics, not just avs, like not just like useful in the av industry, but useful across a variety of tasks, and it was a big deal. So this is where we'll get into the argument for depth and why light oars were adopted. So again, the primary and significant function that lightar provides avs is

01:22:32

a depth calculation. And the reason why this is completely useless because number one, there's no reason to have super precise death depth. There's no reason to have laser precision on depth. The argument went like this, the major problem in avs is collision, and because lightar does accurate depth perception, highly accurate, this is how you have collision. This was the crux and fundamental purpose of the argument for lidar.

01:23:06

But if you think about your human eyes and how you do it in real life, you're not trying to calculate the distance between you and an object in front of you. The only thing you care about when you in your everyday real life is you care about this object is in front of that object, but behind that object. This is called relative distance. So this is how the brain does it, and this is actually the better way

01:23:35

of doing it. It's way more efficient. So one of the reasons why we have binocular vision two eyes is because to having two eyes aka two cameras is how our brain does depth perception on a relative basis. When you have two fixed points that you're using as a camera with known dimensions, you can create triangles off of this, and then everything becomes geometry and that's it. You don't you can even make up then like the actual real precise numbers don't even matter. You can be as long

01:24:09

as they're relatively correct. It doesn't actually matter what the numbers are. You can even make up numbers as you go. The only thing that matters is relative to death.

Speaker 1

01:24:18

Now, insects, look at the eyes on insects. It's just massive eyes, just over and over again, just so that it has perfect you know, it's it's got depth perception and everything else. Yeah, all all you know, wired together with those eyes.

Speaker 4

01:24:33

And you'll even notice that the eyes on say a fly, for example, are like they look complicated, but they're actually just repeating the same thing over and over again, which one fundamental concept and what they care about. A fly cares about the most is motion detection. And that's why the eyes bulge out and they can see behind them and all around. That's why the globe shape and they're doing object detection sorry, motion motion de and that's the

01:25:02

fundamental reason why it is. And you can even get hints of evolutionary purpose evolutionary progress. Sorry, you can get hints of evolutionary progress just by looking at the eye throughout all the species. The more evolved the eye, probably the more intelligent the species. If all the eye that they could this is some animal or or whatever it can do, is look do objects or motion detection, then it's probably a primitive eye that probably has gray cylinders

01:25:27

or something equivalent, right, gray scale. But the more evolved the eye is, with more color perception and more quality, then this is probably a more intelligent species. And one of the things that I kind of I'm not going to hit too hard on this, but one of the things I want to demonstrate is that consciousness is a gradient. One of the things is that there's not a real definition of consciousness. And why is it difficult to measure?

01:25:51

It's because consciousness is a gradient. It's not one. It's not like you just suddenly have a certain level that you hit and then all of a sudden you're conscious. Nonsciousness is a gradient, and it goes from simple to complex, and it can get even more complex than we even are. So I want to get to the real fun stuff. So I was just talking about depth perception. So and earlier I had mentioned that bounding boxes are a cheat. Well, this is the solution to that. To those problems, depth

01:26:31

perception and bounding boxes are both solved by this. So in the AV industry it has been completely adopted, including Waimo, every single major company or philosophy. It doesn't matter what your belief system is. Every participant in the AV industry has adopted this thing called an occupancy network. And this is one of two AI cons that I'm going to be demonstrating. This is when earlier I was saying I was hoping to teach people what real AI is and

01:27:07

distinguish that from fake AI. This is the one of the two things you need to know in order to understand what is real AI and what is not. An occupancy network is one, and the next thing I'll talk about is something called nerve. These two concepts you must know if you want to understand anything about AI going forward. And this is what distinguishes real AI from all the

01:27:30

other junk. So an occupancy network, what it is is it's a grid pattern that is made around the vehicle itself, and this grid pattern is filled with these Minecraft looking boxes all around it. And these Minecraft looking boxes, these boxes are called voxels, and these voxels are made are heat mapped all around the vehicle. So it's basically a grid. And then you put objects on this grid and it doesn't matter the exact precise distance as long as you

01:28:02

get it pixel correct. Whatever your pixel resolution is will be the resolution of your occupancy network and these voxels. What is really simple and powerful about an occupancy network and why it was adopted by the industry is because it gives you a simple mechanism, a zero and one binary mechanism for drive and not drive. So you can see here in blue, this blue area is where you can drive, and then everything else is no drive, don't drive.

Speaker 1

01:28:39

If you're gonna if you're gonna relate this to a sense, it would have to be touch, right, because each and every one of these things have different bumps and different layers, so that as you're reaching out, you're touching these boxes, which is a generalized form, and you're getting the feel for it how far or close it is. So it's a very tactile sais with this form right here.

Speaker 4

01:29:02

Good, So you're absolutely right, and so eventually I'm going to show the evolution of this. So this is way. This is an old version. I know that says twenty twenty three, but this is actually an old version of Tesla's occupancy network. I think it is. Actually this is

01:29:15

the twenty twenty version, if I recall correctly. So this is an old version, and I'm just showing this to show you the evolution of what happened in the industry the occupancy because this is fundamentally important an occupancy network, and you're right. You are going to get the texture quality as a result of this, and there's a few there's actually a number of other benefits. But first and foremost, the thing to understand here is that the reason why

01:29:43

this was adopted into the av industry. No matter your philosophy, whether you're using lidar or not, it doesn't matter, you're still going to be doing this is because it provided an automatic implicit collision avoidance system. This through software alone, this occupancy network automatically created an implicit collision avoidance system, and that alone was what was a lot enabled it which made it attractive to put into avs. Now, there are many other benefits that I'm going to talk about

01:30:20

that ended up happening, and this evolved even further. But important to understand here that basically, think of Minecraft. You have a zero one mechanism for where you can drive and where not to drive. And in the you know, in the zero part, the part you don't drive these little boxes. What they've done is they've added in a bunch of little data points, so they add further data into the not drive part. And then the drive part is basically the road. That's it, just the road, and

01:30:50

that's simple, binary and super powerful. And it turns out and this is this is the solution to the bounding problem. And I'm going to get into that. So the bounding box problem I was saying was a was a cheat. Oh shit, I'm sorry, I need to talk about nerves first. Pause on the bounding box problem. I need to talk about this first. I got a mixed up. So the I was saying, there are two things that we need to know in real AI. The occupancy network is one,

01:31:24

and then the second thing you need to know. And this is as technical as I'm going to get there is nerves now NERF, n ERF, NERF. This thing was invented. This is a technology that was invented in twenty nineteen, I think or twenty eighteen, twenty nineteen, and this stands for neural radiance field. And now, if you play video games, you're going to have an advantage in understanding all this because a neural radiance field is very similar in concept to

01:31:54

raytracing in a video game. And so what a radiance field is is that it's the ray from a pixel on a frame. It's the ray vector that is pretty much identical to what ray tracing is in video games, and it's each pixel doing that ray tracing. And that's what a NERF is. That's originally what a NERF was. So NERF started out in twenty nineteen as a new technology, and what happened is this thing that's called a radiance field,

01:32:31

this NERF thing, it evolved into a whole branch of technologies. Now, this whole branch of technologies in AI is just colloquially collectively referred to as NERF. But they haven't since then. Since then, they have made like thousands and thousands of different evolutions of this thing. Called nerve. Now, earlier I was showing the occupancy network is these like Minecraft looking

01:32:57

boxes fossils. What happened oh time is that those voxels reduced in size over time and eventually became single points. And what you were able to do with that is you were able to make an occupancy network and the NERF effectively the same thing. So occupancy networks and nurse initially started out as two independent, completely different things, and over time they came together and became basically the same thing.

01:33:33

So another way of understanding this is that ray tracing in video games and occupancy networks in aves basically the same thing. And so this is like if you play video games, that should be nuts because that means that everything that is happening in vehicles in robotics is a video game. It is like literally it is that. It is not like that, it is that equal to that. So these two things became the same thing. This is why I'm saying you need to know these two things,

01:34:05

not that you have to understand everything about it. You just need to know what this is because this is real AI and you're gonna see what happens with this. So I'll even see that word. So NERF is a new concept, I'm sure to people. So I'm just gonna show like a few examples so that people get an understanding of what it is. So initially NERF. So initially

01:34:29

NERF was slow, you know, like all new things. But then eventually there was this paper that was called plinoxiles and they figured out how to do it way faster, especially with GPUs. And you'll see in this video here the difference. So the left one is like what it was originally trying to do. This is what NERF is originally trying to do. It's trying to render the object. But you can see that it was like slow and

01:34:54

has some problems and has some difficulty. But then of course they basically sped this thing up like really fast. So what I'm going to eventually build up to is can you do this in real time? Yes you can, And so I'm going to build up to, Yes, you can render NERVE aka ocupency network in real time. If you're an AI person, this matters to you because if you're an av navigating through the world, it's one thing to be able to build of the environment around you,

01:35:20

but can you do it in real time? Which is an even harder right, you have to do it fast. So can you do radiance fields aka ocupency networks in real time? The answer is going to be yes, and I'm going to demonstrate that these are the results if you need to care about this. So let me show you one step of evolution. This is not the modern version of the occupens the network in the tessel system.

01:35:42

This is like what twenty twenty two, So you can see just in a couple of years how the evolution is lower resolution I mean sorry, higher resolution voxels, smaller vauxels, and what you're getting here. You can see that this is still a zero one mechanism, just like I was saying before for drive, no drive. But look at this. So these are the cameras on top, the literal cameras on top, and then this is what the car sees. So this is the evolution of the occupancy network over

01:36:09

probably a couple of years. So a little bit higher resolution you're seeing, and you can see the car shapes are starting to become more defined. And remember how I was saying, uh, bounding boxes are a cheat. These little shapes right here, this is how you get around the bounding box problem. Once you apply them to a grid and you make the voxels as small as possible to

01:36:33

individual points. Then the task becomes getting the shapes pixel perfect, and you have a pixel perfect shape through an occupancy network for free. And still so many people are using bounding boxes, which is a total cheat.

Speaker 1

01:36:53

This reminds me of Blender.

Speaker 4

01:36:56

Blender, Oh Blender, yeah, right.

Speaker 1

01:36:59

The three D software, Like you could just plug just straight into there and have a total model of the entire place, and then I guess share it between different vehicles. Isn't that what's also happening as well, Like there's a sharing between different vehicles with all these.

Speaker 4

01:37:16

I think that, I mean, this is that's more theoretical. How do I say it's it's not commercially available like Tesla's don't do it yet, but this is part of the plan that they're working on. Yeah, and yeah, that that you can assume that that'll be obvious. It's yeah. I would say that in terms of priorities, like what I'm talking about here, making this perfect, getting it exactly right becomes is a way higher priority for everyone in the world, And that's what they're This is where we're at.

01:37:46

We're still in the vacuum tube stage of AI. Every once acond we're like, no, no, no, we're not. And I'm trying to show some of the evolution in the progress because this is the state of the art stuff. And if you understand video games, you understand that we're still right, like this is like not even nineteen nineties graphics yet, right, So we're getting there. So, as I was saying before, you get object shapes for free, and this is the

01:38:12

solutions to the bounding box problem. And this was a major problem in the industry and still probably is for some dumb people. But this is how you solve it. And this is a really really important thing, getting object shapes for free, Like it's up there with collision avoidance in terms of importance one of the and this is like kind of important too, especially if you're an AI you understand how significant this is. Probably most people won't

01:38:39

understand how big of a deal this is. But this is a one of the other things you get out of an ocupency network is that it reasons well within about occluded objects, So objects that are traveling behind other objects and partially occluded. This system inherently and automatically just understands that because it's just a grid network around you. Very simple heat map. But again, as I was saying, the most important thing about the occupancy network for avs

01:39:05

is a collision avoidance system. It implicitly has this. Now that that is not to say that it will automatically go on the correct path. No, it just means it will consistently choose a collision avoidant path. So, as you can see here, this car went out of control. It almost collided in the right corner, but the collision avoid the occupancy network avoided it moved over to the left. This is clearly the wrong side of the road, but

01:39:31

it avoided another collision. It stabilized and that's the value originally, initially that was the major value, and it was sufficient to sell it into the av industry just for this alone, because there's a software alone doing this.

Speaker 5

01:39:45

Close your words, looking to the darkness far the blazing start focus on it be called the don't feel let the should be done.

Speaker 4

01:40:09

And does

Speaker 1

01:40:12

You the shoes

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript