What is Apple's Neural Engine?

Speaker 1

00:04

Get in touch with technology with tech Stuff from how stuff works dot com. Hey there, and welcome to tech Stuff. My name is Jonathan Strickland. I happen to be the host of this show. I'm also an executive producer at how stuff Works. And hey, I love all things tech, and today we're doing a little listener mail request. Dan wrote in and asked if I might do an episode about Apple's so called neural engine in its more recent iPhones. So today we are going to learn what a neural

00:38

engine is and what it does. And if you guys, by the way, have any requests for topics you've always thought, Hey, I want to have an episode about this particular tech topic. Remember, you can send those to me by sending an email to tex Stuff at how stuff works dot com. And

00:54

now let's talk about this neural engine. Well, the general public for heard about this topic back in September two thousand seventeen, when Apple CEO Tim Cook presented at what has become an annual tradition for Apple at around that time of year, pretty much every September is when Apple will come out and unveil the latest in its line of iPhone smartphones, and in that would have been the Iconic iPhone X, the tenth anniversary edition of the iPhone

01:28

also the one that's been discontinued now. Cook listed off a lot of features when he went to that presentation, but the one we're really interested in today is part of the phones A eleven micro processor, also called the A eleven Bionic CPU. The most recent iPhones as of the recording of this podcast now have the next generation of that chip, the A twelve, But in both cases, the neural engine is one of the elements that gets

01:58

a lot of coverage. So let's go to the A eleven since that was the first one to have it. It's more than just a CPU. It's technically a system on a chip or s O a C. It's an ARM sixty four bit chip. But that doesn't really tell you anything if you're not, you know, deep into the world of micro processors. So what does that actually mean. Well, the ARM based part means that it's it's based on

02:24

the ARM micro architecture in chip design. So for our purposes we can simplify this to say, the chips components, the stuff that's actually on the microprocessor are laid out in a way that was developed by ARM Holdings, that's the company behind ARM processors. Now that is different from the layout you would find in a chip that was made by Intel, for example. So the architecture part literally refers to the layout of components in the microprocessor and

02:58

how they interact with each other. And generally speaking, companies that make microprocessors develop an architecture. They do so in a way that is supposed to maximize the efficiency of the chips. So if you get the most power for the least amount of energy input you can with the least amount of waste, really is the best way of putting it. You don't want to waste too much and produce too much heat. And then you typically would then

03:25

reduce the size of the various components. And then after you reduce the size of the components, you might figure out a new architecture that makes better use of these smaller components. And this process goes on and on. Intel calls this the TIC talk methodology. So that's what the ARM based part means. It's from this particular company following this particular layout. As for that sixty four bit part, what does that mean, Well, that refers to the data

03:54

width of the arithmetic logic unit or a LU. The says the part of the processor that actually carries out those operations on data from computer instructions. So data with essentially tells you how much information the a L you can accept or handle at a given time, and it tells you this in bits. Now, a bit, just to remind you, is a single unit of computational information, and it is binary, meaning has two states, which we designate

04:29

as being either a zero or a one. Some people say often on or false and true, but it's zero and one. The number of bits tells you how big these actual numbers can get. Before the a L you can't handle them anymore. So let's say you have an eight bit chip, because that's a lot easier to talk about. You would be able to add, subtract, multiply, divide, you know, the basic arithmetic lot logical operation to eight bit numbers.

05:02

With an eight bit chip, Now, a single bit is a zero or a one, and eight bit number you can represent as a string of eight eight numbers, either zeros or ones, So you could have eight zeros in a row, up to eight ones in a row and everything in between. So it could be seven zeros and then a one or it could be six zeros and then a one and then another zero. You get the point. With that many combinations, that means you would be able to go from the typical numbers of zero to two

05:36

hundred fifty five. That's with eight bit. However, we're not talking eight bit. We're talking about a sixty four bit chips. So now you have sixty four digits in a row that can be either a zero or one. That provides you a lot more combinations, which means you could range in number from zero row to nine quintillion, two hundred twenty three quadrillion, three hundred seventy two trillion, thirty six billion, eight hundred fifty four million, seven hundred seventy five thousand,

06:15

eight hundred seven. That's a pretty big range. It can handle way way larger numbers than an eight bit chip. So that tells you the type of architecture this chip has and the amount of data it can handle at a time. The A eleven has six cores, so processors with multiple cores can work on parts of a problem simultaneously. If you have something that's called a parallel problem, you can divide that problem up into different segments and have different cores tackle it. Two of those six cores are

06:49

what Apple calls high performance cores. They have a clock speed of two point three thirty nine giga hurts uh in the A eleven, So the clock speed tells you how many clock cycles a CPU can perform per second. Two point three nine gigga hurts means that these cores can each perform two point thirty nine billion clock cycles per second. Now, clock cycles do not easily translate over into actions. It's not necessarily one clock cycle per action.

07:21

But generally these numbers tell you how much a core of the processor is able to handle per second, how many tasks it can do per second, assuming a certain number of clock cycles per task. Now, these two cores are referred to as Monsoon. The other four cores are what Apple refers to as energy efficient cores. They are not at that same high clock speed. They are meant to handle more routine tasks. They are called mistral. So you have Monsoon and Mistral, two Monsoon cores for mistral cores.

07:57

But the A eleven is not just a CPU. Also has a three core graphics processing unit or GPU incorporated into this chip. And then there are the two processing cores dedicated specifically to handling tasks related to machine learning algorithms. This pair of processors are the neural engine. They are

08:18

essentially an artificial neural network. And I've talked a little bit about artificial neural networks before, but we're really going to try and get an understanding of what makes them special today, because that's really why neural engine means anything in the first place. So this means we get to do a quick history lesson because this is tech stuff, and of course we have to go into the history.

08:43

So here we go back in the nineteen forties and the nineteen fifties, there were some smart guys named Warren McCullough who was a neurophysiologist, and another guy named Walter Pitts who was a computer scientist and a logician, and they began developing theories that brought together computational science and neuroscience, in other words, the way machines process information and the

09:08

way brains process information, which is different. McCullough wrote a couple of papers about this, and he asserted that the basic unit of logic in the brain is the neuron. So the nerve cell, the brain cell, is your your basic unit of logic in a brain, so it would act kind of like a gate or a transistor in a circuit. And so you might have a transistor being the smallest unit, not not metric of logic, but the smallest unit to allow this to happen in a circuit

09:44

neurons in the brain. Pets and McCullough began developing computer algorithms that attempted to guide machines to process information in a way that was at least conceptually similar to the way our brains process information. McCullough had proposed that by doing this, you could train a machine to wreck niye handwritten characters like numbers or letters, even if those representations

10:06

varied in size or style. And I've talked about this being a challenge in the past as well, that training a computer to recognize a specific type of image or a specific thing in an image is challenging. So I always use coffee mugs as an example. I don't know why, but I like that that particular one. So we're gonna

10:27

go with it again. If you were to create a computer program where you feed an image of a coffee mug to the computer program, and you tell the computer program this image corresponds with this concept called coffee mug. And the image shows a blue coffee mug and its handle is pointed toward the right of the perspective of the viewer. And then you were to feed a different image, maybe of that same coffee mug, but now at a different angle. Well, the machine as looking at this as

11:01

if it's a totally new thing. It cannot just uh extricate that information and say, oh, this is also a coffee mug, or maybe it's a different coffee mug. It's a different color or a different size or different shape. The computer doesn't understand the concept of coffee mug. So how can you teach it this concept? How can you train it so it recognizes coffee mugs? That was what McCulloch was looking at. Then you have another guy who came along, Frank Rosenblat, very smart man, who built on

11:33

this work. He developed an artificial neuron called the perceptron. Now, a perceptron's job is, from a very high level, pretty simple. It accepts multiple binary inputs. So it accepts inputs that are either zeros or ones, and then it produces a single binary output either a zero or a one based

11:54

upon processing that information. So let's say you want to create a program that can help you decide which restaurant you want to go to, and you've come up with three criteria that you think are really important in order for you to make this decision. And the three criteria you have are is the restaurant within a twenty minute drive or less? So, is it relatively close? Will a meal cost less than fifty dollars for two people to

12:23

have dinner there? And does the restaurant serve tacos? Those are your three points of criteria, and you can represent each of those variables with a binary figure. So, for example, you could say that if the restaurant is closer than a twenty minute drive, if it is nearby, you represent that variable with a one. If it is further away than that, it's a zero. If the dinner for two is cheaper than fifty dollars, that's a one. If it's more expensive, it's a zero. And if it serves tacos,

12:54

it's a one. And if it does not serve tacos, it's a big fat zero. Then you have a list of various restaurants you could feed each restaurant through your criteria and see how they do. Uh, And then you could narrow your choices this way, and perhaps there is no single restaurant that meets all those criteria, so you really should take another step. And that's where Rosenblatt introduces the concept of weights, where you you change how important each of the criteria are in relation to each other.

13:26

Weights are real numbers that indicate the importance of particular criterion. So you want, let's say all those three criteria you've identified, the distance, the cost, and whether or not they have tacos. You have decided the most critical piece of information is whether or not the restaurant serves tacos. So you could then assign a greater weight to that criterion, saying this is more important to me, and that will influence the output of the neuron. You must also determine a threshold

13:58

value for the decision. In other words, you say, in order to produce a positive result to tell me, yes, this is a restaurant you should go to, you must at least meet this threshold. That's the minimum value the calculation has to meet or exceed in order to produce a go to this restaurant result. I'll explain a bit more about this in just a second, But first I'm going to take a quick break and thank our sponsors.

14:32

That threshold value that I mentioned before the break is really important because it tells your model what sort of results count as valid versus not valid. So let's say I've waited the criteria so that the distance to the restaurant and the expense of the meal each have a weight of two, but the presence of tacos is a six. That's how important I think tacos are. And I've said

14:56

a threshold of four. Well, that means that if the restaurant is relatively close and it's relatively inexpensive, it's going to pass my criteria because I've given a weight of two for both of those and added together that's four. It equals the threshold. Good to go. But even if the restaurant is far away and even if it's expensive, if it serves tacos, it still passes my criteria because I gave the tacos a weight of six. Raising the

15:23

threshold value reduces the number of valid restaurants. So if I make the threshold eight instead of four, now the only way I can get a valid result a result of yes, go to this restaurant is if the restaurant has tacos and it's either close by, or it's inexpensive, or both. And if I said the threshold were ten, all three criteria would need to be met for this

15:51

option to be valid. Now, an artificial intelligence for the purposes of notation, many people will move the threshold value to the other side of the equation, and in this case we now call it a bias, and a bias essentially is a measurement to tell you how easy or difficult it is to get the perceptron to fire off a positive value. If you have a big positive bias, that means it's easier for the perceptron to produce a positive result a one. A large negative bias does the opposite,

16:22

and thus you would get a zero. So we can write out the perceptron's rules like this. Take the value of a variable which is either going to be a zero or a one. It will be binary. You multiply the value of this variable by the weight of that variable, and weights can be different values. Let's say that the distance and expense are both weighted at two. Tacos gets a big hefty six. You're going to add your various weighted variable results together, and then you add the I

17:00

s for the perceptron. And in our example, the bias is a minus six. That's to tell us that in order for this perceptron to fire, you have to you have to be able to factor in that minus six and beat it. So if after adding these elements together, you get a result that is zero or lower, the output is a zero or a negative, saying, don't go to this restaurant. So after adding that negative six, if you have a zero or less, you don't go. If you get a result that's greater than zero, it's a

17:32

positive result, it says, go to that restaurant. So here in our hypothetical perceptron, we've decided on a bias of minus six, and we take our three variables as we examine a single restaurant. So this restaurant is twenty five minutes away. So that means for our first variable, which is all about distance, it gets a zero because it is further than twenty minutes away. So that variable is a zero. And we multiply the variable times the weight.

17:57

The weight is too for that particular variable two time zero is zero. Then I look and I see that dinner for two of that restaurant's gonna set me back thirty dollars, but that's below the limit we had set of fifty dollars. So that means the value of the variable is one. It is cheaper than fifty dollars, so that gets a one. The weight for this variable is to so multiply the weight times of variable two times one is two. Then we have the question does the

18:26

restaurants serve tacos? And I know you're dying to know this. I'm glad to report the restaurant does in fact serve tacos, And that means that the variable is a one. It's positive, and we waited this variable very heavily with a six, So six times one is six. Now we have to add all of those results together, so we have zero from the first one, too, from the second one, six from the third one. Add that together you get eight. Now we have to add in the bias, and the

18:58

bias for this perceptron is a minus six. Eight plus minus six gives us a final value of two. Two is greater than zero. So by the rules we have established, the perceptron says this is a positive result and fires off a one. So the restaurant we fed to the perceptron met the criteria based on that bias. Now, if our bias had been minus ten or minus nine, we

19:24

would have not produced this positive result. We have gotten a zero or negative number and it would have said no. So that bias is very important, as is the weight of the various variables. And that is one neuron. Now you can actually create layers of neurons. That's why we call it an artificial neural network, not just an artificial neuron. And by doing that you can have results from one

19:49

neuron's decisions feed directly into another neuron. Also, a perceptron can perform as a type of logical gait called a name end gate in a n D that stands for not and it's a type of logical gate that can produce a false or negative output if all its inputs are true or positive. So, in other words, with the right weights and biases, a perceptron will produce an output

20:20

of zero if all of its inputs are ones. The nand gate in computer science is a universal gate because you can use different creations and combinations of nand gates and build any kind of computation. You just have to link them together properly in order to do it. It's not always the most efficient way to do this, but it does work. So if you had perceptrons that accepted two variables, each with a weight of minus two, and the perceptron had a bias of three, it would act

20:51

like a nandgate. That's because if both variables are one, then the final equation you'd get to determine the output would be minus two because you multiply the weight of minus two times the variable of one, and then you have to add a second minus two because the second variable is the same way. And then you would add the bias, which is three. But minus two plus minus

21:15

two is minus four. You add in plus three, you get a minus one is the result minus one is less than zero, which means they output for the perceptron must be zero as opposed to one. You get a false or an off or a zero result. Two positive inputs create a negative output when a few times you can say two positives make a negative. Now that means we can ask progressively more complicated questions, with each perceptron handling one aspect of that question and feeding into another

21:46

layer of perceptrons. Each perceptron will produce either a positive or a negative result, so you either get a one or a zero, and these results will feed into other neurons in the network, which will use them to perform their own calculations of their own weights and their own biases. All of this is to feed those questions through a network to produce a result, and I should be clear the weights for each variable along this path can change from one part of the decision making process to the next.

22:13

We're not just talking about identical perceptrons all through the network, and that last bit is the most important part, because if this were just a matter of setting biases and weights and building out a network of perceptrons, there'd be nothing special about it, because we already have nannd gates. They existed before perceptrons. It would just mean that we have a different way to implement something we could already do, and finding a new way to do something you were

22:41

already doing is rarely super transformative. You might be able to make it a better way of doing the same thing, but in this case it might be less efficient than the old way. However, there is something else that makes these perceptrons special, and that's by pairing them with those special algorithms that Cola and Pets were proposing back in

23:02

the forties and fifties. These would be learning algorithms. These algorithms are instructions that can, based upon external stimuli, dynamically and automatically tune the weights and biases of perceptrons in a neural network. In other words, a program can guide the network so that it learns how to solve problems. But how well. It all comes down to making small changes in those weights and biases in order to fine tune outputs. So let's say we're working on an image

23:33

recognition algorithm. That's one of the big things that the neural engine and Apple's iPhones do. They that's one of their main purposes. So in our example, let's say we're training the neural network to recognize handwritten printed lowercase letters. It's very similar to what McCulla was talking about. But let's say our model is having trouble differentiating a lowercase L with a lowercase I. It was just having issues

24:00

being able to tell those two apart in particular. Now we've got a specific example in which our model is misidentifying an L as an eye. Let's say, in the hypothetical situation, and so we decide we're gonna make some minor tweaks in the weights and biases earlier on in the artificial neural network to guide our network so that it can more readily tell the difference between l lower case ls and lower case eyes. And we get our

24:29

model closer to being able to tell that difference. We keep making these small adjustments until we get more consistent output. The network as a whole is said to quote unquote learn through this process. It's getting better and creating an output there's more reflective reality. But there's a bit of a problem, and anyone who has worked in QA has probably already spotted what it is. For everybody else. I'm gonna explain it in just a minute, but first let's

24:54

take another quick break to thank our sponsor. So what was that problem I was talking about before the break? Well, if you've ever worked in any sort of programming environment, you know that when you introduce changes in code, you might fix whatever problem you're focusing on at the moment, but you might also break something else that's already in

25:21

the code. With perceptrons. That happens when you start tweaking weights and biases, because a small change in one spot in a network can have sort of a ripple effect with unintended consequences. So, for example, in our little hypothetical situation, maybe your new model can better tell the difference between a lower case L and a lower case I, but now the lowercase J is giving it problems the way

25:46

perceptron's work. Small changes in the network can reduce much larger variations and output, so it's sort of like the butterfly effect in action. Computer scientists created a different type of artificial neuron network that addresses this problem, and this type is called a sigmoid neuron. Really, I should say they created a different type of artificial neuron, So the

26:09

sigmoid neuron. What the heck is this? Well, from a high level, sigmoid neurons look kind of like perceptrons, but while you'd either use either a zero or a one as the value for an input into a perceptron, a sigmoid neuron can accept a zero, a one, or any number in between zero and one. The output a sigmoid neuron produces is called the logistic function or sigmoid function.

26:37

This gets a bit complicated on a surface level, particularly if like me, you're a little rusty on your algebra and calculus, but generally speaking, the end result is that using this type of artificial neuron, you can make small changes to weights and biases and not create a larger effect on the ultimate output. You'll still make small adjustments to the output. There are a lot of resources online

27:03

that go into greater detail about sigma neurons. I'm not going to go into more detail here because without visual aids and being able to go into algebraic functions, it gets a little hard for me to explain. But in your typical neural network, you would have an input layer and you would have an output layer, So you have a layer where information comes in, and you would have the output layer where new information comes out. But between those two you would have what are called hidden layers.

27:31

Then just really means that they're not input or output there in the middle. Hidden makes it sound like they're super clandestine and spy worthy and cool, but really they're just in between input and output. They perform processes on the inputs they receive, and they passed them on as outputs to other neurons to have more processes put on them until you finally get the output. The sort of networks I've described so far are called feed forward networks,

28:01

and that means pretty much what sounds like. You plug in and puts the information passes one way through the network, and you eventually get output as the information continues to move, and we typically visualize this in a left to right kind of of display, so you imagine input coming in from the left side, passing through this network, having various processes put on it as each of these neurons UH decides if it counts as a positive or a negative response,

28:32

or with sigmoid neurons, some degree in between, and then plugging that into the next neuron until you finally get to the output. It always gets fed forward. But that's not the only type of artificial neural network. There are also things called recurrent neural networks, in which neurons fire at some predetermined amount of time. Then they typically settle down they're not firing at all, but the next group

28:55

of neurons start to fire. This creates kind of a cascade effect through the network, and occasionally there it could be neurons that feed back into previous neurons. There's a feedback loop. It's more challenging to make a powerful learning algorithm with recurrent neural networks because it gets super duper complicated. But recurrent neural networks pose potentially huge utility in the future.

29:20

So an artificial neural network can be made up as of as few as a few dozen artificial neurons all the way up to millions of artificial neurons, and we trained them through various processes such as back propagation. Now that's when you take the actual output of the process and you compare it to what you wanted it to produce, and then you use the difference between those two results to make changes to the weights and biases in the network.

29:48

So here's an example where training a our network to recognize pictures of cats, because this has actually been done. Google famously did this. So you're training your network to recognize what a cat is based upon a picture, and you use a picture that you know is a picture of a cat, so you already know the answer to this. You're teaching the computer to learn the answer to this.

30:13

You know that the answer is cat, and you feed the image through this system, It analyzes the data, it gives you an output, and you see how well it did. Did it correctly identify the image as a cat, did it assign a certain level of of certainty to its conclusion, and if it's far off, you could start making changes to those weights and biases to help guide the system

30:40

into determining, oh, yes, that is a cat. Training a network multiple times refines this process to the point where you can start to introduce brand new inputs to the system, inputs that the system has never encountered before, and get

30:54

a reliable result. So with Google's example, you might feed it thousands or tens of thousands, or hundreds of thousands or more images of cats, and each time the system is told now that there is a cat in this image, and it begins to refine its approach, figuring out which weights and biases it needs to tweak in order to get to that result. And then you feed it a whole group of new images and you don't tell it

31:23

if there are cats in those images or not. Then you leave it to the system to determine are there cats in these pictures? And if you have trained it properly, if those weights and biases are actually well tweaked, then the system should be able to reliably pick out the pictures that have cats in them. That's the idea. Now, there's tons more to be said about artificial neural networks, but i'll give you I've given a quick overview. Let's let's jump back over to Apple for a second, because

31:54

that was the whole purpose of this episode. So what is a neural engine actually used for. Well, for the iPhone, it's used mainly in processing speech and image data. It's the neural engine that can analyze your face, for example, and then translate your expressions into animated form. You can create animated emoji this way, So you could use the little application and create a customized surprise emoji that copies the way you look when you make a sort of

32:22

an exaggerated surprise face. You could do that. The neural engine takes the incoming data the images it's pulling from the camera, analyze it, and then helps create an animated image that mirrors what you did. The neural engine also analyzes visual data for the purposes of augmented reality. That's when you overlay digital information on top of a view of the physical world around you. So with smartphones like the iPhone, it means holding your phone up and looking

32:51

at the world through your phone screen. So the camera on your phone is giving you a live video feed of whatever you're pointing the phone at it, and then you use an augmented reality app, and on top of that video image your phone will overlay some sort of digital information. Could be a game, it could be information about your surroundings. The digital information can appear to be

33:14

anchored to the physical space itself. So you could have an augmented reality application that let's you view a virtual piece of furniture in your house. And so when you hold up the phone, you use the app to place a virtual chair, let's say, in a specific location in a room, and you can walk around this virtual chair holding your phone up, and it looks like the chair

33:35

is actually there, even as your perspective changes. You can circle around it and view the chair from all the different angles as if it were actually sitting there in the room. It's anchored to its place that you've put it within the view of the room. The neural engine is analyzing all this information that's coming in from the camera and helping the app create the image of the chair, keeping it the appropriate size and orientation with respect your

34:00

viewing angle. And the neural engine can use this ability to help you go through stuff like your photos. Let's say you've got an adorable pet, like my doggie Timbolt. He is adorable. The iPhone can use its neural engine and image recognition algorithms to return the pictures of your pet in response to a search query. So my wife, who has an iPhone, could do this with our dogs. She could search for the word dog in her photo app and then she would get countless images of Tibolt.

34:31

And I know it works because she's done it. Apple has included access to the neural engine so that app developers can actually take advantage of that technology as well. They'll doubtlessly create new ways to leverage this tech, so we'll have to keep our eyes open to see what comes out of it. Neural networks in general are becoming increasingly important in machine learning and artificial intelligence, so it's likely to grow as a branch of computer science for

34:57

the next several years. And that wraps up this episode. If you have suggestions for future episodes of tech Stuff, maybe it's a technology, a person in tech company, anything like that, Send me an email the addresses tech Stuff at how stuff works dot com. You can draw me a line on Facebook or Twitter. The handle for both of those is tech Stuff H s W. Don't forget, we have a merchandise store over at t public dot com slash tech stuff. That's T E E public dot

35:27

com slash tech stuff. You can go and get your uh, your caption test, the prove you're not a robot sticker or T shirt or tote bag or whatever type of thing you would like that on. It's pretty cool, So go check that out, and don't forget to follow us on Instagram and I'll talk to you again really soon. For more on this and thousands of other topics, visit how staff works dot com ye

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript