TechStuff Makes Eye Contact with a Robot - podcast episode cover

TechStuff Makes Eye Contact with a Robot

Nov 02, 202049 min
--:--
--:--
Listen in podcast apps:

Episode description

Disney Imagineers are working on animatronic figures that can make eye contact with humans and the results are fascinating (and a little creepy). We explore what has to happen to create a robot that can make eye contact.

Learn more about your ad-choices at https://www.iheartpodcastnetwork.com

See omnystudio.com/listener for privacy information.

Transcript

Speaker 1

Welcome to tech Stuff, a production from my Heart Radio. Hey there, and welcome to tech Stuff. I'm your host, Jonathan Strickland. I'm an executive producer with I Heart Radio, and I love all things tech. And Halloween is over by the time you hear this. I hope you had a happy one. But I still have something that falls into the kind of creepy category, at least in my opinion.

And I discovered this after looking around at tech news in general, and I became fascinated by it and figured, hey, you know, I haven't done a really focused episode on a very specific implementation of technology in a long time, so why not do that now. Now, anyone who knows me can tell you that I am a sucker for Disney imagineering, which of course is the peculiar twist on engineering and innovation that Disney champions. Right. The inventiveness and

the attention to detail impressed me a great deal. Those are hallmarks of Disney engineering or imagineering. And I've done episodes covering various elements that tie into this, from the history of upcot to how audio animatronics work. And it's that last topic I wish to revisit because not long ago I read a research paper from Disney Imagineers titled

Realistic and Interactive Robot Gaze. That's g A Z E, you know, referring to where a person or in this case uh an object a robot appears to be looking. And the paper is fascinating and it's available for anyone to read for free. So if you find this subject matter neat, I really recommend you read it. Now. It does get a bit technical. There's some math in there too, but for the most part, I think it's a pretty

accessible paper. The pictures and good gravy, y'all. The video that are connected to this project are the stuff of nightmares, but we'll get to that. The heart of the paper is all about designing systems so that an audio animatronic or or just an animatronic figure can make and maintain eye contact or at least appear to with someone who

is looking at that figure and onlooker. So, in other words, imagine that there's a Disney attraction at a park, and in this attraction you can walk up to a robot. It's probably going to be behind like a rail or inside a booth or something, so that you can't you know, touch it, and the robot notices you looking at it,

and it looks you in the eye. And then maybe you get to chat with the robot and it maintains eye contact with you, and occasionally maybe it's eyes dart around to glance at other stuff that's within its field of view, or maybe even indicating that the robot is appearing to like take a second to think of a response. That's kind of what we're talking about here. And here's the thing. This is surprisingly difficult to do, and it's extra hard to do without dipping into super unsettling territory.

So today we're going to learn more about the technology and the psychology behind this project, as well as what makes it different from earlier audio animatronics, which is honestly a good place for us to start. The original audio animatronics were essentially puppets. In fact, you could argue that all animatronics are ultimately puppets. Each puppet has a certain number of degrees of freedom, and that refers to a number of independent directions of motion. So let's take a

simple example. Let's say that a robots neck only has one degree of freedom. Well, that would mean the robot might be able to nod its head up and down. But if it could do that, it wouldn't be able to shake its head or tilt its head, because that would be an additional degree of freedom. Or maybe it's able to shake its head, but it's not able to nod or tilt because it only has that one degree

of freedom. That one degree is really limiting, and it just tells us the full range of of direction emotions that any one joint can do, and we typically talk about degrees of freedom with joints to express the range of possible motions the you know, whatever it is can perform. The enchanted Tiki Room at Disneyland was an early example

of audio animatronic ingenuity. It wasn't the very first use of audio animatronics, but it was an early one, and when you learned how it worked behind the scenes, it's pretty wacky. The various birds, flowers, and other elements in the attraction connected to a very complex system, including some pneumatic valves. A pneumatic system uses air under pressure to do work, so these valves in turn connected to a

circuit that had thin metal reads as switches. Now, normally the switch would be open, meaning no electricity can flow through the circuit and thus provide electricity to open or close the valve. But when sounds of a certain frequency would play near these reads, it would cause those reads to vibrate, and you know, depending on the thickness and length of the read, that would determine what frequency of

sound would most likely get it to start vibrating. Once it vibrated, it would close the circuit and thus allow power to go through to the respective valve. And every bird and flower in the attraction had this sort of system where the sounds playing through the sound system would actually cause the individual circuits for those birds and flowers

to activate. So the chirping of the bird, that chirping sound was actually the sound that was opening and closing the the circuit and thus activating the valve that would control the bird's beak. And because the figures relied on the sound to close the circuit, they were audio animatronics. Over the years, Disney would improve on this design, sometimes

by necessity. So for example, when the imagineers set out to create the attraction The Great Moments with Mr. Lincoln, they had to come up with new mechanisms to do that because pneumatics would not be a good solution. With pneumatics, you've got a couple of limitations that you're working with. One is that you can't move really heavy stuff effectively with pneumatics. Another is that pneumatic pistons tend to move really fast. It's hard to do controlled slow movements with pneumatics.

So it might be okay for something like a bird flapping its wings or opening and closing its beak fairly quickly, but it's not so great for say, a revered US president lifting his hand. But I've covered that in other episodes. The really important thing I want to stress is that audio animatronic figures have historically been limited to a cific, pre programmed sequence of motions, so calling them puppets is

fairly appropriate. These are figures that will do the exact same sequence of motions until something goes wrong or the attraction is shut off for some reason. The pirate and Pirates of the Caribbean that is precariously attempting to step onto a rowboat is never going to fall into the water. He's never going to get into the boat, and he's never gonna step back onto the shore. He will continue his balancing act until the end of time. And this is starting to sound like some sort of Greek myth

about the afterlife at this point. Now, the reason I'm bringing this up, the reason it's important, is that creating an animatronic figure that can actually detect an onlookers gaze and return it making eye contact can't be totally dedicated to following the same set of motions on repeat. There has to be some room for variability within it. At the same time, Disney's whole gig is to create a show.

The amusement parks are show business. If you are in a public space of one of those parks, like you're inside the confines of the park itself, walgging a down Main street or whatever, you are on stage. The employees are called cast members, and shows, while they can have some variation in them, are supposed to follow a general flow.

They follow a script. And so the imagineers were working on creating a figure that would follow a scripted set of behaviors, but would have the freedom to throw in stuff like eye contact now and then the figure, in a way would be able to improvise. It's jazz Baby. The tune is more or less set, but how you go through it allows for a lot of variation. For the purposes of this work, the team relied on an animatronic bust. Now we've kind of dropped the audio at

this point. Modern animatronic figures are not really driven by audio signals anymore. They're driven by circuitry and sophisticated computer systems and programs. Though to be fair, they still often are referred to as audio animatronic. But you really need

to see a picture of this thing. I'll do my best to describe it, but really you should search this Disney uh interactive gaze animatronic because who boy, so imagine the V shaped torso of a bust sculpture, right, It's very narrow at the bottom, and it widens up to the shoulders. It's clad in a white button up shirt, you know, kind of like an Oxford shirt of business shirt. It does have shoulders, but does not have arms. It

has a head, good golly, it has a head. The head of this figure has a sort of plastic skull, though it's kind of more like a plastic mask than a human skull. It doesn't look like a skeleton skull. It does have eyes, it's even got eyelids, and it's got teeth. And looking at this thing is a little unsettling. And that's before it even makes eye contact with you. Now, why would you want to make something like this be able to make eye contact in the first place. Well,

eye contact is an important social signal. It shows mutual acknowledgement, and it can lead us to projecting certain things upon the person or animal that's making eye contact with us. We tend to perceive such creatures as possessing a certain amount of intelligence and sincerity. For example, when I make eye contact with my dog Ti Bolt, I perceive him to be intelligent and alert and loving. Now I have no way of knowing what is really going on in

his doggy mind. I suspect it's probably more along the lines of is the bald man about to give me a treat? I should pay attention, But I like to think of it as sincere love. Now, as the paper states, quote, given the importance of gays in social interactions, as well as its ability to communicate states and shape perceptions, it is a parent that gays can function as a significant tool for an interactive robot character end quote. And I

can totally grock that. I imagine what it might be like to a child who's going to Disney World or Disneyland for the very first time and going to a ride or an attraction where there's an animatronic figure, perhaps one that looks like a famous Disney character, and it makes eye contact with that child, maybe it even speaks to the child, and maybe it can respond to the

child of the child speaks back. That sort of interaction would have been the kind of stuff that would have stuck with me as a kid well into adulthood, and I feel confident about that because I have a lot of memories of the seemingly magical moments I've experienced at Disney with far more primitive technology. Is that we're in the Disney parks when I first started visiting them in the nineteen seventies, so I can certainly see the show

need for this sort of development. But there are numerous challenges that stand in the way of achieving this goal, and they fall into different broad categories. Perhaps the easiest set of challenges to conquer is actually the electro mechanical side of things. That is, the actual mechanisms that you're going to use to create these effects, the servos and the motors and the other components that will create the actual motions that will translate into the robot making eye

contact or behaving in otherwise realistic ways. That's one of the set of challenges, but there are others. One is giving the robot the ability to detect the gaze of onlookers in the first place. There has to be some sort of face recognition and maybe even eye tracking technology so that the robot looks at the right spot. So the electro mechanical parts have to work correctly, but so

does the robot vision or perception. Otherwise the robot is going to look in the wrong spot, perhaps staring off to one side or above or below and onlooker's eye contact or attempt at eye contact. Another challenge would be on the programming side. You have to figure out how to determine who the figure is going to look at.

You also have to figure out how long the robot will look at somebody and what could distract the robot, and whether or not the robot would return to looking at, you know, the first person, or maybe look at a second person, or maybe look at something else Entirely, you have to solve the challenge of the program and prioritize the order of operations so that the robot behaves in a way that makes sense, as opposed to a robot that's just you know, reacting to all visual stimuli in

a random way, which would be at the very least disconcerting. And then we get to something that's a bit harder to define than degrees of freedom or range of motion or the hierarchy of programming, and that's human psychology. Now, as the paper points out, eye contact is an important social cue for most of us, but there are a whole range of humans out there right For people who have autism, eye contact can be a really challenging task, and it tends to make people who have this type

of autism. It makes their lives a little more difficult or complicated as a result. It's something that people some people anyway, have to consciously deal with. They have to remember to do this and work at it. It's not it's not a natural behavior for them. So this is something that can be tricky for human beings, let alone

for robots. Now, while eye contact can help create a sense of sincerity and interest, it can also shift over into more unpleasant territory, such as a sense of predatory intent or as a comedian I once saw said there's a fine line between the casual eye contact of a friend and the cold stare of a serial killer. He was specifically taught king about trying to navigate the tricky territory of approaching people in order to get to know them. But I think the meaning could be used for lots

of scenarios, including an encounter with a robotic figure. And along with that is the issue of the uncanny valley, which I have touched on in previous episodes. I'm not sure if I've ever actually talked about the origin of the phrase, however, a professor at the Tokyo Institute of Technology named massa Hiro Mori coined this phrase in the

nineteen seventies to describe a pretty odd phenomenon. As robots become more human like or more lifelike in general, they become more appealing to us, but only up to a point, and once they get to that point and go beyond it, our reception of these robots plunges into the uncanny valley. The valley in this case is how humans react to

the robot. This also applies to other stuff like c g I characters, for instance, and other words are a robot that might be a simple industrial arm is one we probably wouldn't feel very much affinity for, you know, it's obviously a machine. A robot that still looks really robotic, but has you know, arms and legs like a vaguely humanoid shape. We would probably feel a little more affinity towards that make it look a little bit more human, but you know, not to the point where anyone would

mistake it for being human. We might like it even more. But once you start getting close to but not quite human in appearance and behavior, our response drops to a point where a lot of people feel unsettled, or even they might feel revulsion when looking at the figure. Something is, you know, not right. The cues that would normally help us identify with the synthetic figure now feel strange and

maybe even scary. It's possible to get beyond the uncanny valley to create a robot or c g I character that doesn't initiate this kind of instant revulsion, but it

is very hard to do so. A big challenge is building an animatronic that doesn't trigger the uncanny value response either by avoiding the trap of being almost but not quite human in behavior, you know, by keeping things a bit more obviously robotic, so there's that clear and distinct separation that kind of removes that that response we have, or creating something lifelike enough that we feel the same sort of reactions we would experience if that were a

real human. So it's tough to do. It's easier to do the robot approach than it is to get something that seems human enough that we let our guard down. None of these challenges are trivial, but they all require distinct approaches that must ultimately converge into a single implementation. When we come back, I'll talk about some of the technologies in this animatronic figure and the engineering team's philosophy behind their design choices. But first let's take a quick break.

The engineering team limited itself to parameters that related to creating a robot that could direct its gaze towards onlookers, which meant they didn't have to worry about it doing literally anything else. The audio animatronic bus they used has nineteen degrees of freedom total, but the team made no use of ten of those. They only used nine degrees of freedom. They focused on the neck, which has three

degrees of freedom. The eyelids, which have two degrees of freedom, the eyes, which also have too, and the eyebrows, which have two degrees of freedom. The unused degrees of freedom are for moving the jaw and the lips of the figure, but since that's not necessary to make eye contact, the team just ignored those they didn't need to mess with them, which means we get the effect of a robotic skull with an unchanging rictus grin staring at us as its

upper facial area remains animated it. I guess what I'm saying is I didn't find the overall effect particularly comforting. According to the paper, the commands going to these components come from a quote custom proprietary software stack operating on a one hurts real time loop end quote. Hurts is a cycle per second, so this means that the software is pulsing out operations one hundred times every second to

control this animatronic bust. Many of those commands aren't only about making the bus do something specific, but to do it in a specific way. Let's get back to the Tiki birds as an example. The pneumatic valve that would control whether or not pressurized air could travel to a specific place like the mechanism that operates a bird's beak is a pretty simple on or off switch, meaning the valve is either open, in which case air can flow, or it's closed, in which case the air is blocked

from flowing through. And a debating the mechanism, So the beak has a natural resting position, and for this example, will just assume that the rest position is a closed beak, and so that's what the beak will always return to when there's no air flowing. To the mechanism that opens the beak. If we open the valve, it lets air through, It rushes to the end point, forces the beak to

open rapidly. Closing and opening the valve quickly forces the bird's beak to open and close quickly, and when matched with a soundtrack, it looks as though the bird is speaking or singing, or you know, whatever it's doing. But that movement is rapid and, just as I mentioned earlier, not suitable for all animatronic applications. Having life sized humanoids move with that kind of alarming speed would be scary and legitimately dangerous. The greater mass of the figures would

mean you're dealing with larger amounts of inertia. I mean, I just imagine what it would look like if Mr Lincoln, in an effort to raise his hand in a gentle show of reserve determination, instead violently karate chopped his own head off. It would be, as the kids say, a bad look. To create the illusion of life, the animatronics that Disney designs follow certain general strategies. One is called

slow in and slow out. Now. This refers to general movements and the ideas that any movement should start off slowly and then pick up speed as the movement continues, and then slow down again before coming to a stop. And it makes the motions appear more fluid, and it has the added benefit of not being quite so harsh

on the figures themselves. So when a Disney figure raises its hand, the hand should start off moving upward with a nice, smooth slow motion, pick up a bit of speed as it's moving upward, and then slow down again as it's approaching its end point. And this means that the underlying motors and mechanical systems have to be capable of achieving the strategy. It's why you can't use pneumatic systems. They can't be those simple single speed devices that are

either on or off, like the Tiki birds. Oh, and I guess I should specify I'm talking in this case about the original Tiki birds because the birds in the attractions today work on updated and more sophisticated computer systems that take up a fraction of a fraction of the space of the old attraction, which essentially required an entire room filled with cables and tubes to make everything work underneath the actual attraction itself. Now a few computers handled

the whole shebang. Anyway, Let's get back to animatronics. Some of the other guiding principles in animatronic motion that in turn dictate the types of motors and joints and other mechanical elements that the team mustn't use to to make these happen include designing motions as arcs, meaning the motion

should follow an arched trajectory. Another is that the motions should have overlap, meaning a robot shouldn't move a single element like an arm, stop, then go to move on the next element like the head position, and then stop and so on, because that would be well, really robotic. Instead, the robots motions should overlap with one another so that Let's say Mr. Lincoln is turning his head at the

same time his arm is going up in determination. Now, another element that's connected to this concept is that of drag, which means that the different body parts are moving at different frequencies or timing. They're not moving all at the same speed. So, in other words, the speed at which Mr. Lincoln turns his head might be slightly faster or slower

than the speed at which his arm goes up. This is all in an effort to create the illusion of life, but it also means that the programming in hardware underlying the figure has to support those strategies. For the purposes of this project, the engineers had certain motions they wanted to be included. One minimum set of motions needed were some that would imply that the bust was a breathing entity, So I need to move slightly as if it were

drawing breath. Blinking was also an important motion to get down, as it would be more than a little unnerving to have an animatronic figure make eye contact with you and then never ever blink. And then there were the scads. Now I have to confess something to you, guys. When I first encountered the word scads, which is S A C C A D E S. I had no idea what that meant. It was a new word to me, and maybe it's a new word for some of you out there too. So if you happen to be like me,

what the heck are scads? Well? That refers to the quick, simultaneous movement of both eyes from one point of focus to another. So think about how you might take in a scene that has a lot of stuff going on. Let's say you you walk up to a building that's that's that's burning. Well, your is are going to dart at different things that are going on in front of you that catch your attention as you focus on them, and then you file that information away. And perhaps you're

even doing this subconsciously. Uh. It means our gaze is not always steady and unwavering. It it moves around a bit on occasion. And that's not the only way we move our eyes. Of course, we can actually track things that are moving and use our eyes to move in a more smooth and gradual motion. But the team knew that if they could incorporate the CODs, that would give the robot a more lifelike performance. But that decision meant the team needed to figure out something else, which was

where to put the cameras. The animatronic needs its own vision to be able to detect onlookers and then direct its own gaze appropriately, and some robots do put cameras in the eyes of the robot so that the eyes are actually camera lenses, but that presents a challenge if you wish to incorporate rapid eye movement like the CODs, because that sort of movement introduces motion blur in the video imagery makes it more challenging for the robot to keep track of what's going on in front of it.

For that reason, the team decided that the cameras would not be mounted in the eyes, but they rather were mounted on the animatronics chest. Presumably, should the gaze tracking technology find its way into full animatronic figures in the future, the camera will be you know, hidden within the body of the animatronic torso in order to avoid this problem,

or otherwise maybe mounted in an obtrusive spot. One thing that interests me with this particular approach is that the system has to do some calculations as to where the eyes of the animatronic are in relation to the physical location of the cameras, you know, because for us, all our eyes are essentially the cameras, or at least the camera lenses, so we don't have to make any adjustments.

Right where we're looking is like the point of our gaze is the point of where we're taking in visual information. For the animatronic, the eyes of the robot, the actual eyes that are in the skull, don't function as eyes. They aren't lenses. They're actually several inches above the actual camera. And yet the eyes in the robot's head need to point in the right direction. They need to be the

part that's pointed at the person who's looking at it. Right, it doesn't make sense for the robot to just turn its sternom towards you. It needs to be looking at you with its robot eyes. And I think of this kind of like someone who's working a hand puppet and they've got the hand puppet up over their head, so maybe they're behind a little stage, you know, like like the muppets tend to be. You've got this hand puppet

and it needs to make eye contact with a human being. Well, that just means the puppeteer has to take that into account and angle their hand so that the puppets eyes appear to be locking on the eyes of the real person that the muppet or puppet is interacting with. It's a little tricky. It requires some skill for the robot. It means that there's some you know, nifty geometry going on in the processor side to make this work out.

Like the image recognition has to identify where the eyes are of the onlooker and then calculate where the robots eyes are in relation to that and direct them in the right way, which to me is really fascinating because again, the eyes of the robot are not where the visual

information is actually coming in. We'll talk more about the behaviors of this robot in a second, but since we're already chatting about cameras, it's good to talk about what the team was actually using to give the robot it's vision. They went within off the shelf solution. They used a camera called the Mint Eye D one thousand and Mint

is spelled m y nt. This particular camera has two lenses in it for stereoscopic vision, and so together they can create a stereo image that is an image with you know, kind of a depth like a three D image with a resolution of two thousand, five hundred sixty by seven twenty pixels at sixty frames per second, so

it can do you know, this is video information. There's also a depth map mode which uses infrared light to help judge the depth of the things within its field of view, like how close is one thing versus another relative to the camera, and the depth maps resolution is at one thousand, two hundred eighty by seven twenty pixels at sixty frames per second. As I mentioned, these two

lenses allow the camera to simulate human binocular vision. So just as we perceive depth in the world around us using two eyes, you know, most of us, uh, this camera can do the same thing and judge which things are in the foreground versus the background, what things are closest to it versus furthest away, and make a better determination of which things within its field of view are worthy of attention, which will become important in a little bit. The camera has a more limited field of view than

a typical human. It has about half the horizontal field of view of persons, so it's periphery is more narrow, and it has a little more than a third the vertical field of view, so I can't see as much up and down as your typical person can. So any future animatronic figure might need a more expansive field of view to be able to interact with guests who could range and height from very small to quite tall. I mean,

all sorts of people go to Disney. So I do see that as a potential limiting factor in the short run, that any stereoscopic kind of camera would need to have a pretty good field of view for a robot to be able to interact properly with guests of different heights. Now, I decided to see how much this camera would cost for some normal schlub like myself, and the answer is less than four hundred dollars. So this is actually a

pretty inexpensive solution all things considered. And again so it's it's really more important for creating the basis for the work as opposed to saying this is a final product. And that's more or less the hardware side of things, or at least as specific as I can get based on the material available. Like I, I don't know what the power of their computer system was, you know, I don't know the specific types of motors they were using in the animatronic but from a high level we understand

what's going on. However, the real magic happens with the system that gives this hardware it's orders, and the team made the conscious decision to create the illusion of life rather than attempt to replicate human behaviors perfectly, which is a bit of a challenging concept. You might think, well, what's the difference, But I think I have a pretty decent analogy. If you've ever gone to see a stage play,

then you've seen sets. Maybe the sets were really detailed, maybe they were bare bones sets, But in any case, the sets are meant to create the illusion of a real place at a real moment of time. You know, it could be a room in the eighteenth century in a in a palatial estate, or it might be a modern day real estate sales office if it's a moment play,

or maybe it's a campsite. In any case, the sets and props are meant to convey the illusion of that place and time, and if you were to actually get up on stage and walk around, that illusion would very quickly be broken. But when you're sitting in the audience, it's up to you to use your imagination to fill in some of the gaps and suspend disbelief it is

a show. Likewise, the engineers who worked on this project talk about robot behaviors in terms of a show, and that means that the robot needs to react and move in ways that create the illusion of life, but it does not necessarily need to adhere completely to human behaviors.

This makes things much more simple, particularly since it removes of tricky questions regarding what sets of behaviors are the most human, because I'm sure you've noticed human beings and human behavior occur in a really broad spectrum, and what might be a typical set of behaviors for one person could be completely alien to another person. So it's a good idea to not try and define what sets of

behaviors are quintessentially human. When we come back, I'll talk about how the team determined how the robot would actually behave it's pretty cool, but first let's take another quick break. The team created an architecture to describe the relationship of various elements to create the behavior of an interactive robotic gaze.

To create this robotic eye contact, the layers include the camera, which is you know, the point of perception from the robot, a perception engine h and an attention engine which determines which things within the robots perception are actually worthy of attention or focus. A behavior selection engine and a library of potential behaviors, and the audio animatronic figures systems. It's hardware, the motor commands and motor states go to that, and

that's the layers in order from top to bottom. These layers explain the relationship of each element in sort of an abstract way, allowing us to understand how the robot processes and reacts to information. So the perception engine is designed to identify potential elements within the robotic vision, you know, separating things out from say just a static background, and the attention engine attempts to identify things within the robots

vision that merit focus. The attention engine generates what the team calls a curiosity score. So if that curiosity score is below a certain threshold, the robot won't quota quote notice something within its field of view. It's it's not enough to capture its attention. Certain actions, such as you know, waving at the robot, merit a higher curiosity score. So if the score ends up being above the curiosity score threshold, the robot will look toward whatever it was that you know,

quote unquote got its attention. The team decided it would be helpful to create a sort of scenario to work with, not just have you know, a robot randomly looking around, So their approach was to simulate an elderly man reading something like a newspaper or a book. Most of the time, the robot would be looking downward a bit, you know, it's head tilted down a little, as if it were reading something that was held more or less at torso level.

If something moves into the robots field of you, the robot could glance up quickly, just as a human would to assess what's going on, and if whatever is within the field of view creates a curiosity score lower than what the threshold is, then the robot just goes back to reading. If whatever is going on is above that curiosity score threshold, the robot might look directly at whatever it is that's happening, and then things could progress from there.

That's where the behavior selection engine and behavior library come into play. There are a few possible reactions, and the robot will choose one depending on several factors. For example, one such factor was familiarity. The robot would behave differently toward people it quote unquote recognized. It also wouldn't switch focus every time someone tried to wave it down, So if you were to distract the robot, it might look away from whatever it was looking at before and then

look to you once. Then it might look back at someone quote unquote knows, and if you were to wave at it again, you wouldn't necessarily get a response. So kind of think about how adults can be with kids, where the adults tend to develop a highly attuned skill of ignoring the child after a bit, even if the child is saying, but look, look, look, hey, Look what's what I'm doing? Look? And so on. So the team

created four basic states. The default state was called read, meaning it would appear as though the figure we're reading a book or newspaper at Torso level. The next state up is glance, where upon the robot would appear to glance away from the reading material to see what sort of ruckus is going on. This involved movement of not just the eyes but the head as well. So the head tilts up a bit and it looks for a moment, like the robot is looking away from the imaginary book

or newspaper. If the curiosity threshold is met, then the next state engage would pop up. This means that whatever it was that got the robot's attention is worthy of further focus. In the robot will direct its gaze at that thing. With the engage stage, which has a nice rhyme to it, the robot will attempt to make eye contact, which involves the cameras detecting the face of the person of interest, and then the computer system commanding the robot's

head and eyes to aim towards that detected face. The amount of time that the robot spends looking at a person is determined both by a minimum countdown clock saying you have to spend this amount at least looking at this person, and then there's the curiosity score that the robot has assigned to that person. So once that score decreases below the engaged threshold, the robot returns to read.

So if you happen to be particularly interesting, the robot will look at you for longer, and when you stop being interesting, the robot eventually goes back to reading its pretend book or whatever. The final stage is called acknowledge, and that was the name that the team gave for those times when the robot is seeing a person that is familiar to the robot. For the purposes of the tests, the familiarity variable was actually randomized, so in other words,

the robot wasn't necessary early familiar with people. It just was told it was familiar with somebody. So, in other words, that it could be a totally new person that walks up to the robot and the robot randomly assigns that person the familiar tag, and the robot will behave as if that's someone that the robot recognizes. Maybe they're just an old friend the robot just met. Is there a word for that? The robot system also had a sort of short term memory that the team called the guesthouse.

As people would come into the robot's field of view or the scene as the team called it, the robot would analyze that person and assign that person a numerical value to keep track of that person, and it would also keep track of how many times that particular person had been within its field of view, and it would keep track of the curiosity score that was assigned to that person. In addition to the states, the team described

lay years of show. Now this relates closely with the states I just mentioned, but it helps explain how the robot transitions from one set of behaviors to another, how does it make the determination to change from one thing to the next, and which behaviors will overwrite others versus behaviors that will always be present with the robot. All of this is necessary because of that variation I was

talking about at the beginning of the show. If the robot were just following a scripted set of directions, it wouldn't have to make these determinations because it would just follow the same sequence over and over. But because we have this variability, we have to build in a system for the robot to follow in order to make decisions. So at the base level you have what the team calls zero show. This is essentially the robot in off mode.

It is inanimate. But the next layer up is a live show, which has the baseline behaviors of simulated bree thing, eye blinking, and the scads. This level of show underlies all the other higher levels, so this is sort of always running in the background. You don't want the robot to suddenly stop breathing while it does other stuff. The next four show levels correspond with the four states of

the robots. So you have read, glance, engage, and acknowledge, and an engage show will subsume the glance and read shows. It will take over the robots behaviors, So the robots not going to display the behaviors of read and glance when engage happens. So it's that hierarchy of operations, and I find it really interesting to look at robot behaviors

in this way as that hierarchy of potential states. It's amazing when you break down those states and determine which should take priority given certain circumstances, and how long that state should remain active before it rever it's to a lower level state. Again, the team is trying to create the illusion of life. The robot doesn't have to actually lose interest or anything like that. It's just simulating it. This particular project was working within some pretty well defined

parameters and restrictions. The team acknowledge that their work is really meant to be a starting point for further improvements. They point out that older audio animatronics might seem lifelike at greater distances and for shorter durations. So, for example, if you were to ride an attraction where you go by a scene of audio animatronic figures at a decent clip and and there you know, good, twenty feet away. The limited amount of time and the greater distance that

are involved can help support that illusion of life. The animatronic figures don't have to be super convincing because you're not spending enough time and attention to see through the illusion, nor are you close enough to see it showed through. The more time you have and the less distance between you and the animatronic figure, the harder it is to create and maintain that illusion of life. Without an interactive gaze, Without eye contact, it becomes pretty clear that the animatronic

figure has no real lifelike quality to it. If you were to stand close to one of these older animatronic figures, you would notice that it's not really looking at anything in particular, and that its movements are a matter of routine. It's not a demonstration of spontaneous or seemingly spontaneous decisions. The Interactive Gaze project takes this a step up. The robot can recognize and acknowledge someone that is in the robot's presence, it can direct its focus and attention at

that person. This definitely is a step up in creating that illusion and works at much smaller distances of viewing than the older methods do, but the engineers admit it still has limitations. They point out that their approach as it stands, might serve as a way to reserve that illusion of life for a couple of minutes at the most, but beyond that the illusion would start to fade away.

They point out that as the distance between the robot and the audience decreases, and as the time of observing the robot increases, you have to incorporate increasingly complex and natural behaviors to maintain that illusion of life, and interactive gaze is just one element. Others could include stuff like a display of emotion. The bust has sort of a

little bit of this. It can it can imply a sense of emotion to some degree with the way it holds its eyes, but because it doesn't have any movement of its jaw or lips, and doesn't have any other means of really indicating emotion, this is pretty limited. So perhaps a robot that can hear and parse and respond to speech, you know, sort of like the voice activated digital assistance that are familiar to us, and you know, probably like the Amazon Echo or the iPhone or Android phones.

That might be something that really pushes that illusion of life. And of course there's also the physical appearance aspect. Now, you would never mistake this animatronic bust for a human I mentioned before. It's pretty creepy looking. It's got a plastic and skeletal quality to it that prevents you from ever mistaking it as a person. But the team points out the physical appearance of the robot taps back into

that problem of uncanny Valley. It might take a while to create something that's convincing enough and yet not repulsive to work as a robotic human animatronic. If you make it look too real, it's going to give people the creeps. I think, at least in the short term, we're more likely to see this technology used to create characters that are human like but still distinctly not human, in order to avoid that negative reaction when the uncanny Valley gets involved.

In other words, using the US to create an animatronic figure that looks a lot like a cartoon character, even a human cartoon character because well, you recognize the cartoon character as representing a human. Cartoon characters don't really look like humans. Usually they look like they have human qualities to them, but they still have cartoonish qualities to them, so you wouldn't mistake them for actually being human. Or you just you know, go the robot route or some

sort of animal career and you sidestep that problem. The engineers conclude their paper by talking about how the attention engine could, with some evolution, work for a lot of

different applications. So imagine that you design an animatronic that represents someone who's really frightened, and that kind of character might have a very low threshold for stimuli to push it to a higher state of attentiveness, right like a little sound might cause that character to perk up and look around quickly because that that character is supposed to

be frightened. Or you could create something like, you know, an absent minded book lover who only glances up from whatever book they're studying if something really exciting is happening, otherwise they just ignore it. They also talk about the bottom up approach to layering behaviors and deciding which behaviors will replace others that might inhabit a lower state. That

is really fascinating to me. Now, we're still a far away off from seeing these sorts of technologies make their way into official attractions, but based on what I've seen and read, I wouldn't be surprised to find them making their way into Disney parks in the next say, five years or so, depending on how the company budgets stuff.

Of course, the pandemic has created a particularly tricky situation for that branch of the Disney Company, even as other branches of that company continue it's global domination of all things entertainment. But the technology itself and the design philosophy of how to program a robot to behave as if it were doing so naturally, it's really neat to me. And as I said at the beginning, the paper is available for free to read, so if you want to

check that out, I highly recommend it. I think it is a fascinating piece of work, and as I said, it's not that difficult to follow. There's some math stuff that will probably, you know, lose a lot of you, but it lost me. I'm not I'm not trying to shame you. I couldn't follow all of it, but it is otherwise pretty easy to understand. And like I said, it is titled Realistic and Interactive Robot Gaze g A Z E, so check that out. It is really a neat paper. Just I apologize for the pictures that are

in there because they're creepy as all get out. That's it for me. I hope you guys enjoyed this episode. If you have suggestions for future topics I should tackle in tech stuff, let me know on Twitter. The handle is text stuff h s W and I'll talk to you again really soon. Text Stuff is an I Heart Radio production. For more podcasts from I Heart Radio, visit the I Heart Radio app, Apple Podcasts, or wherever you listen to your favorite shows,

Transcript source: Provided by creator in RSS feed: download file