Testing With Turing

Speaker 1

00:00

Brought to you by Toyota. Let's go places. Welcome to Forward Thinking. Hey the everyone, and welcome to Forward Thinking, the podcast that looks at the future and says Marvin, I love you. I'm Jonathan Strickland and I'm Joe McCormick. Joe, I got a question for you, Okay, ask me a question. What did you have for lunch today? I don't want to talk about that. That's personal, Okay, fine, all right, Well, what was the last movie you saw? Movies are weird

00:38

these days, you know. I'm going to ask one more question, Joe. Who's your favorite person to work with in the office? Obviously you Joe? One one? I lied one last question? Are you? Are you human? Why would you ask? Of course, your responses have left the question somewhat vague. I'm not

01:03

really sure. So it could be that you are, in fact some sort of simulation that is drawing upon a massive data bank of pre determined answers to questions, and and you're just drawing from that, and you're not actually genuinely coming up with responses. Why would you say that? Tell me more about your feelings? Are doctor space Zo

01:25

or Eliza? We wanted to talk today? About artificial intelligence and the Touring test and whether or not the Touring test is in fact a good test for artificial intelligence and what exactly is it and and it does it doesn't really do what we think it does, because there is that huge news story a couple of weeks back, a couple of weeks back, being in June of where in uh, Eugene Goostman, who is a chat butt, not a real human, not a real human, um it is

01:57

a chat butt, I suppose it's more technically correct, was announced to have passed the Turing test for the first time kind of sort of, well, not for the first time for this program. A lot of the media coverage suggested like, for the first time a computer program has quote passed the Turing test or quote beaten the Turing test, which there are multiple issues that we need to address to to actually talk about that, But first let's let's

02:25

go on and explain what what Goostman did. Well, what it definitely did was full ten of thirty judges, which is thirty three for anyone following along doing the math at home that we're participating in a celebration of turning on the sixtieth anniversary of his death. It was a celebrity guest judge panel. It was pretty fancy. There was one of the actors from Red Dwarf on there and by fool amument that the computer program tricked them into

02:54

thinking it was a human, right right. This program has been around since about two thousand one, when a team out of the Ukraine began designing it, and it poses as a thirteen year old boy from Odessa, Ukraine. It's taken part in a few different competitions like this over the years, and you can even find a version online

03:10

if you want to play around with it. But the news that this news that really brought it into the public eye was a press release again yes, claiming that it was the first chatbot to pass the Turing test in an open ended event. And open ended is kind of one of those keywords here, because a lot of competitions in which chatbots are put up against the Turing Test involve a closed series of questions that is predetermined.

03:35

It's a specific set of parameters. Either you are only allowed to ask certain questions, or the whole interview took place before the event and you are reviewing a transcript and then have to determine whether or not the participant was a human or a machine. So those are generally speaking, the different variations on this. We'll we'll get more into the actual touring test in a second, but that's sort of how they are performed these days, right, and so

04:03

this achievement is pretty cool. But it did get way crazy over hyped in the media, I think largely because the press release, which was mostly a quotation from one Dr Kevin Warwick, who was an engineer slash futurist, slash provocateur, slash cyborg right form, he designed it to be kind of overhyped, which means that it did drum up public interest in touring and chat bots and artificial intelligence and

04:30

technology at large, which is totally awesome. But the fact that the news media wasn't quite sure what the turing test was, or how this chat bought was interacting with it, or the fact that this was a chat bought and not, for example, a supercomputer, which is a ludicrous concept. Lots of people supercomputer defeat touring test. No, it was a it was software, Yes it was. It was a program, was independent, independent of hardware, It could run on lots

04:56

of different types of hardware. Probably your smartphone, for example. So so we might need a new Turing test, right, So before we talk about new Turing tests, let's talk about the old one. Shall we maybe make a distinction between what idea actually came from the early computer scientists, Alan Touring, and how that has evolved into what we popularly think of as the Turing test today. Was just

05:22

talking about. Yeah, yeah, so Touring test in general is kind of a misnomer because Touring He wrote a paper in nineteen fifty in which he was discussing the idea of machine intelligence and whether machines could actually possess intelligence, And his philosophy was that machines could demonstrate behaviors that we associate we humans associate with human intelligence, right, and that if a machine could produce such behavior reliably, then

05:52

it stands to reason that we call that machine intelligent, because it, to all appearances is intelligent. Therefore we might as well go ahead and extended the courtesy. So that was generally what he was saying, And he said that he believed that by a fifty years time, which would be the year two thousand, a typical computer of sufficient power and sophistication would be able to fool people thirty pc of the time that it was in fact a human and not a machine, that that was the level

06:21

of sophistication he expected. And this was more of a prediction, not a test. People have since looked at it as being a kind of test to apply to computers to see whether or not they pass this bar for machine intelligence, which is sort of a reverse engineering. Look at what he was talking about now. He used a specific example in his in his uh explanation of fooling someone thinking it was human, and that was a variation on something he called the imitation game. And the imitation game is

06:59

a game. It's a part the game. Yeah. The original idea would be that you have three players. You've got an interrogator who could be a man or a woman, and then you have a male player and a female player who will be interrogated. All three people are separated

07:13

from one another. They are not able to see and preferably not be able to hear one another, right, communicating through right and preferably communicating through teletype or some other form of non handwriting communication, just in case handwriting would give away some form of gender identity. Right, because it's the interviewer's job to figure out which of the participants is the man and which is the woman, right, and it's the man player's job to try and trick the interrogator.

07:41

It's the female player's job to try and get the

07:43

interrogator to get the answer correct. So the man will attempt to throw doubt into the situation, and the interrogator has to figure out without any other like visual or audible cues, who is who, And Touring said, well, let's replay this with saying it's a human interrogator, and then you have a human respond it and a machine responded as the interrogator's job to figure out whether or not the entity he or she is talking to is in fact human or machine, and suggested that by the year

08:18

two thousand, computers would be at a level sophisticate enough where thirty percent of the time you would not be able to tell you would you know, or thirty if you prefer, of the people who would interview said machine would be unable to tell that it was a machine. So Turing didn't really lay out a path to how

08:35

this would be achieved, did he? More or less, he he talked about the principles of computer science and that they would become more and more sophisticated to the point where this would become possible, but did not, in fact layout like, here's my fifty year plan for making computers intelligence. I guess he probably he wasn't talking about natural language

08:56

processing algorithms and such stuff. No, no, no, it was our general approach was talking about the idea of a machine possessing the general intelligence that would allow it to do something right. This is this was more of a philosophical perspective than anything else. And uh. On top of that, there were a lot of people who disagreed with Tourings philosophy. They said, you know, you're saying that because you're able to mimic intelligence that that means you have some form

09:23

of possession of intelligence. You've asserted this without providing any actual proof. One of the criticisms I read specifically stated that one of the things I had said earlier was how we say, if a machine appears to be intelligent, we might as well say it is intelligent. And uh. And one interpretation of that is that touring would say, well, let's say two people are talking to one another. They each assumed the other person possesses intelligence, partly because of

09:53

the conversation that's going on. Therefore, if a machine were able to hold a conversation with you, wouldn't it be just party is to go ahead? And a shouldn't it be entitled to write? Whereas the critic I read said, well, to be fair, humans pretty much assume that other humans possess intelligence. It has nothing to do with the conversation.

10:13

It has everything to do with the human being a human, Which doesn't necessarily mean it's based upon any kind of science or any kind of rational approach, but that is how humans are. I mean humans are also a little bit on the irrational side. I mean we we kind of also extend that courtesy to like our pets, which you know, yeah, we we at least we at least in in ways that we write about and and verbalize, will anthropomorphize lots of stuff, sure, like like our toaster.

10:44

Given enough capacity, we go like, that's a good smart toaster, or we'll say some things like this toaster hates me. It burns it either either just warms the bread or it burns it to a crisp. It just hates me. And that's when we're projecting things upon these objects that they don't inherently possess. But part of this philosophical argument is that we're really only assuming that other humans have this intelligence, right, I mean it's tied into sort of

11:10

the philosophical problem of solipsism. It's a thing nobody wants to assume that they are the only mind in the universe, or some people might want to assume that, but hopefully nobody wants to don't want to assume it, but proved me wrong, is what I'm saying. It's one of those where you just operate day to day under the assumption that that is not the case, but you have no

11:31

way to prove it's not the case. But philosophy aside, the turn test is being used as a very practical means of evaluating AI these days, right, at least as far as being able to mimic a humans ability to hold a conversation. Right. So, Yeah, it's sort of moved beyond the bounds of this philosophical discussion in which it was originally proposed, and now it has become a thing

11:55

people try to do. Yeah, it's become something that has been uh a goal in of itself, which again has become part of a growing conversation around AI and whether or not that's a legitimate goal sure, which is partially cool because it's encouraging competition and a good kind of sense of go get them fun amongst a lot of of programmers and computer scientists and thinkers, whereas other people say, yeah, but it's kind of diverting attention away from real problems

12:24

and machine intelligence that need to have more work put into them. But everyone's worried about making chat bought two thousand full of people that it's actually you know, Bob from Minnesota. Yeah. Whether or not those people were actors on Red Dwarf, yeah, it doesn't matter. Okay. So in the in the sort of popular thinking today, the popular tech thinking, what is the Turing test? Now? What? What does what does it mean when someone says I've beaten

12:52

the Turing tests? The Turing test today is based again on that prediction that by two thousand a computer would be able a full thirty percent of the time, or in thirty percent of the cases, the computer would be able to fool someone that it itself was a human being. So now we've taken that on and turned it on its head and said, the Touring test is a test where if a chatbot is able to convince a human interviewer that it in fact is a human or more of the time it passes the touring test and it is,

13:24

I guess, showing some form of intelligence. I mean, that's the that's the real problem where it all breaks down right as the definition of intelligence. And part of it was just let's see if we can create a program sophisticated enough so that it can respond to questions in a way that is natural and will fool folks into thinking, oh, it's another human being on the other end of this terminal.

13:46

And as we've talked about before on this show, the problem of computers understanding natural language is a huge one. So it's not a terrible benchmark as benchmarks go. No, it can in fact improve lots of different areas and artificial and elligience. It's just a question of how do

14:01

you implement it. If you're implementing it in a very superficial way, you might be able to get some some fairly interesting results from an actual quote unquote touring test where you submit this to judges and the judges believe in fact that it is a human. But you might be doing so using essentially smoke and mirrors away where the computer does not have any real natural language processing ability to it. It just responds in a way that convinces the judges that it, in fact is a human.

14:30

It's got some clever tricks, right. And in fact, most of the chatbots that have passed the Turing test, and this has gone back to the nineteen seventies, is it's not a new thing that has just happened. Whoa you said past the Turing test? I mean, I thought based on the media reporting that the thirteen year old Ukrainian boy, the fake thirteen year old Ukrainian boy, Eugene Goostman was the first to pass. No, we we previously to that had a fake paranoid schizophrenic yeah, Perry r r y

15:00

and a fake psychologist Eliza Eliza. Yeah, so Eliza was a program. In fact, Eliza is probably the first really well known chatbot that kind of I didn't necessary to probably heard me doing a little bit of Aliza earlier. It's a Rogerian psychiatrist personality. Actually, Eliza had several different personalities she could assume, but the doctor one is the most popular of the most well known of all the variations. And then you have Perry who mimic the behavior of

15:31

a paranoid schizophrenic, including a backstory. The the programmer for Perry created basic facts about Perry that Perry would share, sometimes spontaneously or or if you just got quote unquote frustrated they were. The way this would work is that the program would analyze what was being said just by looking for certain keywords and then trying to create us. And it's based on those keywords. Not really a sophisticate analysis.

16:00

It was just very basic, and Perry would occasionally throw in complete non sequadors, but in a paranoid schizophrenic kind of way, which was convincing enough to at least full some psychiatrists who were reading transcripts of interviews with Perry and saying, no, this actually sounds like it could be

16:17

a human being. Okay, so it sounds like in all these cases we're dealing with a chat about that is this assuming the personality of sort of a specialized interlocutor, like, uh, you have a doctor who has a very particular way of talking, meaning not really responding to what you're saying and just saying how does that make you feel? Right?

16:37

Or you have a paranoid schizophrenic who's sort of range of response is naturally very limited, especially a skewed a skewed view of reality, or or you have Eugene Goostman, even the more recent one, who has probably a limited command of English and a limited understanding of the popular culture outside of his own community. Just all the world works in general. If you're talking about a thirteen year old, you're not going to have the expectation of maturity and

17:04

sophistication at the world. So so that's a big criticism year olds listening who are probably fabulous humans. You're probably you, You're You're the exception, okay, but all those other year olds, you're the future. And you know what Cheers was about.

17:22

So we'll get into that. So the the the point here is that we have not seen really an example of a chat bot that is able to assume the role of a typical however you wanted to find that, you know, average human being who communicates and whatever the

17:42

language of choice happens to be, usually as English. But let's say that we had a the ideal goal here would be have having a chat bot that communicates fluently in whatever language was picked for the competition and can answer questions that any person the age of said chatbot would be able to answer and uh and be able to to roll with any of those kind of of um conversational venues. Right. Basically, we're looking for Lieutenant Data. Yeah, yeah, Mr Data, Okay, I want to move us along here,

18:20

please to ask the question. So, given that h this test is not actually necessarily a test, and that it has it has, depending on your interpretation, been bested in kind of limited ways throughout the years, why is it still popularly considered the bar for artificial intelligence? Well? I think mainly it's because, first of all, it mimics a part of intelligence that we all recognize, right, So that makes it easy for it's especially easy to sell it to a general public who may or may not be

18:53

very savvy as far as what machine intelligence entails. So it's an easy story. So that has If you can have a conversation with the thing, it's clearly smart. Yeah, So that has that going for it. Uh. Secondly, this is a non trivial problem. I mean, creating some form of computer or machine intelligence that can respond to human language in a way that is natural, whether it is doing so in a way that makes you think it's

19:19

human or not. That's a non trivial problem. I mean you're talking about humans have the ability to put the same kind of information into vastly different wording so that they're conveying the same meaning, but they could do it in multiple ways, and you have to be able to design a program and a machine that can roll with that, that can understand or at least respond in all these

19:45

different variations with the appropriate information. So whether or not you're talking about a chatbot or you're talking about some

19:51

form of interface. Let's say it's a call center where you're trying to get help for a problem, there could be a hundred different ways for you to phrase what has gone wrong with whatever the issue is, and the machine intelligence has to be able to interpret all one hundred of those ways and come back with the appropriate response that is the same for all one hundred different ways that you you could phrase it. These are really

20:16

tough challenges. We've seen it uh demonstrated in lots of different types of technology, the IBM S Watson being a big one that we can point to, where it took the lots of different subtle things about the English language and was able to interpret them and respond in the

20:33

format of a game show and do it successfully. And that was pretty tricky because it involved not just analyzing all right, well, this, this word here in this sentence is the object, this one is the verb, this one's the subject, and not just consulting Google and coming up with answers to questions, but understanding puns for example. Yeah, it had to be able to uh, to respond to word play things that that, so that could interpret what

21:01

was actually being asked of it. That was a significant achievement. So it's something that you can point to and say, look how hard this problem is. It's something that's very easy for us humans. We we learn language, we even learn to uh you know how, it's flexible, We even change it ourselves. I mean there are words that exist now that didn't exist when I was a kid, that, you know, things that we entered the just the popular parlance and then became actual words. So it's something that

21:30

is difficult to to program into a machine. It's probably one of the real reasons why this bar is seen as like the defining element of our artificial intelligence, because it's not an easy problem. Law zs there you go. But so okay, so so there there are good reasons for this to be the bar, and there are good

21:54

reasons for it to no longer be the bar. Let's let's talk about a few of those, yes, please, Okay, Well, I would like to wing out that when we're making these sort of what you might read as criticisms of the Turing test, we're not criticizing Alan Turing and his vision for this. You know, he's one of the best human people qualitatively. Yeah, well, he certainly. He certainly was like the founding father of computer science. Like the reason computer science is the way it is is largely in

22:23

part due to his work. So there's that. No, Yeah, I just more I think we should focus on sort of the natural limitations of this concept as it is most often envisioned, and why it doesn't necessarily apply to artificial intelligence in general. One of them is that, in the way people use this test, it's not designed to talk about artificial intelligence in general, because artificial intelligence spans a lot more than natural language processing. I mean, it's

22:52

specifically focused on mimicking human behavior in a text based conversation. Now, that would have to be a feature of what people would call say, an artificial general intelligence, something that that a general intelligence should be capable of doing, but not the only thing it should be capable of. But AI research focuses on all kinds of other things. You have AI algorithms that recognize objects by sight, that's artificial intelligence.

23:20

You can have AI algorithms that control physical movement that can be artificial intelligence. You might even say that you know the room BA that's going around on your floors. There there is a type of artificial intelligence that governs its movements. Well, let let's put it to you this way, Joe. Let's say that I I tell you, all right, I've got two people here that I want you to judge

23:40

whether or not they're intelligence. One of them is able to tell you what happened in the very first episode of All in the Family, able to explain what happened in that episode. The other is able to navigate to a place it has never been before and be able to to get there by consulting various maps, Able to move through traffic, able to get to the destination and back safely, and do so in a reasonable amount of time. You would probably say, all right, well, both of those

24:15

seem intelligent to me. Sure you know one it's more fun at parties, but one it's more likely to get to the party. One of those were likely to drop you off a minute. Which one is fun at the party depends on how much you like all in the family. I go to parties just to have people explain the plot I was going to go. I was gonna go with the Jersey Shore, but I figured that would have been a low blow. So the no, no, I know. It's somebody who explains to you the plot of transformers

24:39

for which means they're not human. Joe, you explained to me the plot of transformers for the other day. I don't think I did. I think I just said product placement over and I think, actually, I think your explaination there was no plot. I think it's what your point was. Yeah. So, so a driver a driverless car Google car, for example,

25:00

you would say, possesses many aspects of intelligence. It would be able to in its in its final implementation, the way it's been described, you would get into said car, essentially, turn it on and tell it where you wanted to go, and it would take you there. It would navigate potentially through very complex street layouts. I mean, you could be especially if you're going from one city into another city,

25:23

respond to traffic and obstacles in real time. Exactly, it would be able to uh to anticipate if some sort of obstacle came into its pathway and be able to react to it. So that all of these things, being able to recognize obstacles, being able to react to them, being able to emerge into existing traffic, being able to go from one location to another, even if that car

25:45

has never been there before. All these things are aspects of intelligence that would never come into play if you only concentrate on the touring test, as this is the example of whether or not a machine is intelligent. Yeah, totally true. Okay. I want to offer another limitation, which is that using this popular vision of the touring test might be encouraging the wrong kind of development of chat

26:09

bots for natural language processing. I mean, obviously, anything that knows how to use words and sentences really well, that is a good step in the right direction, how to comprehend them, how to put them together. But chat thoughts that are designed to fool humans into thinking they're humans, it's all based on deception. Yeah, I've seen it suggested that we're not raising the bar for computer speech we're lowering the bar for human intelligence. That's also that's also

26:35

a case. Actually, I read some great articles that talked about actual touring test procedures where humans were considered to probably be the machine and the machines were considered to be humans, partly because one of the things about this touring test is it doesn't say that the machine has to be honest at all. In fact, it can't be. It's got to deceive you into thinking it's humans. So by it's very nature, it has to be able to answer questions that it honestly would have to say, I

27:03

have no answer for that. If it were just being a machine. For example, if you said what do your parents do? It would have to either invent something or say my parents died when I was I mean, it wouldn't be able to say I don't have parents because of my computer. That would that would end up eliminating it from the touring test unless you also assume that the human right and it's coming up with responses like Joe did at the top of the podcast saying like like, well,

27:29

how would I know that I'm where? In that case, you might have the interrogators say, well, I assume that the humans are also being dishonest that they are also trying to trick me. So if something is coming across is obviously mechanic, well then that has to be a machine, or it has to be a human pertending to be a machine. Rather, it can't be a machine that's just a poorly designed chatbot. It could totally be a machine

27:50

that's a poorly designed chatbot. So if you, if Joe, if you kept on saying like, I don't want to talk about that, or you know, let's change the subject, I might sit there and say, well, this is coming across as a chatbot, but I bet it's actually a

28:03

person pretending to be a chatbot. And meanwhile, there were other examples they were The one I read was specifically, and this really appealed to me, was specifically an interview with a Shakespearean scholar where the interviewer was asking questions and the Shakespearean scholar, being a human and being smart and being a little snarky, was responding in kind of a snarky way to some of the questions, and they just assumed that because it was a snarky comment, that

28:29

this was some clever programming, even though the person being interviewed was demonstrating an understanding of the subject matter, not just regurgitating things or or pulling together simple were sentences. They were actually being pretty complex. So for example, they might say, um, like, what's your favorite Shakespearean play? And the respondent said something along the lines of, you know, that's a really general question. I don't even really know

28:53

how to answer that. If we if we want to go with just the comedies, then I guess it happens to be Twelfth Night or some thing along those lines. And so it was. It was actually a very thoughtful response, and it was and the person was also trying to engage the interrogator into a conversation, and the interrogator would end up resisting, and they ended up saying, well, this

29:13

this Shakespeare scholar, one's clearly a robot. And this other one that kept on saying, like, trying to deflect the question, that's clearly a human pretending to be a robot. And they were They were wrong in both cases. They were wrong. I would I would really like to never be personally challenged on whether I'm a human or not. I feel like I would fail that game. Yeah, well, at any rate, so I'm a real person. So yes, it is lowering the bar for what we humans consider to be a

29:40

passing grade. But it also is all based on deception. Like you said, Joe, it's not it's not based on any kind of let's improve the computer's ability to do any particular task other than to fool people into thinking it's human. That's not terribly useful. In fact, that could be really bad for us to really go down that road and just get really good at making computers think trick humans. Yeah, let's not do that. Yeah, that's the first thing we're teaching. It's like before they're even able

30:12

to do all this other great stuff. For the main thing we're focused on is how to lie to us. At least we didn't teach them like how to hate us. That's their their line comes from an honest place. But then we have next. Another criticism is that leads to the idea that the machine is not doing any sort of thinking at all. It's doing, like we said, smoke and mirrors kind of approach to deflecting us verbal trickery to make us think that it's that it's thinking, but

30:42

it's not actually thinking. And that is illustrated and in one aspect in the Chinese Room thought experiment. Yeah, so this is kind of a higher level criticism, is kind of back up and say, wait a minute, I sort of this criticism says, I disagree with Turing's initial premise about how you rep resent intelligence. Right. The idea here is is that even if you were to create a chatbot that one dent of the time could fool people into thinking that it was in fact human, you could

31:10

not specifically state that the machine itself was intelligent. That the the whole touring test is based upon a premise that is faulty. Yeah, we've done a podcast about the Chinese Room experiment before, So if you want to go back, I believe it's called does your Computer No Chinese? Yeah, if you want to go back and listen to that, please check it out. It's cool one. But but it boils down to the idea that there are some philosophers and thinkers who say that UM, no programmatic system can

31:39

never understand the meaning of words and symbols. No, it simply is following directions, predetermined directions, and that it cannot go beyond those predetermined directions. So it is it is by its own design limited in what it can do. So, in other words, the basic ideas. Imagine you've got a

32:00

room that doesn't have any windows to it. It's got one doorway and under and you you are in the room, and the only other thing that's in the room is besides like basic furniture, is a big book of instructions. And occasionally a piece of paper gets shoved under the door that has a symbol written in a Chinese Chinese character, and you do not understand Chinese. That's important for this

32:23

thought experiment. You don't understand Chinese. So you get this what is to you a meaningless symbol because you have no idea what it means, what it says, what means You open up the big book of directions and you find the symbol in the book of directions. The book of directions tells you what's symbol to draw. In response to that, you do, so you slide that under the door, and you just keep doing that over and over and

32:45

over again. To anyone from outside that room, it appears that the room understands Chinese because they put a symbol underneath, and they get the proper response shoved back eventually, But you, as the person inside the room, do not understand Chinese. You are simply following a series of directions that are just predetermined, and you have no understanding of that. Now. Of course, there are a lot of also philosophical responses

33:13

to this criticism. One of the main ones, and I'd say probably the most popular one and the one that makes the most sense to me, is the idea that, Okay, well, while the person in this analogy, basically it's a criticism of the scope of the analogy. It says that, Okay, in this analogy or in this thought experiment, the person in the room might not understand Chinese, but that it would make sense to talk about the system as a whole.

33:38

So the room, the person, and the book put together as a thing understanding Chinese, you could say that the room system does understand what Chinese is, just as you could say that a particular region in your brain doesn't necessarily understand English, but you understand English. Yeah. So it's not that everything single part of your brain is dedicated to language processing. It's not. So you couldn't say that this one particular part of your brain understands English. That

34:06

would just be untrue. But it also would would be untrue to say, well, the person therefore does not understand English. But you've got to look at the full system, right, So that is another counter argument to that particular objection. Okay, So the question might be, do we need an alternative to the touring toes something in the popular consciousness that represents the new bar that all artificial intelligence has to pass.

34:32

I I would say, yes, what might that look like? Well, how about instead of trying to get it to ape a human through text, we actually see if we can create a program that can comprehend stuff, that can consume some form of media, whatever that might be, and then be able to actually answer meaningful questions about that media. So it could be that you have this program and you allow it, you know, you feed it the information about a book, and then you might ask what were

35:04

the main character's motivations? Why did the main character do the actions he or she did in the context of that novel? Or you might have it watch a television program and then answer questions, So what happened first, what happened next, why did that happen? You know, that kind of thing, and that if a machine were able to comprehend something and answer these questions in a meaningful way.

35:28

That would go much further toward establishing this machine as being intelligent in this in a similar way that we humans are intelligent. Yeah, that's interesting. I like that idea, especially talking about visual media, like giving it a movie to watch and then say, asking questions about them. So that would encompass quite a few things all at once.

35:49

That would encompass, uh, visual data processing, recognizing the faces, being able to look at a strip of film and interpret all that is data out a narrative, understanding what's going on in the relationship of different characters and objects. Of course, it would include natural language processing. It would include understanding of cultural tropes and things like that, so sort of incorporating cultural knowledge into this engine, which is something that a lot of these chat abouts so far

36:19

haven't been able to do. That's why they have to deflect and say, I know, I've never seen cheers. I don't know what it's about. So that would do a whole lot all in in one fell swoop. At the same time, I wonder if I wonder if that's setting the bar too high. See, I don't know if it's saying the bar too high. If you're talking about establishing strong AI. If you want truly strong AI, then you

36:43

have to have that high bar set, right sure. Cognitive scientist Gary Marcus wrote a really great blog post for The New Yorker about this entire concept. He he pointed out that a true test of intelligence is learning new things at will, which this demonstrates. And also that sounds really simple, but that's just our human bias. Everything that we do is basically learning new things at will. But it's really hard for computers to do. Is we have

37:07

talked many times about on the show before. He closed his blog post with with this excellent tidbit and I quote, no existing program, not Watson, not Gooseman, nut Siri can currently come close to doing what any bright, real teenager can do. Watch an episode of The Simpsons and tell us when to laugh. Yeah, I mean, that's a great point.

37:27

I said that I won't. I won't assume a computer is intelligent until it is able to watch the Transformers movies and then ask you why you made it do that thing and then and it be hurt if it's if it's hurt, and it it considers you trade, Yeah, it feels that you are a cruel overmaster. Then you know you have created an intelligent machine, and you should listen to it and also probably apologize. Again. I think that's a really great way of framing the issue, and

37:56

that's very interesting and very ambitious. I still wonder the same thing I was just thinking about, like, is that too high to set the bar? I don't know how high you should set the bar. I mean, on one hand, it's great to shoot for the moon, you know, to to encourage really really amazing leaps of innovation. But if you set the bar too high, it might be discouraging because there's almost nothing to aim for it. It's not

38:19

even in sight. The other way to look at it is to say, if you don't set the bar too high, what are you saying about human intelligence? Because for the most part, what we're talking about here as machines still operating under an intelligence that is similar to human intelligence. That's another criticism, criticism that is not going to go away anytime soon, the idea that we're holding machines to a standard that applies to humans. We're not creating a

38:43

standard that particularly applies to machines. And therefore, any any test we come up with, whether it is really hard or really easy, is ultimately, uh a a a fool's endeavor, like this is not the right pathway to to to

38:59

go down. But still there are other alternatives we can talk about, and slightly more specific ones like that one reminds me of kind of the essay comprehension version or a bit of the s A T. Yeah and the verbal side yeah, well, not even the verbal but like the reading sure, sure, yeah, so they they Another similar

39:20

one to that is the Wino Grad Schema Challenge. There's actually a great paper that's all written about this particular proposed challenge that again, it's it takes aim to replace the Touring test as the standard for testing this particular type of AI. It's still looking specifically at how a computer is able to to interpret and respond to text, but it is not looking at a way for a computer to uh pose as a human being, which they said was a fundamental flaw in the design of the

39:53

Turing test. Well, I can respect that, So we can say it's narrowly focused on natural language processing maybe, but it doesn't go for those tricks, the little little games that these programs play, right, So it's built on the concept of recognizing textual entailment, which it was an earlier

40:11

proposed alternative to the Turing test. Now, that specifically is where you have a pair of sentences and you detect whether or not they are logically paired together properly, and you answer with a yes or no. Yes those logically are paired together, or no they are not. So here's an example. William Shakespeare's best known plays Romeo and Juliet. I know it could be Hamlet. That's fine, Williams Shakespeare's best known plays Romeo and Juliet. And then the second

40:39

sentence would be Shakespeare wrote Romeo and Juliet. So you ask any human being this thing, and they would say yes, those two fall together. Actually depends on how logical the human being is, because if you ask Spock and tell him he supposed to be operating out of a vacuum and not having any cultural knowledge, he would say no. Because Shakespeare might have been an actor. Interesting, that would

41:05

be my answer, completely outside of it. But still I've seen it positive that Falcans would certainly fail the Turing test. So right, you can be too logical, you can be so logical that you're not play. Right. Shakespeare's best known work is Romeo and Juliet, and then Shakespeare wrote Romeo and Juliet. Then would in that case I would say yes, they definitely, But as you phrased it, I could say no because Shakespeare might not have been the writer. But

41:31

that's fair. But you know that's ultimately the whole point is just trying to get two sentences that either are logically compatible or not. Okay, give me two that are definitely not. Um All right, let's see. How about Uh, some pets are dogs. I own a pet, so I have a dog. The no, definitely not, because those two things don't necessarily belong to each other. That would just be a very simple, basic example. But the one a grad scheme of challenges a little more of a variation

42:03

on this. It's Uh, You're given a single sentence that's followed by a question and two choices for answers. So here's an example. The trophy would not fit in the brown suitcase because it was too small. What was too small the trophy or the suitcase? The trophy would not fit into the brown suitcase because it was too small. What was too small? Are you quizzing me? I'm asking you, I'm asking you. The suitcase was too small? Right, Because if the trophy were too small, that doesn't make sense.

42:32

It wouldn't it wouldn't prevent it from into the suitcase. Correct. But here's the thing that pronoun It could refer to either the trophy or the suitcase. It's the context of the question. It's how we understand what the question is asking that we're able to give the answer. But if we were able to change one word in that sentence, we could completely change which answer was the correct one. So I said the trophy would not fit in the brown suitcase because it was too large. What was too large?

43:04

The trophy? Right? So by changing one word in the sentence, I have changed like I've kept everything else the same, I've kept the same answers. I kept the same everything else except for the changing the small to large, and

43:16

that changes which answer is the correct one. So if you were able to pose a series of these style of questions where by changing one word you could swap out which answer is the correct answer and otherwise you've got the same answers for each individual pair of questions, and the computer was able to reliably answer which one is the correct answer, then they would argue that is a better demonstration of machine intelligence because the computer is

43:44

actually comprehending what was being asked and coming up with the correct answer. Yeah, so that actually is more general than just natural language processing. It's not just making good sense of the grammatical units in this sentence. It's actually drawing on a knowledge base, the kind of NOLED ledge base that would tell you, yes, a thing would have to fit inside a suitcase. It would. It's having to get rid of ambiguity, right. The ambiguity is the pronoun.

44:09

The same thing, by the way, can be true if you use people's names and you just you make sure that the two names are are gender compatible. It's the same gender whether you know, it doesn't have to be one that's specific to a gender. But you could say, like Chris and Sam, and then you use the pronoun he and you don't you know, designate otherwise other than the context of the sentence who he is? And you asked that question like who is he? Is it Chris

44:38

or is it Sam? Then if the computer is able to reliably answer those kind of questions, then again that's

44:43

a good demonstration of machine intelligence. And it it all is about this ambiguous sentence being uh analyzed and understood and I think that this is a slightly easier problem for a machine to tackle, or for a program or to have a machine tackle rather because well Joe Joe used in our notes that this reminds you of the verbal section of the S A T. And I think that that in in contrast to the reading section, it's

45:11

it's just a slightly more logic based umblem. Yeah. And see the issue here though, is that this test tests comprehension as opposed to just pulling up from a database of pre existing responses or even just patterns that a chatbot can pull from where there's no understanding there, right, there's no And you could in theory, present any new type of sentence that follows this pattern and then ask any type of question that follows this pattern to a

45:46

computer capable of answering these things, and it would be able to handle it. It It wouldn't just be a preset all right, Well, there's these twenty questions that you know we're going to ask you, so make sure your computer can answer them, because that's just you know, you could just program a comp it or to say, yeah, when

46:01

the sentences in this order, this is the answer you give. Um, it would have to be able to answer any question it had never seen before and be able to comprehend what the question was asking and give the proper answer. So that is a different type of intelligence than just natural language processing. It goes beyond that. But these these are not the only ideas that are out there. I mean, we all actually kind of have probably our own concepts of how this would go. Joe, you mentioned one, well,

46:28

I was kind of joking, but kind of serious. Also my ideas you should have a computer program that has

46:33

to write fan fiction. You so, you you give it a story about a character, and then it has to look at that and then itself come back and write a short story about the same character basically in keeping with what human judges would agree is that character's psychological profile, how how this character should behave So not only would it have to understand the the motivations and personality of the character, it would have to be able to construct a narrative, which let's let's hold to at least at

47:06

least the standard that the narrative needs to make sense. Right, I'm not saying it has to be a real good story. I'm just saying it has to be the kind of story a human would write if they understood the character they were writing. Your standards are really slipping already. That is a tough one too. We had a whole episode where we talked about whether computers are capable of creating art and how difficult the problem of a computer writing

47:28

in a author's voice can be. Let's be fair, though fan fiction authors, only a small number of them are are what I would consider really really good. There are some really really good ones, and that's only because there's so much fan fiction. I would argue that there are only a few really good ones that that that that the percentage. I mean, there's just there's a great deal

47:49

of it on the internet. Okay, fair, fair enough, Like you would say, you know, there's only a few really good television shows on TV, but there's a lot of it.

47:59

So now I would say that this would be truly challenging because not only would it have to analyze the existing work so it could get a handle on what this character is and what this character's uh motivations and behaviors tend to be, then create a brand new situation for this character to inhabit, have the character behave in a believable way within that situation and have the situation at least makes some sort of sense, so that you don't have random events following one another with no sense

48:30

of causation. Right, So this wouldn't involve things like processing visual information and all that kind of stuff. I was trying to imagine something that would still be text based the way the Turing test is today, but it would require a much much deeper kind of idea of socialization and general knowledge and emotional intelligence that that these chatbots

48:53

really can't come close to, right. And I think that that concept of emotion being part of the test is a really interesting one because it seems like a lot of these new propositions for Turing tests all kind of hinge on the question of a machine being able to understand human emotions. So I mean, like, are we are we working towards a clump test being the new Turing test?

49:14

If you don't recognize what that is. It was the machine that determined whether replicants were replicants or humans in Blade runner Um or or yeah, just bringing Lieutenant data in here and having him talk to computers about whether or not their computers And well, I don't think the Winograd test would really involve that, but the other two definitely would. Yeah, yeah, And I'm not sure how I

49:36

feel about that element. I do think that the original Turing test is about creating a computer that behaves like a human rather than thinks like a human. And I think that we need to ask ourselves whether human behavior is the gold standard of intelligence, which which we talked about earlier a little bit here. Human intelligence is not the only kind of intelligence, nor is it necessarily the one that we need to focus on to create machines

50:01

that are truly useful to us. Right? And is that is that, in fact bio chauvinistic, That is that is a real word that I heard real researchers by saying, this machine isn't intelligent because it can't talk to me about the last television show I saw or the last mut song I listened to. And if it can't do that, then it can't be intelligent. That does seem to be pretty shortsighted, sure, I mean, especially if you think about it, There are plenty of human beings that couldn't pass the

50:27

Tearing test. There are plenty of human beings who haven't passed the Teares right, right, Um, So, like she experienced scholars for example. Right, right, So, so maybe maybe we need to in fact revise our concept of intelligence here um, even beyond the point where a single test is useful in this question. Yeah, That's exactly what I was wondering when I was doing research for this. I started thinking,

50:52

maybe there just isn't a test. Maybe there isn't one test that we should think of as the standard bar for artificial intelligence, something like the Turing test or one of our alternatives. It might be a really good sort of goal post in a narrowly focused part of artificial intelligence, but I can't imagine what the test would be for

51:14

for the truly general intelligence question. Right. So. In other words, these tests might be useful in that they can give focus to people working in artificial intelligence specific goals to work toward. But even if you achieve that goal, that

51:29

doesn't necessarily mean that you have achieved strong AI. It may be that we move the goal posts, which happens with us trying to define what human intelligence is, right, we we end up kind of it ends up being one of those arguments where you say, well I can't necessarily tell you what it is, I can start telling you what it isn't. And eventually we're gonna eliminate enough stuff so that we have whatever the Colonel is and that will be our definition. But we're not there yet.

51:55

The same may be true with machine intelligence. In fact, I assume it what really will be. And one of the things I propose is that instead of creating actual tests, things that we think the machines need to overcome and saying, well, now we've reached the point where we can say this machine is truly intelligent, another thing we could do is just watch and look for examples of things that we

52:17

consider to be part of intelligence. And again we're coming from it from a human perspective, because most of us are humans, you know, Yeah, then to some degree or another, And so we think of it as things like things that that humans exhibit, like curiosity, the desire to know how something works, and the the innovation to find out how it works by trying different things, experimentation. The Board

52:42

had curiosity. The Board also as a work of fiction, and also I think you would agree the Board had intelligence, so that's not even relevant to this discussion in multiple ways, uhize, But yeah, this idea of having curiosity, experimentation, innovation invention a machine inventing something like you were saying with the creating a fan fiction, that's invention, right, That's not just regurgitation. It's actually being able to understand rules at least on

53:14

some level. Maybe not human understanding, but it is able to apply these rules and create something new with it. This is something that is not easy to do, and I think until we have machines that are capable of doing all these things, we won't really say that we have a strong AI And in fact, we may never get there. And that's okay. If we never get there, that's fine. We are we are still going to be developing incredible machines that can do things that will make

53:41

our lives much easier on us. That it may not mean that we're going to have a deep, meaningful conversation with our computer, but it may mean our computer is able to get us all the things that we need. At that time, I wanted to follow up on exactly that thing, because I was just thinking, maybe the real test isn't a test we come up with, but a test that's determined by utility. I mean, all these tests

54:08

we've been talking about are just sort of exercises. They're not. Yeah, they're they're not what a machine can do for us. There what we can get a machine to do in front of us. That's a good way of putting it.

54:23

So yeah, it'll be interesting. I think we'll see more debate about the UH, the usefulness of the touring test as any form of measurement of machine intelligence, will see UH, probably see other teams try to tackle these other harder, arguably problems, and who knows, maybe within our lifetimes will actually see machines that at least approach this this concept of strong AI, or maybe we'll get to a point where most people say, you know what, our our work

54:53

would be better suited going towards some other part of machine intelligence that could directly benefit us, rather than drive for a goal that is looking increasingly difficult to achieve because every time we eliminate one thing, five more things pop up and then we say, oh, this this problem was more complex than I first anticipated. So um, I'm interested to see how it goes. Maybe our listeners are too, Hey, listeners,

55:17

are you interested to see where artificial intelligence goes? Do you have any thoughts on the subject, You've got any opinions or or questions? Maybe you want us to clarify something, or maybe you just want us to talk about some entirely different subject, and you say enough with the AI already talk about. I don't know the future of lawnmowers. Let us know. We'll definitely give it strong consideration. You can drop us a line on Facebook, Twitter, or Google Plus.

55:42

Our handle at all three is FW thinking, and we will talk to you again really soon. For more on this topic in the future of technology, visit forward thinking dot com, brought to you by Toyota Let's Go Places

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript