A Quick Chat About CAPTCHAs

Speaker 1

00:04

Welcome to Tech Stuff, a production from iHeartRadio. Hey there, and welcome to tech Stuff. I'm your host, Jonathan Strickland. I'm an executive producer with iHeart Podcasts. And how the tech are you. I thought today we could do a real quickie, because you know, sometimes it's nice just to do a short thing to talk about a subject in tech. And there are a whole bunch of tech Stuff episodes

00:32

in which I have talked about the Turing test. So there are a lot of different variations of the Turing test. It's based off a thought experiment from Alan Turing, the computer scientist, very influential, very important in World War Two. He helped crack the Enigma code. And actually the movie that sort of depicted his efforts in cracking Enigma is called the Imitation Game. Well, the Imitation Game makes reference

01:03

that is like part of the Turing test. So he kind of proposed this test when people would ask him if he thought machines would be capable of thought. Now keep in mind, like this is back in the forties and fifties, do you think that machines will be capable of thinking? And he said, I don't really think that's a very interesting question. For one thing, I don't know

01:23

that there's any meaningful way to answer it. However, I do think we can be a little more precise if we think about it in terms of kind of a thought experiment, a test. So imagine this is the situation you find yourself in. You go into a room and there's a computer terminal there, and that's it. You know, you can't see into any other rooms or anything. It's just a desk with a computer terminal. And you sit down at this terminal and there's a little prompt there

01:52

that lets you get into a chat session. And you enter into this chat session, and you have five minutes and you can ask the person on the other end of the chat session any questions you want within that five minute time frame. And once those five minutes are up, you're asked to determine was the person on the other end of the chat an actual human being or was it a computer program? Was it some form of artificial intelligence?

02:20

A bot is what we would call it today. And if you are unable to determine whether the subject on the other end of the chat is human or a bot to any reliable degree, then you could say that, oh, that that program passed the Turing test, I could find no way of telling the difference between that computer program

02:45

and an actual person. Touring suggests that due to advancements in computer science, he suspected that people would have at best a success rate of around seventy percent to be able to tell whether or not the quote unquote person on the other end of the church chat was a human being or a computer program, And he said he expected that to be capable in just a few years time.

03:06

It took a little bit longer than that, but I would say that with the sophistication we've reached with chatbots these days, I think you could fairly conclusively say that we've got programs out there that can quote unquote beat the Turing test. Part of the problem is that Turing was saying that in the future, these programs are going to be sophisticated enough that they will fool people into

03:30

thinking it's another person. He wasn't saying, oh, you have to meet this specific threshold for your system to have achieved beating the Turing test. That would come afterward. Other people would kind of create the criteria. But since then people have subsequently used the phrase Turing test to reference. Essentially any kind of task designed to determine if a machine has or at least appears to have the property of intell and when I say that, I mean really

04:01

general intelligence. But there's another specific use of Turing tests that I would like to bring up today, and that is the completely automated public Turing test to tell computers and humans apart that once you turn it into an acronym, becomes cap chop. So these are those little tasks that you occasionally encounter on certain websites, and they require you to do something like you type in a string of characters that are displayed on screen. They're usually deformed in

04:36

some way and up against a crazy background. Or you might be given a big selection of images and told to pick out all the ones that have a cat in them or something. Or you might have to drag a little picture of a puzzle piece into an image where it fits into a very specific spot. All of these are meant to separate out actual human visitors to a website service versus all the automated programs or bots

05:03

or whatever you might want to call it. So I thought it would be fun to do a quick episode on where captures came from and what purpose they serve kind of touched on that already, but also how they fit into the grand picture of artificial intelligence, because interestingly

05:18

they play a pretty important part. They have helped drive the development and advancement of artificial intelligence, not necessarily in a way that is helpful to everybody out there, but it certainly has served as a way to get people thinking about how to tackle certain AI problems. So our story actually begins with the good old website Yahoo. Y'all remember Yahoo. I mean, it's still a thing, but I remember a time where Yahoo was practically synonymous with Internet

05:50

for a lot of folks. You may not even remember this if you haven't been on Yahoo in ages, but once upon a time, Yahoo was sort of an ordal to the rest of the Internet. Yahoo was kind of like a landing page. A lot of people had it set as their homepage, so when they would go into a web browser, they'd go right into Yahoo and you would find articles there and all sorts of other links, as well as chat rooms and of course the search engine where you could search for other stuff online besides

06:21

the stuff that just popped up on Yahoo. Well, in those chat rooms, moderators were running into a really serious problem. The chat spaces were becoming invaded by bots posing as people. Now this is in two thousand. The bots were not particularly sophisticated, but they were creating a lot of spam, Like, they were jamming up chat spaces with just spam messages while people are trying to chat. In some cases, they were gathering personal information of users in an effort to

06:52

exploit those users in some way or another. So Yahoo didn't want this to keep going. It wasn't reflecting well

06:59

on the companies. So they turned to the computer science department at Carnegie Mellon University in order to see, Hey, is there some way that we could, you know, kind of like have a bouncer out front, a gate keeper if you will, that would allow humans into the various systems so that they can make use of them the way they were intended, but prevent all the robots, all the AI programs, all the computer software or algorithms, however you want to define it, keep them from getting access.

07:31

So a team led by Manuel Bloom and including folks or Blum, I suppose, and including folks like John Langford, Louis von On, Nicholas Hopper, and others tackled this challenge, so they needed to come up with a test. Now, in an ideal world, the test would be a cinch for a human being to complete, but it would be a real stumper for algorithmically driven by And that is

08:01

the basic philosophy of capture. Make a test that humans find really easy to complete, perhaps even trivial, like it's just a mild inconvenience, as they say, but for bots it's like a turn away. You're never going to be able to get this. Now, some of y'all might be saying something along the lines of but Jonathan, whenever I run into captures these days, they're sometimes really hard, Like it's hard to see what they spell out. I'll try and type things in three four times and get kicked out.

08:35

And you're right, that is a problem. It is something that actually is happening. It doesn't mean that you're not human. If you're having like existential crises. I would like to set your mind at ease by saying you're probably human. I mean, I don't think I could say anything for certain,

08:52

but I feel fairly confidence saying you're probably human. But the reason why captures have become really difficult in some way cases anyway with some specific types of captures is largely because other programmers figured out how to make better

09:07

automated programs that can parse and respond to captures. So as one group of programmers figured out how to design tools to defeat a capture, the captured designers would go back to the drawing board to create new tests to be more challenging for those bots, to say, well, they got good at this, let's change these things and reintroduce the capture so that this will trip up those systems because while they're good at what we used to use

09:40

for gatekeeping, they've never run into this before, and unfortunately that sometimes means that the tests become more challenging for human beings as well. It no longer is a case where something is trivial for a human but difficult for robots,

09:56

at least for certain types of captures. And that's particularly true so if the human has some impairments like if they have color blindness for example, or some other visual or impairment like there are real issues in making captures that do what they're supposed to do, that is weed out all the non humans but also be accessible to all humans, even those who might have impairments that would otherwise make it difficult or challenging to complete a capture.

10:25

It is not an easy path to walk. We're going to take a quick break. When we come back, i'll talk more about the capture story we're back. So in the early days of captures, they mostly took on the form of distorted text that was printed over a busy background.

10:55

And the idea was that most automated programs would not be able to recognize distorted texts like it would be an image, not just text letters where it would be able to read like the code used to generate the letters and then say, oh, well that's these letters that can replicate that and get through no problem. You had to have something that was going to really stump them. Now,

11:18

image recognition is a pretty tricky science. I've talked about it on this show before, Like, training computer systems to recognize images takes a lot of time and effort and lots and lots and lots of samples so that the computer system can quote unquote learn what those images represent. Now, it's one thing to teach a computer how to recognize standard letters that are in a recognizable font. So if the Internet only ever used one font and only used

11:48

one size of that font. Then it would be relatively trivial for those who want to defeat captures, because once you train a computer vision system on what a lower case T looks like, for example, then the system would recognize a lowercase tea every time one popped up. But of course, there are lots of different fonts and typefaces on the Internet, and they come in different sizes and

12:11

colors and on different backgrounds. So teaching a computer system what a times new Roman lowercase tea looks like against a blank background doesn't mean it's also going to recognize a lowercase tea and some other font on some crazy background. Plus maybe the tea is a little wavy, a little distorted, so distorting that text makes it more challenging for image recognition systems, like they're looking for defining features to be able to recognize the image of a letter with the

12:40

actual letter. You see, humans, when we teach a human what something looks like, it's a lot easier for humans to associate other things that look kind of the way the first example did, but maybe not exactly the same. So in other words, like the example, I always use our coffee mugs right. If I show you a coffee mugdug, and I say this is a coffee mug, and then I show you a second kind that looks totally different, different color, different size, you know, whatever, maybe has different

13:08

writing on it, whatever it might be. And I say, this is also a coffee mug. And then I show you a third example that looks unlike the first two. You could say, oh, okay, I get the idea. I get the different features that make up what a coffee mug is. I understand now. And now when I encounter different types of coffee mugs, even though they might not look anything like any of the other ones I've encountered,

13:32

I know, Okay, that's probably a coffee mug. Until someone says, no, that's a teacup, and then your world is turned upside down. But you get what I'm saying. Computers don't work that way. Computers like, if you teach it an example is a thing, it doesn't necessarily understand that similar but distinctly different versions of that same thing fall into the same category. That

13:55

takes lots and lots of training. So the whole idea of distortion was that this would make it very tricky for most systems to be able to parse that information and be able to put it in reliably and to fool the capture system doesn't mean that it was fool proof. Over time, those systems did get better at being able to recognize those figures that were on screen, even better

14:20

than humans could in some cases, which is obviously a problem. Now, there have been lots of other capture systems, not just Capture. For example, there's one called Asira Asira. Asira did something I mentioned earlier in the episode. It would present the visitor with a collection of photographs and they would include cats and dogs, and it would ask you, okay, identify the pictures that have cats in them. So that was one way to get around this was that it wasn't

14:51

just figuring out text. It was differentiating between cats and dogs, something that again computer systems couldn't do just natively. They had to be taught how to recognize the features that belonged to a cat versus those that belonged to a dog, just the same as all other image recognition software. The folks over at Google developed Recapture, and that actually served

15:13

a dual purpose. It was kind of sneaky. So with Recapture, you would go to a website and you would be greeted by some you know, kind of grainy text, and you'd be asked to type it out. You'd actually get a couple of different ones, not just one. And this text was from scans made of physical digitized books, so in other words, books where they had put the page down on a scanner and created a scan. So some of these books were in you know, pretty bad shape.

15:41

They were at all crisp, clear images. So your first capture you'd be presented with, Google actually knew the answer to whatever the word was. So let's say the word is salamander and you type in salamander, and so Google says, all right, I already knew that this scanned word is salamander. This is obviously a person who has typed this in. But the second image would be a scan from a book. Maybe it'd be a really smudged one, like one that's harder to read, and it would ask you, okay, was

16:16

this word. Let's say it's surgeon and you type insurgeon. Well, the secret sauce here is that Google didn't know that that scanned word was surgeon. What Google was doing was crowdsourcing crowdsourcing the effort to figure out what the text in this scanned image actually said. So if you and thousands of other people all put the same word in when you were encountering this particular scan, Google would say,

16:47

all right, that word is very likely surgeon. Because you know, ninety eight percent of the people who were shown this recapture typed surgeon in. So now we know that that word is which meant that they could essentially transcribe these digitized texts by using the crowd to do the work for them. And that is kind of the heart of where capture and AI meet. That captures have been used one to help train AI so that it's more effective.

17:21

Like if you've encountered other Google ones where it's like pick all the images here that have motorcycles in them or stairs. Well, part of that is training Google's image recognition systems so that they're more accurate. Right, Like an image recognition system might have trouble differentiating an actual like stone staircase out in front of a building with a pedestrian crosswalk, because you know you've got those those broken lines on a crosswalk, those could look like stairs to

17:53

a computer image recognition system. So by giving users the task of hey, identify all the excit samples in this list that have stairs in them, Google starts to train its own image recognition algorithms to be more effective and more accurate. So in a way, we were essentially being used as free labor to make these AI systems more accurate, just so that we could get access to whatever it was we were trying to visit, whether that was an online shop or a chat room, or you know, whatever

18:27

it might be. So, yeah, we we've been working for free, y'all. Actually it's in some cases we've been working for free and denied access to tools that we wanted to use because the captures were too hard for us to be able to solve. But yeah, that's that's the quick story about the history and evolution of captures. Clearly they're still used today. Sometimes it's something simple like click this box to prove your human that kind of thing where it

18:54

requires you to take an action. Those obviously are much more simple for humans to comp than for robots, so those still follow the philosophy of the original captions. A lot of other ones, though they get pretty tricky, to the point where sometimes I'm discouraged from even going further and visiting the website in particular, just like, you know what, I don't need to feel stupid because I couldn't find all the fire hydrants in these photographs, so I'm just out.

19:22

But yeah, that's it. And like I said, it plays a really important part with AI. It's kind of a seesaw effect, right, Like you create a barrier that AI can't get over until it can, and then you have to go back and create a harder barrier. And meanwhile, the folks developing the AI keep making advancements that the AI gets more sophisticated and powerful over time. So yeah, delicate balance and not everybody benefits. As I said, hope

19:51

that that was interesting and informative to y'all. I hope you're all doing well, and I'll talk to you again really soon. Tech Stuff is an iHeartRadio production. For more podcasts from iHeartRadio, visit the iHeartRadio app, Apple Podcasts, or wherever you listen to your favorite shows.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript