Machine Learning 101

Speaker 1

00:04

Welcome to tech Stuff, a production from I Heart Radio. Hey there, and welcome to tech Stuff. This is your host, Jonathan Strickland. I'm an executive producer with I Heart Radio and I love all things sex. You know, folks, Back in nineteen eighties, six comedy science fiction film that I saw in the theater about a robot the games sentience and becomes a total goofball what it will? It hit theaters in eight six and it was called Short Circuit.

00:39

The movie starred Steve Gutenberg, Ali Sheety, and lamentably a white actor named Fisher Stevens playing a non white character, someone who is Indian. I should add that's not Steven's fault. I mean, he auditioned to be in a movie and

00:55

he got a gig. He didn't cast himself in the film, and he has since talked about his experiences realizing the problems with a white man playing a non white character, but setting aside all the problematic whitewashing, the movie showed this robot, who in the course of the film names itself Johnny five. Learning. It learns about the world around it, It learns about people, It learns about human concepts like humor and emotion, and the general idea was pretty cute.

01:26

Now the nifty thing is machines actually can learn. In fact, machine learning is a really important field of study these days, complete with its own challenges and risks. I've talked about machine learning a few times in the past, but I figured we could do a deeper dive to understand what machine learning is, what it isn't, how people are leveraging machine learning, and why. I said that it does come

01:51

with risks, so let's learn about machines learning. It will be impossible to talk about machine learning without also talking about artificial intelligence or AI. And this term artificial intelligence is a real doozy. It trips people up, even people who have dedicated their lives to researching and developing artificial intelligence. You can get two experts in AI talking about AI and find out that because they have slightly different takes

02:25

on what AI is, there are some communication issues. It's not as simple as red versus blue would have you think, what does the A stand for? So when you really boil it down, it comes out as as no big surprise that there's a lot of ambiguity here. After all, how would you define intelligence just intelligence, not artificial intelligence just intelligence? Well, would it be the ability to learn, that is, to acquire skills and knowledge? Or is it

03:01

the application of learning? Is it problems solving? Is it being able to think ahead and make plans in order to achieve a specific goal? Is it the ability to examine a problem and deconstructed in order to figure out the best solution a more specific version of problem solving? Is it the ability to recognize, understand, and navigate emotional scenarios? Now,

03:25

arguably it's all of these things and more. We all have kind of an intuitive grasp on what intelligence is, but defining it in a simple way tends to feel reductive and it leaves out a lot of important details. So if defining just general intelligence is hard, it stands for reason that defining artificial intelligence is also a tough job. Heck, even coming up with a number of different types of AI is tricky. And if you don't believe me, just

04:02

google the phrase different types of artificial intelligence. Never mind, you don't. You don't really actually have to do that. I already did it, though, Feel free to do it yourself and check my work if you like. When I googled that phrase different types of AI, some of The top results included a blog post on BMC Software titled four types of Artificial Intelligence. But then there was also an article on code bots that was titled what are the three types of AI? And then there was an

04:35

article from Forbes titled seven types of Artificial Intelligence. See, we can't even agree on how many versions of A EITHER are because defining a I is really hard. It largely depends upon how you view AI and then how you break it down into different realms of intelligence. Now we could go super high level because a classic way to look at AI is strong versus weak artificial intelligence.

05:06

Strong AI UH sometimes called artificial general intelligence, would be a machine that processes information and at least appears to have some form of consciousness and self awareness and the ability to both have experiences and to be aware that it is having experiences. It might even feel emotion, though maybe not emotions that we could easily identify or sympathize with. So this would be the kind of machine that would

05:39

think in a way similar to humans. It would be able to sense its environment and not just react, but really process what is going on and build and understanding. It's the type of AI that we see a lot in science fiction. That's the type of AI of Johnny five from Short Circuit, or how from two thousand one, or the draw aids in Star Wars. It's also a type of artificial intelligence that we have yet to actually

06:06

achieve in the real world. So then what is weak AI. Well, you could say it's everything else, or you could say it's the building blocks that maybe collectively will lead to strong AI. Week AI involves processes that allow machines to complete tasks. So, for example, image recognition software could fall into this category. Once upon a time, in order to search photos effectively, you needed to actually add meta data

06:39

like tags to those photos. So, for example, I might tag pictures of my dog with the meta tag dog, and then if I wanted to see photos of my pooch, then I would pull up my photo app and search the term dog and all the photos that I had

06:58

tagged with the word dog would show up. But if I had failed to tag some pictures of my dog, those pictures wouldn't pop up in search because the computer program wasn't actually looking for dogs and my photos, it was just looking for photos that had that particular meta

07:14

tag attached to it. But now we've reached a point where at least some photo apps are using image recognition to analyze photos, and these will return results that the algorithm has identified as having a reasonable chance of meeting your search query. So if I used an app like that and I put in dog as my search term, it could pull up photos that had no meta tags attached to them at all, because the search is relying

07:41

on image recognition. Now, this also means that if the image recognition algorithm isn't very good, I could get some images that don't have a dog in them at all, or it might miss other images that have my dog in them. But my point is that the ability to identify whether or not a dog is in a particular photo represents a kind of weak artificial intelligence. You wouldn't say that the photo search tool possesses humanlike intelligence, because

08:10

really it only does one thing. It's analyzing photos and looks for matches to specific search queries, but it can't do anything outside of that use case. However, that's just one little example. There are all sorts of other ones, like voice recognition, environmental sensing, course plotting, that kind of thing, and in some circles, as we get better at making machines, and systems that can do these things. Those elements seem to kind of drift away from the ongoing conversation about

08:42

artificial intelligence. A guy named Larry Tesler, who was a computer scientist who worked at lots of really important places like Xerox Park and Amazon and Apple. He once observed, quote intelligence is whatever machines haven't done yet, end quote. So his point was that the reason that AI is really hard to talk about is that the goal post for what actually is artificial intelligence is constantly moving. Now, this pretty much mirrors how we think about things like consciousness.

09:17

Lots of people study consciousness, and the general sense I get is that it's a lot easier for people to talk about what isn't consciousness rather than what consciousness actually is. And it seems like artificial intelligence is in a similar place, which really isn't that big of a surprise as we closely associate intelligence with consciousness. Now this leads us to why there are so many different takes on how many

09:45

types of AI there are. It all depends on how you classify different disciplines in artificial intelligence, and over time, a lot of disciplines that were previously distinct from AI have sort of converged into becoming heart of the AI discussion. Machine learning, as it turns out, was part of the AI discussion, branched off from it, and then rejoined the

10:09

AI discussion years later. So I am not going to go down all the different approaches to classification because I don't know that they would be that valuable to us. They would really just illustrate that there are a lot of different ways to look at the subject. So if you ever find yourself in a conversation about AI, it might be a good idea to set a few ground rules as to what everyone means when they use the term artificial intelligence. That can help with expectations and understanding.

10:43

Or you could just run for the nearest exit, which is what people tend to do whenever I start talking about it anyway. What about machine learning, Well, from one perspective, you could say machine learning is a sub discipline of artificial and eligence, although like I said, it hasn't always

11:03

been viewed as such. I think most people would say that the ability to learn that is to take information and experience and then have some form of understanding of those things so that you can apply that to future tasks. Potentially getting better over time. I would say most people would call that part of intelligence, but you could also be a bit more wishy washy and say it's related to, you know, artificial intelligence, as opposed to being part of AI.

11:33

Since the definition of AI is let's say, fluid, either way of classifying. Machine learning works. As far as I'm concerned, machine learning boils down to the idea of creating a system that can learn as it performs a task. It can learn what works and more importantly, what does not work. You may have heard that we learn a lot more from our mistakes than we do from our successes, which

12:01

there's pretty much true in my experience. When something goes wrong, it's usually, but not always, possible to trace the event or events that led to the failure. You can identify decisions that we're probably the wrong ones or that led to a bad outcome, But if you have a success, it's hard to figure out which decisions were key to that successful outcome. Did your decision at step two set you on the right path, or was your choice at step three so good that it helped correct a mistake

12:37

that you made it step two. But a good approach to machine learning involves a system that can adjust things on its own to reduce mistakes and increase the success rate. And another way of putting it is that instead of programming a system to arrive at a specific outcome, you are training the system to learn how to do it by itself. And that sounds a bit magical when you

13:00

put it that way, doesn't it. It sounds like someone just took a computer and showed it pictures of cats and then expected the computer to know what a cat was. And this actually does mirror an actual project that really did do that, But I'm leaving out some big important information in the middle. Now. One big step is that

13:22

computers and machines can't just magically learn by default. People first had to come up with a methodology that allows machines to go through the process of completing a task, then making adjustments to the process of doing that task, which would then improve future results. We have to lay the groundwork in architecture and theory and algorithms. We have to build the logical pathways that computers can follow in order for them to learn. A lot of machine learning

13:53

revolves around patterns and pattern recognition. So what do I mean by patterns? Well, I mean some form of regularity and predictability. Machine learning models analyze patterns and attempt to draw conclusions based on those patterns. This in itself is tricky stuff. So why is that, Well, it's because sometimes we might think there's a pattern, when in reality there is not. We humans are pretty good at recognizing patterns,

14:25

which makes sense. It's a survival mechanism. If you were to look at tall grass and you see patterns that suggest the presence of a predator like a tiger, well you would know that danger is nearby, and you would have the opportunity to do something about that to help your chances of survival. If, however, you remained blissfully unaware of the danger, you'd be far more likely to fall

14:52

prey to that hungry tiger. So recognizing patterns is one of the abilities that gave humans a chance to live another day, and, from an evolutionary standpoint, a chance to make more humans. But sometimes we humans will perceive a pattern where none actually exists. A simple example of this is the fun exercise of laying on your back outside, looking up at the clouds and saying, what does that

15:20

cloud remind you of? The shapes of clouds? Which have no significance and are the product of environmental factors, can seem to suggest patterns to us. We might see a dog, or a car or a face, but we know that what we're really seeing with just the appearance of a pattern, it's it's not evidence of a pattern actually being there. It's noise, not signal, but it could be misinterpreted as signal. Well, it turns out that in machine learning applications this is

15:53

also an issue. I'll talk about it more towards the end of this episode. Computers can sometimes misinterpret data and termines something represents a pattern when it really doesn't. When that happens, a system relying on machine learning can produce false positives, and the consequences can sometimes be funny, like hey, this image recognition software thinks this coffee mug is actually a kiddie cat, or they can be really serious and

16:17

potentially harmful. Hey, this facial recognition software has misidentified a person, marking them as, say, a person of interest in a criminal case. And it's all because this facial recognition software isn't very good at differentiating people of color. That's a real problem that really happens. Now, when we come back I'll give a little overview of the evolution of machine learning, but before we do that, let's take a quick break

16:51

to talk about the history of machine learning. We first have to look back much much earlier, long before the era of computers, and talk about how thinkers like Thomas Bayes thought about the act of problem solving. Bays was born way back in two so quite a bit before we were thinking about machine learning, but he was interested in problem solving for problems involving probabilities, and specifically the relationship between different probabilities. I think it's easier to talk

17:24

about if I give you an example. So let's make a silly one, all right, So let's say we got ourselves a plucky podcaster. Hey there, everybody, It's Jonathan Strickland, and it's Tuesday as I record this, And because of who I am, you know who this podcaster is. And because it's Tuesday, there is a chance I am wearing

17:47

a they might Be Giants T shirt. And we also know that if this podcaster is wearing a they might be Giants T shirt on a Tuesday, there's a sixty percent chance that I'm going to end up wearing pajamas on Wednesday. But we also know that if I did not where they Might be Giants shirt on Tuesday, and remember there's a sixty chance I didn't, then we know there's an eighty percent chance I'm going to be wearing

18:17

pajamas on Wednesday. Well, Bays worked out a way that described this sort of probability relationship between different discrete events, and using his reasoning, you can work forward or backward based on probabilities. Bays would describe wearing a they Might be Giant shirt on Tuesday as one event and wearing pajamas on Wednesday as a separate event, and then describe the two not only determining how likely it is I'll wear pajamas on Wednesday, but if we start with the

18:50

later event. In other words, if we start with the fact that it's Wednesday and I'm wearing pajamas, we could work out how likely it was that yesterday, on Tuesday, I was wearing they Might Be Giants shirt. That was his his contribution, that you can work this in either

19:08

direction if you know these different variables. Now, Bay has never published his thoughts, but rather send an essay explaining it to a friend of his, who then made sure that The work was published after Bays had passed away, and a few decades later Pierre Simon Laplace would take this work that Bays had done and flesh it out

19:27

into an actual formal theorem. It's an important example of conditional probability, and a lot of what machine learning is really boiled down to is dealing with different probabilities, not certainties, which, when you get down to it, is what most of us are doing most of the time. Right We make decisions based on at least perceived probabilities. Sometimes these decisions might feel like they're a coin flip situation that any choice is equally likely to precipitate a good outcome or

19:58

a bad outcome. Other times we might make a choice because we feel the probabilities are stacked favorably one way over another. Sometimes we will make a choice to back the least probable outcome because well, humans are not always superrational and hex sometimes the long shot does pay off,

20:17

so that keeps Vegas in business. Bayes' theorem is just one example of ways that mathematicians and philosophers figured out ways to mathematically express problem solving and decision making, and a lot of this was figuring out if there were a way to boil down things that most of us

20:36

approached through intuition and experience. So it's kind of neat, and also the more you look into it, the more likely you might find it's a little spooky, because it's weird to consider that our approaches to making choices and solving problems can be reduced down to mathematical expressions. But let's leave the potential existential crises alone for now, shall we. So moving on, we have another smarty pants we need to talk about Andre Markov, a Russian mathematician. In the

21:08

early twentieth century. He began studying the nature of certain random processes that follow a particular type of rule, which we now call the Markov property. That rule says that for this particular process, the next stage of the process only depends upon the current stage, but not any stages that came before then. So let's take my ridiculous T shirt example, and let's build it out a little bit further. Let's say that I've got three T shirts to my name.

21:40

One of them is that they might be Giant's shirt, one is a plain blue T shirt, and the third is a shirt that has the tech stuff logo on it, and it's based off of long observation that you've determined these following facts. If I am wearing that they Might Be shirt today, I definitely will not wear it tomorrow. But there's a fifty shot I'll wear either the blue shirt or the tech Stuff shirt. Now, if I'm wearing the blue shirt today, there's a ten percent chance I'm

22:15

going to wear the same blue shirt tomorrow. Don't worry, I'll wash it first. There's a sixty chance that I'll wear the tech Stuff shirt, and there's a thirty percent chance I'll wear the they Might Be Giant shirt. But if I'm wearing the tech Stuff shirt today, there's a seventy chance I'll wear it again tomorrow because I like to promote myself. But there's a thirty percent chance I'll wear the they Might Be Giant shirt, and there is no chance that I'm going to wear the blue one

22:45

in this case. So those are our various scenarios. Right which shirt I will wear tomorrow depends only upon which shirt I am wearing today. What I wore yesterday has no bearing on the outcome for tomorrow, So today is all that matters. And depending on which shirt I wear,

23:06

you can make some probability predictions for tomorrow. So we can actually use this approach to figure out the probability that I might wear the tech Stuff shirts, say ten days in a row, since there's a better than even chance that if I'm wearing tech stuff today, I'll end up wearing it again tomorrow. And if I wear it tomorrow, then there's a better than fifty chance that I'm going

23:28

to wear it the following day. But at some point you're going to see that the odds are starting to be against you, for you know, increasingly long strings of wearing the tech Stuff shirt. Anyway, Markov chains would become one of the types of processes that machine learning models would incorporate, with some models looking at the current state of a given process and then make predictions on what the next state will be with no need to look

23:57

back at the previous decision. The Markov chain is memory less. Now that's just a couple of the mathematicians whose work underlies elements of machine learning. There's also structure we need to talk about. In ninety nine, a man named Donald Hebb wrote a book titled The Organization of Behavior, and in that book, Hebb gave a hypothesis on how neurons, that is, how how brain cells interact with one another.

24:28

His ideas included the notion that if two neurons interact with one another regularly, that is, if one fires, that the second one is also likely to fire. They end up forming a tighter communicative relationship with each other. Not long after his expression of this hypothesis, computer scientists began to think of a potential way to do this artificially,

24:53

with machines creating the equivalent of artificial neurons. The relative strength and relationship between artificial neurons is something we described by wait that's going to be an important part of machine learning. WIT. By the way, is W E I G H T, as in this relationship is weighted more heavily than that relationship. In the early nineteen fifties, and IBM researcher named Arthur Samuel created a program designed to

25:25

win at checkers. The program would do a quick analysis of where pieces were on a checkerboard and whose move it was, and then calculate the chances of each side winning the game based on those positions, and it did this with a mini max approach. Alright, so checkers is a two player turn based game. Player one makes a move, then player two can make a move. There are a finite number of moves that can be made, a finite number of possibilities, though admittedly it's a pretty good number

25:57

of possibilities. But let's say a game has been going on for a few moves, and you've got your two sides. You've got the red checkers over on player one side and the black checkers for a player to Let's say it's player one's move. For the purposes of this example, will say that player one really just has one piece that they can actually move on this turn, and it can move into one of two open spaces. So player one has to make a choice. After that choice, it's

26:26

going to be player two's turn. So we can create a decision tree illustrating the possible choices and the possible outcomes of those choices. These choices are the children of the starting position for player one, so player one's starting position has two children. Player too will have their own choices to make after that decision has been made, but those choices are going to depend upon whatever move player

26:53

one ultimately takes. So we can extend out our decision tree showing the branching possible move that Player Too might make, and these are the children of the two possible outcomes of our first choice. After player two's turn, it's player ones turn again, which means we need to branch those decisions out even further. And this is all before player one has even made that first choice. We're just evaluating possibilities.

27:22

At some point, either when we have plotted far enough out that we know all possible outcomes of the game, or we're just reaching a point where it would be unmanageable for us to go any further. We need to actually analyze what our options are. The endpoints represent either a win, a loss, or a draw for player one, or, if we haven't extended out the tree all the way to the end of the game, at least a change in advantage, whether it would be in player one's advantage

27:54

to make that move or disadvantage. We could actually assign numerical values to each in the point, with positive values representing an advantage for player one and a negative value representing an advantage for Player Too. And once we do that, we can see which pathways tend to lead to better outcomes for Player one. We work backward through the decision tree. So on all the decisions that end in an advantage for player one, we can say this is the choice

28:24

that player one would take. But then we know that for player to player two is always going to choose whichever choice has the grace advantage for that player. So we have to actually take that into account as we're working backward, and this is how we can finally get to the point where we decide which move we're going to make, because these decisions, as you go backward up the tree, they ultimately inform you which of those two choices is going to give you the best result. Those values, well,

28:58

those are weights. So for player one, the goal is to pick the path that has the highest positive value. For player too, it's to pick the path that has the lowest possible value or the highest negative value, if you prefer. So. In other words, player one might be thinking something like, if I moved to Spot A, my chance of winning this game is but if I moved to Spot B, it's only so. Of course, those percentages will also depend on what player two is going to

29:26

do in response. Some moves that Player Too might do could end up guaranteeing a win for player one. This is the mini max approach, and there's an algorithm that guides it. It depends upon the current position within a game and how many moves or how much depth it has to take into account, and for which player is it actually helping out. What happens is if player one does this evaluation and finds that both options are negative, well,

29:55

then this is something that happens in games. Right. Sometimes you find out there is no good move, like any move you make is going to be a losing move. Well, the only option at that point is to choose the least bad one, so it would be whatever the smallest negative value choice was. Our next big development that I need to mention is Frank Rosenblatt's artificial neural network called Perceptron.

30:19

Its purpose was to recognize shapes and patterns, and it was originally going to be its own machine like actual hardware, but the first incarnation of Perceptron would actually be in the form of software rather than hardware. There was a purpose built Perceptron later, but the original one was software. Despite some early excitement, the Perceptron proved to be somewhat limited in its capabilities and interest in artificial neural networks

30:46

died down for a while as a result. In a way, you could kind of compare this to some other technologies that got a big hype cycle and then later deflated. Virtual reality is the one I always go with. Back in the nineteen nineties, the world was really hyped for virtual reality. People had incredibly unrealistic expectations for what VR

31:08

actually meant and what it could do. And when it turned out the VR wasn't nearly as sophisticated as people were imagining, a lot of enthusiasm dropped out for the entire field, and with that dropped funding and support, and as a result, development and VR hit a real wall, with only a fraction of the people who had been working in the field sticking around, and they had to scramble just to find funding to keep their projects going. So VR was effectively put on the shelf and wouldn't

31:38

make much progress for nearly twenty years. Well. Artificial neural networks had a very similar issue, but other computer scientists eventually found ways to design artificial neural networks. They could do some pretty amazing things if they had access to enough data. When we come back, i'll talk a little bit more about that and what it all means. But first let's take another quick break. So we left off with the AI field going into hibernation for a little bit.

32:15

Theory and mathematics were bumping up against the limitations of technology, which wasn't quite at the level to put all that theory to the test. Plus there needed to be some tweaks to the approaches, but those came with time and more mathematicians found new ways to create artificial neural networks capable of stuff like pattern recognition and learning. So let's imagine another decision tree. We've got our starting position. This

32:44

is probably where we put some input. We would feed data into a system, and let's say from that starting position, we have a process that's going to transform that input into one of two possible ways. So we've got two potential outputs for that first step. Like our mini max example, we can go down several layers of possible choices, and

33:08

we can wait the relationships between these different choices. So if the incoming value is higher than a certain amount, maybe the node sends it down one pathway, but if the value is lower than that arbitrary amount, the node will send it down a different pathway. This is drastically oversimplifying, but I hope you kind of get the idea. It's like a big sorting system and the goal is that at the very end, whatever comes out as output is correct or true. Ideally, you've got a system that is

33:43

self improving. It trains itself to be better. But how the heck does that happen? Well, let's consider cats for a bit, not the musical and could heavens definitely not the movie music a coal that is a subject that deserves its own episode. Maybe one day I'll figure out a way to tackle that film with some sort of tech capacity, but honestly, I'm just not ready to do that yet, from like an emotional standpoint as well as

34:14

a research one. No, let's say you're teaching a computer system to recognize cats pictures of cats, and the system has an artificial neural network that accepts input pictures of cats and then filters that input through the network to make the determination does this picture include a cat in it? And you start feeding it lots of images. The neural network acts on the data according to the weighted relationship

34:42

between the artificial neurons, and it produces an output. Now here's the thing we already know what we want the output to be because we can recognize of a picture has a cat inet or not. Maybe we've got one thousand pictures. This is the training data we're going to use for this machine learning process. We also know that eight hundred of those pictures have a cat in them and two hundred don't, so we know what we want

35:10

the results to be. We've got an artificial neural network in which some neurons or nodes will accept input and perform a function based on that input, and then the weighted connections that neuron has to other neurons will determine where it passes the information down until we get to an output. And this happens until we get that conclusion. So what happens if the computer's answer is wrong. One if we feed those one thousand photos to it and says only three hundred of them have cats in them.

35:40

While we have to go back and adjust those weighted connections because clearly something didn't go right, the connections within the network need to be readjusted. We would likely start closest to our output and see which neurons seem to contribute to the mistake, which which neurons were responsible, in other words, for it to say, oh, only three these

36:04

pictures had cats in them. And then we would adjust the weights, the incoming weights of connections to those neurons in order to try and favor pathways that lead to correct answers. Then we feed it the one thousand pictures again and we look at those results. Then we do this again and again and again, every time tweaking the

36:26

network a little bit so that it gets a bit better. Eventually, when we have trained the system, we can start to feed brand new data to the network, not the stuff we've trained it on, but pictures that we and the system have never seen before. And if our network is a good one, if we have trained it well, it will sort through these new photos and it will count up the ones that have the cat pictures lickety split.

36:56

This approach is called supervised learning because it involves kind of grading the network on its homework and then working with it to get better. Heck, with the right algorithm, a neural network can learn to recognize and differentiate patterns even if we never explicitly told the system what it was looking for. Google discovered this several years ago when it fed several thousand YouTube videos to an enormous artificial

37:25

neural network. The system analyzed the videos that were fed to it and gradually recognized patterns that represented different types of stuff, like people or like cats, because there are a lot of cat videos on YouTube, and the network got to the point where it could identify an image of a cat fairly reliably better than seventy of the time, even though it was never told how to do that, or it was never even told what a cat was. So, as Google representatives put it, they said, it had to

38:01

invent the concept of a cat. It had to recognize that cats are not the same as people, which I think is a big slap in the face to some cats. Really, what it said was that I recognized this particular pattern of features, and I recognized that these other instances of creatures that have a similar pattern seemed to match that, and so I draw the conclusion that this instance of a thing belongs with all these other instances of things

38:35

that are similar in characteristics. So this was more of an example of unsupervised learning, and that the system, when fed enough data, began to categorize stuff all on its own through its own parameters. Now, one neat way that computer scientists will train up systems for certain types of applications. Is through a generative adversarial network, which I admit sounds kind of sinister, doesn't it, And I mean it can be, but it doesn't have to be. Essentially, you're using two

39:07

different artificial neural networks. One of the networks has a specific job, it's to fool the other network. So the other network's job is to detect attempts to fool it versus legitimate data. So let's use an example. Let's say you're trying to create a system that can make realistic but entirely computer generated, that is, fabricated photographs of people. So, in other words, these are computer generated images that don't

39:36

actually represent a real person at all. We've got one artificial neural network, the generator, and its job is to create images of people that can pass as real photographs. Then we've got our other network, which is the discriminator. This is trying to sort out real photos of actual people from pictures that have been generated by the generative system. And we put these two networks against each other. The idea here is that both systems get better as they

40:10

test one another out. If the generator network is falling behind because the discriminator can suss out the fakes too easily. Well, then it's time to tweak some weights in that neural network that are leading to dissatisfactory computer generated images and try it again. But then, if the discriminator is starting to miss fakes while, it's time to tweak the discriminator network so it's better at spotting the false pictures. Not

40:41

Along the way, some pretty extraordinary stuff can happen. There are photos of computer generated faces, not altered pictures, not ones created by a human artist, but entirely composed by a computer, and they can look absolutely realistic, complete with consistent lighting and shadows. This is only after lots of training sessions the networks learn what the giveaways are, like, what is it that leads the discriminator to say, no, this is a fake photo, and how can you fix that?

41:16

It reminds me a bit of how photo experts used to point out really bad photoshop jobs and explaining how certain elements like shadows or edges or whatever, we're a dead giveaway that someone had altered an image. Well, similar rules exist for generated images, and through training, the generator gets better at making really convincing examples that don't fall into the traps that would reveal it as a fake.

41:43

Over time, generative networks can get good enough to produce stuff that would be very difficult for a human to tell apart from the quote unquote real thing, and discriminators can get good enough to detect fakes that would otherwise pass human inspection. So an example of this is the current ongoing battle with deep fakes. These are computer generated videos that appear to be legit. If they're done well enough,

42:10

they can have famous people in them. Doesn't have to be a famous person, but it can show a video of someone doing something that they absolutely never did, but according to the video, they did, and it can be really convincing if it's done well. A good deep fake can fool people if you aren't paying too much attention. Some of the really good ones can pass pretty deep scrutiny.

42:33

So this requires researchers to come up with solutions that are pretty subtle and beyond the average person's ability to replicate, like looking at the reflections in the person's eyes and whether or not they seem realistic or a computer generated. But that really just represents another hurdle for the generative side. So in other words, this is a seesaw approach, right. It's creating face as on one side and detecting them on the other side. It's something we see an artificial

43:05

intelligence in general. A similar story played out with the old capture systems, where you know, we saw back and forth between methods to try and weed out bots by using capture images that only humans could really parse, and then we saw improved bots that could analyze these images and return correct results, which men it was necessary to

43:27

create more difficult captures. Eventually get to a point where the captures are difficult enough where the average person can't even pass them, and then you have to go to a different method. We also see this play out in the cyber security realm, where you might say, the thieves get better at lock picking, and then security experts make better locks, and the cycle just repeats endlessly. One thing that has really fueled machine learning recently is the era

43:54

of big data. Being able to harvest information on a truly massive scale provides the opportunity to feed that data into various machine learning systems to search for meaning within that data. These systems might scour the information to look for stuff like criminal activity like financial crimes or the attempt to move some money around from various criminal exploits.

44:22

Or it could be used to look for trends like market trends, or it might be used to plot possible spikes in COVID nineteen transmission where those might occur where people should really be focusing their attention. But now we got to think back on what I said earlier about looking up at the sky and seeing shapes in the clouds. There's a risk that comes along with machine learning. Actually, technically there are a lot of risks, but this one

44:50

is a biggie. It is possible for machines like humans to detect a pattern where there really isn't a pattern. Systems might interpret noise to be signal, and depending on what you're using the system to do, that could lead you to some seriously dangerous incorrect conclusions. In some cases, you could just be inconvenient, but depending on what you're

45:13

working toward, it could be catastrophic. And so computer scientists know they have to do a lot of analysis to make sure that patterns that are identified through machine learning processes are actually real before acting on that information. Likewise, bias is something that we humans have. Well, it's also something that machine learning systems have too. Now, sometimes bias is intentional. It can take the form of those weighted

45:42

relationships between artificial neurons. Other times, a systems architects, you know, the people who put it together. They might have introduced bias, not through conscious effort, but merely through the approach they took, and that approach might have been too narrow. Ow we've seen this pop up a lot again with facial recognition technologies, many of which have a sliding scale of efficacy. They might be more reliable with certain ethnicities like white people,

46:13

over others. That points a a likely problem with the way those systems were trained. This is one of the reasons why many companies have made a choice to stop supplying certain parties like police forces and military branches with facial recognition systems. The systems aren't reliable for all demographic groups and thus could cause disproportionate harm to certain populations. It would be a technological approach to systemic racism, and

46:40

this stuff is already out there in the wild. You might think a computer system can't be biased or prejudiced or racist, and sure, we're still not. At the point where these systems are thinking in the way that humans do, but the outcome is still disproportionately harmful to some groups. Now that not to say that machine learning itself is bad.

47:03

It's not bad. It's a tool, just as all technology is a tool used properly with a careful hand to make sure that biases understood and where needed mitigated, and where work can be double or triple checked before acted upon. It is a remarkably useful tool, one that will power and design and improve elements in our lives if it's under the correct stewardship. But it does require a bit more hands on work. We can't just leave it to

47:34

the machines just yet. Well, that wraps up this look at the concept of machine learning and some of the thought that underlies it. This really is a very high level treatment of machine learning. There are plenty of resources online if you want to dive in and learn more. A lot of them get very heavy into the math. So if that's not your bag, Uh, it might be a little challenging to navigate. It certainly is for me.

48:03

I love learning about the stuff, but um, a lot of it requires me to look up a term, then look up a term that explains that term, and so on, and I go down a rabbit hole. But hopefully you have a better appreciation for what machine learning is at this point. If you have suggestions for topics I should cover in future text Stuff episodes, let me know. The best way to get in touch with me is through Twitter and the handle is text stuff H s W, and I'll talk to you again really soon. Text Stuff

48:41

is an I Heart Radio production. For more podcasts from my heart Radio, visit the i heart Radio app, Apple Podcasts, or wherever you listen to your favorite shows.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript