Hallucinating with AI

Speaker 1

00:04

Welcome to tech Stuff, a production from iHeartRadio. Hey there, and welcome to tech Stuff. I'm your host, Jonathan Strickland. I'm an executive producer with iHeartRadio and how the tech are you. At the beginning of this year, that being twenty twenty three, I said like it felt like it was going to be the year of AI, and so far I think I'm pretty much on the money. But more specifically, twenty twenty three has been the year of

00:36

generative AI. That is artificial intelligence that creates or generates something, whether it's an image, a sound, or as we're going to talk about today, text in response to some sort of input. Now, before we go any further, this is where we need to remind ourselves that while this is a type of artificial intelligence, it's not all of AI.

01:03

Not every AI application involves generative processes. And while generative AI can seem fascinating, exciting, surprising, or creepy, I believe that largely stems from how generative AI appears to be mimicking humans, and it's not an indication of how sophisticated, advanced, or dangerous it really is. It's kind of an uncanny Valley thing because it appears to be behaving like a human, we start to project things on it that aren't necessarily

01:42

accurate or realistic. I think of it kind of like the way we can be with our pets, where we will project things on our pets that may not reflect what the pet is actually experiencing, but that's how we're perceiving it. So the reason I say all of this up at the very top of this episode is that we're also seeing a lot of people expressing concern about AI, which is understandable. You know about how it could potentially

02:09

lead to harm, and these are legitimate and rational concerns. However, with the focus on stuff like chat GPT for example, or Google Bard, I would argue the concern is far too narrowly focused on just one aspect of AI, and in my opinion, it's not even the most dangerous implementation of AI. I mean, we have cars on the road right now that use AI for driver assists and autonomous operations. If we're worried about the robots taking us down, maybe

02:47

we shouldn't make them our chauffeurs. But really that's a topic for another episode. Today, I wanted to take a look at an issue that crops up in AI chat bots like open ai or goole Bard and similar products. This is one that is concerning because it's an issue that leads these tools to create false or misleading information while presenting that info in a way that seems authoritative and trustworthy. And in the field of AI, the term hallucination is used to describe this situation. At least a

03:22

lot of folks will use the word hallucination. As it turns out, there's actually some debate in AI circles about whether or not that should be the appropriate term. Now for we mirror mortals, a hallucination is when we have an experience in which we perceive something that isn't reflected in reality. Maybe we hear a sound but there was actually no sound present. Maybe it was that tree falling in the woods and no one was around or something,

03:53

or we see something that's not really there. It can be really darn disconcerting, and sometimes it can be absolutely terrifying.

04:03

I'm reminded of how many people who experience sleep paralysis often will also have hallucinations accompany this period where they're awake but they cannot move, and it's probably because sleep paralysis occurs when you're kind of caught between being asleep and being awake, so there's still some dream like activity going on in your brain that's trying to explain things like why you're unable to move. Oh, it's because you have this witch perched on your chest and she's pinning

04:36

you to the bed. Tools like chat GPT are not dreaming, you know, they're not perceiving anything at all. They have no senses to trigger, so they cannot hallucinate in that sense. Instead, what they are doing is mistakenly assigning high confidence to something that they just plane made up. So they're treating it like it's a fact that they're highly confident is accurate,

05:06

when really they just invented it. So it is an instance where they're really confident in something that is not coming from a reliable source in the AI's actual training data. So if we wanted to put that into human terms, it'd be kind of like if you made up a story to explain something that otherwise would either be really

05:29

boring or maybe really embarrassing. So you make up a lie, in other words, to cover up something that you would rather not be known, And so you tell this lie over and over when people are asking you about this particular thing, and you repeat it often enough where gradually your brain essentially makes a pathway where this fake version of history of what actually happened becomes the real one in your head. You begin to believe your own lie, and so in future tellings of the story, you don't

05:59

even realize you're lying at all. You're telling what you believe to be the real sequence of events, even though it's all a fib. That's kind of what's happening with AI hallucinations, only it happens all at once, And for that reason, some folks prefer to use other terms to describe what AI does when it starts to invent things in response to a query from a user. So some have proposed the word confabulation as an alternative descriptor of

06:30

what's going on. So this is similar to kind of the scenario I just gave, because it's in human psychology. A confabulation is when we have a hitch in our memory, and so we fill in a gap that's in our memory. We're not doing it consciously, it just happens, and that might mean we fill in the gap that doesn't at all reflect what really happened. So this can happen at any time. I've seen it happen with people who are in like a situation that was totally on a expected

07:00

in high stress. I've seen it in training operations where you have a group of people and then someone bursts in as if they are a burglar or a thief or something, and then they get out, and then those people who were just subjected to this very scary situation are asked to give details about the thief's appearance, and people start to invent things, not purposefully, not with the intent to deceive, but because their memory is just trying to fill in

07:29

gaps because their perception didn't really take it all in. So confabulation doesn't imply intent, and I think that might be why a lot of researchers like the word, because it's not the intention of the AI to fool people or to pass off fantasy as if it were reality. Instead, the AI is making an honest go of trying to

07:53

meet the expectations of the user. So if you ask the AI about, say a historical figure, really tries to give you a good answer, but occasionally that answer might be wrong, not because the AI is drawing from a bad data source, but because there's actually a gap in its knowledge, and the AI just fills that gap as

08:15

best it can. Unfortunately, the end result is you get an answer that seems totally cromulent, like you could just imagine reading that answer in a respectable, thoroughly fact check encyclopedia, but then it turns out to be garbage. So let's talk about how this happens, which will involve an overview of how these chatbought AI tools are trained and at a very very high level, how they work. So this is going to involve some discussion about machine learning and statistics. So,

08:48

first off, how do machines actually learn? I think it's pretty easy to understand. How we program machines to do some specific task. Right, we create a set of rules that this machine follows sequentially, and the machine executes those rules as directed, and then we get the result we wanted. That is easy to understand. So I'll give an example.

09:13

Let's say we have a robotic arm and you've got two tables, and you put a wooden block on table number one, and you program the robotic arm to pick up this wooden block on table one and move it over to table two. Once you program it then it should be able to do that task over and over, assuming that no one has moved the tables. No one has moved the robotic arm, and the wooden block is

09:37

always in the same place and it's always the same size. Right, you haven't changed any of the parameters, so it's the exact same situation over and over and over again. You've created this simple program. It should be no surprise when

09:49

the robotic arm does it successfully. But what if we wanted a robotic arm that could learn how to pick up different objects from table one and then move them to t able to These objects could be different shapes, they could be different sizes, they could weigh different amounts, They might be made of different stuff. Maybe some of them are fairly delicate and the arm would break the

10:12

object if it applied too much pressure. So how would we build a robotic arm that could deal with these different scenarios, including ones where we put something completely new to the robot on the table, something that the robot has never encountered before. Well, to do that, we would probably pursue a machine learning model in order to teach this robot the whole process of picking something up, especially

10:41

something it had not encountered before. So basically, machine learning uses sets of algorithms in an effort to get better at a given task, and part of learning involves training, which really boils down to feeding a machine lots and lots and lots of information, like the more information you can feed it, the better, and then letting it process this information in an effort to get a specific result, and then going back and tweaking the model to refine it over and over and over and over again to

11:18

get better at it over time. So we'll imagine a hypothetical machine learning model that is designed to do something relatively simple like recognize if an image has a cat in it or not, because this is actually something that has been done with machine learning models in the past. It's actually a fairly popular approach is does this picture have a cat in it? Or does this video have a cat in it? That kind of thing. Let's imagine that our machine learning model is an actual physical model,

11:49

like it's a giant funnel. So on the wide end of the funnel, that's where we just dump tons of photographs with some of them have cats in them, some of them don't. Now imagine that at the narrow end of the funnel. At the bottom of the funnel, we actually have two channels. One channel leads into a bucket that says no cats here, and the other channel leads

12:13

to a bucket that says, ah, sweet kitty cats. So we dump thousands, maybe millions of photographs into the top of this funnel, and the funnel starts to sort the pictures. We can't see this because it's inside the funnel, but there are channels inside that funnel where photos are directed either to go more toward the no kittycat side or the yes kittykat side, And they go through these channels all down the funnel, and at the very end of it,

12:47

they start spitting out these images into the two buckets. Well, once it's done, once it has processed all the photos, we take the two buckets and we see how our model did. And maybe we see that the model caught most of the pictures with cats in them, but not all of them. Maybe we also see that there are some photos that fell into the kitty cat bucket that have exactly zero kitty cats in the picture. Something is

13:12

not working inside our model. So at that point we open the funnel, we take the top off or whatever we have built in a hinged latch or something, and we've opened it up. Now essentially inside our funnel, we see all those channels, and each channel is meant to look for some sort of evidence of a cat, and if it finds evidence, it pushes it closer toward the pathway of kitty cat, and if it doesn't, it pushes it closer to the pathway of no kitty cat. But

13:41

there's tons of these channels. Some of them feed images back up through the whole process. Again, it's very complicated inside this funnel, and you have to go in there and start to tweak little bits of rules in these channels to adjust for whatever problem you're encountering at the end result when you're done. So, when you're training your model, you change the weights of these different decisions that are made. Some decisions perhaps have too much emphasis on them. They

14:14

like they're too powerful and they're skewing the results. So you reduce the weight of that particular decision point and you increase the weight of a different one to try and get things right. It's a painstaking process and you have to do it over and over again, and these exercises repeat and you try to refine your model to get it better at deciding whether or not a photograph has got a cabinet or does it, and eventually, if everything is working well, it gets very very good at

14:44

sorting images. Maybe once in a while, something sneaks through. Maybe there's a cloud that kind of looks like a kitty cat and it goes into the wrong bucket, or maybe there is a kitty cat that goes into the no kitty cat bucket, but the kitty cat was kind of obscured in the picture and the model just couldn't suss it out. But it succeeds more often than not. Okay, that's a baseline. When we come back, we'll talk a bit more about machine learning and how this plays into

15:09

tools like chat GPT. Okay, I laid out one version of machine learning, and I want to stress that's just one version of machine learning. It's related to things like neural networks, which are designed to kind of mimic the way our brains process information and form pathways among neurons while we're trying to suss things out. But that's just one version of machine learning. I don't mean to say

15:46

that's how it all works. There are actually lots of sub fields within machine learning, neural networks being just one of them, but there's also subsets of neural networks. One of those was would be deep learning, which always makes we think of MST three K and deep hurting shout outs to any misties out there. Now, as you dive down to deep learning, you're really getting into an interesting field of AI and machine learning. So deep learning models

16:13

can accept unstructured data. If you're going further up to less specialized machine learning models, these have to use heavily labeled data sets and heavily structured data and use supervised learning in order to improve with time. But when you get into deep learning, you're looking at a very focused approach to machine learning where you can just feed unstructured data that has no labels to it and start to use this model to do whatever it is that you

16:45

want it to do. But we're still kind of talking about a channeling or funneling situation here. The input goes into the model, the model analyzes the input and pushes it further one way or another through the system, and it comes out the end as output, which could be an image search result for kiddy cats in your smartphone's photo role, for example, So if you've ever gone into a smartphone photo collection and you just typed in a general word in search, you know it's not that you

17:15

tagged any of your photos with this. You're just like looking for photos in your role that has a cat in them, and it returns something like that. Well, that can be the result of a machine learning process like the one I've just described, because again, the system has to figure out which of your photos have cats in them, even though you didn't tag any of those photos with cats. It doesn't have metadata. It has to analyze the photo itself.

17:40

Now it's time to talk about probabilities. Large language models lms, which are what power chat bots like Google Bard and Chat GPT. They work in probabilities. And there's one example of an AI using probabilistic algorithms to generate responses that I really loved reference, and that example is IBM's Watson platform. So while the world right now is struggling to figure out how to handle chat GPT and Google Bard and such, IBM's Watson gave us a glimpse at what we could

18:16

expect all the way back in twenty eleven. That's when IBM famously put Watson to the test and some exhibition games of the game show Jeopardy against former champions of that game show, human champions. So in many ways, this was an echo of IBM's Deep Blue going up against

18:36

chess master Gary Kasparov in various games of chess. Putting Watson up against humans and Jeopardy was a fantastic publicity stunt, and it also was really impressive because the way Jeopardy works is players get several categories of trivia that they can choose from. Each category has different levels of questions that are designated by a dollar amount, So higher the

19:01

dollar amount, the harder the trivia question is. Generally speaking, the actual clue that the players get is given in the form of an answer, and they have to provide a question that relates to that answer. So here's an example. The answer revealed in say a hypothetical Jeopardy game that has the category podcasts, could be something like he was Jonathan Strickland's original co host on the show tech Stuff. The correct response would be bipp a bip Who is

19:32

Chris Palette? That would be the correct answer, But Jeopardy goes beyond just trivia. Often the answers provided will include word play or images or sound cues, and players will have to think outside the box. They can't just know the answer. Sometimes there's interpretation that has to happen first. The clue to the correct response could be a pun, it could involve a rhyme to the answer. It's not

20:02

always a straightforward trivia question. In other words, so Watson needed to be able to analyze the clue given, to break it apart into components to understand what exactly is being asked of it. Then it needed to search its database for relevant information. So Watson famously was not connected to the Internet during these Jeopardy games. Instead, it was relying upon a database representing millions of books filled with facts.

20:30

Then it would generate hypothetical responses like a hypothetical answer that Watson should give, or rather questions we're talking about jeopardy, and it would submit these hypotheses to a second round of analysis to look at is there any evidence that supports this response as being correct? Kind of measuring like, well, here's a possible answer, how likely is this answer to be right? And that was all part of the process.

21:01

So it might even produce more than one answer. You might have multiple potential answers, and Watson would assign each answer a probability kind of a confidence level of how it felt that answer measured up against all the other ones. So, as an example, answer A might receive a ninety percent confidence level, So that's pretty darn confident that's the right answer. Maybe you have answer B and you're like, I'm seventy

21:25

eight percent sure that this could be right. An answer C is the long shot with thirty three percent confidence. These don't add up to one hundred because they're not It's not like a zero sum game. It's more like, oh, it could be this or it could be that, but I feel like this is more likely than that, so I'm going to go with this. And Watson also had a threshold. If the answer it generated failed to meet a certain confidence threshold, Watson would not buzz in to

21:53

try an answer. Otherwise, Watson played pretty aggressively and even in some sticky situations with daily dumb where if you get a daily double in Jeopardy, you don't buzz in anymore. If you are the one who chose the daily double, you're playing by yourself and you just have to give an answer. So in those situations, Watson got aggressive, and it would it would guess with very low confidence thresholds for some of these, like at the thirty percent range,

22:21

and occasionally it was right. In fact, more often than not it was right until it got to final Jeopardy, where at least the first time, things did not go totally in Watson's favor. Also, Watson had an interesting betting strategy when it came to daily doubles. But I'm getting way off track. So that confidence level is really what I want to hone in on here. So it was expressed in percentages, So zero percent confidence would be like I do not know the answer, I do not know

22:51

what goes here. A one hundred percent confidence level would be I am absolutely certain this is the right answer. And in a way, AI chat bots like chat GP and Google Bard are doing the same thing, only their confidence isn't about this is the answer to your question. I'm one hundred percent certain that this answers your question. It's more like it's more granular than that, because it's more at the sentence level. It's like, I think this word is the word that needs to go next to

23:22

create the sentence that I'm building. So let's talk about how these models do create sentences, and I'm not going to wade into stuff like natural language processing. That is a major part of this, but I have done full episodes about natural language processing before. That essentially says, it's a way for machines to analyze information that's written in you know, your normal language, whether that's English or whatever. But you're not trying to create a sentence that the

23:54

machine is able to parse. Right, You're not trying to work with the machine on its terms. You're just communicating with it the way you would with anyone else. It's the machines job to figure out what the heck you're saying. So we're not gonna dwell on that. Instead, we're going to talk about how a chatbot chooses how to respond to something that is said or asked of it. These chatbots are built on top of language models that have had enormous data sets fed to them during training. The

24:27

data sets include stuff like basic facts. So if you ask a chatbot who was the sixteenth president of the United States, a well trained chatbot at least is going to say it was Abraham Lincoln. But that data also trains the chatbot on how we communicate with one another. So through analyzing hundreds of millions of documents, ranging from books to online social platforms like Reddit, these chatbot models learn rules of communication. They learn rules about spelling syntax.

25:01

They learn about structure that goes from the sentence level to paragraphs like They learn how to build a sentence properly, how to build another sentence that builds on the first one, how to build a whole paragraph that gets a thought across, and then how to do a series of paragraphs to convey meaning of some sort right, how to build to like a thesis almost They learn which words typically follow behind other words, which ones are statistically likely to be

25:34

the best word to use in any given moment. So when a chatbot is dynamically generating a response, it is referencing this huge amount of learning, and that learning will guide the content and influence which facts are included or excluded, but will also just simply guide the chatbot to build sentences properly. So if we were to zoom weigh in on what is going on as a chatbot builds a new response, we would see the chatbot is selecting words

26:07

based on statistical probability. Essentially, the chatbot would be considering which word is statistically most likely to be the correct one for that part of its response. Whichever word ranks highest is likely to go in there. Now, guiding this guessing game is the context of the conversation. So if I'm asking a chatbot a question about Abraham Lincoln, the chatbot is not likely to pull superfluous information about like

26:39

key lime pie or something. So when I talk about which word is statistically most likely to come next, we have to take an account that context is determining this too. Each situation will be unique, and if you and I both are having similar conversations with a chatbot, but we're framing our questions slightly differently, or coming at this topic from different perspectives, the responses we get from the chatbot could reflect that. Now here's where we get into the

27:10

tricksie territory. Sometimes the chatbot will be attempting to build a response and there will be a gap in its data set, So, for some reason or another, the relevant data to answer our question just isn't there, Or perhaps the language model can't reconcile that the data is relevant for this particular conversation, or maybe there are conflicting elements in its data set, and so in the absence of reliable information, the chatbot simply invents a response by following

27:46

those statistical rules when constructing a sentence. So what we get is a sentence that is grammatically correct, that is posted in a way that appears to be trustworthy, but it does not necessarily reflect reality. We get an answer that reads as if it is correct, but it's not.

28:07

It would be as if someone with an agenda had written an article for an encyclopedia and none of the editing staff caught that this was the case, and so the whole thing went to print, and it's presented as if this is an objective truth, when really it's a subjective point of view. Except with AI, there's no agenda needed because AI is not thinking anything. It's not motivated

28:33

because it lacks the capability of being motivated. There's no sentience there, there's the mimicry of sentience, there's the appearance of it. And again, I think this is a large reason why we have a lot of people concerned about AI right now, because it appears to be behaving like

28:53

a person, even though there's nothing behind that. Right There's no sentience or conciousness behind this it just has the surface level appearance of it, and that's enough to make us start to create all sorts of scenarios where the AI goes bad or sinister. That's not even necessary. It's just trying to answer our questions and occasionally having to make stuff up while it does so. The chatbot, the machine is just presenting what is estimated to be the

29:23

most statistically likely response. And by that I don't mean that the answer is statistically likely to be correct, but rather down to the sentence and paragraph structure that they are statistically probable to be the most correct from a like a grammatical and structural point of view, not from a content perspective. So it's really about how statistically likely is word two to follow word one, and that word

29:54

three would follow word two, and so on. Where the finished sentence is what's important, and whether it's factual or not is immaterial. Okay, we're gonna take another quick break. I've got a lot more to say about this. We have to cover a lot more ground we're back. So a lot of the time, perhaps even most of the time, you won't run into trouble when you're using these chatbots

30:26

because the dataset feeding these large language models. Is truly huge. Plus, there are people working on these models all the time. They're refining them, they're catching mistakes, they're trying to correct those mistakes, they're tweaking the model to prevent it from happening again. But now and again, you might ask a chatbot a question and you'll encounter a situation where there's this gap in the chatbot's data and it makes stuff up,

30:48

It hallucinates. Personally, I find it both odd and oddly human that the companies behind these chatbots haven't built in a fail safe where if a chatbot comes up up against this kind of situation, it just says something akin to I don't know the answer to that, and instead it kind of invents an answer. So it's kind of like being in a conversation with someone who is incapable of admitting that they don't know something. I used to be that guy. In fact, sometimes I still am that guy.

31:19

I have to catch myself to remind myself that it's actually okay to not know something, and that curiosity is a way better look than trying to bluff your way through life. But then I also admit I don't know how you would go about implementing a system in which an AI chatbot fesses up to not knowing something. It may not be as simple as that. And there's also a related problem, which is that without knowing what source or sources the AI is referencing for any given query,

31:49

you don't really know how reliable that response is. If the AI is pulling on information from unreliable sources, whether those sources were poorly informed, or they were biased, or it was satire and it was just being presented as fact. I've talked about this before on this show. There are a lot of websites that were really popular just a few years ago that called themselves satire, but really they just posted lies like it wasn't satire. There was nothing

32:18

humorous about it. They weren't saying anything other than just making up stuff. So if the AI is pulling information from those kinds of sources, you cannot expect the AI's answer to magically scrub all the bad from those sources and then provide good information. So, in other words, garbage in, garbage out. So in some cases it may not be that the AI is hallucinating at all. It may just be that it's referencing a poor source for its information.

32:46

The trouble is you can rarely tell what's going on from a user standpoint, and the AI presents everything the same way, So you'll get responses with good info, you'll get responses with bad info, and you'll get responses where the AI just made up stuff and it's all handed to you in a format that makes it impossible to tell the difference between them all on a surface level.

33:09

So this can lead to really dangerous situations. For example, Google employees reported while they were internally testing the Barred chatbot before Google rolled it out for a beta program that the responses were unreliable in many cases, and in fact, in some instances, those responses could actually lead to people

33:29

getting hurt. Allegedly, when asked about scuba diving procedures, Google bar generated a response that had incorrect information, and if someone were to act on that, they could be injured or worse. So clearly that represents a real danger. It's one thing if the chatbot gives you the wrong answer to put in your essay about Emily Dickinson. It's another if you're counting on it to teach you how to, I don't know, pack your parachute correctly for your first

33:55

skydiving solo jump. But there's also the danger of people weaponizing AI hallucinations to push a narrative that may not be accurate. And it's easy at least to understand what led people to form that kind of narrative. So I'm going to give a recent example that really happened. Fox News, which has a reputation for right leaning reporting, it's kind of putting it lightly, published a story relating to Elon Musk's appearances on a show with Fox News personality Tucker Carlson.

34:28

The accompanying news story pointed out that chat gpt produced an outright incorrect answer when asked to give a background on the late Al Gore Senior, who's al Gore's father, the former Vice President. His father served in the House of Representatives and then the US Senate for the state of Tennessee. Now, the chat gpt generated information on al

34:51

Gore Senior included the following statement quote. During his time in the Senate, Gore was a vocal supporter of civil rights legislation and was one of the few Southern politicians to vote in favor of the Civil Rights Act of nineteen sixty four. End quote that is one hundred percent not right, that is completely incorrect. Gore actually voted against the Civil Rights Act of nineteen sixty four. I guess technically it wasn't one hundred percent incorrect because he was

35:21

a senator, so that part was right. But no, he voted against the Civil Rights Act of nineteen sixty four. He was a Democrat representing a state that, to put it lightly in general, was not in favor of granting civil rights to anyone who wasn't white. So what his personal feelings on the matter were, I don't know. I mean, he certainly positioned himself as a defender of the great State of Tennessee's right to oppress people who weren't white.

35:47

But I can definitely say that he wanted to get reelected, and whether he believed in his vote or not, he did vote against the Civil Rights Act of nineteen sixty four. Of course, the Act passed anyway, and Golore was able to get re elected, and he did subsequently vote in favor of the Voting Rights Act of nineteen sixty five.

36:10

But the point is chat GPT got this response very wrong, and Fox News positioned it as if this was a feature not a bug that that was the intended outcome, and it was evidence of a campaign to rewrite history to position Democrats as like saintly saviors who could do no wrong. But there's no need to go looking for a conspiracy here. The problem isn't in some invisible hand guiding chat gpt to create biased history. It's the very

36:41

nature of how this kind of AI works. When it doesn't have the data, it makes stuff up based on what is statistically the most quote unquote correct word for the sentence. Now you might ask why did chat gpt not have access to the relevant data, And I do

36:58

not know the answer to that. I did test this myself, however, I actually opened up chat GPT and I asked it to give me background on al Gore Sr. And sure enough, I got a similar response to what Fox reported, including the incorrect fact quote unquote that al Gore Senior had voted in favor of the Civil Rights Act of nineteen sixty four. So I then asked a follow up question. I specifically said, how did al Gore Senior vote on the Civil Rights Act of nineteen sixty four? Chad GPT

37:31

gave me the wrong information again. Then I said, you're wrong that Al Gore sor voted against the Civil Rights Act of nineteen sixty four. What sources did you use? Chad gpt gave me a message that essentially said, I'm sorry, you're right, al Gore Senior didn't vote in favor of the Civil Rights Act, he did vote against it. Then it gave me a vague response that it draws from various articles and such for its answers. It didn't give

37:59

any specifics. It was not a very satisfying response, but it did at least admit, Oh, you're right, I give you the wrong answer. But again, there's no need to assume there was some conspiracy that caused this to happen. These hallucinations happen across every topic, not just history and politics. Yes, if we look at this very specific example, you start to ask, oh, is there an intent here? Is there a desire to rewrite history to make democratic leaders look

38:31

more positive in a modern lens? And is it a way to avoid tough questions like which party actually was supporting civil rights and which party was opposing them? If you're talking about Southern Democrats, the answer is they were opposing it because Southern Democrats are very, very different from of the time of the nineteen sixties Southern democrats, very

38:54

different from modern democrats. But you kind of you if you're whitewashing, if you're changing the facts to try and make them seem more sympathetic, that would be bad, right, that's clearly manipulation. That, however, I don't think is what's

39:09

going on here. I think there's no need for it, because the AI is just hallucinating and creating information that it thinks is correct, or at least thinks is the most statistically correct answer to give based upon the information that has available to it, and it's presenting it as if it's hard fact and it's not. So we know that the AI, when it's presenting information that could potentially

39:38

be harmful, that that can't be the intent. Right. There's not some cabal out there that's say A Now those scuba divers who aren't smart enough to ask people who are really knowledgeable about this, but will turn to AI, they'll get to what's coming to them. That makes no sense. So I don't think there's any intentional approach to trying

39:59

to create misinformation. The problem is by its very nature, these chatbots create misinformation in these in these instances, not in every case, but in enough cases where it is a problem. I think there is bias in these chatbots and including chat GPT. In fact, I don't think there's bias. There's just bias, but it's necessary bias. So you might recall a few years ago, Microsoft released an AI chatbot named Tay. Tay, this chatbot was supposed to respond to people,

40:35

specifically younger people. This is Microsoft's attempt to relate to the youth. It was supposed to do so in a natural way, and it was also supposed to learn as users interacted with Tay, like learn how to interact in a way that was reflective of the culture of the time. So it would pick up slang, and it would pick up phrases and perspective and points of view. And in less than twenty four hours, Microsoft had to take it down because within twenty four hours, users had already turned

41:05

Tay into a crazy, racist, misogynistic, toxic machine. Tay was a disaster, both from a technical perspective and a pr perspective. So AI companies have started to put in restrictions like guardrails to keep AI from going to extremes. So it includes tools that try to prevent AI from generating hate speech,

41:29

for example, or slandering people. Now, these tools are far from perfect, and there are plenty of examples of people figuring out ways around them, and there are plenty of examples of chad GPT even saying factually that a person was accused of and convicted of a crime when that's just not the case. Like that, there have been examples of that happening as well. But these rules do tend to push AI responses in a general direction. Right, This is bias. It's intention I don't bias, but it's also

42:01

not meant to be harmful. It's meant to try and avoid situations that themselves could be harmful, either to users or more pointedly, to the companies behind the chatbots. Because you've got to remember open ai one of the big business models for it is to work with other companies and to incorporate chat GPT into the tools and services

42:22

that these other companies have. Well, if chat GPT gets a reputation for going off on racist rants, that's not a good look and no one's going to want to incorporate chat GPT into their business, And then open ai doesn't have a product to sell So there's like a it's not just altruistic, right, It's not just we don't want to cause harm, it's we don't want to kill

42:43

ourselves out of out of getting business. So there's a lot of work being done to try and guide chat GPT's responses to avoid the extremes and to avoid things that would cause problems. As result, it could be an overcorrection and we could be seeing that chat GBT is creating responses that don't reflect reality and do appear to

43:10

be erasing important historical context. So the bias, in combination with gaps and knowledge, can lead chatbots to appear, at least on a surface level, to have a political leaning to them. But again, I don't think that's the result of a conspiracy. I don't think that was intentional. I think it's the natural destination considering one how these chatbots work and two the guardrails that are put up there

43:37

to prevent chatbots from going bonkers. Now, to be clear, I don't think we should just accept this any time any chatbot presents incorrect information as fact. That is a problem, particularly when companies like Google and Microsoft are looking to incorporate these tools into stuff like search results. It would be like going to a library. The librarian has their own agenda to only point people to resources that support the librarian's own personal philosophy, and they never point out

44:08

anything that would contradict it. That would also not be good. The lack of transparency makes it worse. Ultimately, I would caution anyone from relying too heavily on responses generated by AI based on these large language models. Now, you might not ever encounter a response that includes hallucinations or draws from unreliable sources, but based on how these chatbots present information, you also could never really be sure that that's the case unless you then went to the extra trouble to

44:43

fact check the AI. And at that point you're just doing the additional research you would have done at the beginning without the AI being there in the first place. So I think AI hallucinations are a huge problem. That's another thing that the Fox News article kind of ignored, like it felt like it was a gotcha moment in

45:03

the Fox News article. But the fact is, if you just search AI and hallucinations on whatever web search you like, you're going to find countless articles across the entire media spectrum that have been bringing this up for months and concerns that people both within and outside the industry have had about hallucinations and AI, and that this is not a new thing, and it's not again, it's not related

45:31

specifically to trying to rewrite history. It's more of a broad problem in the field itself that affects all sorts of responses and we absolutely should be concerned about it and be working toward fixing it. That the hallucinations present a genuine problem, and it's not necessarily because there's a cabal trying to rewrite how the world works and brain wall.

46:01

You don't need the cabal for that to happen. The AI is doing it itself because it's working from a very complex statistical table and very few people have the insight into that table or understanding of it to fix the issues. So yeah, that, in a nutshell, is the problem of hallucinations in AI. I don't see it going away soon unless we move away from the large language model approach of AI. And there are alternatives out there. There are companies that are pursuing a different approach to

46:36

creating a reliable chatbot and maybe they'll have better success. Yeah, flights of fancy are fun when it's fiction, but when it's someone trying to present to you a factual document, it's less fun. So hopefully we suss this out before it causes any more problems. And again, while I do think this is a type of AI that we should keep our eye and we should ask critical questions and we should use critical thinking, it's not necessarily the AI that I'm concerned about the most when it comes to

47:08

things like I don't know a potential existential threat. All right, that's it. I hope all of you are well out there. Be careful, especially with AI, you know, make sure you double check. I know it's a hassle, but it can save you a lot of grief down the road. And I'll talk to you again really soon. Tech Stuff is an iHeartRadio production. For more podcasts from iHeartRadio, visit the iHeartRadio app, Apple Podcasts, or wherever you listen to your favorite shows.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript