This is an ABC podcast Hello, this is David Rutledge with you in the Philosopher's Zone. Once again, welcome to the show where we're talking about AI this week and the question of trust in AI. AI is making all sorts of decisions for us these days, and it's only going to be making more. time goes on and some of those decisions
couldn't have a more direct bearing on our lives. Decisions that have to do with medical diagnosis and treatment, whether or not you can get a bank loan, whether or not you get the job you're going for. If you've been given a prison sentence or you're in prison and you're coming up...
for parole. AI is going to be making decisions about how likely you are to re-offend. AI decision-making is increasingly being used in the military, all of which brings up this question that we always ask of any new technology, how far can we trust this thing?
Because the concerning thing about AI is that it's quite inscrutable. If you think about a self-driving car, as soon as you get behind the wheel of one of those, you're putting your life in the hands of something that's about to make a whole string of decisions that have to do with speech. and braking and cornering and keeping away from pedestrians and other cars on the road.
so it'll do all those things but what it will never do is explain those decisions to you or to anyone else this is what's called black box ai where you have an input and an output but a mysterious void in the middle where even the developers of the AI system can't see inside to find out what's going on or why the thing is making the decisions it's making. And if we can't explain it and the AI won't explain itself, what kind of trust is it appropriate for us to bring to the technology?
This understandably worries a lot of people, and that worry has been reflected in a government policy paper that came out in September 2024. Policy for the responsible use of AI in government. And the question of trust is front and centre of the government. concern there? Well, that same question is one that my guest this week has been working on and he's come to some fascinating and I think rather counterintuitive conclusions.
Sam Barron is Associate Professor of Philosophy at the University of Melbourne, and he's the author of a recent paper in the journal Philosophy and Technology titled Trust, Explainability and AI. Can you tell me about the opacity of AI systems? Because I still find something kind of head spinning about the notion that a technology could be created by humans, but then the inner workings of that technology could be beyond the understanding of those same humans.
it's becoming a familiar idea but never it doesn't get any less surprising to me how does that work i think it's important to recognize that there is a sense in which we do understand them. So we know the base architecture of AI systems. So the way that modern AI systems work is that they are algorithms. that are trained on deep neural nets. And a deep neural net is a system that more or less mimics or corresponds in some way to the way that our brain works.
The way that our brain works is that you have information or electrochemical signals that are passed through neurons, which are then processed through the entire neural system. In an AI system as well, you have information, usually mathematics, that's passed through artificial neurons that is then processed through the entire system. So we sort of understand that aspect of it. We know how to train them.
The way in which we lose understanding relates to the sheer complexity of the system that we're using. So inside something like ChatGPT, we've got 175 billion artificial neurons. So to put that into perspective, human brain has about 70 to 100 billion neurons. So the structure of ChatGPT in terms of its artificial neural net...
outpaces the complexity of the human brain and at least sheer number of neurons. So that's already a point at which we're starting to lose understanding because we're dealing with a highly complex system and when we're dealing with a highly complex system, they're hard to understand.
But more than that, we don't really have a good sense of how information is processed through that massive system. And the problem is really analogous to the way that we don't understand how information is processed in the brain. We've got a lot of... good work in neuroscience and cognitive science that's trying to answer that. But because we're dealing with a system that's massively complicated and we didn't sort of build it block by block.
we can't just look at it and go, okay, this is how it works. Similarly with the AI, we sort of put the structure in place, the artificial neural net, but then it works out how to use it, how to... modify it so that information is passed and processed, and we just don't have access to that. And merely opening the thing and looking at 175 billion connections is not going to give us any insight, right? So it's really the sheer complexity of the thing that prevents us from understanding it.
So all that being the case, and philosophical arguments aside for the moment, how reasonable do you think it is that people might be disinclined to trust? black box AI, because I feel like there's something about the idea of an opaque system making important decisions about our lives that goes against our folk intuitions about
who and what and how we should trust. Is that a fair observation, do you think? I think that's right. And I think it is reasonable, at least in a sort of knee-jerk way or a first pass way to... not trust a system that is extremely complicated that we don't understand. And so we also see this in other cases. There are many...
things that we don't trust just because they're super complicated and we might not have a good sense of how it works. I think it's a very natural response to something that we can't pull apart and work through the process of decision making piece by piece that we might be a bit skeptical. or have trouble trusting the system. Well, let's talk for a minute about the federal government's policy for the responsible use of AI in government that was released in September 2020.
What does the policy recommend when it comes to the issue of trust in AI? Because one of the things it points out is that public trust in AI is low. So what are they recommending? Yeah, they do. They recognize that public trust is low, but more than that, the paucity of public trust provides a kind of handbrake on the uptake and use of AI systems. And they think that AI systems, I guess...
We should be using them. And I think that this is probably right in many cases because they do things better than us. And so they want to take away any obstacles, I think, to the use and uptake of AI systems, both in the government. but also in industry and the commercial sector as well. What they recommend, interestingly, is that AI systems in order to increase public trust should be explainable.
They should be interpretable. We should be able to understand how they work. And that's an interesting recommendation given that we don't currently understand how they work and it's now part of government policy that we should be using systems that we can actually explain.
Well, it's interesting to note here that the policy states that AI use should be transparent and explainable to the public, which is different from saying that AI itself should be transparent and explainable to the public, right? I think that's right. which they say that its use should be transparent and explainable is that they say its use in decision making should be transparent and explainable. And if we...
basically appealing to a system that we don't understand, it's not clear to me how the decision making itself can be explainable. So if someone says to you, oh, so you have, I don't know, given me a massive Centrelink debt and the government... comes back and they say, okay, well, we need to explain that to you. But they've used an AI system to sort of work out Centrelink debts that they...
don't understand, they're just not in a position to give you an explanation beyond saying, well, AI worked it out. And that's not going to be a kind of satisfying explanation. So I suppose there's a question about the depth of the explanation that's being given. And it's not enough to just give any explanation. We need to give an explanation.
explanation that's actually going to be useful to someone who receives a decision from an AI system and that can put them in a position to challenge that decision if they need to. Well, this paper that you've written, you differentiate between trust and reliance. Broadly speaking, what is that difference? So reliance is a fairly...
weak notion insofar as when we talk about relying on something, we're usually thinking about just being willing to use it kind of without thinking about it too hard, right? So when you rely on a ladder not to collapse. then you're using it and you're hoping or at least trusting in some very, very weak sense that you're not going to fall over and the ladder's not going to fall down. When we think about...
The relationship between reliance and trust. I think many people are inclined to think of reliance as a very weak form of basic trust, but we also are inclined to think that trust... includes something that goes beyond reliance. Particularly when we think about trusting people or something like that, you can rely on someone, but even if you rely on someone, you might not trust them.
right? So there are ways in which reliance and trust can come apart, particularly in the interpersonal case. Well, you outline seven different kinds of trust in your paper, and I'd like to talk about a few of them. Some are quite intuitive, trust requiring an unquestioning attitude, that sort of thing. But one that I'm interested in because it pertains directly to AI is trust as something that involves what you call discretionary authority. Tell me about that.
This is a notion of trust that is based on giving over the authority for something that you do or something that you maybe
control of over to an AI. So you're sort of giving the AI the discretion maybe to make decisions on your behalf or maybe to perform some action on your behalf or to do something that you would usually have the authority for. So it's sort of like you can think of power of attorney or something like that, where you give over the discretion in some cases to a lawyer to make decisions for you and to enact certain things on your behalf.
You would trust the lawyer in this discretionary authority sense. And so when we're thinking about an AI as well, we can think about investing it with that discretionary authority, and that's a kind of trust. And this discretionary sense...
as you describe it, we have a predictive expectation, which is that AI will do the thing that it's been designed to do, but also a normative expectation that AI will, in some sense, perform as it should. Tell me more about that. I think that's really interesting. Yeah, so when we're thinking about these normative expectations, we're thinking about a system that maybe we believe should be operating in a certain way.
And once we have those beliefs about how it should be operating in a certain way, we then have an expectation that it ought to operate in that way. We have these normative expectations for other things. So just think about... you get in your car and you drive to work, you have an expectation that the car will operate as it should in the sense that it'll operate in the way that sort of the manufacturer of the car has said it will operate.
and has almost guaranteed in some cases that it will operate, particularly when you've got an object that's under a kind of warranty. That's a sort of guarantee that the object is going to continue to operate as it should. And so when we're thinking about investing an AI with discretionary authority...
One of the aspects of that is that we give over authority to it to make certain decisions. But in so doing, we're expecting the AI to work as it should, as in this case, the designers, maybe the manufacturers of the AI have specified that.
ought to work or have said that these are the things that it should be able to do under these circumstances. So you then have an expectation, just like with any other object or any other piece of technology, that it will in fact do the sorts of things that the people who've made it or produced it have said that it should do. There seems to be the hint of a moral expectation there when we talk about things that perform as they should.
Does this mean that we're imbuing AI itself with some sort of moral obligation? Because that seems odd. So I think that there is a hint of a moral obligation here. So when... you expect something to operate as it should, this is kind of invested with some morality. If you
told that something works in a certain way and then you expect it to work in that way and you use it and it fails, then there's a sense in which you can hold someone accountable in various ways for it failing to operate in that respect. And some of those ways can be moral. If, for instance... They haven't exerted due care to ensure that the object won't hurt people or won't...
sort of via you incur various kinds of suffering that could be important morally speaking, then I absolutely think that there's a kind of moral accountability that is involved. However, the moral accountability usually doesn't invest in the object itself. So if you're driving your car to work...
and your car breaks down. Look, you might get angry at your car. You might think that your car is kind of betraying you in some sense, but that would be a mistake to think of it that way. The person, if anyone who's betraying you in that situation is the car manufacturer.
or the person who's designed the car not the car itself and so when we're thinking about this discretionary authority with respect to an AI when it lets us down there could be a sense in which you can moralize that you could think well look the people that designed the system have occurred a kind of moral wrong by rushing out the system before it was safe or reliable, those sorts of things. But the AI itself is not something that isn't.
doing something morally wrong. And this situation is just a kind of tool that has been designed for a purpose and it's failing to perform that function, but not because it itself is doing something morally wrong. And I guess this question also takes us away from our central question here, which is the explainability of AI and whether our trust should hinge on that explainability.
But if AI doesn't have moral obligations, then another thing it doesn't have is normative commitments. You can't give AI a task and then expect that it will think to itself, I'd better make a good job of this because people are relying on me and I don't want to let them down. I think it's commonly understood that interpersonal trust relies on that sort of normative commitment on the part of the trustee. If you take that away...
Do we still have trust? Can we have trust? It's a good question. And I think that this speaks to the way in which there's quite a... textured landscape of different notions of trust. So there are some notions of trust that are far more interpersonal. So they'll say, look, you need reliance for trust. So you need to be able to rely on someone, say, but you also need... or have some requirement that the person that you're relying on responds to you in a certain kind of way.
So for instance, if you ask someone to look after your cat and you trust them to look after your cat, you're relying on them to do that. It's more than just reliance. You're also, you may be also expecting that that person will feel a kind of moral obligation to perform the action for you and that you would only really trust them if they felt that kind of moral.
obligation to actually look after the cat. Because if they're not going to feel that moral obligation, you might worry that they might not take it seriously, they might not actually do the thing. So you need to have a more robust sense of what's going on for them inside their mind in some sense. And so if we're thinking about AI and we're thinking about a kind of interpersonal notion of trust, we're thinking about trusting an AI like a person. And I think that...
There is a strong inclination for people to think this way because they anthropomorphize artificial intelligence. After all, we put the I in there, this notion of intelligence, which is this sort of mental concept. If we do think of trusting an AI in an interpersonal sense, there's a way in which there's a mistake that we're making because it's not at all clear that an AI can have a moral...
compass. It doesn't really do moral reasoning. It's not clear that it can have the kinds of mental states towards you that an interpersonal notion of trust would require. Now, you can still trust the AI. In the sense that trust is just a mental state that you can have and you're free to have your own mental states however you like.
But then there's a question about whether you should trust the AI in a situation where what your trust actually requires is for them to respond to you morally and they can't. And it looks like there's an issue there about whether we should trust AI in that situation. This is The Philosopher's Zone on ABC Radio National and ABC Listen. I'm David Rutledge and my guest this week is Sam Barron from the University of Melbourne. We're talking about AI, reliability and trust.
Well, let's get on to explainability and whether that is necessary for trust in AI. But before we get to trust, I want to talk about reliance and whether or not appropriate reliance on AI requires explainability. What's your thinking there? So it depends a little bit on what we think reliance requires. And I think that in some sense, we do have some...
good accounts of what reliance requires when we're thinking about artificial intelligence. But I will note that this is very much an open area of research about what it would take for an AI to be reliable. But to take an example, think about an AI system that's used for medical diagnosis, which increasingly AI systems are being used for this purpose. What will happen is that a doctor will say, take your symptoms, they'll put it into an artificial intelligence.
algorithm and it might make a recommendation about the kind of diagnosis that you have. Now, is that diagnosis reliable? Like, should we... believe that you in fact have the disease that the AI system has diagnosed you as having. Well, that depends on how accurate the system is. And it also depends on how good the system is at giving an accurate diagnosis across a large... selection or variation of quite different cases.
Those things we can measure. So we can measure how accurate the system is with respect to its diagnosis. We can measure how accurate the system is under changing circumstances regarding, you know, whether or not it's a person, say, in America that's receiving the diagnosis. or a person in Australia or somewhere in Europe.
We can measure the accuracy of the system in those situations. And we don't need to open the system up in order to, like, or get explainability or understand the inner workings of the system in order to measure the accuracy. Because the accuracy is really just... Does the output fit with what's actually happening in the world?
So if someone in fact has a disease and the AI system says they do, well then we can measure the accuracy that way. So there are ways of measuring accuracy that are based purely on the output of the system and don't require any sense of what's happening inside the system. So in that sense, it doesn't look like explainability is necessary for reliance. Okay, so it might tend to promote reliance, but as you say, it's not necessary. That's fine for...
what you call explainability in practice. But what about explainability in principle? Like if I rely on my car, even though I can't personally explain how it works. Isn't it important that somebody somewhere can explain how it works and that the same would go for AI? I think that it's not necessary. So I think that you can have...
you can perfectly rely on something that is not even explainable in principle, so long as you have enough information over time about how well it works. Basically, I think you need to use it enough so that you can... develop certain inductive inferences about what's going to happen in the future if you continue using the system, right? So suppose you find an alien object just in a field one day and it's really good at providing
recipes that align with your preferences. And so what you do is you just use it over time and you come to discover that it's just really, really good at giving you these recommendations. It seems perfectly fine to rely on that system even though it's a completely alien device. and no one even in principle seems to be able to explain it. So I don't think even explainability in principle is required for reliance. However, I think...
Being able to explain it or knowing that someone can explain it is definitely going to make it easier for you to rely on a system. So it's going to sort of promote reliance. And that's a fine distinction between whether something is strictly required for relying or whether it's just helpful but not actually strictly required. in every case. And I think that explainability is more on the helpful side rather than the strict requirement side.
Well, let's turn to trust. And the accounts of trust that you offer in your paper, you sort into two categories, moderate and strong accounts of trust. The strong accounts involving a kind of interpersonal dimension, and we'll get to those in a minute.
Let's take a moderate account of trust, the notion that trust involves the adoption of an unquestioning attitude. Is explainability necessary for this kind of trust in AI to be appropriate? Yeah, so let's think about this unquestioning attitude a little bit. The idea is that you trust something in this moderate sense when you rely on it. So you're happy to use it maybe in an everyday way over time for a certain task.
But it's not just reliance that gets you this kind of trust. You also need to not spend your time constantly questioning whether you should. be using the system, whether it is the sort of thing that is fit for purpose, whether you can rely on it. So it's sort of reliance plus just being chill about the reliance and not constantly questioning the system.
Whether explainability is necessary for this kind of trust, I think depends on whether explainability is needed for someone to adopt this kind of unquestioning attitude. Because I've already suggested that explainability is not strictly needed for reliance. but it might be needed for this idea that you don't constantly question the system.
But even then, I think that explainability is probably not needed. And part of that is because whether or not you adopt this unquestioning attitude is sort of a matter of decision. It's kind of up to you whether you adopt this unquestioning attitude. And sometimes you adopt the unquestioning attitude. attitude for…
pragmatic reasons because if you were to continually keep questioning a system or questioning something that you're relying on, it would just get in the way of actually using the thing. So sometimes you can just adopt the unquestioning attitude without having any greater sense of how something... operates, but just because for pragmatic reasons you decide to.
Whether or not in general you need explainability for the unquestioning attitude I think depends on whether understanding the system makes it sort of more reasonable to adopt this unquestioning attitude versus not adopting the unquestioning attitude. But again, I think that the...
evidence of the system functioning well over time seems perfectly good for adopting an attitude of this kind. If you have good evidence that something works well and you have no idea how it works, you're probably going to be happy to continue using it without. In fact, if we set aside AI systems, I think we do this all the time with other...
types of technology. Very few of us know how most of the technology that we use works in any particular detail and yet we do rely on it and we do adopt this unquestioning attitude and that seems perfectly okay. There doesn't seem to be anything particularly different about the AI case, at least with respect to the unquestioning attitude.
Well, what about stronger notions of trust where there's an expectation of goodwill on the part of the trustee, or at least a sense that the trustee is acting out of a sort of moral commitment? Because that's where we need to know why the trustee is acting as they do. And so it seems to require a kind of explainability. Is that in fact the case in interpersonal situations? And would it be the case with AI? So I think in the interpersonal case, it is...
much more plausible that explainability is necessary. If we just think of explainability now as a general notion about understanding why someone is doing the sort of thing that they're doing. Lots of these stronger, more interpersonal notions of trust are really going to rely on having that understanding of someone else's mental states or someone else's actions. So for instance, if I trust...
someone to look after my cat. And part of what it is for me to trust them to look after my cat is for me to believe that the person will... perform the task that I've set them in a manner that exhibits some goodwill towards me or some moral commitment towards me or some sense in which they take the fact that I've trusted them into account.
when they're working out what to do, then in order for me to have the trust and for it to be reasonable, in order for me to have the sense that the trust that I've adopted is a good... way to think about the person and way to respond to the person, I really need to understand why it is that they're acting. If I don't understand that they're...
acting to me out of goodwill or out of some moral commitment, then these stronger notions of trust don't seem to be appropriate or seem to be the kinds of things that we should adopt towards a person. So that's just the general interpersonal case.
when we're dealing with trusting a person, but I think it carries over to AI. If we are working out whether we should trust AI in this stronger sense, then I really do think explainability would be necessary for that kind of case if we're going to think that we should trust AI. so far as we believe that it will act out of a moral commitment to us or out of goodwill to us, then we really do need to understand how it's making its decisions in that situation. But then I suppose that...
sense that AI is acting out of goodwill towards us or a moral commitment towards us, that seems to be a sort of a second order concern, right? That's something that will just make us feel better about the AI, but it doesn't seem to have a bearing on...
whether or how much the AI can be trusted to do what it does. Yeah, I think we could even go a little bit further than that. I think it's a mistake to trust AI in these stronger senses. And the reason that it's a mistake to trust AI in these stronger senses, like in... the sense that it might have a moral commitment to us or act out of goodwill to us, is because these things don't have minds, at least not the kinds of systems that are in general use today. And there is debate about this.
In the early days of large language models, for instance, like ChatGPT, people did make some pretty strong claims about these things, being conscious and having mental states. But I think that it's... to my mind, quite implausible that these things, at least at their current stage of development, have the kinds of sophisticated mental lives that would enable them to...
form moral beliefs or have anything like a goodwill, which is a very complicated notion, actually. It's a very complicated mental state for something to have. It's not even clear that many animals have these things, let alone something like an AI.
And so I think it would be a mistake to adopt these sorts of notions of trust towards AI, which pushes us back to more moderate notions, things like... adopting an unquestioning attitude or imbuing it with discretionary authority, which is to say that the sort of trust that we have for people is not yet appropriate, I think, for AI because AI just doesn't have the kind of mental capacities that people have.
But people are involved, aren't they? I mean, if we broaden our horizon and consider that an AI system isn't some sort of discrete... self-generated thing. It's embedded in a network of people who use AI, people who develop AI. And that once we take these people into account, the stronger interpersonal notions of trust that require explainability, they do come into play.
I think that's right. And there is a push, I think, in thinking about AI and trust and explainability more generally from people to... consider not just the AI system in isolation, but the AI as it's embedded in really a social network that's quite complicated. So the AI has people that... design it and train it. It has people that develop it or produce it. It has people that implement it or sell it. And it has people that use it. And this whole system is just a big...
social network. And once we add the fact that AI is embedded in the social network into the picture, then it does bring into play notions of trust that are more interpersonal, that do include people. But when we do that, we We're really not, I think... targeting the AI with that trust. What we're doing is we're targeting the people involved with that trust. So we're trusting the designers of AI maybe to act out of goodwill or good faith. We're trusting the people that implement the AI.
in government or in industry to be acting in a moral way so that when they make decisions that affect us, those decisions are kind of guided by moral principles. Which is to say that once we bed the AI in this larger social context and larger social network, the notion of trust that we're interested in is really just the general notion of trust.
that applies to the use of any technology, which is always embedded in a social network of that type. And so, yes, those notions of trust do come into play, but it's not clear that the AI is the target of the trust. It's the people involved as it is.
in any case when we roll out a new technology that affects people's lives. One thing that troubles me is that as AI comes to play a bigger and bigger role in our lives, we're presumably going to get used to the exercise of trust without explainability. And as that happens, I worry that we might transfer our inclination to trust in this way.
from AI to other black box entities like governments, certain kinds of institutions, or perhaps that governments and institutions might come to feel emboldened to resist demands for explanations and that we will feel... less troubled by that than we do now. Is that a concern, do you think? It's an interesting thought. I'm almost inclined to think that the arrow is coming the other way. We're already okay, I think, about inscrutable government.
decision making or inscrutable decision making on the behalf of industry and commerce as well. We're living in a world in some sense of inscrutability because of the complexity of a lot of the social stuff that happens around us, the complexity of government, the complexity of any large corporation. And I think that we're... primed in that sense to be okay with things that we don't understand very well.
And I do think that gets us into trouble. And maybe what we should be doing is thinking carefully about whether we should be relying on systems and whether we should be relying on things that we don't fully understand. So in that sense, what we could think is that...
at the moment, maybe it's perfectly reasonable to rely on AI, it's perfectly reasonable to rely on systems that are inscrutable. But if that reliance... gets us into trouble in the long term if what we need to do in fact is lift our game a bit and not be so happy with relying on things just because they're useful to us and they're accurate and predictive, but we really need to deeply understand them first before we come to rely on them, then I think we could shift.
in our way of thinking about AI. So we could shift towards a situation in which we're not prepared to rely on something unless we understand it. We're not prepared to rely on something sufficiently complex unless we understand it. I think this would require a bit of a shift in the world. that we interact with the world. And maybe even doing this would end up with certain losses in efficiency, I think.
Just in our day-to-day lives, we get a lot of efficiency from relying on highly complex things that we don't understand. But maybe that's a good thing. Maybe we should be thinking about reducing our efficiency so that we have... more scrutable systems and more scrutable ways of making decisions because ultimately that might end up with a better moral state of affairs. We might end up with a better, more healthy.
society, morally speaking, if we do use systems that are explainable and transparent. And this could be well where the government is coming from with their policy, where they require... or at least they ask for the use of these systems to be explainable, they might well be thinking in this long-termish way about the moral good that's associated with having complex systems that are understood well.
being used by individuals but ultimately i think is a sort of as much starting to become a kind of political question about how we want to think about the arrangement of ourselves in relation to governments and institutions as it is a sort of... technical, philosophical question about whether we ought to be able to explain things in order to trust them.
Sam Barron, he's Associate Professor of Philosophy at the University of Melbourne, and he's the author of Trust, Explainability and AI, recently published in the journal Philosophy and Technology. More info on the website, that's the Philosopher's Zone.
And of course, you can always find us via ABC Listen, which is your magic portal to a trove of past Philosopher's Zone episodes. And if you get sick of those, you can go and find any number of other ABC Radio National podcasts. And I'm David Rutledge. Lovely to have your company this week, and I hope to see you next time. Bye for now.