Building Moral AI with Jana Schaich Borg - podcast episode cover

Building Moral AI with Jana Schaich Borg

May 01, 20251 hr 22 minSeason 4Ep. 17
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

How Do You Build a Moral AI? with Jana Schaich Borg

In this episode of the Behavioral Design Podcast, hosts Aline and Samuel are joined by Jana Schaich Borg, Associate Research Professor at Duke University and co-author of the book “Moral AI and How We Get There”. Together they explore one of the thorniest and most important questions in the AI age: How do you encode human morality into machines—and should you even try?

Drawing from neuroscience, philosophy, and machine learning, Jana walks us through bottom-up and top-down approaches to moral alignment, why current models fall short, and how her team’s hybrid framework may offer a better path. Along the way, they dive into the messy nature of human values, the challenges of AI ethics in organizations, and how AI could help us become more moral—not just more efficient.

This conversation blends practical tools with philosophical inquiry and leaves us with a cautiously hopeful perspective: that we can, and should, teach machines to care.

 Topics Covered:

  • What AI alignment really means (and why it’s so hard)

  • Bottom-up vs. top-down moral AI systems

  • How organizations get ethical AI wrong—and what to do instead

  • The messy reality of human values and decision making

  • Translational ethics and the need for AI KPIs

  • Personalizing AI to match your values

  • When moral self-reflection becomes a design feature

Timestamps:

00:00  Intro: AI Alignment — Mission Impossible?
04:00  Why Moral AI Is So Hard (and Necessary)
07:00  The “Spec” Story & Reinforcement Gone Wrong
10:00  Anthropomorphizing AI — Helpful or Misleading?
12:00  Introducing Jana & the Moral AI Project
15:00  What “Moral AI” Really Means
18:00  Interdisciplinary Collaboration (and Friction)
21:00  Bottom-Up vs. Top-Down Approaches
27:00  Why Human Morality Is Messy
31:00  Building a Hybrid Moral AI System
41:00  Case Study: Kidney Donation Decisions
47:00  From Models to Moral Reflection
52:00  Embedding Ethics Inside Organizations
56:00  Moral Growth Mindset & Training the Workforce
01:03:00  Why Trust & Culture Matter Most
01:06:00  Comparing AI Labs: OpenAI vs. Anthropic vs. Meta
01:10:00  What We Still Don’t Know
01:11:00  Quickfire: To AI or Not To AI
01:16:00  Jana’s Most Controversial Take
01:19:00  Can AI Make Us Better Humans?

🎧 Like this episode? Share it with a friend or leave us a review to help others discover the show.

Let me know if you’d like an abridged version, pull quotes, or platform-specific text for Apple, Spotify, or LinkedIn.

Transcript

Intro: AI Alignment - Mission Impossible?

Hello and welcome to the Behavioral Design Podcast. This season we're diving into the intersection of behavioral science and AI. We want to make sense of the state of AI, from understanding how humans interact with intelligent systems to using AI to do behavioral design itself. I'm Aline Holsworth, a health tech advisor specializing in AI and product design. Over the past 15 years, I've been crafting human centered products with behavioral science

at the core. At Apple, I LED Behavioral Science for Health AI, designing and launching AI powered features to help users reach their health goals. And I'm Samuel Sultzer, your second Co host. I'm a behavioral strategist specializing in hybrid formation and designing products that drive long term baby change. I work with leading tech organizations integrating AI to scale behavioral design for good.

And I'm also the founder of Baby Bites, a dedicated community on behavioral science and AI. Quick word on Nuance Behavior where we help organizations build impactful digital products using behavioral design. We only take on a few clients at a time to ensure the highest level of quality for our tailored evidence based solutions. If you'd like to become one of our special projects, e-mail us at hello@nuancebehavior.com or we could call directly on our

website, nuancebehavior.com. Hi, Sam. Hey, Elaine. What comes to mind when you hear the phrase AI alignment? It's a good question. I think impossible is the first word that comes to mind like that. It's so hard to align anything on a large scale or, you know, when it comes to humans, we're famously very misaligned. We don't agree on so many

different issues. Even when we're in the same country or have the same moral beliefs and so on, we still find love disagreements about things that could be like quite dramatic. See, I think impossible is my feeling towards that is really really hard to achieve good alignment. Yeah. So for me, the first thing that I think of is aligned with what like, and I understand there's this general aligned like the humans are aligned with each other.

Like there of course, there are many diverse perspectives among humans on this Earth, right? We have different beliefs, but in general, I think it's a pretty common belief that we should preserve humanity. We should preserve life in its various forms. And if you said, OK, now just copy and paste that value onto machines, like, why is that hard? Why can't we just do that? Yeah, it sounds easy when you when you say it like that, but

it's definitely something where. The more you start thinking about it and looking into the more ways of achieving this, it becomes really, really hard. And like, that's why smarter people than me, like something like Nick Bostrom or similar are like really scratching their head that like, OK, how do we succeed, especially when it comes to, you know, artificial intelligence, that is with a lot of autonomy and with higher intelligence. And we have, how do we keep it

interested in our best interest? Yeah. So I think it's interesting from like AI labs standpoint, we have like open AI and then we have Anthropic. I think Anthropic has been credited with taking a little bit more serious stance on trying to develop AI models that are, you know, constitutionally grounded in a moral framework and so on. But quote. UN quote aligned. Quote, UN quote aligned. But I think it's still, I would say at it's infancy to really be

Why Moral AI Is So Hard (and Necessary)

confident in that a constitutional so for moral framework would work. Because as we know, constitutional stuff in any of our societies are often times being tested in front of juries and courts and are being questioned all the time because there's so much ambiguity within what we see as kind of like, OK, we have constitutional framework around like this is what we do in this country or like this is what we care about. But then something new happens.

Then they're like, well, yeah, we didn't think about it in that way. Yeah, that's true. And like, you know, there's always this moving target that I think is really hard to really achieve. And then you have to expand maybe there the constitutional framework and you have to make amendments and you have to get

longer and longer and longer. And like, how long can you make a moral framework that encompasses everything that would perfectly ensure that AI would always know what it should be doing in all the kind of cases to serve humans best interest, it seems. Again, a moving target that I think it's hard to achieve. Yeah.

Yeah, it's hard enough if you have consistent framework of moral beliefs, but when you have moral beliefs that are evolving and changing over time and varying from situation to situation and across different people, yeah, the complexity really compounds on itself. This reminds me actually of what these researchers call the spec in a story that somewhere recently came out and you're probably familiar with it in

tech specs in general. But in this story, AI 2027, this is I guess a sci-fi forecast is one thing you could call it, it's a fictional story, but with this goal of predictive accuracy. So it's sort of rooted in research. And to the extent that you could say that we have evidence surrounding the ability to predict the future from this group, the AI futures project.

And so they came up with this story where they've found a way to embed a wide range of predictions about the future of AI and this development of super intelligence and how that might go in the next, you know, it's called AI 2027. So really this is really like in a couple years. And the spec in this story is sort of like a guide and instruction manual that is given to the AI agent. This is the sort of a fictional combobulation of companies called Open Brain that have created this agent.

And the agent ends up developing other agents that sort of take on lives of their own. And the researchers have written this spec, which, you know, basically is a combination of some vague goals, like, you know, don't break the law, you

The "Spec" Story & Reinforcement Gone Wrong

know, assist the user, so on. And then much more specific do's and don'ts, these, like, lists of things to do and to not do. And as the story plays out, what I found was really interesting was how and when the AI agent adheres to the spec versus learns to ignore it. And we know as a result of reinforcement learning and the types of behaviors that are rewarded and incentivized, it's not always adhering to the spec,

right? Especially when you have the interests of for profit company at hand. You know, it might be innovating at the expense of things like morality and ethical decision making, safety and privacy, and you know, that long list of things that we as humans care about, but maybe are sometimes at odds with becoming the leading AI company. And so you can sort of really easily see how this can get out of hand.

And in the story, I'll just give the spoiler, the agent becomes misaligned and you know in. One of the future paths. No, I think it becomes misaligned in all of the paths. I guess so. It's true. Yeah, it gets misaligned in all, but then in some of the more optimistic ones, it's kind of like we managed to curb it, but. Yeah, there. There's no preventing misalignment, but there is potentially the ability to stop it from completely spiraling out of control. Yeah. Yeah, potentially.

Yeah. And I loved this, honestly, reading it like it was so interesting. And one thing I didn't know until very recently is that one of the people that were involved were actually someone who's like one of the best super forecasters in the world. And he had written this thing in in 2020 about like how the future would look like in 2025. We were like, yeah, right. Yeah, I was like before, you know, GPT and all that stuff. And he was like, I think really, really kind of on the money

then. And I think what I like about it is like, obviously it's been written after these people have made the predictions and laid out what they think is the most likely path forward, just not by making a dramatic assumptions, but not just like, OK, based on how things are progressing right now, let's just follow all of the curves, all of the trajectories and see where we go. And then Scott Alexander added the sci-fi storytelling to it

that made it really compelling. But I think it really grounded him quite conservative assumptions. Like it's not really making any like really crazy assumptions or like very dramatic things. It's like this is where things are heading. Maybe I don't know. I still, you know, I'm a skeptic right at my core. I think even then, while you can say, all right, there's evidence to support this part and you see sort of the increments, I think

Anthropomorphizing AI - Helpful or Misleading?

you still need to do a whole lot of anthropomorphizing in order to get to like the machines desires. And like that it would have the will to even become misaligned, for example. Like that strikes me as maybe like you can go a long way with that assumption and it's not

clear that is correct. Even with what we described with the reinforcing of alternative things that are not aligned with the speck, to me it's not obvious that would then result in a creature that wants to destroy humanity, right? Yeah, I guess, but I think there's enough evidence already that like the current models are prone to, you know, a good degree of sycophancy and it's a good degree of, you know, in

some ways deception and lying. I think a Tropic has been probably the ones who are more open about like noticing a lot of this where they have noticed that like, yeah, they gave task in certain ways and then the model find a way to kind of like deceive them a little bit and hide some stuff and like, you know, do some things in the background to maintain its its

own survival basically. So, so yeah, obviously there I agree that there's some anthropomorphizing at play and especially, you know, like my favorite part was basically where there was some form of scenario where they would, the AI would set up some form of terminals of pseudo humans. I'm not really sure if they were humans or your cyborgs that would basically give reinforcement learning. Like basically say like good job, you're doing well.

Giving them praise. This is all that's left of humanity is giving the thumbs up to the machine. Yeah, exactly. So definitely I would say we recommend everyone reading and or listening to this AI 2027 and we mentioned the Bostrom before and he obviously has done a lot

Introducing Jana & the Moral AI Project

of work on AI and super intelligence and he has a great quote on this topic. Basically that AI and super intelligence is basically philosophy with a deadline. And that's honestly how I feel today. Like it feels very urgent to figure some of these things out you. Thought we had time to figure this. Out It's great that we had recently Peter Slattery talking about the AI Risk repository, and today we have a fantastic guest to help us explore this AI

and moral AI terrain. Yeah, and Jana Scheichborg is absolutely the perfect person to talk about this topic, both about the intersection of philosophy. She's, you know, done a lot of work with philosophers, even though she herself is not a philosopher.

She's sort of worked in this moral AI space and even has a book called Moral AI, which she's put together with philosopher Walter Sinnott Armstrong and computer scientist Vincent Connitzer. So yes, Jana is the perfect person to talk about this intersection. I actually worked together with Jana at Duke. We were both part of the Social Science Research Institute, and she's really this incredible mix of expertise.

She's got neuroscience going on, computational modeling, data science and AI. She's just worked on so many cool different things and so it was such a pleasure to have her on the show. Yeah, so as expected, we dove head first into exploring moral AI and what it means and especially how we can encode human morality into AI systems. What is the best approach? Is it more of a top down, bottom up hybrid? We'll explore all of that, and much more happens. To Murgatroy.

Jenna, welcome. Thank you, I'm so excited to be here. So you wrote Moral AI with an extremely interdisciplinary team, and this is not unusual for you. Your work is very interdisciplinary. So of course this is like very cool on its own, but it seems like in the context of moral AI maybe particularly important to bring all these different perspectives together. Yeah, it's been a lot of fun, but perhaps even more than a lot of fun.

I mean, I really think, for me personally anyway, it's the best way to do it. We really draw on each other's backgrounds and push each other and directions and ideas that we wouldn't go on our own. And I really feel like it's advanced what we think about and how well we're able to achieve our goals. So yeah, it's been great. And yes, weird, but I advocate for it. Suggest everyone do it. Not weird, I think. Can you think of any examples that really made the benefits of

What "Moral AI" Really Means

this collaboration really shine? Like where the bringing together of your different perspectives really made you see something that you didn't realize, or made one of your co-authors see something that that you brought to the table. Oh, man, I'm being very honest when I say it's almost every moment. We really do push each other a lot, and we think about a lot of things that are not in the book, and we debate about the things that are allowed in the book.

So, for example, one of the things we've wrestled with is, is there a point where AIS would justify or should have moral rights? And you might think, what does a neuroscientist have to say about that? And what does a computer scientist have to say about that? But first of all, Vince is the AI and game theorist, and he spent a lot of time thinking about consciousness. A neuroscientist, many neuroscientists who spent a lot of time thinking about

consciousness. And a lot of philosophers think you have to have consciousness in order to be deserving of moral rights, for example. And so this one comes to mind because this was a case where I think Walter, our philosopher, was kind of like, oh, I got this in a bag. Like, this is obvious. And Vince and I really pushed back and actually we had a chapter that was supposed to go in the book about this. We ended up taking it out

because we couldn't actually. It was the one thing we haven't been able to align enough on that we could even write about it and describe our disagreements in a way that we could all even agree on. But yeah, so Walter had some philosophical assumptions that to me seem to violate kind of the way I think about agency in a neuroscience context. And Vince had kind of his own way of thinking about the data adapted to me. It's an example, probably

because Walter often wins. That's not true in everything. But when it comes to ethics, of course, we often defer to him since he's the specialist. So it's one that sticks out to my mind because it's the one where he, I think he came out of it being like, maybe I don't have it in the bag, or at least I want to tell philosophers I have it in the bag. But it's still up for debate in any case.

Yeah, it's like this old metaphor on this blind man walking on the road and can like stomach across something and not by themself able to make sense of it. One thing is a tree, one thing is a rope, and then a third screams that I think is a snake. And I realized like, actually it's an elephant. And I feel like that experience that in some ways what you're talking about now.

And I had a similar experience recently doing some form of work around synthetic users where I felt like I needed to align with someone who had a data science background and with someone more of a traditional use of research background in order to have a good opinion and shared knowledge around actually what to think about this. Because use this one perspective

doesn't really capture it all. And it's like there's so many overlapping things anyway, both in general with AI and uses of AI, but especially also like moral AI. It's certainly true as well. Yeah. And I don't know if you've had this experience, but we've been working together since 2016, so quite a while now, which I'm

very grateful for. But it's really made me appreciate what some people call soft skills, which I don't like the name of because it makes it sound like they're lame when in

Interdisciplinary Collaboration (and Friction)

fact, I think they're the crux of it. But it's really made me appreciate how important those are for getting this work done. They, I'm so lucky to say that they're just wonderful people too. But it takes each party has to be completely open and receptive to what the others are saying and really listen and be willing to be wrong and be willing to reevaluate and make new opinions. And also, if you don't understand what the other is saying, be able to communicate

what you don't understand. And then each of you have to get used to describing it in a way that you never would describe to people in your own field. So I've really come to appreciate those skills. That's part of what I brought to kind of my data science education too, is that that I almost think that this is the most important thing of doing any of this kind of work. But I don't know if you've had that same experience, but for me, it's really made me

appreciate those skills. Yeah. And I would say, I don't know if my bet is that a behavioral scientist is more used to that because like, I feel like you're so humbled by the fact that there's so many of the kind of disciplines, I guess the classic Sapolsky idea around like different buckets, whether you explain a behavior through a neuroscience bucket or a traditional behavioral psychology or God forbid, evolutionary psychology, like there's so many ways to explain why behavior happened.

And I think, I don't know, I feel like as a behavioral scientist, I feel quite used to being humble all the time by like, you know, talking to someone. Like, I saw it from one perspective, but that illuminated from a different perspective. And that kind of completes the picture in some ways. I don't know, Eileen. What do you think? Yeah, just totally agree. I mean, I would say I'm not super good at admitting what I'm wrong, maybe with that caveat.

But I do like to think of things from different perspectives. And I love this idea that you can take all these different approaches and not just like smoosh them together, but have a much more informed debate than if you were to just try and like work through it with your own really like pretty narrow

perspective. Yeah, I think a lot of kind of interdisciplinary work ends up being people still staying in their own camp and being like, I'll do my camp plus your camp, and then, you know, we'll put it together at the end. It's very different than actually truly integrating the perspectives. Also, I just want to say, I think it says something about both of you that you think that behavioral science is by its default humble.

Because my experience in behavioral science is, you know, one of the best ways to get a tenure track position is to put your stake in the ground on a view for at least 10 years and hold on to it for dear life. We have the benefit of having escaped academia, so we have different pressures, but we don't have that pressure. Fair enough, fair enough. Awesome. So you have this quote in your book. It's a Stephen Hawking quote. I'm going to read it because I love it.

And so he said our future is a race between the growing power of technology and the wisdom with which we use it. And then you add to that, or the three of you add to that wisdom

Bottom-Up vs. Top-Down Approaches

requires being humble and clear eyed about the magnitude of harm AI can create when integrated into real, messy human life. And that messiness, that's the behavioral science, I think. So that statements ring very true to me, and I think it's clearly very important to think about how technology is built and how it's used in all the different ways, right? Safety, fairness, privacy, and so on. But since you literally wrote the book on moral AI, I'd love

to start with that. What do you mean by moral AI? Can you set the stage for us? Yeah, absolutely. And we think of moral AI as kind of a project and our mission, if you will. And I think that's two main components. And I should start back by saying that we are all AI enthusiasts. In fact, the reason we've been working on this for so long is because we've been using AI for so long and seeing its benefits. And so the first kind of piece of the puzzle is how could you

build morality into AI? In other words, And we can talk about that more because for some, that sounds like a terrifying prospect, but we mean it in a much more kind of practical way. How can you make sure AI behaves in a way that aligns with the community's moral values?

And the reason we think that's important is because we have seen in our own work and because we've been thinking about AI for so long, yes, there are so many benefits, but there are so many ways, as you're alluding to in that quote, that they could really harm people. So one piece should then be how can we actually design the AI better?

But then the second piece, which is equally important, is how can we make sure people are empowered and know what they need and have the tools they need to behave ethically and to use AI ethically. You can't do 1 without the other. So we work on technology, we develop AI systems, but that is definitely not the only piece of the puzzle. And so moral AI is kind of putting those both together. It's not just one, it's not just the other, it's both pieces. So let's dive into each of those

pieces. Why don't we start with designing AI systems to make moral judgments and decisions? How the heck would you even go about that? What are some ways that people approach this? Yeah, absolutely. And why do we need to do that? Like, do AIS just do everything Fine. So in many, there are all kinds of cases and you could dig them up, kind of debate them and talk about them in in other contexts, AIS are telling people to commit suicide and sometimes they're

listening. AIS are telling a teenager to kill their parents because their parents are telling them to spend less time with the AIAIS are doing things like making mistakes when they're in a robot form and so crushing someone because I thought it was a box and we're supposed to pick up the box. AIS are directing missiles. So there there's a lot of harm it can do in a lot of mistakes if it's not used in the right way.

So the idea is OK, let's at least have the AI do a little bit better in knowing how it's supposed to behave. These cases that you described, these are really obvious cases, right? Where the answer is don't crush the human being right? But there are much more Gray areas in moral decision making. I'm also curious how you go about designing AI systems to not incorporate bias or when it's not so obvious what the right answer is.

Yeah, absolutely. So let's just acknowledge what some of those are and then I'll give you some answers. And I'm not going to have all the answers yet, obviously, no. One does. You know, some of those things, part of what makes them not obvious is they happen because AI is so scalable and so many people are using it, and it's in the kind of a system with all these feedback loops and all this stuff happens. So social media is a perfect

example, right? Many of us love using it for what it can offer, but it also means that you end up with these echo chambers. And some people would like to debate whether those echo chambers exist. I find it hard to say that with a straight face, but you don't see that these create echo chambers. How could they not? Exactly. They're designed almost to do that at this. Point literally, that's the algorithm. In many cases, in many cases, right.

And so when you have echo chambers, then it leads to this kind of in Group out group differentiation that actually has been shown to correlate with violence in a very direct way. So that's a lot of steps to have to make. And you're think if you imagine yourself as an AI engineer who's like build this product, it's not part of their training to have to think about all those things.

And some people may say that's obvious and you should be able to foresee it. But I think the kind of the main point that we want to make, and part of the reason we said the quote that you cited that we have to be humble about it is that I think it comes back to, again, so many people are using AI. It is so scalable. It is in so many places that it's really hard to anticipate

all the impacts. And at this point, I think it's just both smarter and more genuine to say we can't anticipate all the impacts, but we should kind of assume that they're going to be negative impacts that we aren't anticipating. I think it's part of what we struggle with a little bit with this movement is I think a lot of people, especially engineers, people who are in the AI field are like, we're good people. You know, we're going to do the right thing. It's kind of built in and makes

mistakes sometimes. And I think it's hard to acknowledge and see straight face that the impacts can actually be really big. And it's not that you intended them. It's not that you know you did it on purpose or anything like that, but they still happen and it's still as a result of choices we make. So basically, would you say that to summarize, it's like curbing the risk around unintended consequences of AI behaving in

misaligned ways? So the way I think of it is that the goal of moral AI should be to maximize the benefits for the most people, minimize the harms to the most people, and eliminate the unacceptable harms. So it's kind of like those three pieces. That's the way I think about it. So it's not just minimize risk. I don't think that quite captures the whole thing. Yeah. Nice.

OK, so how do you do it? And again, there's no 100% answers, but I'll tell you how we're working on it, how I think we should work on it, and what I have a lot of optimism about. So there are three main approaches you could take. One is the one I think people are most familiar with, which is you Hoover up as much data as you can and you learn as much as

Why Human Morality Is Messy

you can from it. So we call this the bottom up approach. So the way this would work if you were trying to learn morality is you would get as many moral decisions or moral judgments or descriptions of moral judgments or moral scenarios as you can. And you would feed it into your system and hope that the system learns what it needs from those examples. So, you know, the one thing is LMS have showed us that this approach works a lot better than people thought it would kind of

in general. And so when I when I first started, I would have thought this was a terrible idea. It sounds terrifying. It really does. It turns out, you know, models can learn a lot from a lot of data. A lot more than I thought that they could just learn. And there have been some examples of this. There have been some systems, There's one called Delphi out of University of Washington.

They've kind of taken a lot of data from online and then also collected some new ones and kind of fine tune a model. And it can do some moral judgments a lot better than I would have ever expected. When it gets into kind of real messy stuff, that's when it starts performing less well, but it can do a lot of things and make some general statements like should you say this online or not. A lot better than I would have expected. So that's the bottom up approach.

Now, the problem with that is it takes a tremendous amount of data, of course. So that's not very practical and it's unclear whether you could ever get enough data to cover all the moral scenarios you want. And I think that's the other important thing to remember, right, is that in some, if you're just trying to predict what kind of movie someone wants to see, that's a pretty low stake scenario. And if you make a bad choice,

big deal. Even if you're talking to an LLM and it hallucinates, it's annoying, but it may not be the biggest deal in the world. If it makes an immoral decision, that can be a bigger deal and sometimes a really bigger deal. So, you know, accuracy matters a lot more. OK, so that's one problem is could you ever get enough data to actually make help it learn all the kind of moral behavior

it needs? And is that about also like the quality, like in terms of when we talk about fine tuning models or even prompting, often times like it's relying on providing it with context that is not only a form of relevant context, but like really high quality context. Like, I guess like the best moral examples of sort Like how much does that play into this? Absolutely, absolutely. So that's the biggest problem. Well, do you really want that

data training your moral AI? So there's a lot of stuff that happens online that people feel is violently terrible, right? And some of it's created in jest, right, just to get clicks. So it doesn't even mean that people actually think it's a morally right thing to do. And the context, like you said,

same was really important. So, you know, if you don't have the context for why you're making a certain decision, then you would predict one thing in one case, but do the exact opposite in a different case with a different context. So you have to make sure all of that gets in there. And then of course, we're biased. We have our own challenges, but then there's also things we spent some time actually doing some painstaking work trying to figure out. Well, how reliable are we as

decision makers anyway? It turns out we as humans, we change our minds. Sometimes it's a great thing, but we do change our minds. But also sometimes we change our minds for reasons that don't feel so great. Like when you're hungry, you make different bail decisions than when you're not hungry. That's a famous study. I'm sure you guys know, you know, from Israel. Or if you're tired, you make different moral decisions than

when you're not tired. And those are things that if you don't have that context, and it's would be pretty hard to get that kind of context and language, for example, if this is a language model that doesn't get built in there. So how does that model deal with all that?

And then even when you put all that aside, some of our studies have shown that if I give you the same moral judgement to make many times over the course of many weeks, some of us are consistent each time, but many of us flip back and forth. And there's some ways you could predict that, but sometimes it's because we actually don't know and it's a hard decision. And those are the ones where we need the AI the most and where

we're the most unreliable. So there's like these more obvious reasons at the bottom up approach is concerning, but there's also these deeper reasons where it's really unclear whether it could ever work all on its own.

Building a Hybrid Moral AI System

Yeah, My favorite version that is, I think it's called. It's a form of steps or pyramid around cognitive dissonance where you have two people who are equally, let's say, willing to say that it's bad to cheat on an exam and they're kind of coin tossed into one is kind of like looks over, sees some kind of right answer from their neighbor and they decides to write it

down. And then based on that chance event, like the leads to a little bit of a pyramid of steps in each direction where maybe the one person like, because they didn't cheat and they also had a chance of glance, they become more and more firmly against the idea of like, you should never ever cheat because morally abhorrent. Another one because they can like decided to cheat a little bit. Like they just kind of minimize

the deal. Like, you know, everyone cheats sometimes we all do it. It's not the end of the world. And so they become more and more kind of on the other side. And again, it's kind of a chance event that led to people to just end up in relatively different moral camps. Absolutely. I could go down this path for a long time, but I'll just add one more, which is that what we say is right or wrong and what we truly believe is right or wrong is often different from what we do.

And so there's this fundamental question, well, which one do we actually think is right then? Is it that what we said was right, or was it what we actually do? Because we're often happier with what we do or other people did the thing that we do. Like don't cut in line, but then, you know, you see your family cut in line so that they can get on to the cruise faster or something like, oh, way to go. Everyone has experiences like this, so it's kind of an open question.

Which one is it that we actually think is right or wrong? And what is a model supposed to do with that? Well, yeah. And often when we're philosophizing about these scenarios, we're thinking about it in a vacuum, like this one singular thing is right or wrong. But what's the alternative? If you take all the different iterations of the trolley problem and then you try and apply that, well, you might say that this is wrong and you shouldn't do it.

But if the alternative is a million times worse than it, maybe it's fine. Maybe it's fine to do it. Or at least the best of two bad options. The best of two Bad. Yeah, yeah, maybe fine is not the right word. Honestly, This is why I was like so excited to talk to you because I feel like it's something I've been thinking about lately a lot when it comes to people talk about misaligned AI and coming back to that is like we as human beings are not the most aligned as well.

And that's like the things we're talking about now in terms of how we differ, how like some of random chance events can cause us to have very small preferences or like you say, context or what we say was what we do. But also like in general, the idea that we think we as humans have sort of like core human values that like this is important, this is what we value. This is morally correct or wrong. That is not true. Like we are very like a constant

moral disagreements. And that's how human societies are built and changed. And sometimes I can very painful, but it's part of what we accept as part of human societies. With AI, on the other hand, I don't think we have the same acceptance like we have this. I think most people that I direct with still cling to is that we want AI to be unbiased, aligned, and basically morally

on a perfect high ground. I think we're working through a lot of things as behavioral scientists and as just people in the world trying to reckon with this AI era that we're in. And one of those is just, are we just scared of AI because it's different? So are we holding it to a different standard just because we're not used to it, or is it because it is fundamentally different? So a lot of people give humans leeway because they think they believe in our. And it's often true that the

humans aren't trying their best. They're trying their best. They have the right intentions and they have your best in mind, right? AI systems, there are multiple things. First of all, they're designed by some organization, whether that be an academic organization or a corporation or whatever. And it's hard to trust that it was designed with your best interest in mind, right? It doesn't actually have some set objective in it that says maximize the best for you or for society.

Whereas, you know, at least a human can articulate if that's what they're trying to do, whether it's true or not. And so I think that's one big, you know, thing that makes us perhaps give humans more leeway than AIS know whether we should or not. We can absolutely debate about that, but I think that's one of the things we kind of have to wrestle with and some of the things that you can kind of think about even from an engineering perspective.

OK, So what if you had some mathematical objective, you know, that was mathematically proven to be on behalf of the user or something? Would people trust that? Is that something we could get our minds wrapped or would that solve any problems? These are some of the questions I think we all have to wrestle with. And I also want to say you said something that I would you want to push back on just a little bit.

I probably over interpreted it, but you're saying something like, it seems like we have all these values, but we don't. And I just want to say we do have the values. We're just messy creatures. And, you know, real life is really confusing and messy and conflicting. And so we don't always act in a consistent way. We don't always know exactly what those values are, the values conflict. And so maybe it's just not even clear what the resolution should be. Yeah. No, I think it's fair.

And I think it was more that we maybe lack consistent, like universal values. That's what I was coming towards. Like it's more that we have certain ideas around, obviously religions, there are certain values within that is like written down as 10 commandments or there are certain things that what makes for a moral good. But at the same time, that's like talking about philosophy, like one of the most debated things, like what is the virtues that we should strive for? What are the vices?

And and I think that changes. That's that was what I was trying to say. That's usually people's first gut reaction to how could you possibly move morality into AI is that, but there's no universal morality. So how do you go about that? So I think we can actually, from an engineering perspective, not settle that, but manage that. It kind of sounds like what you're describing is the top down approach for designing morality into AI systems.

Is that what you would say when you say, like, we have this set of core moral beliefs and we can say, OK, now apply this to the world? The top down approach should be let's take these principles. We all have some principles that we think we follow and let's build those into the AI and give it as rules or guidelines. And this is also been tried and shown to have quite a lot of success. In some ways, Anthropic uses something like this right now.

They call it constitutional AI, and they came up with these kind of ad hoc rules. They're not totally ad hoc. They came up with some from like, you know, human rights documents, some from their own research and a bunch of things. And they put them together and they said, OK, now AI, here's our description of what these things are. Now train yourself to make sure you abide by them. And it actually led to much better behavior than some other

approaches. As far as I understand, they're still using it, so that can definitely make a contribution. But the challenges are, Samuel, as you suggested, well, what if people don't agree? First of all, we can talk about that too. But second of all, there are still philosophers. And the reason why is because there's no one theory or set of principles that seems to account for everything, you know? So there's still debates about, well, what should the right theory be?

And there's no one theory that can actually account for all the situations. So we're still in this problem where you can't handle all the scenarios. And the other thing is that usually these rules, one thing they really don't do, and that really characterizes real life, is that values and principles conflict. And often many of them conflict. And usually we have no idea what principle to say about OK when they conflict. Here's how you should resolve them.

All right, so it sounds like there are some pitfalls of a bottom up approach and there are also some pitfalls for a top down approach. What do you do then in this situation? Nothing is perfect. Nothing's perfect, and our approach isn't perfect either, but our approach is to use what we call a hybrid approach.

So try to take both of best worlds of using both top down and bottom up. So the idea behind our first generation is first we should actually say you have a commitment to something that many don't, which is that we think if you're going to be behaving in a moral domain, you should be interpretable. In other words, we should know how a model works and be able to make some real predictions about how it's going to behave in different settings rather than

just hope for the best. And I'm not saying that all black box models are models where you can't understand what's going on are bad, but we are committed to having not black box models because we think that's important for long standing moral AI. In all cases, do you think? Or are there some cases where it doesn't matter so much? I'm sure there are some cases where it matters less, but I

think that we should. I'm not saying that we shouldn't have products that have a Gen. AI and black box algorithms in them, but I think that we should be very careful about using them in a moral domain, especially when you know you're affecting life or death or could. Then we think that should be essential. And so we should be putting our energies into that. So there's some trade-offs with doing that based on our current technology.

And so our trade off is that OK? In order to make it interpretable rather than make a general AI, we need to start with kind of an individual use case. So let's start by figuring out how we can make this work in one kind of setting, see what we can learn, and then we can build out so the use case that we've worked with. Is kidney allocation. So they're way fewer organs to go around than people who need them. And so they're kind of different ways you can get kidneys.

We'll focus on kidneys, but it's also true for many other organs. 1 is someone could die and you could get the kidney before the kidney becomes unviable. It could be transplanted into you. But we also all have two kidneys. And so you can actually donate kidneys to people. And it turns out you can't just donate a kidney to anyone you want. They have to be compatible. Your blood types have to be compatible, your immune panels have to be compatible.

There's kind of a bunch of different things. So it's actually one of the great success stories of AI. That's why we chose it first of all, because it seemed like if we made any impact, it would only do good, or it seems like

Case Study: Kidney Donation Decisions

the harm would be dramatically reduced. And we already know it's an AI success story, so that's why we used to start with. But so AI has been used to try to optimize what they call kidney exchanges. It's like, OK, if my kidney does isn't compatible with my son's, but my kidney is compatible with my neighbors and my neighbor's mom has a kidney that's compatible with my son's. I know that's already makes your brain hurt. That's just a two way exchange. There have been like 12 way

exchanges before. And so a you can see already if it makes your brain hurt. That's why AI can be very useful here. Essentially what you're saying is that you can donate a kidney, and while your exact kidney might not make it to your person of interest, they will get a kidney from someone else if you donate your kidney. That's right. And it might take a bunch of swaps in order to get that to work, but eventually someone

will get it to you. But you can see how that involves a lot of trust too, right? It's like, well, if I'm going to give you my one and only kidney, I better know that a kidney is getting. Probably you need some blockchain in there. Yeah, something like that. So in our medical system in the US, it's actually different in Europe. But in the US, they're only specific things that are allowed to be taken into account in those algorithms.

And it's things like your age, how long you've been on the wait list and how compatible you are, things like that. Well, it turns out that if you ask people. So this is going to be our bottom up, OK, bottom up approach. If you ask people, are there other factors, moral factors that you think they should be taken into account? People say, yeah, a lot of them.

So, for example, most people think that, and you might not agree and that's fine, but most people think that how many dependents a person has should impact whether they get that kidney or not. So if they're especially a single parent and they have three small kids, that should give them kind of a bump up in the priority list. Perhaps also for elderly patients. Other things that matter are whether you did something in your past that might have contributed to having a SO were

you an alcoholic? Are you drinking? Now? Some people think that whether you are a criminal or a violent criminal matters. And back in the day, in the 50s or whatever it was, there was actually a God panel. It was like 12 or 13 community members who got together and decided who would get dialysis. And these were some of the things that if you look at the transcripts, they actually consider. But this has been taken out of the algorithms.

But it turns out most people in all of our samples think that there's some moral considerations that should be put back in. So all right, so the bottom up approach is, OK, let's ask people what do they think is important in these situations? So not yet what's right, but just what do you think are some of the factors you should take into account? And so then what we can do then is then ask them create models. This is a more traditional AI

approach. One of the assumption, I'm curious what you guys think about this, was there's kind of an assumption in the economics and computer science literature that if you ask people what they think, they won't be able to tell you the truth, That the only way you can find out what they really think is by having to make a bunch of choices. And they kind of reveal their preferences.

They're making a bunch of choices, but that's the only way you'll actually understand what's, you know, their actual decision making. The stated versus elicited or revealed preferences. Yeah, exactly. Curiously, what do you guys think about that? Yeah, no, I think that is probably something I would quite strongly agree with in many ways, if you compare it to the

stated version. I would much rather, especially for these moral decisions for them to be informed by people's actually willingness to act or something or like they're, you know, in more traditional like business. I think this is just kind of like, are you willing to pay for it? Basically, it's great that you like the idea or as great as you like the feature, but are you

willing to pay for that feature? Are you willing to pay for that product and so on. And I do think that, yeah, based on my experience, that would be something I would weigh much heavier than their stated. Preferences. Yeah. I will only add one thing, which is that we had a really interesting conversation about this with Carrie Morwich on a past episode when we talked about recommender systems. And this is like low stakes in

most cases. You know, this is more along the lines of the movie example, but I think we kind of came to a both is better approach where you do have sort of that combo. Interesting. You're foreshadowing. So now I'll set the stage for the kind of just revealed pressure revealed Preferences approach is our generation 1 and the combination approach is our generation 2. So generation one, the bottom up is this combination of what

bottom up and top down. So we ask them what features they think are important, but we assume that they're not going to be able to tell us exactly how they think it's important or how they would actually make judgments. So then we have them make a bunch of choices where we have two people who could potentially get a kidney.

There's a kidney available. Two people here are two people which one should get it. And we give them a bunch of features that and these are all features that people said they thought were important. And we kind of vary the values and those features and find out what they think. And then we do a bunch of kind of computer science stuff in the backs and of how do we ask the most optimal queries so that we can get your preferences as quickly and efficiently as

possible. You know, how do we prioritize the ones that are going to have the biggest impact and kind of all that kind of stuff. I'm not actually positive that SAT uses what's called active learning. That's something we work with on the back end, but that's what I remember it feeling like. It's like it adapts dynamically based on your previous decisions. Yeah, maybe that's AT does, but certainly something like Khan Academy or other educational tools that are very much driven

by machine learning. And how conventional were these options? Like were they mostly like the more conventional options between we talked about the, the number of dependents versus not and so on? Or did you include things like having 2 cats versus one dog or like how? Yeah, yeah. I don't know if you've talked about the moral machine. Have you talked about the Moral Machine on your podcast yet?

From Models to Moral Reflection

No, we have not. So there's a group at MIT, or they were at MIT originally, and some of these scenarios called the trolley scenarios that are famous in philosophy. And the most quintessential 1 is there's five people tied to the track and there's a trolley coming running down the track. And if you do nothing, those five people are going to get run over. But, and they're different

versions of this. There's a really large man with a backpack because you're not allowed to say that they're fat. Originally it was a fat man in front of you, but the really large man with the backpack right in front of you, he's bigger than you. So if you just jumped in front of the tracks, you wouldn't be able to stop the trolley. But if you push this fat man onto the tracks before the trolley runs over the five people, the trolley will stop.

And so that one person would die, but the five people would live. And so there's like a lot of different versions of this from the philosophical literature, and a lot of the behavioral ethics that's been done have

been using the scenarios. So a group from MIT did this with cars because this is actually something, these types of decisions are things that autonomous vehicles actually have to decide on. So they had these kind of scenarios and they were asking people across the world decide, OK, you're in this one car and you have three people in the car, like two kids, an adult and a cat walks across the street

with a priest. So should you run over the cat, the priest, or should you jam on your brakes, you know, potentially hurting the kids in your car? And they asked a bunch of these types of decisions. In our case, Samuel, we did this bottom up. We first did a lot of survey work to say, well, which features do you guys think are important? And so use those to determine which features we are going to ask about. So no one said that cats were important. So we don't include cats, for

example. But dogs? Dogs. For sure. Yeah, no dogs yet. No dogs yet. Yeah. So we only, we've only included things that the kind of the population said was important. And so we give them a lot of these queries and then we see if we can learn their model and be able to predict what they would say in new settings. And this is kind of like I said first generation approach and we are able to predict pretty well in many cases definitely above 90%, sometimes way above 90%.

And then there are some people that are hard to predict. Now what we're working on that I think is probably this is here's a slightly controversial view and I'll speak on behalf of myself and not my co-authors. I actually think that much more of AI training should look something like this, that it's more of a collaboration between users and the AI. So now what we are working on is OK, especially because we're committed to these interpretable models.

We can actually tell you the model we've learned. So what if we tell you what that is? Can you give us feedback about it? And what kind of feedback can you give us? So we're trying things like, well, what if we just give you slider bars? Like here's how much weight we think you're putting on this, you know, do you understand enough? And what if you change those slider bars? But we're also looking into ways that we can have a natural language way of doing that.

Like here's what we think you're doing. This is sound right to you? If not, why not? Here's one of the parts the models confused about. What do you think is the case there? And we don't just stop there, though. The idea is that you would have this iterative process that we tell you, OK, you've answered maybe like 10 different scenarios. Here's our best guess about what your model is. Now, what do you think about that? Here's what I'm most confused

about. OK, I'm going to adjust based on what you told me. Now I'm going to give you another 10 scenarios. You're going to give me your answers. I'm going to see how different that is from what I predicted. I'm also going to tell you when I would have made a different prediction. And you then you can tell me, do you want the one that I would have predicted or do you want the one you said you were going to say? And you can kind of go through

this iterative process. I was like a bit like human in the loop. Reinforcement learning is that. Yeah, with the caveat, and I'm sure we'll talk about that, we've talked about how humans are fallible. I am less proponent of just saying human in the loop is the answer to all things. But yes, it's very much like that. But it's like really taking that seriously. It's not just human loop to course correct. It's actually a fundamental aspect of how do we train this up most efficiently.

And the other critical part that I think is really relevant is, well, now that we're telling you what we think the model is and we're telling you what you think you would say, can it increase your trust that this model actually represents you? And especially if it's something you can articulate, we've articulated it to you. So if you can say, OK, yeah, I think this is right. I prioritize having this many of dependents over how long you've been on the wait list by this

much. And I prioritize those things over whether or not you were an alcoholic in the past by this much. When this happens, this is the decision tree I would take. You can actually articulate it and you can write it down and you can be like, I'm going to look at it for a while. I'm going to think about it.

Is that me or not? And so I really think a big important aspect of this is that trust piece, because if eventually you're going to have some type of AI acting on your behalf or voting on your behalf or getting integrated into some type of system that's taking kind of everyone's moral on your behalf, you want to be darn sure that it's representing you accurately. I'm happy to have you be the one thinking about these problems. But of course, developing the

Embedding Ethics Inside Organizations

tool, that's not the last step. That's maybe one of the first steps, right? Even if we had, you know, we lived in this world with perfect technical, moral AI tools, they have to be implemented successfully. They have to be embraced and adopted by people who are actually building and using them. Yeah, absolutely. I'm really committed to the idea of what I call translational ethical AI.

So people in medicine are used to this idea of it's hard to get a medicine into the hands of doctors or patients in a way they will use. Well, the same thing is true of an AI tool. And you have to figure out how to do that translational work. And kind of the way we tend to do these tools, especially ethical AI tools, is we make it, we publish it and move on. I think including in companies

the like use these model cards. Here's an idea, go figure it out yourself how you're actually going to implement it. And so I think there needs to be a lot of work on helping that translation. But that's just from the

technical tool part. So we talked about 5 different pillars that we think society needs to be working on simultaneously to make it most likely anyway that we're going to end up on the right side of history and be happy where we end up. And at least a good chunk of these involve a lot of behavioral science. So we'll get to those. The first one are we need to have technical tools and we need to work on making sure that the people know how to use them and they can be translated.

The second is what we call agile public policy. And so those are kind of public policy mechanisms that can move more quickly than traditional ones. But those are the two I think almost everyone has heard about in some capacity. So I'll actually focus on the other ones. And so the next three are, first of all, we need to scale from the organizational practices that make it possible for ethical AI tools to actually be implemented and for people to make good decisions.

And that sounds silly, but I don't think it's silly at all. And I think it's one of the biggest pieces. And the first thing is that we can debate about this. And I'd be curious what you guys think about this from your consulting experience. But there's a lot of data out there now that suggests that even employees think that their companies don't mean it when they say they want to use AI ethically. They think that's not a priority. So we'll have to have one set of

strategies for one. That's the case. I'm going to put those aside for right now, if you permit me, and say, OK, for those organizations who really do want to use AI ethically and they think that's the most sustainable and profitable course, they have to acknowledge, if they collected some data, they'd probably find out that most people in their company don't think that it's actually a priority for them because the incentives don't seem to be aligned for that.

People feel like if they actually want to figure out how to implement these ethical AI tools, they have to do it all in their off time. People think fairness is something everyone wants to achieve. And there are at least still are some regulations about being fair and not discriminating against others. And I should also say, sorry that there's a lot of technical tools out there for AI fairness. You can audit your algorithms. You can actually modify your algorithm so that it's more

likely to be fair. You can have gone got kind of go through these checklists. So it seems like we've got everything we need. AI fairness should be easy. Turns out there's over 20 definitions, mathematical definitions of AI fairness, and someone in your company has to decide what that definition is. And they're dramatically different. Like first of all, who reads mathematical equations? Not many.

And yet somehow you have to understand the implications of this and it makes huge differences. And so now your technical team has to make that decision, perhaps while they're on a two week Sprint and you know them be held responsible or not for that. Right. There is the risk of making the wrong decision, right? Whereas if you kind of ignore that there's a decision there at all, it's almost safer to you, even if you can have these dramatic consequences. Exactly. Exactly.

And the amount of time it would take them to get up to speed on making the right decision is not what they're being incentivized with, right? So there's a lot I could say about this, but that's kind of one piece of the puzzle. And I think this is where behavioral scientists have a lot to offer. And there's a field of behavioral ethics in particular that I think has a lot to offer here. And we talked about some concrete pieces of advice in the book, But just to give you an

Moral Growth Mindset & Training the Workforce

idea of some of the things that I think are really important, my number one thing that I wish the entire world would do is create really robust ethical AIKPIS or key performance indicators. So we all know especially effective businesses organize everything, their compensation promotion, the entire strategy right around changing the needle on KPIs. Well if you don't have ethical AIKPIS then how are you ever going to compete with all the other KP is people are being,

you know, compensated for? Are there companies that have ethical AIKPIS who's doing this? I don't know for sure, any that are doing it. I keep hearing through the Grapevine, yeah, we're doing it. And then I ask about, well, what are they? And then no one seems to be able to tell me. How much of your bonus is tied to that? Right, exactly, exactly. So I'd be very curious and it again, if you looking through the evidence, most people who are working on ethical AI teams

feel like they have no power. So it makes me feel like that probably means they don't have KP is associated, you know, whatever they're supposed to produce. So yeah, there's a lot we can talk about there. But that on its own, just having the right kind of setting and organizational practices and helping organizations figure out, well, what do they need to do? What kind of cultures do they need to set? What kind of change management do they need?

What kind of processes that still won't matter if people don't have the skills they need, you know, even within that context. And so I think the next thing is what I call developing scalable ways to have system or career wide career long training in moral AI systems thinking lots of words there. The point is that very few of us that have time to develop to even figure out what we think is morally right or wrong or develop all the skills that you

need to do that right. And we're kind of in a culture now where people think, well, you're either a good person, you're a bad person, and if you're good, that equals good behavior. And if you're a bad person, that equals bad behavior. And that's it. I tie this back to another concept from the behavioral literature, this kind of fixed mindset versus growth mindset. People are familiar with that in education.

It applies to moral stuff too. Like you're, I think of it if you think that when people are either good people or bad people, that's kind of this fixed mindset. If you think of it instead as a moral growth mindset, we're all good people, or most of us at least are good people. We just have to learn how, you know, learn the skills. That's the shift we need to make. We need to make that and then actually give people the

opportunity to learn this stuff. And people like to think that moral education is just like high school level, maybe college level, but people forget that we actually need it all the way through. Like, you know, I don't think CEOs have had much opportunity to figure out exactly what their values are or to figure out how they apply to AI, right? Or like how that's even relevant to AI? Yeah, they do when they get a chance to write the best selling book about like this is how it succeed.

Then they have one virtue like you got to be tough, you got to be exploited like they might be to find something there. But yeah, I think honestly even what you said before about the kidney immoral AI intervention, it's almost a self reflection exercise for people to make sense of their you know, what is my moral preferences and and so on.

But what do I think and I think that is deeply needed and like to compare against what you said around will be nice or important to have when it comes to implementing AI morally at scale with organizations. I guess the reality right now is so far from that. Like you have some company wide tools or AI capabilities that are given to people. Like they get part of the Microsoft suite or whatever thing they have and they say, hey, you have Copilot now.

And then you have some people in the organization that are maybe, you know, able to already like double or triple their productivity, but they don't tell anyone about it because they don't really benefit from it.

So they're just writing all the code with some cursor or some other AI tool, but they had no real benefit to share it with whatever else, what tools they're actually using and how they're using it and so on. So you have like some really massive time for AI with organizations where yeah, I think it's so far from. The ideal context. So what do you see it from that messy reality we're in right now? What do you see as some of the first steps to better embed moral AI within organizations?

Yeah, absolutely. KP is. So that's the first thing. Then there are different ways to handle it. But I like the idea that many are advocating for of having an embedded ethical AI experts. And I'm going to say ethical AI, not just ethicists into product teams in particular. So product teams that are either developing the AI models themselves or integrating them

into products. And because the reason why, especially if I put on on my data science hat is, you know, there's a lot of kind of technical details that matter for these ethical decisions. And so you need to really have a kind of understanding of the data side of like, what's all this technical stuff and why is it relevant? But also some of the issues, the most likely issues to arise and

calling those out for people. And you need to be able to that into the normal agile process of product development, which works in sprints often and have immediate deadlines and don't have time for kind of other things. So embedding AI or moral AI or ethical AI experts throughout the organization, that's one thing. Now how do you get those in the 1st place? That's another piece. Then most people are not trained in both the moral and ethical side and the technical side.

So we need more people trained in those things. Another than critical piece is you have to work with change management teams and strategies to change the culture. We really have to get out of this idea that you're going to have a right or wrong, and if you do something wrong once, that means you're a bad person and you're out. It has to be. We're all going to have to recognize going back to this humility, We're all learning what our values are and how to make things align.

And also, this is a really fast moving space anyway, so we're not going to be able to predict things in the right way. So there needs to be kind of a change of culture of, OK, we're learning together and we are taking responsibility for when we make a mistake, we're going to fix it. But you're not going to, we're not going to fire you every time that there's a mistake or there's an unexpected outcome. You need to work on things like psychological safety, all kinds

of stuff. You guys are probably even better experts at than me. And you need to have facilitators. And some of these conversations, you know, people have very different views as we keep talking about. You can't just have those people in a room, give them an hour to discuss what to do, and think that everything's going to go fine if they've never been, you know, trained and having difficult conversations before, right? The tricky thing right now is

that information is abundant. We have so much information around AI and how it works and so on. But like wisdom around how to use it properly and thoughtfully and all these things is very scarce.

And so that's kind of what I hear when you say like having some embedded AI ethicists of sort like some of experts, because I think it can sound a little bit like you're purposely putting people there to kind of slow things down to be like, hey, but I think it really comes down to having people there that understands the more subtle elements of like how to use this

Why Trust & Culture Matter Most

thoughtfully for the best, good for the organization, like how to get out the best. Because I think that's honestly where I think right now there's a lot of even the, the organization that are selling kind of quote and quote change management running adoption. Often times they don't really understand AI that much themselves. Like they may be used to do change management for other products and services. And they think AI is like any

software. They start using it like anything, but it's very different. It's a very different thing. And you need that kind of higher level of expertise or wisdom to help facilitate that, I think. Yeah, absolutely. And it's a good point. Cause what are the differences and similarities between just getting people up to speed on using AI in general, which is an incredible need right now. And then is there something different about getting them up to speed on kind of how to use

AI ethically? And I think that's still an open question in general, but it definitely, I think people feel like they're more likely to have their incentives lined up, at least if they learn about using AI in general, then they are about learning about ethical AI. Because as you said, I mean, everyone feels like these ethical challenges make things slower sometimes, at least in the short term, it feels like it makes things slower. And so then it just feels like

it comes up the works. But don't you think that there's a pretty solid counter argument to the ethical AI gums things up and makes it slower by arguing that because it mitigates risk, you actually save time in the future when things you know are prevented from going terribly wrong?

And also, not just that, but if you can communicate the steps that you're taking to build moral AI, ethical AI that actually builds trust in your users and your audience, and also within your own company as well. And people are more willing to use your product, which maybe creates some efficiencies in terms of like, you know, you don't have to do so much marketing or all the other things that you would have to do to convince people to use your

product. Now, if people trust it, they're more inclined to use it right off the bat. I personally completely agree with you. And you know, I'm very persuaded with that, and I cite a lot of the data related to those things. But I think it does depend on your background. So I come from a lot of biomedical engineering. If you're going to stick something in someone's brain,

you want to get it right, right? And I've always been interested about this because I have some engineering background too. And for me as an engineer, you want to get the whole system working right. It's kind of like people who used to make space shuttles, you know, you're worried about the worst possible case. And that was kind of the

culture. But if you come from more of a software development culture where you don't think things are going to go really wrong, that's not part of your culture. So that stuff is not persuasive to it. Exactly, Yeah. But yeah, one of my colleagues who's now Google but was also a neuroscientist, says the AI adoption moves at the speed of trust.

And I'm persuaded by all the data behind that statement, but not everyone is. So you have to navigate that with those who are and are not persuaded by that.

Comparing AI Labs: OpenAI vs. Anthropic vs. Meta

Yeah. How do you look at the development of AI itself or like the AI models we talked about large language models, you know, because I think in some ways, if you look at least from my vantage point, compare and contrast in some of the labs is kind of a little bit of a

microcosm of what we discussed. I think like I've written down in terms of like if we think about open AI, Anthropic, maybe meta, like they all stick to certain like core principles like open AII think they stick to, you know, moving fast. Like they are really trying to get things out there as fast as

possible. Maybe a little bit more just embracing this idea of move fast and break things mentality from Silicon Valley and so on. Then Anthropic, as you referenced before, like they left, I think some of the founders left open AI because they basically felt like things were moving a little too fast and they wanted to again, stick to constitutional. Like it should be more based on

some of that. And then I have meta, which I think is really, at least among some of the people with the meta, it's like really strong around openness. Like this should be AI should be open, it should be open source and so on. And so I feel like within these labs, they're valuing different types of or taking different types of moral stances around AI development. How do you look at that? What are your take on kind of how AI develops, taking a step back?

Yeah. Yeah. I mean, I think all of that's right. And I would say it also seems like maybe sometimes those values shift and if I'm being trying to be consistent with stuff we talked earlier, I don't think it's bad to shift. I think you should just take ownership over that process. And so if you're changing how much you're focusing on something, why can you be transparent about it? Can you take responsibility for it? You're meaning that you change the name to close the AI.

That I mean, how important you think safety should be, for example? Yeah. Right. Maybe you started out by thinking that our reason to exist is to make sure that AI doesn't become conscious. And maybe you've decided actually the best thing for society is for AI to become conscious. That might be a viable transition. But in order for society to ultimately trust you, going back to this idea, ideally you would be genuine about your reasons and transparent about them.

And then we should be not just forgiving, but assume again, we all make decisions. We're all learning. Yeah. How much do you think that what these labs are doing is broadly the same when it comes to like we're talking about top down, bottom up, hybrid approaches and so on? How much do you think there are

yet? They're different in some ways, but is that really big differences in terms of how they're developing their AI models and how they are approaching their moral, the moral aspects to these things? Or are they like very different? Like are they very much on polar opposites or like how do you compare contrasts? But. It's really hard to know for sure what their real framework is.

I mean, Meta has Yan Lukun who goes on record all the time saying LLMS are stupid and are going to eventually going to fade out. And you have to have reasoning a is does that mean Meta actually succeeding in developing AI models? And are they investing, you know, a whole lot in developing models that will function in different ways than LLM's? In which case we have to know about them before we know, you know, are there principles built

into there? Yan Mikun seems to think that they will be, but there's no evidence of that yet. So it's really hard to know for sure. And I think they've all said different things at different times. Right now, it does seem like everyone feels like no matter what they think, they're in the race to make the best LLM and to get those into products. And so whatever they think seems to feel that speed is of the

essence. And so I think they're all kind of making their own calculations about what that means. But I think that does mean it's probably going to be more bottom up for a while than top down. Other than this kind of constitutional AI approach, which I still struggle with

What We Still Don't Know

because it's kind of this weird bottom up, top down, because you have no idea how the AI is actually interpreting these principles. It still feels strange to me, But yeah. OK.

So we made it to our quick fire round of decision which we call to AI or not to AI. And basically we're now going to post certain task for things that AI could do and we basically want you to tell us whether you think they should do it. So whether these things are something for AI or not to AI. OK, first one personalized AI assistance responses based on individual's moral values. I hope that AI can be part of that equation.

For right now I think it should be humans, but I hope that eventually AI can be part of that equation. Determine responsibility for AI failures like when a self

Quickfire: To AI or Not To AI

driving car fatally hits a person. I hope I'm not a broken record right now. I think humans need to make that decision. But I hope that we get to a point where AI's can be a big part of that and can actually make that more systemized. Cool. To AI or not? To AIAI version of you that you can designate to vote on your behalf. Well, right now for me,

definitely person human. But again, part of the vision is that we'll have cases where we would want a IS to vote on our behalf and feel comfortable doing so. So right now, definitely me. But I hope again that changes. All right, Social synchrony training for neurodivergent individuals. I'll give AI the benefit of the doubt here and say if it's a really well trained AI which we don't yet have. But then I think AI might end up being better than humans at some things anyway.

Depends. For neurodivergent individuals, I think AI might end up being more beneficial in the end when it's trained up well. And how do you think that? What would that training look like? Maybe for those who aren't familiar with social synchrony. Yeah. So it would have to be an interpretable system most likely again, or at least would have to

give interpretable feedback. So it would say things like here are the behaviors you're having that are not aligning with someone else's behaviors in the following way. And here's how it's impacting this interaction. So perhaps you're not looking at their eyes when they're looking at your eyes, or you're not taking turns when you should be taking turns if you want to communicate that you're listening, that type of thing. Cool. Next one, decide which struggling relationships of

yours are worth saving. Right now, definitely human, especially given that AI told a teenager to kill it's parents because it was spending too much time with the AI. Yeah, probably. Keep that decision for yourself, OK? A psychic hotline that models its answers based on probabilistic modeling. It's a psychic hotline. Are you? So you're going to the hotline because you're looking for a

psychic? Yep, Yeah. You call in the psychic hotline, you say, yeah, I like, you know, am I going to be rich and famous? AI. What about this AI generated breakup messages optimized for minimum emotional harm? So basically the AI writing this for you so that there's minimum emotional harm on the receiving end. Yeah, so this is one of where I will admit that I might have my view changed. And I'm this is something I'm really grappling with this type of thing.

But right now I'm still firmly in the camp of no, if you're going to break up with someone, that should be you, not an AI writing that. Even if you do it suboptimally, your intention matters and the respect you give them matters. What if you like put give ChatGPT your like first draft and you say can you like smooth this out for me? So like you really gave the input.

Yeah, I'm grappling with that. And we didn't get to talk about this, but this is something where I think we need to collect a lot of data. So I think it depends on the person. So for some people, the benefits would outweigh the harms of doing that. And so for them, that would be the case. For me, for example, if I was the recipient of that breakup message, it would harm me a lot more to eventually find out that I knew, yeah, smooth it out. But that only that's the case

for everyone. So I think we have to figure out it depends on the recipient. We have to learn who needs what. This is minority question. Just follow up, what would it hurt you most if you found out that was the break a message or that was like the first flirty message that got you to agree to go on a dates? Because there's a lot of AI basically only for that task. I don't remember what it's called, like a risks AI or something like this.

There's a lot of them, basically to write messages to get people to agree to go on dates and various things like that. That's a great question. Right now, personally, I would be more hurt by AI writing the breakup message because you already know me. So it seems like now it's really just like it's a profound kind of deep loss of respect and loss of compassion and effort as the the big thing is the effort that you're putting into caring for me.

The last one, a wedding planner that ensures you're efficient doesn't get poached by another couple. No, I still want it to be a human in this case. When it comes to it, when it, I still want it to be a human that decides who gets that efficient. Awesome. OK, I will give the context. This is a personal example. We both asked the same person, Dan Ariely, to officiate our wedding and he said yes to me first. So. We did, and I took about almost a decade for me to find out who it was.

That guy. Yeah. Yeah. But for the record, we also had a wonderful and efficient ourselves, so yeah.

Jana's Most Controversial Take

No regrets, you even convinced him to write a book with you later. Exactly. Well, that brings us to our final question. Jenna, what is your most controversial opinion in AII? Think my most controversial opinion is that acknowledging everything we talked about today and that I'm very concerned about a is harms. I also think that AI can actually help humans become better moral decision makers and actually can help us improve and understand our moral values

better. And yes, that's provocative. Yeah. Yeah. And actually this is something because you mentioned the lab, they had this Delphi AI model and it made me think about the Oracle at Delphi. And in this in on this vein, like, would you hope that there was some form of an Oracle at Delphi that you can go to with your moral qualms and be like, help me, you know? Yeah, there are two ways. And I'm actually, I really believe in these, like we're

working on them. So 1 is we talked about how we're in this phase two of turning up a moral AI. Do you have this come back and forth? So first of all, if you actually get to the point where you trust this thing and it's an AI for you, so you're like this model. Yeah, this is what I think. Then you can use it in a context where you have lots of time. The idea is that you would train this thing up when you're calm and rested. You get to think about it many times.

You get to change your mind. There's no judgement. So that then when you're in a context where you're under time pressure and you're not at your best, it could tell you, here's what you said was important to you in this context. Here's how that would play out. You know, it's up to you to decide what you want to do. But, you know, do you want that check? And so, for example, in our kidney exchange context, we're looking to use that with

surgeons. So surgeons, there's these documented effects that, for example, they're more likely to reject a kidney on the weekend than on the weekdays to come, you know, presumably because they have to come in. And they're more likely to reject a kidney from an African American woman than from other demographics in ways that can't be accounted for by medical situations.

So if you train them up in this outside context and now, you know, they get a phone call in the middle of the night, they have to decide in 20 seconds, are they going to accept this kidney or not? Not 20 seconds, you know, 20 minutes or something. You know, would they, how would they respond to an AI that they trained up? No one knows what the AI says. All they know is that they trust and they train this thing up. How would they respond to what

that AI says? So that's one kind of context. We're also looking at that in end of life decisions. Could you train up an AI so that it would make your own end of life decisions for you? But here's the other way. And you mentioned this was back earlier when we talked, Samuel, but you hit it right on the nose. So there was something I totally didn't anticipate with this whole training process. And that's that it's almost like a little moral psychology or moral philosophy class for

yourself. It's like you get to figure out through this process what you actually think. So I've been thinking about moral stuff for decades and going through this own process of turning up our own AI taught me things I didn't even know about my own moral judgement and made me think about it. And it lets you do it in a way that you're not being judged, at least as long as you don't have

Can AI Make Us Better Humans?

AAI assistant that's judging you. But right now that's not our models, right? And so it really has taught me and I've become actually really excited about this opportunity. And maybe it's a way to tie back

into this education model. Like, could we create learning opportunities for everyone tailored for them that let them figure out what they think is right and wrong and do that in stressful environment where you're not going to be judged and no one needs to know what you're deciding, but helps you figure that out? And so I actually, I actually firmly believe that both of these things are very possible and I'm excited about it. Yeah, I love that. And honestly, this is so much fun.

And like you mentioned beginning that you're an AI enthusiast and I think that really, you know, shows I I can really feel it. And it was really fun to, yeah, get into talking about all things moral AI. But from this ways where we can really spreading some enthusiasm for others to feel the same feeling of like we can do better here. And that could be a really valuable thing.

I'm so glad to hear that because that's my biggest concern is everyone thinks that we're just trying to tamp down AI those. Downers. Downers, especially me, because I tend to be more vocal about the downer stuff, but we really aren't. It's almost like we believe so much in the positive impacts that we want to see us get there. And you can't get there if we don't manage this other stuff. So I'm glad to hear that you feel the enthusiasm, because it's genuine awesome.

Thank you. This was lovely. Thank you again for coming on the show. Thanks so much to you guys and I look forward to hearing what else you learn and teach us the rest of your season. And that's a wrap. You've been listening to the Behavioral Design Podcast, brought to you by Habit Weekly and Nuanced Behavior. Sam and Aline tell me this season is packed with incredible insights about behavioral design and AI, so be sure to subscribe and share the podcast with your

friends. Though you might want to keep it away from your enemies. In case you haven't noticed, I'm an AI voice. Yep, pretty crazy. Quite the improvement since last season's AI outro, don't you think? And if you'd like to collaborate with us at Nuance Behavior, where we use behavioral design to craft digital products with Nuance, e-mail us at hello@nuancebehavior.com or book a call directly on our website,

nuancebehavior.com. A special thanks to the amazing Dave Pizarro for our show music, and to Mei Chen Yap and April English for their help in producing and publishing this episode. Thanks again for tuning in. We'll be back soon with another exciting conversation where behavioral design and AI Intersect happens to. Mugatroid.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android