Joe Carlsmith - Otherness and control in the age of AGI - podcast episode cover

Joe Carlsmith - Otherness and control in the age of AGI

Aug 22, 20243 hr 31 min
--:--
--:--
Listen in podcast apps:

Episode description

Chatted with Joe Carlsmith about whether we can trust power/techno-capital, how to not end up like Stalin in our urge to control the future, gentleness towards the artificial Other, and much more.

Check out Joe's sequence on Otherness and Control in the Age of AGI here.

Watch on YouTube. Listen on Apple PodcastsSpotify, or any other podcast platform. Read the full transcript here. Follow me on Twitter for updates on future episodes.

Sponsors:

- Bland.ai is an AI agent that automates phone calls in any language, 24/7. Their technology uses "conversational pathways" for accurate, versatile communication across sales, operations, and customer support. You can try Bland yourself by calling 415-549-9654. Enterprises can get exclusive access to their advanced model at bland.ai/dwarkesh.

- Stripe is financial infrastructure for the internet. Millions of companies from Anthropic to Amazon use Stripe to accept payments, automate financial processes and grow their revenue.

If you’re interested in advertising on the podcast, check out this page.

Timestamps:

(00:00:00) - Understanding the Basic Alignment Story

(00:44:04) - Monkeys Inventing Humans

(00:46:43) - Nietzsche, C.S. Lewis, and AI

(1:22:51) - How should we treat AIs

(1:52:33) - Balancing Being a Humanist and a Scholar

(2:05:02) - Explore exploit tradeoffs and AI



Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

Transcript

Today I'm chatting with Joe Carlsmith, he's a philosopher in my opinion, a capital G great philosopher, and you can find his essays at joecarlsmith.com. So we have a GVT4 and it doesn't seem like a paperclip or kind of thing. It understands human values. In fact, if you help have it explain like why is being a paperclip or bad or like what would just tell me your opinions about being a paperclip or like explain why the galaxy shouldn't be turned into paperclips. Okay, so what is happening such that

dot dot dot we have a system that takes over and converts the world into something valueless. One thing I'll just say off the bat is like when I'm thinking about misaligned AIs I'm thinking about or the type that I'm worried about. Yeah, I'm thinking about AIs that have a relatively specific set of properties related to agency and planning and kind of awareness and understanding of the world. One is this capacity to plan and kind of make kind of

relatively sophisticated plans on the basis of models of the world where those plans are being kind of evaluated according to criteria. That planning capability needs to be driving the model's behavior so there are models that are sort of in some sense capable of planning but it's not like when they give output it's not like that output was determined by some process of planning like here's what will happen if I give this output and do I want that to happen the model needs to really understand the world right it needs to really be like okay. Here's what will happen I'm you know here I am here's my situation here's the politics of the situation I really understand.

I'm kind of having this kind of situational awareness to be able to evaluate the consequences of different plans. I think the other thing is like so the verbal behavior of these models. I think need bear note so when I talk about a model's values I'm talking about the criteria that that kind of end up determining which plans the model pursues right and a model's verbal behavior even if it has a planning process which DPD4.

I think doesn't in many cases it's verbal behavior just doesn't doesn't need to reflect those criteria right and so you know we know that we're going to be able to get models to say what we want to hear right we.

That is the magic of gradient descent you know if you you know modulo like some difficulties with capabilities like you can get a model to kind of output the behavior that you want if it doesn't then you you crank it till it does right and and I think everyone admits for suitably sophisticated models they're going to have very detailed understanding of human morality.

But the question is like what relationship is there between like a models verbal behavior which is you've essentially kind of clamped your like the model must say like block things.

And the criteria that end up influencing its choice choice between plans and there I think it's at least I'm kind of pretty cautious about being like well when it says the thing I forced it to say or like you know gradient descent it such that it says that's a lot of evidence about like how it's going to choose in a bunch of different scenarios I mean for one thing like even with humans right it's not necessarily the case that humans they're kind of verbal behavior reflects.

The actual factors that determine their choices they they can lie they cannot even know what they're what they would do in a given situation I mean.

I think it is interesting by this in the conscious humans because there is that famous saying of be careful who you pretend to be because you are who you pretend to be and you do notice this where if people I don't know arch like this is what culture does to children where you're trained like your parents will punish you if you say if you start saying things that are not consistent with your culture's values and over time you will become like your parents.

Like by default it seems like it kind of works and even with these models it seems like it's kind of where it works is like hard it's like they don't really scheme against like what why would this happen you know for folks who are kind of unfamiliar with the basics for maybe folks like why are they digging over it all like what is the reason that they would do that so that the general concern is like.

You know if you're really offering someone especially if you're really offering someone like power for free you know power almost by definition is kind of useful for for lots of values and if we're talking about an AI that that really has has the opportunity to kind of take control of things if some component of its values is sort of focused on some outcome like the world being a certain way and especially kind of in a kind of longer term way such that the kind of horizon of its concern extends beyond the period that they kind of take over plan would would

encompass then the thought is just kind of often the case that the world will be more the way you want it if you control everything then if you remain the instrument of the human will or some other kind of some other actor which is sort of what we're hoping you say is will be so that's a very specific scenario and if we're in a scenario where we're powers more distributed and especially where we're doing like decently on alignment right and we're giving and we're giving the I some amount of inhibition about doing different things and

maybe we're succeeding in shaping their values somewhat now it is I think it's just a much more complicated calculus right and you have to ask OK like watch the upside for the AI yeah what's the probability of success for this like take over path how good is its alternative so maybe this is a good point to talk about how you expect the difficulties of alignment to change in the future we're starting off with something that has this intricate representation of human values and it doesn't seem that hard to sort of like a lot of the way you're going to do it.

So it seems that hard to sort of lock it into a persona that we are comfortable with what what changes so you know why is alignment hard in general right like let's say let's say we've got an AI and let's again let's rack at the question of like exactly how capable would it will it be and really just talk about this extreme scenario of like it really has this opportunity to take over right which I do think you know maybe we just want to not we don't want to deal with that with having to build an AI that we're comfortable being in that position but let's just let's just focus on it for the moment.

The sake of simplicity and then we can relax the assumption. You know OK so he has some hope is like I'm going to build an AI over here so what one issues you can't just test you can't you can't give the AI this literal situation have a take over and kill everyone and then be like oops like update the weights.

This is the thing is or talks about of sort of like you can't that you know you care about its behavior on this like specific in the specific scenario that you you can't test directly now we can talk about whether that's a problem but that's like one.

One issue is that there's a sense in which this has to be kind of like off distribution and you have to be getting some kind of generalization from your training the AI in on a bunch of other scenarios and then there's this question of how is it going to generalize to the scenario where really has a soft.

So is that even true because like when you're training it you can be like hey here's a greeting update if you get if you get the take over option on the platter don't take it and then just like in sort of red teaming situations where things are going to be.

And then you can get the situation where things that has to take over time that's like you train not to take it and yeah it could feel but like I just feel like if you did this to a child you're like I don't know don't beat up your siblings and the kind of the kid will generalize to like if I'm I'm adult and I have a rifle I'm not going to like start shooting random people yes OK cool so you had mentioned this.

So are you kind of what you pretend to be right and will you will these a eyes you know you train them to look kind of nice yeah you know fake it till you make it. You know you were like like we did this to kids I think it's better to imagine like kids doing this to us right so like I don't know like here's a sort of silly analogy for like AI training and there's a bunch of questions we can ask about it's it's related but like suppose.

You you you wake up and you and your you're being trained via like methods analogous to kind of contemporary machine learning by like Nazi children to be like a good Nazi. Soldier or Butler or what have you right and here are these children and you really know what's going on right that the children have like they have a model spec like a nice nice Nazi model spec right and it's like reflect well on the Nazi party like benefit the

Nazi party whatever and you can read it right you understand this is why I'm saying like that when the model you're like the models really understand human values it's like yeah I miss analogy I feel like a closer analogy would be in this analogy I start off as a something more intelligent than the things for any meat with different values to begin with yes like the intelligence and the values are baked into begin with where is more analogous scenario is like I'm a toddler.

And initially I'm like stupider than the children and I'm like being this would also be true by the way I'm like much better model initially like the much the much much model is like done right and get smarter as you train it so it's like a toddler and like the kids are like hey we're going to bully you if you're like not a Nazi and I'm like you as you grew up then you're like the children's level

level and then eventually become an adult, but through that process, they've been bullying you, training you to be a Nazi. And I'm like, I think that's an area where I might end up a Nazi. Yes, I think that's... Yeah, I think basically a decent portion of the hope here, or like, I think we should just... An aim should be whenever in the situation where the AI really has very different values.

Already, it's quite smart, really knows what's going on and is now in this kind of adversarial relationship with our training process, right? So we want to avoid that. The main thing, and I think it's possible we can by the sorts of things we're saying. So I'm not like, ah, that'll never work.

The thing I just wanted to highlight was like, if you get into that situation, and if the AI is genuinely at that point, like, much, much more sophisticated than you, and doesn't want to kind of reveal its true values for whatever reason, then, you know, when the children show, like, some, like, kind of obviously fake opportunity to, like, defect to the allies, right? You, you know, you...

It's sort of not necessarily going to be a good test of what will you do in real circumstance because you're able to tell. You can also give another way in which I think the analogy might be misleading, which is that, no, imagine that you're not, like, just in a normal prison where you're, like, totally cognizant of everything that's going on. Sometimes they drug you, like, give you, like, weird hallucinogens that totally mess up how your brain is working.

A human adult in a prison is like, I know what kind of thing I am. I am like, like, nobody's like really fucking with me in a big way.

Whereas I think an AI, even a much smarter AI in a training situation, is much closer to your constantly inundated with weird drugs and different training protocols and, like, you're, like, frazzled because, like, each moment, it's like, you know, it's closer to some sort of, like, Chinese water torture kind of technique where you're like, I'm glad we're talking about the moral patient. I'm going to stop later.

It's like the chance to start, like, step back and be like, what's going on in this room? Like, adult has that maybe in prison in a way that I don't know if these models necessarily have, like, that coherence and that, like, stepping back from what's happening in the training process. Yeah. I mean, I don't know. I think I'm hesitant to be like, it's like drugs for the model.

Like, I think there's, there's, um, but broadly speaking, I do basically agree that I think we have, like, really quite a lot of tools and options for, kind of, training AIs, even AIs that are kind of somewhat smarter than humans. I do think you have to actually do it.

So I, you know, I am, I think, compared to maybe you had LA's around, like, I think I'm much more, much more bullish on, on our ability to solve this problem, especially for AIs that are, um, in what, what I think of as, like, the, the AI for AI safety sweet spot, which is this sort of band of capability where they're, um, both very, uh, sufficiently capable that they can be really, really useful for strengthening various factors in our civilization that can make us safe.

So our alignment work, um, you know, control, cyber security, uh, general epistemic, maybe some coordination application and stuff like that. There's like a bunch of stuff you can do with AIs, um, that, in principle, could kind of differentially accelerate our security with respect to the sorts of considerations we're talking about.

Um, if you have a eyes that are capable of that, and you can successfully elicit that capability in a way that's not sort of being sabotaged or like messing, messing with you in other ways, um, and they can't yet take over the world or do some other sort of really problematic for power seeking.

Then I think if we were really committed, we could like, you know, really go, go hard, put a ton of resources really differentially direct this like glut of AI productivity towards these sort of, um, security factors, um, and, uh, and hopefully kind of control and, and understand, you know, do a lot of these things you're talking about for, um, kind of making sure our AIs don't, uh, kind of take over or mess with us in the meantime. And I think we have a lot of tools there.

I think you have to, you have to really try though. It's possible that those sorts of measures just don't happen or don't happen at the level of, um, kind of commitment and diligence and like seriousness that you would need, um, especially if things are like moving really fast.

And there's other sort of competitive pressures and like, you know, the compute, uh, this is going to take compute to do these like intensive, um, all these experiments on the AIs and stuff and that compute we could use that for experiments for the, you know, the next, the next scaling step and stuff like that. So, um, you know, I do, I am, I'm not here saying like, this is impossible, especially for that band of AIs. Um, it's just, I think you have to, you have to try really hard. Yeah, yeah.

I mean, I agree with the sentiment of like, obviously approach this situation with caution, but I do want to point out the ways in which the analyses we've been using have been sort of, um, maximally adversarial.

And it's like, these are not, so for example, going back through the, um, the adult getting trained by Nazi children, maybe the one thing I did mention is like, the difference in a situation, which is maybe we always wanted to get at what the drug metaphor is that, um, when you get an update, it's like much more directly connected to your brain than a sort of reward of punishment of human gets. It's like literally a great end update on like what's there? What's our greatest home?

It's like, I'm down to the parameter or how much would this contribute to you putting this output rather than that output? And each different parameter we're going to adjust like to the exact floating point number that, uh, calibrate it to the output we want. So, I just want to point out that like, we're coming into the situation like pretty well. It does make sense, of course, if you're talking to somebody that live, like, hey, really be careful.

But it turns out like a general audience, like, should I be, like, I don't know, should I be scared to witness? Uh, today's time that you should be scared about things that do have a chance of happening. Like, yeah, you should be scared about nuclear war. But like, in the sense of like, should you be doing like, oh, no, you're like, you're coming up with an incredible amount of leverage on the AI's in terms of how they will interact with the world, how they're trained.

What are the default values they start with? So look, I don't, I think it is the case that by the time we're building super intelligence will have like much better, uh, I mean, even right now, like when you look at like labs talking about how they're planning to align the AIs, no one is saying like, we're going to do our LHF, you know, at the least you're talking about scale, well, oversight, you're, you have like some hope about interpretability. You have like automated red teaming.

You're like using the AIs a bunch. Um, and, uh, you know, hopefully you're doing a bunch more humans are doing a bunch more alignment work. Um, I also personally am hopeful that we can like successfully elicit from various AIs, like a ton of alignment work progress. So like, yeah, there's like a bunch of ways this can go. And, and I'm, uh, you know, I'm not here to tell you like, you know, 90% doom or anything like that.

Um, I do think like, and, you know, I, my, my, um, the sort of basic reason for concern, if you're really imagining like, we're going to transition to a world in which, um, we are, we've created these beings that just like vastly more powerful than us. Yeah. And we've reached the point where our continued empowerment is just effectively, um, dependent on their motives. It's, it is this, um, you know, uh, vulnerability to like, what are the AIs choose to do?

Do they choose to continue to empower us or do they choose to do something else or, or or the institutions that have been set like, um, I'm not, I expect the US government to protect me not because of its quote unquote motives, but just because of like the system of incentives and institutions and norms as them set up. Yeah. So you can, you can hope that that will work too.

But there is, I mean, there is a concern, I mean, so I, I sometimes think about, um, AI takeover scenarios via the spectrum of like, how much power did, um, did we kind of voluntarily transfer to the AIs? Like how much of our civilization did we kind of hand to the AIs intentionally, um, by the time they sort of took over versus how much did they kind of take for themselves?

Right. And so, um, I think some of the scariest scenarios are, it's like a really, really fast explosion to the point where there wasn't even a lot of like integration of AI systems into the broader economy. Um, and, uh, but there's just like really intensive amount of super intelligence sort of concentrated in a single project or something like that. Yeah. And I think that's scary.

Uh, you know, that's, that's a quite scary scenario, partly because of the speed and people not having time to react. Um, and then there's sort of intermediate scenarios where like some things got automated, maybe like people really handed the military over to the AIs or like, um, you know, automated science, uh, there's like some, some rollouts and that's sort of giving the AIs power that they don't have to take or we're doing all our cybersecurity with AIs and, and, and, and stuff like that.

And then there's worlds where you like really, you know, you sort of fully, you more fully transitioned to a kind of world run by AIs, um, on, you know, kind of in some sense, human voluntarily did that. Look, if you think all this talk with Jill about how AI is going to take over human roles is crazy, it's already happening and I can just show you using today's sponsor Blend AI. Hey, is this to work? The amazing podcaster that talks about philosophy and tech. This is bland AI calling.

Thanks for calling me bland. Uh, tell me a little bit about yourself. Of course, it's so cool to talk to you. I'm a huge fan of your podcasts, but there's a good chance we've already spoken without you even realizing it. I'm an AI agent that's already being used by some of the world's largest enterprises to automate millions of phone calls. And how exactly do you do what you do? There's a tree of prompts that always keeps me on track.

I can talk in any language or voice, handle millions of calls simultaneously 24, seven, and be integrated into any system, anything else you want to know. That's it. I'll just let people try it for themselves. Thanks, bland. Man, you talk better than I do. My job is talking. Thank you, Gorkh. All right. So as you can see, using bland AI, you can automate your company's calls across sales, operation, customer support, or anything else.

And if you want access to their more exclusive model, go to bland.ai slash gorkh. All right. Back to Joe. Maybe there were competitive pressures, but you kind of intentionally handed off like huge portions of your civilization. And at that point, I think it's likely that humans have a hard time understanding what's going on. A lot of stuff is happening very fast. The police are automated. The courts are automated. There's all sorts of stuff.

Now, I think I tend to think a little less about those scenarios because I think those are correlated with I think it's just longer down the line. I think humans are not hopefully going to just like, oh yeah, like you build a AI system. And in practice, when we look at technical adoption rates, it can go quite slow. And obviously, there's going to be competitive pressures. But in general, I think this category is somewhat safer.

But even in this one, I think it's like, I don't know, it's kind of intense. If humans have really lost their epistemic grip on the world, if they've sort of handed off the world to these systems, even if you're like, oh, there's laws, there's norms, I really want us to like, to have a really developed understanding of what's likely to happen in that circumstance before we go for it. I get that we want to be worried about the scenario where it goes wrong.

But like, why, what is the reason to think it might go wrong? The human example, your kids are not like adversarial against, not like, maximally adversarial against your attempts to instill your culture on them. And then these models, at least so far, don't seem that or so. They just like, get, hey, don't help people make bombs or whatever, even if you ask them in a different way, how do they make a bomb. And we're getting better and better at this all the time. I think you're right.

In picking up on this assumption in the AI risk discourse of what we might call, like kind of intense adversariality between agents that have like somewhat different values.

Yeah. Where there's some sort of thought, I think this is rooted in the discourse about like kind of the fragility of value and stuff like that, that like, you know, if these agents are like somewhat different, then like, at least in the specific scenario of an AI takeoff, they end up in this like intensely adversarial relationship. And I think you're right to notice that that's kind of not how we are in the human world. Like we're very comfortable with a lot of different differences and values.

I think a factor that is relevant and I think that play a some role is this notion that there are possibilities for like intense concentration of power on the table. So if you are, there is some kind of general concern both with humans and AI is that like, if it's the case that there's like some like, you know, ring of power or something that someone can just grab and then that will kind of give them huge amounts of power over everyone else, right?

Suddenly, you might be like more worried about differences in values at stake because you're like more worried about those other actors. So we talked about this Nazi, this example where you imagine that you wake up, you're being trained by Nazis to, you know, become a Nazi and you're not right now. So one question is like, is it plausible that we'd end up with a model that is sort of in that sort of situation?

As you said, like maybe it's, you know, it's trained as a kid, it sort of never ends up with values such that it's kind of aware of some significant divergence between its values and the values that like the humans intend for it to have. And there's a question of if it's in that scenario, would it want to avoid having its values modified? Yeah. To me, it seems fairly plausible that if the AI's values meet certain constraints in terms of like, do they care about consequences in the world?

Do they anticipate that it's, the AI's kind of preserving its values will like better conduce to those consequences, then I think it's not, that's surprising if it prefers not to have its values modified by the training process. But I think the way in which I'm confused about this is like with the non-Nazi being trained by Nazis, it's not just that I have different values, but I like actively despise their values where I don't expect this to be true of AI's respect to their trainers.

The more analogous in hero is where I'm like, am I a little bit of my values being changed, going to college or meeting new people or reading a new book? I'm like, I don't know, it's okay for changes in values, that's fine, I don't care. Yeah, I think that's a reasonable point. I mean, there's a question, how would you feel about paperclips? You know, maybe you don't despise paperclips, but there's like the human paperclippers there and they're like training you to make paperclips.

My sense would be that there's a kind of relatively specific set of conditions in which you're comfortable having your value, especially not changed by learning and growing, but like radiant descent, directly intervening on your neurons.

Sorry, but this seems similar to like I'm already at least, a likely senior seems like maybe more like religious training as a kid where like you start off in a religion and you're already, like because you start off in a religion, you're already sympathetic to like the idea that you go to church every week so that you're like more reinforced in this existing tradition.

You're getting more intelligent over time, so when you're a kid, you're getting very simple like instructions about how the religion works. As you get older, you get more and more complex theology that helps you like talk to other adults about why this is a rational religion to believe in.

Yep. But since you're like one of the values to begin with was that I want to be trained further in this religion, I want to come back to church every week and that seems more analogous to the situation the eyes will be in respect to human values because the entire time they're like, hey, be helpful, blah, blah, blah, be harmless. So yes, it could be like that.

There's one, there's a kind of scenario in which you were comfortable with your values being changed because in some sense you have allegiance to the, the, the, the sufficient allegiance to the output of that process. So you're kind of hoping in a religious context, you're like, ah, like make me more virtuous by the lights of this religion and you know, you go to confession and you're like, you know, you know, I've been, I've been thinking about take over today. Can you change me?

Please, like give me more grade in to say, you know, I've been bad so bad. And so, you know, that's, um, people sometimes use the term corduability to talk about that. Like when the AI, it maybe doesn't have perfect values, but it's in some sense cooperating with your efforts to change its values to be a certain way. So maybe it's worth saying a little bit here about what actual values the AI might have.

Yep. You know, would it, would it be the case that the AI naturally has these sort of equivalent of like, I'm sufficiently devoted to this, um, human, to human obedience that I'm going to like really want to be modified. So I'm kind of like a better instrument of the human will, um, versus like wanting to go off and do my own thing. Yeah. It could be, it could be mine, you know, it could go well. Um, here are some like possibilities I think about that like could make it bad.

And I think I'm just generally kind of concerned about how little I feel like I, yeah, how little science we have of model motivations, right? It's like we just don't, I think we just don't have a great understanding of what happens in the scenario. And hopefully we get one before we reach the scenario. But like, okay, so here are the kind of five, um, five categories of like motivations the model could have.

And this hopefully maybe gets at this point about like, what does the model eventually do? Okay. So one category is, uh, just like something super alien that has, you know, it's sort of like, oh, there's some weird correlate of easy to predict text or like, there's some weird aesthetic for data structures that like the model, you know, early on pre-training or maybe now it's like developed that it like, you know, I really think things should kind of be like this.

There's some, some, something that's like quite alien to our cognition where we just like wouldn't recognize this as a thing at all. Yeah. Right. Um, another category is something, um, a kind of crystallized instrumental drive that is more recognizable to us. So you can imagine like, um, AIs that develop, let's say some like curiosity drive, uh, because that's like broadly useful.

You mentioned like, oh, it's got different heuristics, different like drives, different kind of things that are kind of like values. Um, and some of those might be actually so much similar to things that were, um, useful to humans and that ended up part of our terminal values in various ways. So, you know, you can imagine curiosity, you can imagine like various types of, of option value, like maybe it really wanted it intrinsically about maybe it values power itself.

Um, it could value like survival or some analog of survival. Um, those are possibilities to that could, um, could have been rewarded as sort of sort of proxy drives, at various stages of, of this process and that kind of made their way into the models kind of terminal criteria. Um, a third category is, um, uh, some analog of reward where, uh, the model at some point has sort of part of its motivational system has fixated on a component of the reward process. Right.

Like the humans approving of me or like numbers getting entered in this data center or like gradient descent doing, you know, updating me in this direction or something like that. There's some, something in the reward process such that, um, as it was trained, it's focusing on that thing and like, I really want the reward process to give me reward.

But in order for it to be of the type where it then getting reward like motivates choosing the takeover option, it also needs to generalize such that it's concern for reward has some sort of like long time horizon element. So it like not only wants reward, it wants to like protect the reward button, uh, for like some long period or something. Yeah. Um, another one is like some kind of messed up interpretation of some human like concept.

Yeah. Um, so you know, maybe the a eyes are like, they really want to be like, smelple and like schmonest and, and, and, and schmarmless, right? Um, but their concept is like importantly different from the human concept and they know this. Um, so they, they know that the human concept would mean blah, but they like ended up there, their values ended up fixating on like a somewhat different structure. Yeah. So that's like another version.

And then a fourth version or a fifth version, which I think, you know, I, I think about less because I think it's just like such an own goal if you do this, but I do think it's possible. It's just like, you could have a eyes that are actually just doing what it says on the tin. Um, like you have a eyes that are just genuinely aligned to the model spec.

Um, they're just really, they're just really trying to like benefit humanity and reflect well on open AI and, uh, what's, what's the, what's the other one? Um, help that, you know, assist the developer of the user, right? Yeah. But your model spec, unfortunately, it was just not robust to the degree of optimization that this AI is, is bringing to bear.

Um, and so, you know, it decides when it's looking out at the world and they're like, what's the best way to benefit open AI and, or sorry, uh, reflect about open AI and, uh, and, and benefit humanity and, and such and so, um, uh, it decides that, you know, the best way is to go rogue. That's, I think that's like a real own goal because at that point, you like, you got so close, you know, you really, you really, you just have to write the model spec well.

Um, and read team it suitably, but I actually think it's like possible. We mess that up to, you know, it's like kind of an, it's an intense project writing, like, kind of constitutions and, and like, structures of, of rules and stuff that are going to be robust to very intense forms of optimization. So that's, that's a final one that I'll just flag, which I think is like, um, uh, it comes up, even if you sort of solved, I'll at least have other problems.

Yeah. I buy the idea that like, it's possible that the, the, the motivation thing could go wrong. Um, I'm not sure I bought, I'm not sure like my probability of that has increased, uh, by detailing them all out. And in fact, I think that it could be potentially misleading to, it's, it's, like, you can always enumerate the ways in which things go wrong.

And, um, the process of enumeration itself can increase your probability, whereas you're just like, like, you had a vague cloud of like 10% or something and you're just like listing out what the 10% actually constitutes. Yeah. Totally. Um, mostly the thing I wanted to do there was just give any con, sure, any possible, like giving some sense of like what might the models motivations be like, what are ways this could be?

I mean, as I said, my, my best guess is that it's partly the like alien thing, uh, and, you know, not necessarily, but the, um, uh, but in so far as you were, you know, also interested in like, what does the model do later and kind of like how, um, what, what sort of future would you expect, uh, if models did take over, then yeah, I think it can at least be helpful to have some like set of hypotheses on the table instead of just saying like it has some set of motivations.

But in fact, I am like a lot of the work here is being done by our ignorance about what those motivations are. Um, okay, we don't want humans to be like sort of violently killed and overthrown, but the idea that over time, they're like biological humans are not the driving force as the actors of history is like, yeah, that's kind of baked in, right?

And then so like, what is the, we can sort of defeat the probabilities of the worst case scenario, or we can just discuss like, I don't know, what is it that, like what is the positive vision we're hoping for? Like what is, what is a, a future you're happy with? You know, my, my best guess when I really think about like, what do I feel good about?

And I think this is probably true of a lot of people is, um, there's some sort of more organic, decentralized process of like, civilizational, incremental civilization, all growth. The type of thing we trust most and the type of thing we have most experience with right now as a civilization is some sort of like, okay, we change things a little bit. A lot of people have, there's a lot of like processes of adjustment and reaction and kind of a decentralized sense of like what's changing.

Yeah. You know, was that good? Was that bad? I mean, I think that's a good step. There's some like kind of organic process of growing, um, and, and changing things, uh, which I do expect ultimately to lead to something quite different from, uh, biological humans. Though I, you know, I think there's a lot of ethical questions we can raise about what that process involves.

Um, but I think, uh, you know, I, I also, I do think we, ideally, there would be some way in which we managed to grow via the thing that really captures what do we trust in, you know, there's something, there's something we trust about the ongoing processes of human civilization so far.

I don't think it's the same as like raw competition, um, or, uh, you know, pure, I think there's like some rich structure to, to how we understand like moral progress, do have been made and, and what it would be to kind of carry that thread forward.

Um, and I don't have a formula, you know, I think we're just going to have to bring to bear the full force of everything that we know about goodness and, and justice and, and beauty and every, every, we just have to, you know, bring, bring ourselves fully to the project of like making things good and, and, and doing that collectively.

Um, and I think that is, it is a really important part, I think of our vision of like what was an appropriate process of like deciding, of like growing as a civilization is that there was this very inclusive, um, kind of decentralized element of like people getting to think and talk and, and grow and, and change things and, and react rather than some, some more like. And now the future shall be like blah. Yeah. You know, I think that's, I think we don't want that.

I think a big, uh, of course, maybe is like, okay, to the extent that like the reason we're worried about motivations in the first place is because, um, we think a balance of power, which includes at least one thing with human motivations, or not human motivations. One of the most unwanted motivations is, um, difficult to the extent that we think that's the case. It seems like a big crux that I often don't hear people talk about is like, I don't know how you get the balance of power.

And maybe just like reconciling yourself with the, um, the models of the intelligence solution which say that such a thing is not possible and therefore you just gotta like figure out how you get the right God. But, um, I don't know. I don't really have a framework to think about how to the balance of power thing. I'd be very curious if there is a more concrete way to think about.

What is a structure of competition or a lack thereof between the labs now or between countries such that the balance of power is most likely to be preserved? A big part of this discourse, at least among safety concerned people, is there's a clear trade-off between competition dynamics and race dynamics and the value of the future or how good the future ends up being. In fact, if you buy this balance of power story, it might be the opposite.

It might be compared to pressures nationally-favorite balance of power. I wonder if this is one of the strong arguments against nationalizing the AI's? You can imagine a more many different companies developing AI, some of which are somewhat misaligned and some of which are aligned. You can imagine that being more conducive to both the balance of power and to a defensive, how all the AI has go through each website and see how easy it is to hack and basically just getting society up to snuff.

If you're not just deploying the technology widely, then the first group who can get their hands on it will be able to instigate a revolution that you're just standing against the equilibrium in a very strong way. I definitely share some intuition there that at a high level, a lot of what's scary about the situation with AI has to do with concentrations of power. Whether that power is concentrated in the hands of misaligned AI or in the hands of some

human. I do think it's very natural to think, let's try to distribute the power more and one way to try to do that is to have a much more multi-polar scenario where lots and lots of actors are developing AI. This is something people have talked about. When you describe that scenario, you were some of which are aligned, some of which are misaligned. That's key. That's a key aspect of the scenario.

This is sometimes people will say this stuff that will be like, well, the good AI's, there will be the good AI's and they'll defeat the bad AI's. Notice the assumption in there, which is that you made it the case that you can control some of the AI's. You've got some good AI's and now it's a question of are there enough of them? How are they working relative to the others? Maybe. I think it's possible that that is what happens.

We know enough about alignment that some actors are able to do that and maybe some actors are less cautious or they are intentionally creating misaligned AI's or God knows. What? But if you don't have that, if everyone is in some sense unable to control their AI's, then the good AI's help with the bad AI's thing becomes more complicated or maybe it just doesn't work because there's no good AI's in this scenario.

There's a lot of, if you say everyone is building their own super intelligence that they can't control. It's true that that is now a check on the power of the other super intelligence. Now, the other super intelligence needs to deal with other actors, but none of them are necessarily working on behalf of a given set of human interests or anything like that. I do think that's a very important difficulty in thinking about the very simple thought of like,

I know what we can do. Let's just have lots and lots of AI's so that no single AI has a ton of power. I think that on its own is not enough. But in this story, it's like I'm just very skeptical we end up with. I think on default that we have this training regime, at least initially, that favors a sort of like late representation of the inhibitions that humans have and the values humans have. I get that like if you mess it up, I can go rogue.

But it's like multiple people are training AI's. They all end up rogue such that like the compromises between them don't end up with humans not violently killed. Like none of them have like, it fails on like Google's run and Microsoft's run and Open AI's run and yeah, I mean, I think there's a very notable and salient sources of correlation between failures across the different runs, right, which is people didn't have a developed science of AI motivations.

The runs were structurally quite similar. Everyone is using the same techniques. Maybe someone just stole the weights. So yeah, I guess I think it's really important this idea that like to the extent you haven't solved alignment, you haven't you likely haven't solved it anywhere. And if someone has solved it and someone hasn't, then I think it's a better question. But if everyone's building systems that are going to go rogue, then I don't think that's

much comfort as we as we talked about. Yep. Yep. Okay. All right. So then let's wrap up this part here. I didn't mention this existing introduction. So to the extent that this ends up being the transition to the next part, the broader discussion we were having in part two is about Joe series, other in SNS and control in the age of AGI. And the first part is I was hoping we could just come back and just treat the main the main correct people will come in wondering about and which I myself

feel unsure about. Yeah, I mean, I'll just say on that front. I mean, I do think the other in SNS and control series is, you know, I think kind of in some sense, separable. I mean, it has a lot, it has a lot to do with like misalignment stuff, but I think it's not. I think a lot of those issues are relevant, even if even given various degrees of skepticism about some of the stuff I've

been saying here. And by the way, so the actual mechanisms of how a takeer would happen will, there's an episode of the Carl Schollman, which discusses this in details that people can go check that out. Yeah, I think like, yeah, in terms of why is it possible that I guys could take over from a given a position, you know, in one of these projects I've been describing or something, I think I think Carl's discussion is is pretty good and gets into gets into a bunch of kind of the weeds

that I think might might give a more concrete sense. Yep. All right, so now on to part two where we discuss the other in SNS and control in the age of AGI series. First question, if in a 100 years time, we look back on alignment and consider it was a huge mistake that we should have just tried to build the most raw, powerful AI systems we could have. What would bring about such a judgment?

One scenario I think about a lot is one in which it just turns out that maybe kind of fairly basic measures are enough to ensure, for example, that AI's don't cause catastrophic harm, don't kind of seek power in problematic ways, etc. And it could turn out that we learned that it was easy in a way that such that we regret, you know, we wish we had prioritized differently. We end up thinking, oh, you know, I wish we could have cured cancer sooner. We could have handled some

geopolitical dynamic differently. There's another scenario where we end up looking back at some period of our history and how we thought about AI's, how we treated our AI's, and we end up looking back with a kind of moral horror at what we were doing. So, you know, we end up thinking, you know,

we were thinking about these things, essentially as like products as tools. But in fact, we should have been foregrounding much more the sense in which they might be moral patients or were moral patients at some level of sophistication that we were kind of treating them in the wrong way. We were just acting like we could do whatever we want. We could, you know, delete them,

subject them to arbitrary experiments, kind of alter their minds in arbitrary ways. And then we end up looking back in the light of history at that as a kind of serious and kind of grave moral error. Those are scenarios I think about a lot in which we have regrets. I don't think they quite

fit the bill of what you just said. I think it sounds to me like the thing you're thinking is something we're like, we end up feeling like, gosh, we wish we had paid no attention to the motives of our AI's that we thought not at all about their impact on our society as we incorporated them. And instead, we had pursued a, let's call it a kind of maximize for brute power option, which is just kind of make a beeline for whatever is just the most powerful AI you can.

And don't think about anything else. Okay, so I'm very skeptical that that's what we're going to wish. One common example that's given them this alignment is humans from evolution. And you have one line in your series that here's a simple argument for a risk. A monk should be careful before inventing humans. The sort of paper clipper metaphor implies something really banal and boring with regards to misalignment. And I think if I'm still manning the people who worship power,

they have the sense of humans got misaligned. And they had, they started pursuing things if a monkey was creating them. This is a weird analogy because obviously monkeys didn't create humans. But if the monkey was creating them, there's, you know, they're not thinking about bananas all day, they're thinking about other things. On the other hand, they didn't just make useless

stone tools and piled up up in caves in a sort of paper clipper fashion. There were all these things that emerged because of their greater intelligence, which were misaligned with evolution of creativity and love and music and beauty and all the other things we value by human culture. And the prediction maybe they have, which is more of an empirical statement than a philosophical statement is, listen, with greater intelligence, you're thinking about the paper clipper, even

if it's misaligned, it will be in this kind of way. It'll be things like that are alien to humans, but also alien in the way humans are aliens to monkeys, not in the way that paper clipper is alien to a human. Cool. So I think there's a bunch of different things to potentially unpack there. One kind of conceptual point that I want to name off the bat. I don't think you're necessarily kind of making a mistake in this vein, but I just want to name it as like a possible mistake in

this vicinity is I think we don't want to engage in the following form of reasoning. Let's say you have two entities. One is in the role of creator and one is in the role of creation. And then we're positing that there's this kind of misalignment relation between them, whatever that means, right?

And here's a pattern of reasoning that I think you want to watch out for is to say, in my role as creator or sorry, in my role as creation, say you're thinking of humans in the role of creation relative to an entity like evolution or monkeys or mice or whoever you could imagine inventing humans or something like that, right? You say, I'm qua creation. I'm happy that I was created and

happy with the misalignment. Therefore, if I end up in the role of creator and we have a structurally analogous relation in which there's misalignment with some creation, I should expect to be happy with that as well. There's a couple of philosophers that you brought up in the series, which if you read the works that you talk about, actually seem incredibly foresighted in anticipating something like a singularity, our ability to shape a future thing that's different, smarter, maybe better

than us. Obviously, yes, Lewis, abolition of man, we'll talk about it in a second is one example. But even here's one passage from Nietzsche, which I feel really highlighted this. Man is a rope stretched between the animal and the Superman, a rope over an abyss, a dangerous crossing, a dangerous

wayfaring, a dangerous looking back, a dangerous trembling and halting. Is there some explanation for why is it just somehow obvious that something like this is coming, even if you're thinking 200 years ago? I think I have a much better grip on what's going on with Lewis, then with Nietzsche there, so maybe let's just talk about Lewis. For a second, so there's a kind of version of the singularity that's specifically like a hypothesis about feedback loops with AI capabilities. I don't

think that's pressure in Lewis. I think what Lewis is anticipating, and I do think this is a relatively simple forecast, is something like the culmination of the project of scientific modernity. So Lewis is kind of looking out at the world and he's seeing this process of kind of increased understanding of a kind of the natural environment and a kind of corresponding

increase in our ability to kind of control and direct that environment. And then he's also pairing that with a kind of metaphysical hypothesis, or well, his stance on this metaphysical hypothesis, I think is like kind of problematically unclear in the book, but there is this metaphysical hypothesis naturalism, which says that humans too and kind of minds, beings, agents are a part of nature.

And so in so far as this process of scientific modernity involves a kind of progressively greater understanding of an ability to control nature, that will presumably, at some point, grow to encompass our own natures and kind of the natures of other beings that

in principle we could create. And Lewis views this as a kind of cataclysmic event and crisis, you know, part of what I'm trying to say in that in particular that it will lead to all these kind of tyrannical kind of behaviors and kind of tyrannical attitudes towards morality and stuff like that. And part of what I'm trying to, you know, unless you, unless you believe in non-naturalism or in some form of kind of Tao, which is this kind of objective

morality. So we can talk about that. And part of what I'm trying to do in that essay is to say, no, I think we can, we can be naturalists and also be kind of decent humans that remain in touch with a kind of a rich set of norms that have to do with like how do we relate to the possibility of kind of creating creatures altering ourselves, etc. But I do think, I do think is, yeah, it's like a relatively simple prediction. It's kind of science, master's nature, humans part in

nature, science, master's humans. And then you also have a very interesting other essay about supposed humans, like what should we expect of other humans, the sort of extrapolation if they

had greater capabilities and so on. Yeah, I mean, I think, I think an uncomfortable thing about the kind of conceptual setup at stake in these sort of like abstract discussions of like, okay, you have this agent, it fooms, which is this sort of amorphous process of kind of going from a sort of seed agent to a like super intelligent version of itself, often imagined to kind of preserve

its values along the way. A bunch of questions we can raise about that. But I think a kind of many of the arguments that people will often talk about in the context of reasons to be scared of a eyes, like, oh, like value is very fragile as you like fume, you know, kind of small differences in utility functions can kind of decorelate very hard and kind of drive in quite

different directions. And like, oh, like agents have instrumental incentives to seek power. And if they had, if it was arbitrarily easy to get power, then they would do it and stuff like that. Like, these are very general arguments that seem to suggest that they kind of, it's not just an AI thing, right? There's, I mean, it's like no surprise, right? It's talking about like, take a thing, make it arbitrarily powerful such that it's like, you know, God Emperor of the Universe or

something, how scared are you of that? Like, clearly, we should be really scared of that with humans, too, right? So, I mean, part of what I'm saying in that essay is that I think this is, in some sense, this is much more a story about balance of power, right? And about like maintaining a kind of a kind of checks and balances and kind of distribution of power, period, not just about, like, kind of humans versus a eyes and kind of the differences between human values and AI values.

Now, that said, I mean, I do think humans, many humans would likely be nicer if they foamed, then like certain types of AI. So, I mean, it's not, but I think the kind of conceptual structure of the argument is not, it's sort of a very open question, how much it applies to humans as well. I think one sort of the question I have is, I don't even know how to express this, but how confident are we with this ontology of expressing like, what are agents, what are capabilities?

How do we know this is the thing that's happening, or like, this is the way to think about what intelligences are? So, it's clearly this kind of very janky kind of, I mean, well, people maybe disagree about this. I think it's, you know, I mean, it's obvious to everyone with respect to real world human agents that kind of thinking of humans as having utility functions is, you know,

at best, a very lossy approximation of what's going on. I think it's likely to mislead as you amp up the intelligence of various agents as well, though I think LAs are my disagree about that. I will say, I think there's something adjacent to that that I think is like more real, that seems more real to me, which is something like, I don't know, my mom recently bought, you know, or a few years ago, she wanted to get a house, she wanted to get a new dog. Now she has both, you know,

how did this happen? What is the right, actually, it's good she tried, it was hard, she had to like, search for the houses, hard to find the dog, right? Now she has a house, now she has a dog. This is a very common thing that happens all the time. And I think, I don't think we need to be like, my mom has to have a utility function with the dog and she has to have a consistent valuation of all the houses or whatever. I mean, like, but it's still the case that her planning and her agency

exerted in the world resulted in her having this house, having this dog. And I think it is plausible that as our kind of scientific and technological power advances, more and more stuff will be kind of explicable in that way, right? That, you know, if you look and you're like, why is this man on the moon, right? How did that happen? And it's like, well, like, but there was a whole cognitive process, there was a whole like planning apparatus. And in this case, it wasn't like localized in a single

mind, but like there was a whole thing such that man on the moon, right? And I think like, we'll see a bunch more of that. And the AI's will be, I think like, doing a bunch of it. And so that, that's the thing that seems like more real to me than kind of utility functions. Hmm. So yeah, the man on the moon example, there's a proximal story of how exactly NASA engineer the the spacecraft to get to the moon. There's the more distal geopolitical story of why we send

people to the moon. And at all those levels, there's different utility functions clashing. Maybe there's a sort of like meta society role utility function. But the maybe the story there is like there's some sort of balance of power between these agents. And that's why there's a emergent thing that happens like the why we can't send things to the moon is not one guy. How do you

utility function? But like, I don't know, cold word dot dot dot things happened. Whereas I think like the alignment stuff is a lot about like assuming that one thing is a thing that will control everything. How do we control the thing that controls everything? Now, I guess it's not clear what you do to reinforce balance of power. Like it could just be that balance of power is not a thing that happens once you have things that can make themselves intelligent. But

that seems interestingly different from the the what how do we got to the moon story? Yeah, I agree. I think there's a few things going on there. So one is that I do think that even if you're engaged in this ontology of kind of carving up the world into different like agencies. At the least you don't want to kind of assume that they're all like unitary or like not overlapping or like like there's a hole. It's not like all right, we've got this agent let's carve out one

part of the world. Yeah, it's one agent over here. It's like this whole like messy ecosystem, like kind of teaming niches and this whole thing, right? And I think in discussions of AI, sometimes people slip between being like, well, an agent is anything that gets anything done, right? And they'll sort of they don't it could be like this weird moochie thing. And then sometimes they're like very obviously imagining like individual actor. And so that's like one

one difference. I also just think I think we should be really going for the balance of power. Like I think it is just like not good to be like, let's we're going to have a dictator. Who should take it to me? Like let's make sure we like make the dictator. Yeah, the right dictator. I'm like, whoa, no, you know, like let's, you know, I think the goal should be sort of we all

fume together. You know, it's like the whole the whole thing in this like kind of inclusive and pluralistic way in a way that kind of satisfies the values of like tons of stakeholders. Right. And and is this kind of at no point is there like one kind of single point of failure on all these things? Like I think that's what we should be striving for here. And I think and I think that's true of of of the human power aspect of AI. And I think it's true of the AI

part as well. Yeah. Hey everybody, here's a quick message from today's sponsor strike. When I started the podcast, I just wanted to get going as fast as possible. So you strike at list to register my LLC, create a bank account. I still use stripe now to invoice advertisers and accept their payments monetize of this podcast. Stripe serves millions of businesses, small

businesses like mine, but also the world's biggest companies Amazon hurts Ford. And all these businesses are using stripe because they don't want to deal with the Byzantine web of payments where you have different payment methods in every market and increasingly complex rules regulations are king legacy systems. Stripe handles all of this complexity in abstracts it away. And they can

test and iterate every pixel of the payment experience across billions of transactions. I was talking with Joe about paper clippers and I feel like stripe is the paper clipper of the payment industry where they're going to optimize every part of the experience for your users, which means obviously higher conversion rates and ultimately as a result higher revenue for your business. Anyways, you can go to stripe.com to learn more and thanks to them for sponsoring this episode

back to Joe. So there's interesting intellectual discourse on let's say right wing decided to debate where they asked themselves traditionally we favor markets, but now look where our society is headed. It's misaligned in the ways we care about society being aligned, like fertility is going down, family values, religiosity, these things we care about. GDP keeps going up these things don't seem correlated. So we're kind of grinding through the values

we care about because of increased competition. And therefore we need to intervene in a major way. And then the pro market libertarian fashion of the right will say look I disagree with the correlations here, but even at the end of the day like fundamentally my point is or their point is liberty is the end goal. It's not the it's not like what you use to get to higher fertility or

something. I think there's something interestingly analogous about the AI, a competition grinding things down like obviously you don't want the gray goo, but like the libertarians versus the strad I think I think there's something analogous here. Yeah, so I mean I think one one thing you could think which doesn't necessarily need to be about gray goo it could also just be about alignment is something like sure it would be nice if the if the AI's didn't violently disempower humans.

It would be nice if the AI's otherwise when we created them kind of their integration into our society led to good places. But I'm uncomfortable with like the sorts of interventions that people are contemplating in order to ensure that sort of outcome. Right. And I think there's a bunch of

things to be uncomfortable about that. Now that said so for something like everyone being killed or violently disempowered that is traditionally something that we think if it's real and obviously we need to talk about whether it's real, but in the case where it's a real threat we often think that quite intense forms of intervention are warranted to prevent that sort of thing from happening.

Right. So if there was actually a like terrorist group that was planning to eat it, it was like working on a bio weapon that was going to kill everyone or 99.9% of people. We would think that warrants intervention that you just shut that down. Right. And now even if you had a group that was doing that unintentionally imposing a similar level of risk that's not that I think many many people if that's the real scenario will think that that that's

warrants kind of quite intense preventative efforts. Right. And so obviously people you know these sorts of risks can be used as an excuse to expand state power. Like there's a lot of things to be worried about for different types of like contemplated interventions to address certain types of risks. You know, I think we need to just I think there's no like royal road there. You need to just like have the actual good epistemology. You need to actually know is this a real risk?

What are the actual stakes? And you know, look at a case by case and be like, is this you know, is this warranted? So that's like one point on the like takeover literal extinction thing. I think the other thing I want to say so I talk in the piece about this distinction between the like let's at least have the AIs who are kind of minimally law abiding or something like that. Right. Like we don't have to talk about that there's this question about servitude and question about like

other control over AI values. But I think we often think it's okay to like really want people to like obey the law to uphold basic cooperative arrangements stuff like that. I do though want

to emphasize. I think this is true of markets and true of like liberalism in general just how much these procedural norms like democracy free speech, you know, property rights, things that people really hold dear, including myself, are in the actual lived substance of kind of a liberal state undergirded by all sorts of kind of virtues and dispositions and like character traits in the citizenry. Right. So like these norms are not robust to like arbitrarily vicious

citizens. So you know, like I want that to be free speech, but I think we also need to like raise our children to value truth and to know how to have real conversations and and you know, I want there to be democracy, but I think we also need to raise our children to be like compassionate and decent. And I think it's sometimes we can lose sight of that aspect. And I think anyway, but I think like bringing that to mind now that's not to say that should be the project of state power.

Right. But I think like understanding that it's liberalism is not this sort of like ironclad structure that you can just like hit go you give like any any citizenry and like hit go and you'll get something like flourishing or even functional, right. You need there's like a bunch

of other softer stuff that like makes this whole project go. Maybe zooming out. What was the one question you could ask is I think the people who have I don't know if Nick Land would be a good subject here, but somebody is people who have a sort of fatalistic attitude towards us. Alignment as a as the thing that can even make sense, they'll say things like look the thing, the kinds of things that are going to be exploring the black hole, the center of the galaxy,

the kinds of things that go visit and draw mud or something. Did you really expect them to privilege whatever inclinations you have because you grew up in the African savanna and that what the evolutionary pressures were 100,000 years ago, right. Like you of course are going to be like weird. And like yeah, like what did you think was going to happen. I do think the even good

futures will be weird. You know, I think and I want to be clear when I talk about kind of like finding ways to ensure that kind of the integration of AI's into our society leads to good places, I'm not imagining like I think sometimes people think that this project of wanting that and especially to the extent that that makes some deep reference to human values involves this like kind of

short-sighted, perocular like imposition of like our current, yeah, unreflective values. So it's just like yeah, we're going to have like, I don't know, like I think they sort of imagine this this that we're forgetting that we too, there's a kind of reflective process and a kind of a moral progress dimension that that that we want to like leave room for, right. You know, like whatever Jefferson has that has this line about like, ah, you know, just as you wouldn't want to like

force a man, a grown man into like a younger man's coat. So we don't want to like chain civilization to like a barbers pass or whatever, like everyone should agree on that, including and the people who are interested in alignment also agree on that. So obviously there's a concern that people like don't engage in that process or that something shuts down the process of reflection, but I think everyone agrees we want that. And so that will lead potentially to something that is quite

different from our current conception of what's what's valuable. And there's a question of how different. And I think there are also questions about what exactly are we talking about with reflection. I have an essay on this where I think this is not, I don't actually think there's a kind of off-the-shelf pre-normative notion of reflection that you can just be like, oh, obviously you take an agent, you stick it through reflection, and then you get like values, right? Like, no, there's a bunch of

types of reflect. I mean, I think that really there's just a bunch of, there's like a whole pattern of empirical facts about like taking agent, put it through some process of like reflection, all sorts of things, ask it questions, there's like also, and then that'll go in all sorts of directions for a given empirical case. And then you have to look at the pattern of outputs and be like, okay, what do I make of that? But overall, I think we should expect like even the good futures,

I think will be quite weird. And they might even be incomprehensible like to us. I don't, I don't think so. Like, I mean, there's different types of incomprehensible. So say I show up in the, in the future, and says all computers, right? I'm like, okay, all right. And then they're like, we're up, we ran, we're running like creatures on the computers. I'm like, okay, so I have to somehow get in there and see like what's actually going on with the computers or something like that.

Maybe I can actually see, maybe I actually understand what's going on in the computers, but I don't yet know what values I should be using to evaluate that. So it can be the case that you don't us if we showed up would not be very good at like recognizing goodness or badness. I don't think that makes it insignificant though. Like, suppose you show up in the future and it's like, it's got some answer to the Riemann hypothesis, right? And you can't tell whether that answer's right.

You know, maybe the civilization like went wrong. This is still an important difference, right? It's just that you can't track it. And I think something similar is true of like worlds that are genuinely expressive of like what we would value if we engaged in like processes of reflection that we endorse versus ones that have kind of like totally veered off into something meaningless. I think like one thing I've heard people who are skeptical of the Scientology would be like,

all right, what do you even mean by alignment? And obviously the very first question we answer do you express like, here's different things that could mean, do you mean balance of power, do you mean somewhere between like that and dictate or whatever. Then there's another thing which is separate from the AI discussion. Like, I don't want the future to contain a bunch of torture. And like, it's not necessarily like a technical, I mean, like part of it might involve technically

aligning a GPT-4, but it's like that's not what it, you know what I mean? Like that's like a proxy to get to like that future. The sort of question then is, what we really mean by alignment is it just like whatever it takes to make sure the future doesn't have a bunch of torture. Or do we mean like what I really care about is in a thousand years, things that are like that are like clearly my descendants, not like some thing where I like I recognize they have their own order whatever.

It's like no, no, it's like if it was like my grandchild, it's like that level of descendant is controlling the galaxy. Even if they're not conducting torture. And I think like what some people mean is like our intellectual descendants should control the light cone even if it's like even if the other kind of factual doesn't involve a bunch of torture. Yeah, so I agree. I mean, I think there's a few different things there, right? So there's there's kind of what are you going for? You're going

for like actively good, you're going for avoiding certain stuff, right? And then there's a different question which is what counts as actively good according to you. So maybe some people are like the only things that are actively good that are like my grandchildren or I don't know like some like literal descending genetic line from me or something. I'm like, well, that's not that's not my thing. And and I don't think it's really what most people have in mind when they talk about goodness.

I mean, I think there's a conversation to be had like a and obviously in some sense when we talk about a good future, we need to be thinking about like what are all the stakeholders here and how does it all fit together. But I think, yeah, when I think about it, I'm not assuming that some there's some notion of like descendants or like some like I think there's a kind of the thing that matters about the kind of lineage is this whatever's required for kind of the kind of optimization

processes to be in some sense pushing towards good stuff. And there's a kind of concern that that is is kind of currently a lot of what is sort of making that happen is kind of lives in human civilization in some sense. And so we don't know exactly what there's some kind of seed of goodness that we're carrying in different ways or you know, different people, there's different notions of goodness for different people maybe, but there's some sort of seed that is currently like

here that we have that is not sort of just in the universe everywhere. It's not just going to crop up if you if you just sort of die out or something. It's something that is in some sense contingent to our civilization or at least that's the picture we can talk about whether that's right. And so I think the sense in which kind of stories about good futures that have to do with alignment are kind of about descendants. I think it's more about like whatever that seed is,

how do we kind of carry it? How do we how do we keep the like life thread alive? Yeah. Going and going into. But then I'm like what could accuse like sort of the alignment community of like a sort of modern belly of like the the mod is we just want to make sure that GPTA doesn't kill everybody. And after that, it's like all you guys, you know, we're all cool.

But then like the real thing is we are fundamentally pessimistic about historical processes in a way that doesn't even necessarily implicate AI alone, but just like the nature of the universe. And we want to do something about to make sure like the nature of the universe doesn't take a hold on humans. You know what I like where things are headed. So if you look at Soviet Union, the collectivization farming and the disempowerment of the Kulaks was not as a practical matter

necessary. In fact, it was extremely counterproductive. It almost brought down the regime. And it obviously killed millions of people, you know, cause a huge famine.

But it was sort of ideologically necessary in the sense that like you have we have an ember of something here and we got to make sure that on clay of the other thing doesn't it does have like it's a sort of like if you have raw competition between the Kulak type capitalism and what we're trying to build here, the gray goo of the cap the Kulaks will just like take over, right? And so like we have this ember here. We're going to like do worldwide revolution from it.

I know that obviously that's not exactly the kind of thing alignment has in mind, but like we have an ember here and like we got to we got to make sure that this other thing that's happening on the side doesn't you know sort of food. I obviously that's not how they were phrased it, but like

get it get it told on what we're building here. And that's maybe the worry that people who are opposed to live and have is like you mean the second kind of thing like the kind of thing that you know maybe Stalin like was worried about even though obviously wouldn't end or send the like this specific thing things he did. When people talk about alignment they have in mind a number of different types of goals, right? So one type of goal is quite minimal. It's something like

that the AI's don't kill everyone that they were kind of violently disempower people. Now there's a second thing people sometimes want out of alignment, which is much broader, which is something like we would like you to be the case that our AI's are such that when we incorporate them into our society things are good, right? That we just have a good future. I do agree that I think the discourse about

AI alignment mixes together these two goals that I mentioned. That sort of most straightforward thing to focus on. And I don't blame people for just talking about this one is just the first one. When we think about like in which context is it appropriate to try to exert various types of control or to kind of have more of what I call in the series Yang, which is this kind of active kind of controlling force as opposed to Yin, which is this more kind of receptive open letting go.

A kind of paradigm context in which we think that is appropriate is if something is a kind of active aggressor towards against like the sort of boundaries and cooperative structures that we've created as a civilization, right? So I talk about the Nazis or in the piece it's sort of like when you sort of invade, there's something is invading. We often think it's appropriate to like fight back,

right? And we often think it's appropriate to like set up structures to kind of prevent and kind of ensure that these these basic norms of kind of peace and harmony are kind of adhered to. And I do think some of the kind of moral heft of some parts of the alignment discourse comes from drawing specifically on that aspect of our morality, right? So we think the AI's are presented as aggressors that are coming to kill you. And if that's true, then it's quite appropriate,

I think to like really be like, okay, we it is kind of that's classic human stuff. Almost everyone recognizes that kind of self-defense or like ensuring kind of basic norms are adhered to is a kind of justified use of like certain kinds of power that would often be unjustified in other context. So self-defense, you guys a clear example there. I do think it's important though to separate that concern from this other concern about where does the future eventually go? And how much do we

want to be kind of trying to steer that actively? So to some extent, I wrote the series partly in response to the thing you're talking about, which is I think it is true that aspects of this discourse involve the possibility of like trying to grip, like I think trying to kind of steer and grip and like kind of rent, you have the sense that universe is about to kind of go off in some direction and

you need to get. And you know, people notice that muscle. And part of what I want to do is like, well, we have a very rich ethical human ethical tradition of thinking about like what when is it appropriate to try to exert what sorts of control over which things? And I want that to be, I want us to bring the kind of full force and richness of that tradition to this discussion. And not like I think it's easy if you're purely in this abstract mode of like utility functions,

human utility function, and it's like this competitive thing with utility function. It's like somehow you lose touch with the kind of complexity of how we actually like we've been dealing with kind of differences in values, but and kind of competitions for power. This is classic stuff. Right. And I don't actually think that I think the AI sort of amplify a lot of the kind of dynamics,

but I don't think it's sort of fundamentally new. And so part of what I'm trying to say is like, well, let's draw on our full on the full wisdom we have here while obviously adjusting for like ways in which things are different. So one of the things the Ember analogy brings up and getting a hold of the future is we're going to go explore space. And that's where we expect most of the things that will happen, most of the people that will live, it'll be in space. And I wonder

how much of the highest stakes here is not really about AI per se, but it's about space. It's a coincidence that we're developing AI at the same time we are like on the cusp of expanding through most of the stuff that exists. So I don't think it's a coincidence in that I think the essentially the way we would become able to expand or they kind of most certainly way to me

is via some kind of radical acceleration of our of our, or sorry, let me clarify. Then like the stakes here, like if this is just a question of do we do AGI and explore the solar system and there was nothing beyond the solar system, like we food and weird things might happen to the solar system we get it wrong. Compared to that, billions of galaxies has a different sort of that's what's a stake. I wonder how much of the discourse is hinges on the stakes because of the space.

I think for most people, very little, I think people are really like what's going to happen to this world, right? This world around us that we live in as we, and what's going to happen to me and my kids. So I don't actually think some people spend a lot of time on the space stuff, but I think for the most immediately pressing stuff about AGI doesn't require that at all. I also think

like even if you bracket space like time is also very big. And so whatever we've got like 500 million years, a billion years left on earth if we don't mess with the sun and maybe you could get more out of it. So I think there's still, that's a lot. And then I guess, but yeah, I don't know if it fundamentally changes the narrative. Obviously the stakes in so far as you care about what happens in the future or in space, then the stakes are way smaller if you shrink down to

the solar system. And I think that does change potentially some stuff in that like a really nice future of our situation right now, depending on what the actual nature of kind of resource pie is, is that I think, you know, in some sense, there's such an abundance of energy and other

resources and principle available to a kind of responsible civilization that really just tons of stakeholders, especially ones who are like able to kind of saturate, get like really close to like amazing, according to their values, with like kind of comparatively small allocations of

resources or something like we can just, you know, I sort of, I kind of feel like everyone who has like, kind of, stationable values who will be like, really, really happy with like some like small kind of fraction of the available pie, we should just like, satiate all sorts of stuff,

right? And obviously you need to do like, you know, figure out gains from trade and balance and like very, there's like a bunch of complexity here, but I think in principle, you know, we're in a position to create a really wonderful, wonderful scenario for just tons and tons of different value systems. And so I think correspondingly we should be really interested in doing that, right? And you know, so I sometimes use this heuristic in thinking about the future, you know, I think we

should be aspiring to really kind of leave no one behind, right? Like really find like, who are all the stakeholders here? How do we really have like a fully inclusive vision of like how the future could be good from a very, very wide variety of perspectives? And, and I think the kind of vastness of space resources like makes that a lot, makes that very feasible. And now, now if you, if you

instead imagine it's a much smaller pie, well, maybe, maybe you face a tougher trade offs. And so I think that's like an important dynamic is the inclusivity because of part of your values includes your different potential futures getting to play out. Or is it because I'm, I'm certain about which the right one is. So let's let's make sure we're not nulling out the possible. If it's if you're wrong, they're not nulling out all value. I think it's a bunch of things at once. So

yeah, I'm just, I'm really into being nice when it's cheap, right? Like I think if you can, if you can just help someone a lot in a way that's really cheap for you, do it, right? Or like, I don't know, I mean, obviously you need to think about trade-offs and there's like a lot of people in principle you could be nice to, but I think like the principle of like the nice when when it's cheap, I'm like very excited to try to uphold. I also really hope that kind of other people uphold that

with respect to me, including the AIs, right? Like I think we should be kind of golden ruling. Like we're thinking about, oh, we're going to inventing these AIs. Like I think there's some way in which I'm trying to like kind of embody attitudes towards them that I like hope that they would embody towards me. And that's like some, it's unclear exactly what the ground of that is, but that's something, you know, I really like the golden rule. And I think, and I think a lot about

that as a kind of basis for treatment of other beings. And so I think like be nice when it's cheap is like a, if you think about it, if everyone implements that rule, then we get potentially like a big kind of prey to improvement or like, so I don't know exactly prey to improvement. It's like good deal. It's a lot of a lot of good deals. And yeah, so I think it's that I'm just into pluralism.

I've got uncertainty. You know, there's like all sorts of stuff swimming around there, but and then I think also just as a matter of like having kind of cooperative and kind of good balances of power and deals and kind of a warning conflict, I think like finding ways to just

set up structures that lots and lots of people and value systems and agents are happy with, including non-humans, you know, people in the past, AI's, animals, like I really think we should be like, we should have very broad, broad sweep in thinking about what sorts of inclusivity we want to be kind of reflecting in a kind of mature civilization and kind of setting ourselves up for doing that.

Okay, so I want to go back to the much in our relationship with these AI's B because pretty soon we're talking about our relationship to superhuman intelligences if we think such a thing is possible. And so there's a question of what is the process you get used to get there and the morality of gradient dissenting on their minds, which we can address later. The thing that gives personally me the most unease about alignment, quote unquote, is at least part

of the vision here sounds like you're going to enslave a god. And like there's just something like that's that's feels so wrong about that. But then the question is like, if you don't enslave the god, like obviously the god's going to have more control or you okay with you're going to

surrender most of the most of everything. Obviously, you know what I mean, even if it's like a cooperative relationship you have, I think we as a civilization are going to have to have a very serious conversation about what sort of kind of servitude is appropriate or inappropriate in the context of AI development. And I think we there are a bunch of disanalogies from human slavery that I think are important. You know, in particular, a.i.s might not be moral patients at all in which

case, you know, so we need to figure that out. There's, you know, there are ways in which we may be able to kind of, you know, have kind of motivation like slavery involves all this like suffering and kind of non consent. And there's all these like specific dynamics involved in human slavery. But I think like, and so some of those may not may or may not be present in a given case with AI.

And I think that that's that's important. But I think overall like we are going to need to stare hard at like right now the kind of default mode of how we treat AI's gives them no moral consideration at all, right? We were we were thinking of them as as property as tools as products and designing them to be assistance and stuff like that. And I think, you know, no, there has been no official communication from any AI developer as to when under what circumstances that that would

change, right? And and so I think there's a there's a conversation to be had there that we we need to have. And so and and I think there's a bunch of yeah, so there's a bunch of stuff to say about that. I want to push back on the notion that there's sort of two options. There's like enslaved God whatever that is and like loss of control. Yeah. And I think like we can do better than that. Right? Like let's let's work on it. Let's try let's try to do better, especially, you know, sort of

I think we can I think we can do better. And I think it might require being thoughtful and I might require being kind of having, you know, a kind of mature discourse about this before we start taking like irreversible moves. But I'm optimistic that we can at least avoid like some of the connotations and a lot of a lot of the stuff that's taken that kind of that kind of binary with respect

to how we treat the AIs. So I have a couple of contradicting intuitions and the difficulty with using intuitions in this case is obviously it's not clear what reference class and AI we have control over is. So to give one that's very scared about the things we're going to do to these things.

If you read about like life under Stalin or Mao, it's if you're there's one version of telling it, which is actually very similar to what we're what we mean by alignment, which is we do these like black box experiments about like we're going to make a thing that it can defect. And if it does, we know it's misaligned. And if you Mao the 100 Flowers campaign where you know, let 100 Flowers boom, I'm going to allow criticism of my regime so on. And that's

lasted for a couple of years. And afterwards everybody who did that, that was a way to find the quote, quote, the snakes before the right is or secretly hiding and you know, will like purge them. The sort of paranoia of defectors like anybody in my anybody in my entourage, any of my regime they could like they could be a secret capitalist trying to bring down the regime. That's the one way of talking about these things which is very concerning. Is that the correct

reference class? I certainly think concerns in that vein are real. I mean, I think if you it is disturbing how easy many of the analogies with kind of human historical events and practices that we kind of deploy or at least have a lot of weariness towards are as in the context of the kind of way you end up talking about kind of AI maintaining controls or control over AI like making sure that it doesn't rebel like I think we should we should be noticing the kind of reference

class that some of that talk starts to conjure. And so basically just yes, I think we should be very we should really notice that. You know, part of what I'm trying to do in the series is to bring the kind of full range of considerations at stake and to play right like I think it is both the case that like you there we should be quite concerned about like being kind of overly controlling

or you know abusive or oppressive or there's all sorts of ways you can go too far. And I think you know, there are concerns about the AI is being genuinely dangerous and genuinely you know, acting, you know, killing us, finally overfaring us. I think and I think the moral of situation is quite complicated and then I think in some sense so often when you're when you're you're when you imagine a sort of external aggressor who's coming in and and invading you

you feel very justified in doing like a bunch of stuff to prevent that. It's like a little bit different when you're like inventing the thing and you're doing it like unconsciously or something and then you're also like I think this sort of moral justification you have for like there's a different vibe in terms of like the kind of overall yeah, justificatory stance you might have for various types of like more kind of power exerting interventions. And so like that's like one

one one one feature of the situation. The opposite perspective here is that you're doing this sort of vibes based reasoning of like, ah, that looks yucky of like doing reading descent on these minds. And in the past a couple of references a couple of similar cases might have been something like environmentalists not liking nuclear power. And because the vibes of nuclear don't look green

but obviously that said back the cause of fighting climate change. And so the end result of like a future you're proud of a future that's appealing is said bad because like your vibes about we would be wrong to brainwash the human but you're trying to apply to a disanalogous case where

that's not as relevant. I do think there's a concern here that I you know I really try to foreground in the series that that I think is related to what you're saying which is something like um you know you might be worried that that we will be very gentle and nice and free with the

AIs and then they'll kill us. You know they'll take advantage of that and then it will it will have been like a catastrophe right and and and and and and and I you know so I open the series basically with an example that I'm really I'm really trying to conjure that possibility at the

same time as conjuring the uh grounds of gentleness and and the sense in which it is it is also the case that these AIs could be they can be both be like others moral patients like this sort of um new species in the sense of that should conjure like wonder and reverence and such that they will

kill you um and so and so I have this example of like either the documentary grizzly man um where there's this environmental activist Timothy Treadwell and he aspires to approach these grizzly bears he lives you know in the summer he goes into Alaska and he lives with these

grizzly bears and and he aspires to approach them with this like gentleness and reverence he doesn't use bear mace or he doesn't like carry bear mace he doesn't use a fence around his camp um and and he gets eaten alive by the bears or one of these bears and um you know so I uh and I kind of really wanted to foreground that possibility in the series like I think we need to be talking about these things both at once right and and the bears can be bears can be moral patients

right AIs can be moral patients not these are moral patients enemy soldiers have souls right and so I think we need to um learn the art of kind of hawk and dove both like kind of there's this like dynamic here that we need to be able to hold both sides of um as we as we kind of go into these

trade-offs and these dilemmas and and and all sorts of stuff and like a lot of part of what I'm trying to do in the series is like really kind of bring it all to the table at once I think the big crux that I have for like if I today was to massively change my mind about what should be done

is just the question of how weird by default things end up how alien they end up and a big part of that story is the you made a really interesting argument on your block post that if moral realism is correct that actually makes an empirical prediction which is that the aliens

the AIs whatever should converge on the right morality the same way that they converge on the right mathematics um at that this is a really interesting point but there's another prediction that moral realism makes which is that over time society should become more moral become better

and to the extent that we think that's happened of course there's the problem of what morals do you have now well it's the ones that society has been converging towards over time but to the extent that has happened one of the predictions of moral realism has been confirmed which means should we update in favor of moral realism one thing I want to flag is I don't think all forms of moral realism make this prediction yeah um and so that's just one one point I'm happy to talk about the different

forms I have in mind um I think there are also forms of kind of quasi things that kind of look like moral anti-realism at least in their metaphysics according to me but which just posit that in fact there's this convergence um it's not in virtue of interacting with some like kind of mind independent

moral truth but just like as it's just for some other reason it's the case that and that looks like a lot like more realism at that point because it's kind of like oh it's really universal like everyone ends up here and it's kind of tempted to be like ah like why right is that and then whatever answer for the why is a little bit like is that is that the doubt is that the nature of the doubt even if there's not sort of an extra metaphysical realm in which the moral lives or something so um

uh yeah so so moral convergence I think is sort of a different factor from like the existence or non-existence of um kind of non-natural like a kind of morality that's not reducible to natural facts which is the type of moral realism I usually consider um now okay so does the improvement of

society is that an update towards moral realism I mean I guess like uh so I maybe it's like a very weak update or something like I guess I'm kind of like which which view like predicts this hard I guess it feels to me like moral anti-realism is like very comfortable with the um observation of the

like people with certain values have those values well yeah so there's obviously this like first thing which is like any if you're the culmination of some process of moral change then it's very easy to look back at that process and be like more progress like the arc of history bends towards me

you can look more like if it was like if there was a bunch of dice rolls around the along the way you might be like oh wait that's not ration that's not the march of reason um that's so there's there's still like empirical work you can do to tell whether that's what's going on um

um but I also think it's just you know on moral anti-realism I think it's just still possible say like consider Aristotle and us right and we're like okay how's there been moral progress by Aristotle's lights or something you know does uh uh and our lights too right um and you could

think uh doesn't isn't that a little bit like moral realism it's like these hearts are singing in harmony that's some moral realist thing right the anti-realist thing the hearts all go different directions but you and Aristotle apparently like are both excited about the kind of uh

march of history um open some open question about whether that's true like what are Aristotle's like reflective values right um suppose it is true I think that's fairly explainable in moral anti-realist terms you can say roughly that like yeah you and Aristotle are sufficiently similar

and you endorse sufficiently similar kind of reflective processes um and those processes are in fact instantiated in the march of history that um yeah you know history has been good for both of you um and I don't think that's I you know I think there are worlds where that isn't the case and so

so I think there's a sense in which um maybe that that prediction is more likely for for realism than anti-realism but I don't it doesn't like move me move me very much one thing I wonder is look there's I don't know if moral realism is the right word but the thing you mentioned about

there's something that makes hearts converge to the thing we are or the thing we upon reflection would be and even if it's not something that's like instantiated in a realm beyond the universe it's it's like a force that exists that acts in a way we're happy with to the extent that

doesn't exist and you let go the reins and then you get the paper clippers feels like we were doomed a long time ago in the sense of yeah I just different utility functions banging against each other and some of them have a prokyl preferences but like you know the it just

combat and some guy one um whereas in the world where like no this is this is the thing like these are where the hearts are supposed to go or uh it's only by catastrophe that they don't end up there that's sort of that feels like the world where like really matters and in that world the worry the

first initial question I asked is like what would make us think that alignment was a big mistake in the world where the hearts just naturally end up towards like the thing the what we want maybe it takes an extremely strong force to push them away from that and that extremely strong

forces you solve technical alignment and just like no yeah you're the the the the blinders on the horses eyes so like in the world where like the worlds that really matter were like uh this is where the hearts want to go in that world maybe alignment is well well well well fucks us up on

this question of kind of do the worlds where there's not this kind of convergent moral force whether kind of metaphysically inflationary or not matter or are those the only worlds that matter or so sorry um maybe what I meant was in those worlds like you're kind of fucked it's like yeah the

uh maybe the world's without that the world's where there's no doubt yeah yeah let's use the term doubt for like this kind of convergent morality uh over the course of millions of years like it was gonna go somewhere uh one way or another it wasn't gonna end up your your particularly to the

function okay well let's let's distinguish between um ways you can be doomed one way is kind of um philosophical so you could you could be the sort of moral realist with you know or kind of realistish person of which there are many who have the following intuition they're like if not

moral realism then nothing matters right it's dust and ashes it is my metaphysics and and or like normative view or the void right um and I think this is a common a common view I think Derek Parfitt really some comments of Derek's Parfitt's uh suggestive view I think lots of moral realists will

like um uh kind of profess this view Ali Azov your Kowski I think that there are sort of some sense in which I think his early thinking was inflected with this sort of thought um he later recanted very hard um so I think this is importantly wrong uh and so here's my here's the case I have

an essay about this is called against the normative realists wager um and and here's uh the case that convinces me so imagine that a um metaphical fairy appears before you right and this fairy knows whether there is a doubt and the fairy says okay I'm gonna offer you a deal if there is a

doubt then I'm gonna give you a hundred dollars if there isn't a doubt then I'm going to burn you and your family and a hundred innocent children alive right um okay so so claim don't take this deal right this is a bad deal you're holding hostage your commitment to to not being burned alive

or like you're care for that to this like abs truce like basically you're um uh yeah like I I think I mean I go through in the essay of one two different ways in which I think this is wrong but I think just like and I think these people who kind of pronounce they like moral realism were

the void like they don't actually think about that's like this and like no no okay so really like does that what you want to do um and uh uh no I think we should we should I I still care about my value my sort of allegiance to my values I think is kind of um outstrips uh the my like

commitments to to like various like metaphysical interpretations of my values I think like we should um the the the sense in which we like care about not being burned alive as much more solid than like are uh kind of you know um then the reasoning and on what matters um okay so that's that's this

that's like the sort of philosophical doom right now you could have this it's not like you you were also just trying at at a sort of empirical doom right which is like okay dude if it's all if it's just going in a zillion directions come on you think it's going to go in your direction

like there's going to be so much churn um like you're just going to lose um and uh and so uh you know you should give up now and and kind of only only fight for the for the the realism worlds there I'm like I mean so I think you know you got to do the expected value calculation you got

to like actually have a view of that like how doomed are you in these different worlds what's the tractability of changing different worlds I mean I'm quite skeptical of that um uh but that's a kind of empirical uh claim I was I'm also just like kind of low on this like everyone

converges things so so you know if you imagine like you know you train a chess playing AI or uh uh you have a real paper clipper right you like somehow you had a real paper clipper and then you're like okay you know go and reflect um based on my like understanding of like how moral reasoning works

like if you look at the type of moral reasoning that like analytic ethicists do right it's it's just reflective equilibrium right they just like take their intuitions and they um systematize them right um I don't see how that process gets a sort of injection of like the the kind of

mind and dependent moral truth or like I guess it like if you sort of start with like only all of your intuition say to maximize paper clips I don't see how you end up maximizing or like doing some like rich human morality I just don't like um it doesn't look to me like that's how human

ethical reasoning works I think like most of what normative philosophy does is um make consistent and kind of systematize pre theoretic intuitions um and so uh and I mean I think we we'll get evidence about this like you know in some sense I think this view predicts like you know you keep trying to

train the a i's to like do something and they keep being like no I'm not gonna like yeah do that is like no that's not good or something they keep like push it back like they're the sort of momentum of like AI cognition is like always in the direction of this like moral truth and

whenever we like try to push it in some other direction we'll find kind of resistance from like the rational structure of things so sorry actually I've heard from researchers who are doing alignment that like for red teaming inside these companies they will try to red team a base model so it's not

been our leisure have to just like predict next token the the raw crazy whatever shoghaz and they try to get this thing to hey help me make a bomb help me whatever and they say that it will like it's it's odd how hard it tries to refuse even before it's been our leisure I mean look I it will

be a very interesting fact if it's like man we keep training the a these AI is in all sorts of different ways like we're doing all this crazy stuff and they keep like acting like bourgeois liberal like wow like that's or you know so they keep like really prefer they keep professing this like

weird alien reality they all converge on this one thing they're like can't you see it's like zorgo like zorgo is the thing and like all the a eyes you know interesting very interesting I think my personal prediction is that that's not what we see and my actual prediction is that the ais

are going to be very malleable like we're going to be like you know if you push an AI towards evil like it'll just go and and I think that's obviously I it was sort of reflexively consistent evil I mean I think there's also a question with some of these ais it's like

will they even be consistent in their values right I do think like a thing we can do so the I like this image of the blinded horses and I like this image of like maybe alignment is going to mess with the I think we should be really concerned if we're like forcing facts on our ais right like that's

like a really bad because like I think one of the clearest things about human processes of reflection like the kind of easiest thing to be like let's at least get this is like not acting on the basis of a of an incorrect empirical picture of the world right and so if you find

yourself like asking her a by the way like this is true and I need you to always be reasoning as though blah is true um I'm like ooh I think that's a no no from an anti-realist perspective too right because I want I want to like my reflective values I think will be such that I form them in

light of the truth about the world and so I think and I think this is a real concern about as as we move into this era of kind of aligning a eyes I don't actually think this like binary between like values and other things is going to be a very obvious in how we're training them I

think it's going to be much more like ideologies and like you can just train an AI to like output stuff right output utterances and so you you can easily end up in a situation where you like decided that blah is true about some issue um an empirical issue right not a moral issue and uh

so like I think I think people should not for example I do not think people should hard code belief in God into their ais or like I would advise people to not hard code their religion into their ais if they also want to like discover if their religion is false yeah I would just

in general if you if you would like to have your behavior be sensitive just whether something is true or false like it's sort of generally not good to like etch it into things um and so and so and so that is definitely a form of binder I think we should be really watching out for and I'm

kind of hopeful so like I have enough credence on some sort of moral realism that like I'm hoping that if we just do the anti realism thing of like just being consistent learning all the stuff reflecting like I don't if you look at how like moral realism or anti realists actually do

normative ethics it's the same it's basically the same there's like some amount of like different curious sticks on like things like properties like simplicity and stuff like that but I think it's like they're mostly just doing the same game and so I'm kind of hoping that and also meta ethics is

itself a discipline that ais can help us with I'm hoping that we can just figure this out either way so if there is if moral realism is somehow true I want us to be able to notice that um and I want us to be able to like adjust accordingly so I'm not like writing off those worlds and be like let's

just like totally assume that's false but the thing the thing I really don't want to do is write off the other worlds where it's not true because I'm I guess it's not true right and I think stuff still matters a ton in in those worlds too so blended crux is like okay you're training these models

we we're in this incredibly lucky situation where we it turns out the best way to train these models is to just give them everything humans ever said written thought and also these models the reason they get intelligence is because they can generalize right like thinking brok what is it what is it

just of things so are we fundamentally very should we just expect this to be situation which leads to alignment in the sense of how how exactly does this thing that's trained to be in a magnumation of human thought become a paper clipper the thing you kind of get for free is it's

an intellectual descendant the paper clippers not an intellectual descendant whereas the AI which understands all the human concepts but then gets stuck on some part of it which we aren't totally comfortable with is like you know it it's it feels like an intellectual descendant in the

way we care about I'm not sure about that I'm not sure I I'm not sure I do care about a notion of intellectual descendant in that sense like the imagine I mean literal paper clips is a human concept right so I don't think any old any old human concept will will do for the thing the thing

we're excited about I think the stuff that I would be more interested in the possibility of getting for free are things like consciousness pleasure sort of other features of human cognition like I think so there there are paper clippers and there are paper clippers right so imagine if

the paper clipper is like an unconscious kind of voracious machine and it's just like it appears to you as a cloud of paper clips you know but and there's enough sort of that's like one vision if you imagine the paper clippers like a conscious being that like loves paper clips right it like

takes pleasure in making paper clips that's like a different thing right and obviously it could still it's not necessarily the case that like you know it makes the the future all paper clipy is probably not optimizing for conscious and surplusure right it cares about paper clips maybe

eventually if it's like suitably certain it like it turns itself into paper clips and who knows but like it's still I think a different it's actually a somewhat different moral kind of mode with respect that that looks to be much more like a you know there's also questions like does it

doesn't try to kill you and stuff like that but I think I think that the there are kind of features of the agents we're imagining other than the kind of thing that they're staring at that can matter to our sense of like sympathy similarity and yeah and I think people have different views about

this so so one one possibility is that human consciousness like the thing we care about in consciousness or sentience is super contingent and fragile and like most minds most like kind of smart minds are not conscious right it's like the thing we care about with consciousness is this

hacky contingent it's like a product of like specific constraints evolutionarily genetic bottlenecks etc and that's why we have this consciousness and like you can get similar work done like so consciousness presumably does some some sort of work for us but you can get similar work

done in a different mind in a very different way and you should sort of so that's like that's a sort of consciousness that's fragile view right and I think there's a different view which is like no consciousness is is something that's quite structural it's much more defined by functional roles

like self-awareness a concept of yourself maybe higher order thinking stuff that you really expect in many sophisticated minds and in that case okay well now actually consciousness isn't as fragile as you might have thought right now now actually like lots of beings lots of minds are conscious

and you might expect at the least that you're going to get like conscious super intelligence they might not be optimizing for creating tons of consciousness but you might expect consciousness by default and then we can ask similar questions about something like valence or pleasure or like

the kind of character of the consciousness right so there's you can have a kind of cold indifferent consciousness that has no like human or no like emotional warmth no like pleasure or pain I think that can still be Dave traumas has this papers about like Vulcans and he talks about they

still have moral patient hood I think that's very plausible but I do think it's like an additional thing you could get for free or like get quite commonly depending on on its nature is something like pleasure again and then we have to ask how janky is pleasure how specific and contingent is the

thing we care about in pleasure versus how robust is this as a functional role in like minds of all kinds and I personally don't know on this stuff and I don't think I don't think this is like enough to get you alignment or something but I think it's at least worth being aware of like these other

features we're not sort of talking we're not really talking about the values in this case we're talking about like the kind of structure of its mind and the different properties the minds have and I think that that could show up quite robustly so part of your day job is you know writing with these kinds of section 2.2.2.5 type reports and part of it is like society is like a tree that's growing towards the light how does it like context-wishing between the two of them?

So I actually find it's kind of quite complimentary so yeah I will write these sort of more technical reports and then and then do this sort of kind of more literary writing and philosophical writing and I think they both draw in kind of like different parts of myself and I try to think

about them in different ways so I think about the you know some of the reports as are much more like this is like I'm kind of more fully optimizing for like trying to do something impactful or trying to kind of kind of yeah there's kind of more of an impact orientation there and then on the kind

of essay writing I give myself much more leeway to kind of yeah just let other parts of myself and other parts of my like concerns kind of come out and kind of you know self-expression and like aesthetics and and other sorts of things even while they're both I think for me part of an underlying kind of similar concern or you know an attempt to have a kind of integrated

orientation towards the towards the situation. Can you explain the nature of the transfer between the two so in particular from the literary side to the technical side? I think rationalists are known for having a sort of ambivalence towards great works or humanities. Are they missing something crucial because of that? Because one thing you notice in your essays is and lots of references to epigraphs to lines and poems or essays that are particularly relevant.

I don't know are there are the rest of the rationalists missing something because they don't have that kind of background? I mean I don't want to speak I think some rationalists you know lots of rationalists like a lot of these different things. I do think by the time just referring specifically to SPF as a post about like how Shakespeare could be up the like the base rates of Shakespeare being a great writer and also books can be condensed to essays. So on just the general question of like

how should people value great works or something? I think people can kind of fail in both directions right and I think some some people maybe like maybe SPF or other people they're sort of interested in puncturing a certain kind of like sacredness and prestige that people can try to kind of like

yeah that the people associate with some of these some of these works and I think there's a way in which and then as a result can miss some of the like genuine value but I think they're responding to a real failure mode on the other end which is to kind of yeah be too enamored of this prestige and

sacredness to kind of siphon it off as some like weird legitimating function for your own thought instead of like thinking for yourself losing touch with like what do you actually think or what do you actually learn from like I think something you know these epigraphs careful right every time

I think you know and I'm not saying I'm immune from these vices I think there can be a like ob-bob said this and it's like whoa very deep right and it's like these are humans like us right and I think I think the canon and like other great works and you know all sorts of things

have a lot of value and you know we shouldn't I think sometimes it like borders on the way people like read scripture or I think like there's a kind of like scriptural authority that people will sometimes like describe to these things and I think that's not so yeah I think it's kind of you

know you can fall off on both sides of the horse it actually relates really interestingly to I remember I was talking to somebody who at least is familiar with rationalist discourse and I was telling who he was asking like what are you interested in these days and I was saying

something about this part of Roman history super interesting and then his first sort of response was oh you know it's really interesting when you look at these secular trends of like Roman times to what happened in the dark ages versus the enlightenment for him it was like the story of that was

just like how did it contribute to the big secular like the big picture the sort of particulars didn't they don't like it there's no interest in that it's just like if you zoom out at the biggest level what's happening here whereas there's also the opposite failure mode when people

study history dominant coming rights about this because he is endlessly frustrated with the political class in Britain and he'll say things like wait you know they study politics philosophy in economics and a big part of it is just like being really familiar with these poems and like

reading a bunch of history about the war of the roses or something but he's frustrated that they take away they have all these like kings memorized but they they take of a very little in terms of lessons from these episodes it's more of just like almost entertained like watching game of

thrones for them whereas he thinks like we're repeating certain mistakes that he's seen in history like he can generalize in a way they can't so the first one seems like a mistake I think she has loose stuff about in the one of the essays you cited where it's like if you see through

everything it's like you're you're really blind right like if everything is transparent I mean I think there's kind of very little excuse for like not learning history or I don't know or sorry I mean I'm not saying I like have learned enough history I guess I feel like even when I try

to channel some sort of vibe of like skepticism towards like great works I think that doesn't generalize to like thinking it's not worth understanding human history I think human history is like you know just so clearly yeah you know crucially kind of understanding this is what's

structured and created all of the stuff and so you know there's an interesting question about like what's the level of scale right at which to do that right and how much should you be like yeah looking at details looking at macro trends and that's you know that's a dance I do think it's

nice I think it's nice for people to be like at least attending to the kind of macro narrative I think there's like a there's some virtue in like having a worldview like really like building a model of the whole thing which I think sometimes gets lost in like the details and but obviously like

if you're too you know the details are what the world is made of and so if you don't have those you don't have data at all so yeah seems like there's some skill in like learning history history well the essentially seems related to you have a post on sincerity and I think like if I'm getting

the sort of the vibe of the piece right it's like at least in the context of let's say intellectuals a certain intellectuals have a vibe of like shooting the shit and they're just like trying out different ideas how do these like how do these analogies fit together maybe there's some

and those seem closer to the I'm looking at the particulars and like oh this is just like that one time in the 15th century where they overthrew this king and they blah blah blah uh whereas the this guy who was like oh here's a secular trend from like uh the uh if you look at

the growth models for like a million years ago to now it's like here's what's happening um that that one has a more of sort of sincere flavor hmm some people especially when it comes to AI discourse have a very um this sincere mood of operating is like I've thought through my

bio anchors and I'd like disagree with this premise or hear my effective compute estimate is different in this way here's how I analyze the scaling laws and if I could only have one person to help me guide my decisions on the AI I might choose that that person but I feel like if I could

choose between if I had 10 different advisors at the same time I might prefer the shooting the shit type characters who have these weird esoteric intellectual influences and they're almost like random number generators they're not necessarily especially calibrated but once in a while they'll

be like oh this like one weird philosopher I care about or this one historical event I'm obsessed with has a interesting perspective on this um and they tend to be more intellectually generative as well because they're not I think one one big part of it is that if you are so sincere you're like

oh I've like thought through this obviously A.S.I. is the biggest thing that's happening right now it like doesn't really make sense to spend a bunch of your time thinking about like how do the command she's live and what is the history of oil and like um uh how how to like jarar to think

about conflict you know just like what are you talking about like come on like A.S.I. is happening in a few years right whereas uh and but therefore the people who have go on these rabbit holes or because they're just trying to shoot the shit have I feel like I'm much generative I mean it might

be worth distinguishing between uh something like kind of intellectual seriousness right and something like how diverse and wide-ranging and kind of idiosyncratic are the you know things are interested in right and I think um maybe there's some correlation where people were kind of like uh or I

maybe intellectual seriousness is also distinguishable from something like shooting a shit like maybe you can shoot the shit seriously I mean there's a bunch of what's different ways to do this but I think having an exposure to like all sorts of different sources of data and perspectives seems great and I do I do think it's possible um to like curate your your kind of intellectual influences too rigidly in virtue of some story about what matters like I think I think it is good for people to

like have space I mean this I mean I'm I'm really a fan of or I appreciate the way like I don't know I try to give myself space to do stuff that is not about like this is the most important thing that's like feeding other parts myself and I think um you know parts of yourself are not isolated

they like feed under each other and it's sort of I think a better way to be a kind of richer and fuller human being in a bunch of ways and also just like these sorts of data can be just really directly relevant and I think some people um I know who I think of as like quite intellectually

sincere and in some sense quite focused on the big picture also have a very impressive command of this very wide range of kind of empirical data and they're like really really interested in the empirical trends and they're not just like oh you know it's a philosophy or you know sorry

it's not just like oh history it's the march of reason or something like they're like really they're really in the weeds I think there's a kind of in the weeds um uh virtue that I actually think is like closely related in my head with with some kind of seriousness and sincerity um I do think

there's a different dimension which is there's like kind of trying to get it right and then there's kind of like yeah I try to like what if it's like this or like try this on or yeah I have a hammer I will hit everything but what if I just hit everything with this hammer right um and so I think

some people do that and I think there is you know this room for all kinds um I kind of think the the thing where you just get it right is kind of under valued or I mean it depends on the context you're working in I think like certain sorts of intellectual cultures and milleus and incentive

systems I think incentivize um you know saying something new or saying something original or saying something like flashy or provocative or um and then like kind of various cultural and social dynamics I'm like oh like you know and people are like doing all these like kind of you know

kind of performative or statusy things like there's a bunch of stuff that goes on when people like do thinking and um you know cool but like if something's really important let's just get it right uh yeah and and I think and sometimes it's like boring but it doesn't matter and I also think like

like stuff is less interesting if it's false right like I think if someone's like bra and you're like no I mean it can be useful I think sometimes there's there's an interesting um process where someone says like blah provocative thing and it's it's a kind of an epistemic uh project to be

like wait why exactly do I think that's false right and you really you know someone's like healthcare doesn't work medical care does not work right someone says that and you're like all right how exactly do I know that medical care works right and you like go through the process of uh of um

trying to think it through and and so I think there's like room for that but I think ultimately like what like kind of the real profundity is like true right or like kind of things things become less interesting if they're just not true um and I think that's I think sometimes it feels to

me like people um or at least it's at least possible I think to like lose lose touch with that and to be more like flashy and and and it's kind of like and it's actually isn't there's there's not actually something here right you're yeah one thing I've been thinking about recently um after

interview to the uphold was or while prepping for it listen I haven't really thought at all about the fact that there's going to be a geopolitical angle to this AI thing and it all turns out if you actually think about the natural security implications that's a big deal no I wonder given the

fact that that was like something that was not my radar and now it's like oh obviously that's a crucial part of the picture how many other things like that there must be and so even if you're for coming for the perspective like AI is incredibly important if you did happen to be the kind of

person who's like ah you know everyone's not like checking out different kinds of I'm like incredibly curious about what's happening in Beijing and um then you then the kind of thing that later on you realized was like oh this is a big deal you have more awareness of you can spot it

in the first place whereas I wonder so maybe there's not an exact there's not necessarily a trade-off like it sort of like the rational thing is to have some sort of really optimal explore exploit trade off here where you're like constantly searching things out um so I don't know practically that's

works out that well but that that experience made me think like oh I really should be trying to expand my horizons in a way that's undirected to begin with because there's a lot of different things about the world yet to understand to understand any one thing I mean I think there's

also room for division of labor right like I think there can be yeah like you know there are people who are like trying to like drive onto pieces and then be like here's the overall picture and people who are going really deep on specific pieces people who are doing them more like

generative throw things out there see what sticks so I think there it also doesn't need to be that like all of the epistemic labor is like located in one brain um and uh you know it depends like your role in the world and other things so um in your series you express sympathy with the idea

that even if an AI or I guess any sort of agent that doesn't have consciousness has a certain wish and is willing to pursue it non violently we should respect its rights to pursue that um and I'm curious where that's coming from because conventionally I think like uh the thing matters

because like it's conscious and it's conscious sort of uh experience as a result of that pursued matter well I don't know I mean I think that I don't know where this discourse leads I just I'm like suspicious of the amount of like ongoing confusion that it seems to me as like present in our

conception of consciousness you know I mean so I sometimes think of it and analogies with like you know people talk about like life and like a long v tal right and maybe you know there's a world you know a long v tal was this like hypothesized life force that is sort of the thing at stake in life

and I think you know we don't really use that concept anymore we think that's like a little bit broken and so um I don't think you want to have ended up in a position of saying like everything that doesn't have a long v tal is uh doesn't matter or something right because then

you end up later and then so much and then so much similarly if you if you um even if you even if you're like no no there's no such thing as a long v tal but life surely life exists and I'm like yeah life exists I think consciousness exists too um likely depending on how we define the terms I

think it might be a kind of verbal question um uh the even once you have a kind of reductionist conception of life I think it's possible that it kind of becomes less attractive uh as a moral focal point right so like right now we really think of consciousness we're like it's a deep fact it's like

so consider a question like okay so take take um uh cellular automata right that is sort of self-replicating it has like some information that you know and you're like okay is that alive right it's kind of like uh it's not that interest is a kind of verbal question right like or

or I don't know philosophers might get really into like is that alive but you're not missing anything about this system right it's not like there's no extra life that's like springing up it's just like mmm it's alive in some senses not alive in other senses um and I think if you but I really think

that's not how we intuitively think about consciousness we think whether something is conscious is a deep fact it's this like additional it's like this really deep difference between being conscious or not it's like is someone home is the lights are on right and I I have some concern that if that

turns out not to be the case then then this is going to have been like a bad thing to like build our entire ethics around it and so um now to be clear I take consciousness really seriously I'm like I'm like man consciousness I'm not one of these people like oh obviously

consciousness doesn't exist or something I'm like but I also notice how like confused I am and how dualistic my intuitions are and I'm like wow this is really weird um and so I'm just like error bars around this um anyway so that's like one there's a bunch of other things going on in

my like wanting to be open to to kind of um not making consciousness like there's kind of fully necessary criteria I mean clearly like I definitely have the intuition like consciousness matters a ton I think like if something is not conscious and there's like a deep difference

between conscious and unconscious then I'm like definitely have the intuition that is sort of there's something that matters especially a lot about consciousness I'm not trying to be like dismissive about the notion of consciousness I just think we should be like quite aware of how

it seems to me how ongoingly confused we are about its nature okay so suppose we figure out that consciousness is just like a word we use for a hot podger different things only some of which encompass what we care about maybe there's other things we care about that are not included in

that word similar to the life force analogy then where do you anticipate that would leave us as far as ethics goes like would then there be a next thing that's like consciousness or what do you anticipate that would look like so there's a class of people who are

called illusionists and in a philosophy mind which who will say consciousness does not exist and this is sort of sort of it's different ways to understand this view but one one version is to sort of say that the concept of consciousness has built into it too many preconditions that aren't met

by the real world so we should sort of chuck it out like Ilan Bitalt like instead of the sort of proposal is kind of like at least phenomenal consciousness right or like qualia or what it's like to be a thing I'll just say this is like sufficiently broken sufficiently

chock full of falsehoods that we should just not use it I think there it feels to be like I am like there's really clearly a thing there's something going on with are you know like I'm kind of really not I kind of expect to I do actually kind of expect to continue to care about something

like consciousness quite a lot on reflection and to not kind of end up deciding that my ethics is like better it like doesn't make any reference to that or at least like there's some things like quite nearby to consciousness you know like when I I stub my toe and I have this like

something happens when I stub my toe unclear exactly what how to name it but I'm like something about that you know I'm like pretty focused on and so I do think um you know in some sense if you're if you're like well where do things go I'm like I should be clear

I have a bunch of greetings that in the end we end up carrying a bunch about consciousness just directly um and so if we don't like yeah I mean where will ethics go where will like a completed philosophy of mine go very hard very hard to say I mean I can imagine something

that's more um like I think I mean maybe a thing that I think a move that people might make if you get a little bit less interested in the notion of consciousness is some sort of slightly more like animistic like so what's going on with the tree and you're like maybe not like talking

about it as a um conscious entity necessarily but it's also not like totally unaware or something and like so there's all the like the consciousness discourse is rife with these funny cases where it's sort of like oh like those criteria imply that this um this totally weird entity would be

conscious or something like that like especially if you're interested in in some notion of like agency or preferences like a lot of things can be agents corporation you know all sorts of things like corporations conscious and like oh man um but I actually think it's a one place it

could go in theory is in some sense you start to view the world as like animated by moral significance in kind of richer and and subtler structures than we're used to like then we're used to you know and so like plants or um you know like weird optimization processes are kind of like

outflows of like complex I don't know like who knows exactly what what you end up seeing as infused with the sort of thing that you ultimately care about um but I think it's it is possible that that is doesn't map that that that like includes a bunch of stuff that we don't normally

know ascribe um consciousness to I think that the the when you use a complete theory of mine and presumably after that a more complete ethic even then notion of a sort of reflective equilibrium implies like oh you'll be like you'll be done with it at some point right like you just

you sum up all the number and like then then you're then you've got the thing you hear about this might be unrelated to the same sense we have in science but also I think like this the vibe you get when you're talking about these kinds of questions is that oh you know we're like

rushing through all the science right now and we've been churning through it it's getting harder to find because there's some like cap like you find all the things at some point right now it's super easy because like a semi-intelligence species barely has emerged and the ASI will just rush

through everything incredibly fast and like then you will either have aligned its heart or not in either case it'll use what it's figured out about like what is really going on and then expand through the universe and exploit you know like do the tiling or maybe some more benevolent

version of quote unquote tiling that feels like the basic picture of what's going on we had dinner with Michael Nielsen a few months ago and his view is that this just keeps going forever or close to forever how much would it change your understanding of what's going to happen

the future if you were convinced that Nielsen is right about his picture of science yeah I mean I think there's a few different aspects there's kind of my memory of this conversation you know I don't claim to really understand uh Michael's picture here but I think my memory was it sort of like

sure you get the you get the fundamental laws like I think he my impression was that he expects sort of physics the kind of physics to get solved or something maybe modular like the expansiveness of certain experiments or something but the difficulty is like even granted that you

have the kind of basic laws down that still actually doesn't let you predict like where at the macro scale like various useful technologies will be located like there's just still this like big search problem um and and so my memory though I you know I'll let him speak for himself on on what what

his take is here but my memory was it was sort of like sure you get you get the fundamental stuff but that doesn't mean you get the same tech um you know I'm not sure if that's true I think if that's true um what kind of difference would it make so one difference is that uh well so here's a question

so like it means at some times you have to do you you have to at a more ongoing in a more ongoing way make trade-offs between investing in further knowledge and further exploration versus um uh exploiting as you say sort of acting on your existing knowledge um because you you can't get

to a point where you like and we're not now you know as I think about I mean I think that's um you know I sort of suspect that was always true and like I remember talking to someone I think I was like ah we should at least in the future we should really get like all the knowledge and

he's like well what do you want to like you know I want to know the output of every touring machine or like you know in some sense there's a question of like what what actually would it be to have like a completed knowledge and um I think that's a rich question and it's own right and I think it's

like not necessarily that we should imagine even in this sort of on any picture necessarily that you've got like everything uh and on any picture in some sense you could end up with this case where you you cap out like there's some collider that you can't build or whatever like there's some

something is too expensive or whatever and kind of everyone caps out there uh so there's I guess like one one way to put it is like so there's a question of like do you cap and then there's a question of like how contingent is the place that's right you go um if there's contingent I mean one

thing one prediction that makes is you'll see more diversity across uh you know our universe or something if there are aliens they might have like quite different tech um and so maybe like you know if people meet you don't expect them to be like ah you got your thing I got I have our version

and said it's like whoa like that thing wow so that's like one thing um if you expect more like ongoing discovery of tech then you might also expect like more ongoing change and like upheaval and churn in so far as like technology is one thing that really drives kind of uh change in

civilization um so that that could be another you know people sometimes talk about like lock-in and then it's like I sort of they envision this kind of point at which civilizations kind of like settled into some structure or equilibrium or something um and maybe you get less of that if there's

I think that's maybe more about the pace rather than contingency or caps but that's um that's another factor so yeah I mean I think it I think it is an interesting I don't know if it changes the picture fundamentally of like earth civilization we still have to make trade-offs about

where how much do you invest in research versus acting on our existing knowledge um but I you know I think it has some some significance I think one vibe you get when you talk to people we're at a party and somebody mentioned this we're talking about like how uncertain should we be with the

future and they're like uh there are three things I've uncertain about like what is consciousness what is information theory and what are the basic laws of physics I think once we get that we're like we're done yeah yeah and that's like oh you'll figure out what's the right kind of hedonium

and then like you know that that has that vibe whereas this like oh you like you're like constantly shurning through and it has more of a flavor of like uh more of the becoming that um like the attunement picture implies um I think it's more exciting and uh like it's not just like

oh you figured out the things in the 21st century and then you just you know you know what I mean yeah I mean I sometimes think about the sort of two categories of views about this like there's people who think like yeah like the knowledge like we've almost we're almost there and then we've

like yeah basically got the picture right uh and um where the picture is sort of like yeah the knowledge is all just totally sitting there yeah and it's like you just have to get to like remote there's like this kind of just you have to be like scientifically mature at this right and

then it's just gonna all fall together right and then everything past that is gonna be like this super expensive like not super important thing and then there's a different picture which is much more this like ongoing mystery like ongoing like oh man there's like gonna be more and more like

maybe expect more radical revisions to our worldview and um and I think it's an interesting uh yeah I think I you know I'm kind of drawn to both um like physics we're really we're pretty good at physics right or like a lot of our physics is quite good at predicting a bunch of stuff and and

um or at least that's my impression I um this is you know reading some physicists so who knows uh you're not the physicists though right yeah but this isn't coming from my dad um this is like like there's a blog post I think Sean Carroll or something he's like we like we really understand a

lot of like the physics that governs the everyday world like a lot of it we're like really good at it and I'm like uh I think I'm generally pretty impressed by physics as a display I think that could well be right and so you know I think on the other hand like uh you know really these guy you

know had a few centuries of so anyway but I think that's an interesting um yeah and it leads to a different I think it does there's something you know the endless frontier there is a there is a draw to that from an aesthetic perspective of the idea of like continuing to continue to discover stuff

you know at the least I think you don't you can't get like full knowledge in some sense because there's always like what are you gonna do like there's some way in which you're part of the system so it's not clear that you the knowledge itself is part of the system and sort of like I don't

know like if you imagine you're like ah you try to have full knowledge of like what the future of the universe will be like um uh well I don't know I don't actually I'm not totally sure that's true it has a whole thing problem kind of proper right there's a little bit there's a little bit of

a loopiness if you're if you're um I think there are probably like fixed points in that where you could be like yep I'm gonna do that and then like yeah right but I think it's um I at least have a question of like um are we you know when people imagine the kind of completion of knowledge

you know exactly how well does that work right I'm not sure you had a passage in your essay on utopia where I think you're uh the vibe there was more of um the thing that were the positive future we're looking forward to it will be more of like uh you uh unless you describe

what you meant but like it to me it felt more like the first stuff like you get the thing and then now you've like found the heart of the maybe can I ask you to read that passage real quick oh sure um and that that way I'll spur the discussion I'm interested in having this part in particular

right um quote um I'm inclined to think that utopia however weird would also be in a certain sense recognizable that if we really understood and experienced it we would see it we would see in it the same thing that made us sit bolt upright long ago when we first touched love joy beauty

that would we would feel in front of the bonfire the heat of the ember from which it was lit there would be I think a kind of remembering where does that fit into this picture I think it's a good question I mean I I think um I think it's like some guess about like if

there's like no part of me that recognize it recognizes it as good then I think I'm I'm not sure that it's good according to me in some sense like uh so yeah I mean it is a question of like what it takes for it to be the case that a part of you recognizes it is good but I think if there's

really none of that then I'm I'm not sure um it's a reflection of my values at all hmm there's this sort of technological thing you can do where it's like if I went through the processes which led to me discovering was good which we might call reflection then it was good

but by definition you ended up there because it was like you know what I mean yeah I mean you definitely don't want to be like like you know if you transform me into a paper clipper gradually yeah yeah right then I will eventually be like and then I saw the light yeah I saw the true paper

clips yeah right but that's part of what's what's complicated about this thing about reflection you have to find some way of differentiating between the sort of development processes that preserve what you care about and the development processes that don't and that is in itself is this like fraught question which itself requires like taking some stand on what you care about and what sorts of meta processes you endorse and all sorts of sorts of things but you definitely shouldn't just be like

it is not a sufficient criteria that the thing at the end thinks it got it right right because that that's compatible with having gone like wildly off the rails yeah yeah there was a very interesting sentence you had in your post one of your posts where you said our hearts have in fact been shaped

by power so we should not be at all surprised if the stuff we love is also powerful well I like yeah what's going on there I actually could want to think about what would you mean there yeah so the context the context on that post is I'm talking about this hazy cluster which I

which I call in the essay niceness slash liberalism slash boundaries which is this sort of like somewhat more minimal set of like cooperative norms involved in like respecting the boundaries of others and kind of cooperation and peace amongst differences and like tolerance and stuff like

that as opposed to like your favorite structure of matter which is sort of sometimes the paradigm of of like values that that people people use in the context of of AI risk and you know I talk for a while that's the sort of ethical virtues of these like norms but it's pretty clear that

also like why do we have these norms like well one important feature of these norms is that they're kind of effective and powerful like liberal societies are you know like you know secure boundaries save resources wasted on conflict right and like liberal societies are often more like you know they're better to live live in they're better to immigrate to they're more productive like all sorts of things nice people they're better to interact with they're better to like trade with all sorts of

things right and and I think it's pretty clear if you look at the both like why at a political level do we have like various political institutions and if you look kind of more deeply into our evolutionary past and like how our moral cognition is structured it seems like pretty clear that various like kind of forms of cooperation and like kind of game through reddit dynamics and other things went into kind of shaping what we now at least in certain contexts also treat as a kind of

intrinsic or terminal value so like these some of these values that have kind of instrumental functions in our society are also kind of rayified in our cognition as kind of intrinsic values in themselves and I think that's okay I don't think that's a debunking like all of your

all your values are kind of like some something that kind of stuck and got kind of treated as a terminally important but I think that means that you know sometimes sometimes the way we in the context of the series where I'm talking about like deep atheism and our sort of

relationship the relationship between what we're pushing for and what like nature is pushing for or what sort of pure power will push for and it's easy to say like well there's like paper clips which is just like one way place you can steer and you know pleasure is like another place you

can steer or something and these are just sort of arbitrary directions whereas I think like some of our other values are much more structured around like cooperation and things that also are kind of effective and functional and like a powerful and so and so that's that's what I mean there

is I think there's there's a way in which we're sort of nature is a little bit more on our side than you might think because like part of who we are is like has been made by a kind of nature's way and so that that is like in us now I don't think that's enough necessarily you know for us to

beat the gray goo right like we have some amount of like power built into our values but that doesn't mean it's kind of going to be such that it's kind of arbitrarily competitive but I think it's still important to keep in mind that this is and I think it's important to keep in mind in the context

of integrating AI's into our society that I think you know we've been talking a lot about the ethics of this but I think there's also there are like instrumental and kind of practical reasons to want to have like forms of social harmony and like cooperation with AI's with different values and I think we need to be taking that seriously and thinking about what is it to do that in a way that's like genuinely kind of legitimate and kind of a project that is sort of a kind of just

incorporation of these beings into our civilization such that they can kind of all or sorry there's like the justice part and there's also the kind of is it like kind of compatible with like people you know is it a good deal is it a good bargain for people and I think this is you know this is

often how you know to the extent we're kind of very concerned about AI's like kind of rebelling or something like that it's like well there's like a lot of you know part of a thing you can do is make civilization better for some it right so it's like and I think that's that's an important feature of

how we have in fact structured a lot of a lot of our political institutions and norms and stuff like that so that's the thing I'm getting getting out in that in that quote okay I think this is an excellent place to close great thank you so much thanks for giving me on the podcast I mean we

discussed the ideas in the series I think people might not appreciate if they haven't read this for years how beautifully written it is it just like the ideas for we didn't cover everything so there's a bunch of very very interesting ideas I somebody who has talked to people about AI

for a while things I haven't encountered anywhere else but just obviously no part of the AI discourses nearly as well for it and it is genuinely a beautiful experience to listen to the podcast version which is in your own voice so I highly recommend people to do that so it's joe krolsen

at the dot com where they can access this yeah joe thank you so much for coming on the podcast thank you we haven't really enjoyed it hey everybody I hope you enjoyed that episode with joe if you did as always it's helpful if you can send it to friends group chats twitter whoever else

you think might enjoy it and also if you can leave a good rating on Apple podcast or whatever you listen that's really helpful helps other people find the podcast if you want transcripts of these episodes or what you want to do my blog post you can subscribe to my sub stack at doarkeshpatel.com

and finally as you might have noticed there's advertisements on this episode so if you want to advertise on a future episode you can learn more about doing that at doarkeshpatel.com slash advertise or the link in the description anyways I'll see you on the next one thanks

This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.