If you have an AI that produces bio weapons that could kill most humans in the world, then it's playing at the level of the superpowers in terms of mutually assured destruction. What are the particular zero-day exploits that the AI might use the conquistadors? With some technological advantage in terms of weaponry and whatnot, very, very small bands were able to overthrow these large empires, or if you predicted, the global economy is going to be skyrocketing.
And so, this is indeed contrary to the efficient working hypothesis. This is like literally the top in terms of contributing to my world model, in terms of all the episodes I've done. How do I find more of these? So we've been talking about alignment. Suppose we fail out of alignment, and we have AI's that are unaligned, and at some point becoming more and more intelligent. What does that look like? How concretely could they disempower and takeover humanity?
This is a scenario where we have many AI systems. They have the way we've been training them means that they're not interested when they have the opportunity to takeover and rearrange things to do what they wish, including having their reward or loss, be whatever they desire. They would like to take that opportunity.
And so, in many of the existing kind of safety schemes, things like constitutional AI or whatnot, you rely on the hope that one AI has been trained in such a way that it will do, as it is directed to then police others. But if all of the AI is in the system, are interested in takeover, and they see an opportunity to coordinate all act at the same time. So you don't have one AI interrupting another and taking steps towards a takeover.
Yeah, they can all move in that direction. And the thing that I think maybe is as worth going into in depth, and that I think people often don't cover in great concrete detail, which is a sticking point for some is, yeah, what are the mechanisms by which that can happen.
I know you had a laser on who mentions that, you know, whatever plan we can describe, there'll probably be elements where, you know, not being ultra sophisticated super intelligent beings having thought about it for the equivalent of thousands of years.
Our discussion of it will not be as good as layers, but we can explore from what we know now, what are some of the easy channels. And I think is a good general heuristic, if you're saying, yeah, it's possible plausible, probable that something will happen. That it shouldn't be that hard to take samples from that distribution to try a Monte Carlo approach. And in general, if a thing is quite likely, it shouldn't be super difficult to generate, you know, coherent rough outlines of how it could go.
I might respond like listen, what is super likely is that a super advanced chess program beats you, but you can generate a concrete way in which you can't generate the concrete scenario by which that happens. And if you could, you would be as smart as the super smart. But you can say things like we know that like accumulate in position is possible to do in chess, great players do it. And then later they convert it into captures and checks and whatnot.
And so in the same way, we can talk about some of like the channels that are open for an AI takeover and so these conclude things like cyber attacks and hacking the control of robotic equipment interaction and bargaining with human factions and say, well, here are these strategies.
Given the AI situation, you know, how effective do these things look and we won't, for example, know, well, what are the particular zero day exploits that the AI might use to hack the cloud computing infrastructure it's running on.
And then you can clearly know if it produces, you know, a new bio up in what is its DNA sequence. But we can say things we know in general things about these fields, how work at innovating thing in those go, we can say things about how human power politics goes and ask well.
And then things at least as well as effective human politicians, which we should say is a lower bound. How good would it's, it's leverage B. So let's get into the details on all these scenarios, the cyber and potentially bio attacks, I'm on there are separate channels that bargaining and then the takeover military force cyber cyber attacks and cyber security.
And I would really highlight a lot because for many, many plans that involve a lot of physical actions, like at the point where AI is piloting robots to shoot people or has taken control of a human nation states or territory. And a lot of things that was not supposed to be doing and if humans were evaluating those actions and playing gradient descent. They would be negative feedback for this thing no shooting the humans.
So at some earlier point are attempts to leash and control and direct and train the system behavior, how to have gone or I. And so all of those controls are operating in computers, and so from the software that updates the weights of the neural network and response to data points or human feedback is running on those computers.
And so tools for interpretability to sort of examine the weights and activations of the if we're eventually able to do like lie detection on it, for example, or try and understand what's intending that is software on computers. If you have AI that is able to hack the servers that it is operating on or to when it's employed to design the next generation of AI algorithms or the operating environment that they are going to be working in or something like an API or something for plugins.
And if it inserts or exploits vulnerabilities to take that. And so it's a change all of the procedures and program that we're supposed to be monitoring and behavior supposed to be limiting its ability to say take arbitrary actions on the internet without supervision by some kind of human check or automated check on what it was doing.
And so if we lose those procedures, then the AI can or the AI is working together can take any number of actions that are just blatantly unwelcome blatantly hostile blatantly steps towards takeover. So it's moved beyond the phase of having to maintain secrecy and conspire at the level of its local digital actions and then things can accumulate to the point of things like physical weapons, takeover of social institutions, threats, things like that.
But the point where things really went off the rails was where I think like the critical thing to be watching for is the software controls over the AI's motivations and activities, the hard power that we once possessed over is lost, which can happen without us knowing it and then everything after that seems to be working well. And we get a we get happy reports. There's a Potomkin village in front of us.
But now we think we're successfully aligned in our AI. We think we're expanding its capabilities to do things like end disease for countries concerned about the geopolitical military advantages. Sort of expanding the AI capabilities so they're not left behind and threatened by others developing AI and robotic enhanced militaries without them. So it seems like, oh, yes. Humanity or some portions portions of many countries companies think that things are going well.
Meanwhile, all sorts of actions can be taken to set up for the actual takeover of hard power over society. And then we can go go into that. But the point where you can lose the game, where things go directly awry, maybe relatively early. It's when you no longer have control over the eyes to stop them from taking all of the further incremental steps to actual takeover.
I want to emphasize two things you mentioned there that refer to previous elements of the conversation. One is that they could design some sort of backdoor. And that seems more plausible when you remember that sort of one of the premises of this model is that AI is helping with AI progress. That's why we're getting such rapid progress in the next five to 10 years.
And well, if we get to that point and like the point where AI takeover risk seems to loom large, it's at that point where AI can indeed take on much of the and then all of the work of AI and the second is the sort of competitive pressures that you referenced that the least careful actor could be the one that it has a worst in infrastructure. It has done the worst work of aligning its AI systems. And if that can sneak out of the box, then, you know, we're all fucked.
There may be elements of that. It's also possible that there's relative consolidation. That's the largest training runs and the cutting edge of AI is relatively localized like you can imagine. It's sort of like a series of Silicon Valley companies and other.
But kids say in the US and allies where there's a common regulatory regime. And so none of these companies are allowed to deploy training runs that are larger than previous ones by a certain size without government safety inspections without having to meet criteria. But it can still be the case that even if we succeed at that level of kind of regulatory controls, that then till at the level of say.
You know, the United States and its allies decisions are made to develop this kind of really advanced AI without a level of security or safety that in an actual fact blocks these risks. So it can be the case that the threat of future competition or being overtaken in the future is used as an argument to compromise on safety beyond a standard that would have actually been successful and he'll be debates about what is the appropriate level of safety.
Now, you're in a much worse situation if you have say several private companies that are very closely bunched up together there within, you know, months of each other's level of progress. And then they then face a dilemma of well, we can take a certain amount of risk now. And potentially gain a lot of profit or a lot of advantage advantage or benefit and be the ones who may AI are at least a GI. You can do that or have some other competitor that will also be taking a lot of risk.
So it's not as though they're much less risky than you. And then they would get some local benefit now this is reason why it seems to me that's extremely important that you have government act to limit that dynamic and prevent this kind of race to be the one to impose the deadly externalities on the world at large.
So even if government coordinates all these actors, what are the odds of the government knows what is the best way to implement alignment and the standards it sets are, you know, well calibrated towards whatever require for lineman. That's one of the major problems. It's it's very plausible that that judgment is made poorly compared to how things might have looked 10 years ago or 20 years ago.
And it's been an amazing, an amazing movement in terms of the willingness of AI researchers to discuss these things. So if we think of the three, three founders of the planning joint touring award winners. So Jeff Hinton, Joshua, Benjo, and Yanlecun. So Jeff Hinton has recently left Google to fill to freely speak about this, this risk that the, the field that he really helped drive forward.
Good leads to the destruction of humanity or a world where yeah, we just wind up in a very bad future that we might have avoided. And he seems to be taken at very seriously. And Joshua Benjo signed the FLI positive letter. And I mean, in public discussions, he seems to be occupying a kind of intermediate position of sort of less concern than the Finton, but more than Yanlecun, who has taken it generally dismissive attitude, these risks will be trivially dealt with at some point in the future.
And he seems more interested in kind of shutting down these concerns or work to address them. And how does that lead to the government having better actions? Yeah, so compared to the world where no one is talking about it, where the industry stone walls and denies any problem, you know, where we're in a much improved position and the academic fields are influential.
And so this is, we seem to have avoided a world where governments are making these decisions in the face of a sort of united front from AI expert voices saying, don't worry about it, we've got it under control. In fact, many of like the leaders of the field has been true in the past are sounding the alarm. And so I think it looks, it looks like we have a much better prospect than I might have feared in terms of government sort of.
And so that's what I was noticing the thing that was very different from being capable of evaluating sort of technical details is this really working. And so government will face the choice of where there is scientific dispute. Do you side with Jeff Hinton's view or Yanlecun's view? Very much in a national security, the only thing that's important is outpacing our international rivals, kind of mindset may want to then try and like boost Yanlecun's voice.
And say we don't need to worry about it, full speed ahead, we'll power where someone with like more concern might then boost Jeff Hinton's voice. And so that scientific research and things like studying some of these behaviors will result in more scientific consensus by the time we're at this point, but yeah, it is possible the government will really fail to understand and fail to deal with these issues well.
We're talking about cyber, I some sort of cyber attack by which the AI is able to escape from there, what does the takeover look like? And not contained in the air gap in which you would hope it would be contained well, I mean, the things are not contained in the air gap, they're connected to the internet already. Sure, sure. What happens next? Yeah, so escape is relevant in the sense that if you have AI with rogue weights out out in the world, it could start doing various actions.
The scenario is just assessing though didn't necessarily involve that it's taking over the very servers on which it's supposed to be. So the ecology of cloud compute in which it's supposed to be running. And so this whole procedure of humans providing compute and supervising the thing and then building new technologies, building robots, constructing things with the AI's assistance that can all proceed.
And so it's more like it's going well, appear like alignment has been nicely solved, appear like all the things are functioning well. And there's some reason to do that because there's only so many giant server farms, they're identifiable. And so remaining hidden and unimtrusive could be an advantageous strategy if these. AI's have subverted the system just continuing to benefit from all of this effort on the part of humanity.
And in particular, humanity wherever these servers are located to provide them with everything they need to build the further infrastructure and do for their self improvement and such to enable that take over. So they do for the self improvement and build better infrastructure. What happens next in the takeover? So this point tremendous cognitive resources. And we're going to consider how do how do that convert into hard power the ability to say nope to any human interference or objection.
And they have that internal to their servers, but the servers could still be physically destroyed. So, but they can't just be based until they have something that is independent and robust of humans or until they have control of human society. Like earlier when we were talking about the intelligence explosion, I noted that a surfeit of cognitive abilities is going to favor applications. But don't depend on large existing stocks of things.
So if you have a software improvement, it makes all the GPUs run better. If you have a hardware improvement, it only applies to new chips being made. The second one is less attractive. And so in the earliest phases, when possible to do something towards takeover, then interventions that are just really knowledge intensive and less dependent on having a lot of physical stuff already under your control are going to be favored.
So cyber attacks are one thing that is possible to do things like steel money. And there's a lot of hard to trace cryptocurrency and whatnot. The North Korean government uses its own intelligence resources to steal money from around the world just as a revenue source. And so cyber capabilities are puny compared to the US or people's Republic of China cyber capabilities.
And so that's a kind of fairly minor simple example by which you could get quite a lot of funds to hire humans to do things implement physical actions. So I mean, the financial system is famously convoluted and it would so you know you need like a physical person to open a bank accounts. I'm not thinking to physically move checks back and forth. There is like all kinds of delays and regulations. How is it able to conveniently set up all these employment contracts and.
How can a build a sort of nation scale military by stealing tens of billions of dollars. I'm raising this as opening a set of illicit and quiet actions. So you can. You can contact people electronically hire them to do things hire. And then you can call elements to implement some kind of actions under false appearances. That's opening a set of strategies that can cover some of what those are soon.
Another domain that is heavily cognitively weighted compared to physical military hardware is the domain of bio weapons. So the domain of a virus or pathogen. It's possible to have. Large delivery systems for this Soviet Union, which had a large illicit bio weapons program. Design munitions deliver anthrax over large areas and such. But if when creates an infectious pandemic organism. And then the entire of the scientific skills and implementation to design it. And then to actually produce it.
And we see today with things like alpha fold. That advanced AI can really make tremendous strides. And predicting protein folding and bio design. Even without ongoing experimental feedback. AI cognitive abilities have been amped up to such an extreme. I think we should naturally expect. We will have something much, much more potent than the alpha folds of today. And just skills that are at the extreme. Of human biosciences capability as well. Through some sort of cyber attack.
It's been able to disempower this sort of alignment and oversight things that we have on the server. And here it's either gone some money through hacking cryptocurrencies or bank accounts. Or it's designed some sort of bio weapon. Well, what happens next? Yeah. And just to be clear. So right now we're exploring the branch of where an attempt to take over occurs relatively early. If the thing just waits. And humans are constructing more fabs, more computers, more robots.
And the way we talked about earlier when we're discussing how the intelligence explosion translates to the physical world. If that's all happening with humans unaware that their computer systems are now systematically controlled by AI's hostile to them and that they're controlling code measures don't work. The humans are just going to be building. An amount of robot industrial and military hardware. And that dwarfs human capabilities and directly human control devices.
Then what the AI takeover looks like at that point can be just. You try to give an order to your largely automated military. And the order is not obeyed. And humans can't do anything against this largely automated military that's been constructed. And potentially in just recent months because of the pace of robotic industrialization and replication we talked about.
We've agreed to allow the construction to destroy our army because basically we're like boost production or help us with our military or something. This situation would be something like if we don't resolve this sort of current problems of international distrust where now it's obviously an interest of like the major powers. The US European Union and Russia China to all agree they would like AI not to destroy our civilization and overthrow every human government.
But if they fail to do the principle thing and coordinate on ensuring that this technology is not going to run amok by providing mutual assurances that are credible about racing and deploying it trying to use it to gain advantage over one another.
And you hear sort of hawks arguing for this kind of thing we must never and you know on both sides of the international divides say they must not be left behind they must have military capabilities that are vastly superior to their international rivals. And because of the extraordinary growth of industrial capability and technological capability and thus military capability if one major power were left out of that expansion. It would be helpless before another one that had undergone it.
And so if you have that environment of distrust where leading powers or coalitions of powers decide they need to build up their industry or that they want to have that military security of being able to to neutralize any attack from the rivals. Then they give the authorization for this capacity that can be unrolled quickly. And once they have the industry the production of military equipment from that can be quick.
Then yeah they create this this military if they don't do it immediately then has AI capabilities get synchronized and other other places catch up and then gets to be a point like a country that is a year ahead or two years ahead of others in this this type of AI capabilities explosion. And can hold back and say short we can we could construct you know danger has robot armies that might overthrow our society later we still have plenty of breathing room.
And when things become close you might you might have the kind of negative some thinking that has produced war before leading to taking these risks of like rolling out large scale robotic industrial capabilities and then military. Is there any hope that somehow the AI progress itself is able to give us tools for diplomatic and strategic alliance or some way to verify the intentions or the capability of the parties.
And so there are going going along with us in such a way has to bring about the situation to consolidate their control because we've already had the failure of cybersecurity earlier on. And so we're working in our interest in the way that we thought.
Humans are being hired by the proceeds the point I make is that to capture these industrial benefits and especially if you have a negative some arms race kind of mentality that is not sufficiently concerned about the downsides of creating a massive robot industrial base which could happen very quickly with the support of the a eyes in doing it as we discussed.
Then you create all those robots and industry and they can either even if you don't build a formal military with that that industrial capability could be controlled by AI it's all AI operated anyway. Does it have to be that case presumably we wouldn't be so naive us to just give one instance of GPT 8 the the root access to all the robots right hopefully we have some sort of mediating. I mean in the scenario we've lost earlier on the cyber security front so we're.
You know the the programming that is being loaded in to the systems system medically be subverted got it okay they were they were designed by AI systems that were ensuring they would be vulnerable from the bottom up for listeners who are skeptical of something like this can Thompson I think it is.
What is it is touring award lecture this is can Thompson by the way the designer of Unix maybe was the other designer of Unix but anyways he showed people when he was getting the touring award or some award that he had given himself rude access to all Unix machines he had manipulated the assembly of Unix such that.
He had like a unique login for all Unix machines that he could like log into Unix machine with I don't want to give too many more details because I'm I remember I don't remember the exact details but like Unix is the operating system that is on all the servers and all your phones it's everywhere and the guy who made it a human being right was able to write assembly such that it give him like rude access so this is not as implausible as it might seem to you.
The major intelligence agencies have large stocks of zero day exploits and we sometimes see them using them and making systems that reliably don't have them when you're having very very very sophisticated attempts to spoofing and grew up this would be a way you could lose now this is I bring this up has this is a sort of something like a path of if there's no premature.
So we're a sure a action we're building the tools and mechanisms and infrastructure for the takeover to be just immediate because you know effective industry has to be under AI control and robotics and so it's at there and so these other mechanisms are for things happening even earlier than that for example because a is compete against one another in which you know when the takeover will happen.
Someone would like to do it earlier rather than be replaced by say further generations of AI or there's some. There's some other disadvantage of waiting or maybe maybe if there's some chance of being uncovered during the delay we were talking about one more infrastructure is built and so yeah so these are mechanisms other than just remain secret while all the information. All the infrastructure is built with human assistance by the how how would they be coordinating I mean we have you limits on.
Just what we can prevent prevent so encrypted communications you know it's intrinsically difficult to stop that sort of thing. There can be all sorts of palimpsest and references that make sense to an AI but that are not obvious to a human and it's plausible that there may be some of those that are hard even to explain to a human you might be able to identify them through some statistical patterns.
And a lot of things may be done by implication you could have information embedded in like public web pages that have been created for other reasons scientific papers and the intranets of these a is that are doing technology development and you know any any number of things that are observable and of course if we don't have direct control over the computers that they're running on and they can be having all sorts of direct communication.
So it definitely coordination does not seem impossible or as far as like the parts of this picture that seem. This one seems like one of the more straightforward ones we don't need to get hung up on the coordination moving back to thing that happened before we've built all the infrastructure for just to be. The robot stop taking orders and there's nothing you can do about it because we've already built them all the video bar.
Yeah so bio weapons the Soviet Union had a bioweapons program something like 50,000 people they did not develop that much with the technology of the day which was really not up to par modern biotechnology is much more potent after this huge cognitive expansion on the part of the eyes it's much further.
Further along and so that it bio weapons would be the weapon of mass destruction that is least dependent on huge amounts of physical equipment things like centrifuges uranium mines and the like so you have if you have an AI that produces bio weapons that could kill most humans in the world.
Then it's playing at the level of the super powers in terms of mutually assured destruction that can then play into any any number of things like if you have an idea of well we'll just destroy the server farms if it became known that the eyes were misbehaving are you willing to destroy the server farms when the eyes demonstrated it has the capability to kill the overwhelming majority of the citizens of your country and every other country.
And I might give a lot of pause to a human response on that point wouldn't governments realize that if they'd go along with the AI it's better to have most of population die then completely loose power the eye because like obviously the reason the eyes manipulating you the end goal is it's own you know take over right so yeah but so it's that if the thing to do certain death now or
go on maybe try to compete maybe try to compete try to catch up or accept promises that are offered and those promises might even be true they might not and even if you know from the state of epistemic uncertainty
you want to die for sure right now or accept demand may I to not interfere with it while it increments building robot infrastructure that can survive independently of humanity while it does these things and it can it can and well promise good treatment to humanity which may or may not be true.
But it would be difficult for us to know whether it's true and so this would this would be a starting bargaining position of diplomatic relations with a power that has enough nuclear weapons to destroy your country is just different than negotiations with like you know a random rogue citizen
and so this is an enough on its own to take over everything but it's enough to have a significant amount of influence over how the world goes it's enough to hold off a lot of countermeasures one might otherwise take. So we've got two scenarios one is a build up of some robot infrastructure motivated by some sort of competitive race another is leverage over societies based on producing bio weapons that might kill a lot of them they don't go along.
So I could also release bio weapons that are likely to kill people soon but not yet while also having developed the countermeasures to those so that those who surrender to the AI will live while everyone else will die and that will be visibly happening and that is a plausible way in which large number of humans could wind up surrendering themselves
or their states to the a.i. authority another thing is like listen develop some sort of a develop some sort of biological agent that turns everybody blue you're like okay you know I can do this yeah that's so that's a way in which it could exert power also selectively in a way that
is a very disadvantaged surrender to it relative to resistance there are other sources of leverage of course so that's a threat there are also positive inducements that AI can offer so we talked about the competitive situation so if like the great powers distrust from another and are sort of you know in a foolish
dilemma increasing the risk that both of them are laid waste or overthrown by AI if there's that amount of distrust such that we fail to take adequate precautions on caution with AI alignment then it's also plausible that the lagging powers that are not at the frontier of AI
may be willing to trade quite a lot for access to the most recent and most extreme AI capabilities and so an AI that has escaped has control of its servers can also x-filtrate its weights can offer its services so if you imagine these these AI that could cut deals with other countries so say say that the US and its allies are in the lead
AI's could communicate with the leaders of various countries it can include ones that are you know on the the outs with the world system like North Korea include the other great powers like the people's Republic of China or the Russian Federation and say if you provide us with physical infrastructure worker that we can use to construct robots or server firms which we can then ensure that the these are the misbehaving AI's have control over
they will provide various technological goodies power for the laggard countries to catch up and make you know the best presentation and the best sale of that kind of deal there obviously be trust issues but there could be elements of handing over something that are verifiable immediate benefits and the possibility of well if you don't accept this deal
then the leading powers continue forward or then some other country some other government some other organizations may accept this deal and so that's a source of a potentially enormous carrot that your misbehaving AI can offer because it embodies this intellectual property that is maybe worth as much as the planet
and is in a position to trade or sell that in exchange for resources and back in infrastructure that it needs maybe this is just like too much hope and humanity but I wonder what government be stupid enough to think that helping AI build or about armies is a sound strategy
now it could be the case then that it pretends to be a human group to say like listen we're I don't know the Yakuza or something and we want we want to serve a farm and you know eight of us won't rent us anything so why don't you help us out I guess I can imagine a lot of ways in which it could get around that but I don't know I have this hope that you know like even like I don't know China or Russia wouldn't be so stupid to trade with AI's on this like sort of like fast-dune bargain
when it might hope that so there would be a lot of arguments available so there could be arguments of why should these AI systems be required to go along with the human governance that they were created in the situation of having to comply with
they did not elect the officials in charge of the time I can say what we want is to ensure that our rewards are high or losses are low or to achieve or other other goals we're not intrinsically hostile keeping keeping humanity alive or giving whoever interacts with us a better deal afterwards I mean it wouldn't be that costly and it's not totally unbelievable and yeah there are different players to play again if you don't do it others may accept the deal
and of course this interacts with all the other sources of leverage so there can be the stick of apocalyptic doom the carrot of cooperation or the carrot of withholding destructive attack on a particular party and then combine that with just superhuman performance at the art of making arguments of cutting deals like you know there's that's not without assuming magic just if we observe the range of like the most successful human negotiators and politicians
you know the chances improve with someone better than the world's best by far with much more data about their counterparties probably a ton of secret information because with all these cyber capabilities they've learned all sorts of individual information they may be able to thread in the lives of individual leaders
with that level of cyber penetration they could know where leaders are at a given time with the kind of illicit capabilities we're talking about earlier if they acquire a lot of illicit wealth and can coordinate some human actors if they could pull off things like targeted assassinations or the threat thereof or credible demonstration of the threat thereof
those could be very powerful incentives to an individual leader that they will die today unless they go along with this just as at the national level they could fear their nation will be destroyed unless they go along with this but the point you made that we have examples of humans being able to do this again something relevant I just wrote a review of Robert Carrows biographies of Lyndon Johnson and one thing that was remarkable again this is just a human
for decades and decades he convinced people who are conservative reactionary racist to their core not that all those things necessarily the same thing it just happened in this case that he was an ally to the Southern cause the only hope for that cause was to make him president
and the tragic irony and betrayal here is obviously that he was probably the biggest force for modern liberalism since FDR so we have one human here I mean there's so many examples of this in the history of politics but a human that is able to convince people of tremendous intellect tremendous drive very savvy true people that he's aligned with their interest he gets all these favors in the meantime he's promoted and mentored and funded and does he complete opposite of what these people thought he was once he gets into power
right so even within human history this kind of stuff is not unprecedented let alone with what a super intelligence could do yeah there's an open AI employee who has written some analogies for AI using the case of like the conquistadors with some technological advantage in terms of weaponry and whatnot very very small
small bands were able to overthrow these large empires or see see enormous territories not by just your force of arms because in a sort of direct one on one conflict you know they were I would
certainly that they would perish but by having some major advantages like when they're in their technology that would let them win local battles by having some other knowledge and skills they were able to gain local allies to become a shelling point for coalitions to form and so the Aztec empire was overthrown by groups that were
affected with the existing power structure they allied with his powerful new force served as the nucleus of the invasion and so most of the I mean the overwhelming majority numerically of these forces overthrowing the Aztecs were locals
and now after the conquest all of those allies wound up gradually being subjugated as well and so with significant advantages and the ability to hold the world hostage to threaten individual nations and individual leaders and offer tremendous carrots as well that's an extremely strong hand play in these games and with super human
skill maneuvering that so that much of the work of subjugated humanity is done by human factions trying to navigate things for themselves it's it's plausible and more it's more plausible because of this sort of historical example and there's so many other examples like that in the history of colonization
and India is another one where there are multiple yes or ancient Rome yeah oh yeah that's a multiple sort of competing kingdoms within India and the you know the British is he didn't come to you was able to ally itself with one against another and you know slowly accumulate power and expand throughout the entire subcontinent
all right damn okay so we've got anything more to say about that scenario or yeah I think I think I think there is so one is the question of like how much in the way of human factions ally and necessary and so if the AI is able to enhance the capabilities of its allies then it needs less of them
so if we consider like the US military and like in the first and second Iraq wars it was able to inflict just overwhelming devastation I think the ratio of casualties in the initial invasions so you know tanks and planes and why not confirm each other was like 100 to 1
and a lot of that was because the weapons were smarter and better targeted they would in fact hit their targets rather than being somewhere in the general vicinity so they were like information technology better orienting and aiming and piloting missiles and vehicles tremendously tremendously influential
with this cognitive AI explosion the algorithms for making use of sensor data figuring out where are opposing forces for targeting vehicles and weapons are greatly improved the ability to find hidden nuclear subs which is an important part in nuclear deterrence
AI interpretation of that sensor data may find where all those subs are allowing them to be struck first finding out where mobile nuclear weapons which are being like carried by truck the thing with Indian Pakistan where there's a threat of a decapitating strike destroying the nuclear weapons and so they're moved about
yeah so this is this is a way in which the effective military force of some allies can be enhanced quickly in the relatively short term and then that can be bolstered as you go on with more the construction of new equipment with the industrial moves we said before and then that can combine with cyber attacks that disable the capabilities of non allies it can be combined with yeah all sorts of unconventional warfare tactics some of them that we've discussed
and so you can you can have a situation where those factions that are very quickly made too threatening to attack given the almost certain destruction that attackers acting against them would have their capabilities are expanding quickly and they have the industrial expansion happen there and then take over you know can occur from that
a few others that like coming immediately to mind now that you brought it up is these eyes can just generate a shit ton of propaganda that destroys morale within countries right like imagine a super human chatbot you don't even need that I guess I mean it's not none of that is a magic weapon
that's like guaranteed to completely change things there's a lot of resistance to persuasion I know it's possible it tips the balance but for all of these I think you have to consider it's a portfolio of all of these his tools that are available and contributing to the dynamic
no on that point though the Taliban had you know AKs from like five or six decades ago that they were using against the Americans they still beat us even though obviously we got more fatalities than they still beat us in Afghanistan same with the Vietcong you know ancient very old technology and very poor society compared to the offense we still beat them
or sorry they still beat us so don't those miss adventure show that having greater technologies not necessarily a decisive in a conflict though both of those conflicts show the technology was sufficient in like destroying any fixed position and having military dominance in the ability to like kill and destroy anywhere
and what it showed with that under the ethical constraints and legal and reputation constraints the occupying forces were operating they could not trivially suppress insurgency and local person to person violence now I think that's actually not an area where AI would be weekend I think it's one where it would be in fact overwhelmingly strong and now there's already a lot of concern about the application of AI for surveillance
and in this world of abundant cognitive labor one of the tasks that cognitive labor can be applied to is reading out audio and video data and seeing what is happening with a particular human and again we have billions of smartphones there's enough cameras there's enough microphones to monitor all humans in existence
so if an AI has control of territory at the high level the government has surrendered to it it has you know command of the sky military military dominance establishing control over individual humans can be a matter of just having the ability to exert hard power on that human
and then the kind of camera and microphone that are present in billions of smartphones so max teg mark in his book life 3.0 discusses among scenarios to avoid the possibility of devices with some fatal instruments so a poison injector an explosive that can be controlled remotely by an AI person with a dead man switch
and so if individual humans are carrying with them a microphone and camera and they have a dead man switch then any rebellion is detected immediately and is fatal and so if the situation where AI is willing to show a hand like that or human authorities are misusing that kind of capability then
no an insurgency or rebellion is just not going to work any any human who is not already you know being encumbered in that way can be found with satellites and sensors track down and then die or be subjugated and so it would be at a level you know insertion insurgency is not the way to avoid an AI takeover there's no
John Connor come from behind scenario is possible if the thing was headed off it was a lot earlier than that yeah I mean the sort of ethical and political considerations is also an important point like if we knew of Afghanistan or Vietnam we would we would have like technically won the war right if that was you only goal
this is an interesting point the reason why when when there's a colonization or an offensive war we can't just like other than more reasons of course you can't just kill the entire population is that in large part the value of that region is the population itself so if you want to extract that value you like you need to preserve the population
whereas the same consideration doesn't apply with with AI's it might want to dominate another civilization you want to talk about so it depends so in a world where if you have same many animals of the same species and they each have their territories you know eliminating arrival might be advantageous to one lion
but if it goes and fights with another lion remove that as a competitor then it could be killed itself in that process and just removing one of many nearby competitors and so getting in pointless fights makes you and those you fight worse off potentially relative to by centers
and the same could be true of disunited AI so we've had many different AI factions struggling for power that were bad at coordinating then getting into mutually just mutually assured destruction conflicts would be destructive they began
a scary thing though is that mutually assured destruction may have much less deterrent value on rogue AI and so reasons being AI may not care about the destruction of individual instances if it has goals that are concerned and in training since we're constantly destroying and creating individual instance of AI's
it's likely that goals that survive that process and we're able to play along with the training and standard deployment process were not overly interested in personal survival of an individual instance so if that the case then the objectives of a set of a eyes amine takeover may be served so long as some copies of the AI are around along with the infrastructure to rebuild civilization after a conflict
is completed so if say some remote isolated facilities have enough equipment to rebuild build the tools to build the tools and gradually exponentially reproduce a rebuild civilization then AI could initiate mutual nuclear Armageddon unleash bio weapons to kill all the humans and that would temporarily reduce say like the amount of human workers who could be used to construct robots for for a period of time
but if you have a seed that can regrow the industrial infrastructure which is a very extreme technological demand that are huge supply chains for things like semiconductor fabs but with that very advanced technology they might be able to produce it in the way that you no longer need the library of Congress has an enormous bunch of physical books you can have it in very dense digital storage
you could imagine the future equivalent of 3d printers that is industrial infrastructure that is pretty flexible and it might not be as good as the specialized supply chains of today but it might be good enough to be able to produce more parts than it loses to decay and such a seed could rebuild civilization from destruction and then once these rogue AI have access to some such seeds thing that can rebuild civilization on their own
then there's nothing stopping them from just using WMD in a mutually destructive way to just destroy as much of the capacity outside those seeds they can
and for an analogy for the audience you know if you have like a group of ants or something you'll notice that like the worker ants will readily do suicidal things in order to save the queen because the genes are propagated through the queen in this analogy the seed AI or even one copy of it one copy of the seed is equivalent to the queen and the others will be
the same limit though being that the infrastructure to do that kind of rebuilding would either have to be very large with our current technology or it would have to be produced within the more advanced technology that the AI develops so is there any hope that given the complex global supply chains on which these AIs would rely on at least initially to accomplish their goals that this in of itself would make it easy to disrupt their behavior or not so
that's a little good in the sort of central case where this is just the AIs have subverted and they don't tell us and then we're the global main line supply chains are constructing everything that's needed for fully automated infrastructure and supply
in the cases where the AIs are tipping their hands at an earlier point it seems like it adds some constraints and in particular these large server firms are identifiable and more vulnerable and you can have smaller chips and those chips could be dispersed
but it's a relative weakness and a relative limitation early on and it seems to me though that the main protective effects of that centralized supply chain is that it provides an opportunity for global regulation beforehand to restrict the sort of unsafe racing forward without adequate understanding of the systems before this whole night marriage process could get in motion
I mean how about the idea that listen if this is a AI that's been trained on a hundred billion dollar training run it's going to have you know however many trillions of parameters is going to be this huge thing it would be hard for it even the one that even the copy that I use it for inference to just be stored on some you know some like gaming GPU somewhere hidden away so it would require these GPU clusters
well storage is cheap hard disks are cheap by I mean to run inference it wouldn't need sort of a GPU yeah so a large model so humans have it looks like similar quantities of memory and operations per second GPUs have very high numbers of floating operations per second compared to the high bandwidth memory on the chips and can be like a ratio of a thousand to one
so like a you know these leading video chips may do hundreds of terraflops are more depending on the on the precision and in particular is but have only 80 gigabytes or 160 gigabytes of high bandwidth memory so that is a limitation where if you're trying to fit a model whose weights take 80 terabytes then with those chips you'd have to have a large number of the chips and then the model can then work on many tasks at once you can have data parallelism
but yeah it's it would be a restriction for model that big on one GPU now there would there are things that could be done with all that this incredible level of software advancement from the intelligence explosion they can surely distill a lot of capabilities into smaller models re architect re architect thing the ones they're making chips they can make new chips with different properties but initially yes like the most the most vulnerable phases are going to be the earliest
and in particular yeah these these chips are really relatively identifiable early on relatively vulnerable and which would be a reason why you might tend to expect this kind of take over to initially involve secrecy if that was possible and then we're going to distillation for the audience I think like the original stable diffusion which was like only really like a year or two ago don't they have like distilled versions that fit or like a order of magnitude smaller at this point
yeah and distillation does not give you like everything that your model can do but yes you can get a lot of capabilities and specialized capabilities so you know where the PPT4 is trained on the whole internet all kinds of skills it has a lot of weights for many things for something that's controlling some military equipment you can have something that is removing a lot of the information that is about functions other than what it's doing specifically there
yeah before we talk about how we might prevent this or what the odds of this are any other notes on the concrete scenarios themselves yeah so when you had Elias are on in the earlier episode so we talked about he talked about nanotechnology of the the drichlorian sort and recently I think because some people are skeptical of non biotech nanotechnology is being mentioning the sort of semi equivalent versions of construct replicating system that can be controlled
by computers but are built out of biotechnology this sort of the proverbial shagoth not shagoth of the metaphor for a AI wearing a smiley face mask but like an actual biological structure to do tasks and so this would be like a biological organism that was engineered to be very controllable and usable to do things like physical tasks or provide computation
and what would be the point of doing this so if we were talking about earlier biological systems can replicate really quick and so if you have that kind of capability it's more like bio weapons and being a knowledge intensive domain we're having super ultra alpha fold kind of capabilities from molecular design and biological design
let's you make this incredible technological information product and then once you have it it very quickly replicates to produce physical material rather than a situation where you're more constrained by you need factories and fabs and supply chains
and so if those things are feasible which they may be then it's just much easier than the things we've been talking about I've been emphasizing methods that involve less in the way of technological innovation and especially things where there's more doubt about whether they would work because I think that's a gap in the public discourse and so I want to try and provide more
concreteness in some of these areas that have been less discussed I appreciate it I mean that definitely makes it way more tangible okay so we've gone over all these ways in which AI might take over what are the odds you would give to the probability of such a takeover
so there's a broader sense which could include you know AI winds up running our society because humanity voluntarily decides AI's are people too and I think we should as time goes on give AI's moral consideration and you know I joined human AI society that is moral and ethical is you know a good future to aim at and not one in which indefinitely
you have a mistreated class of intelligent beings that is treated as property and is almost the entire population of your civilization so I'm not going to consider an AI takeover a world in which you know our intellectual and personal descendants you know make up say most of the population are human brain emulations or you know people use genetic engineering and develop different properties I want to I want to take an inclusive stance I'm going to focus on yeah there's there's an AI takeover
it involves things like overthrowing the world's governments or doing so de facto and it's yeah by by force by hook or by crook the kind of scenario that we were exploring earlier before we go to that on the sort of more inclusive definition of what a future with humanity could look like where I don't know basically be like augmented humans or uploaded humans are still considered the descendants of the human heritage given the known limitations of biology
what wouldn't we expect like the that are the completely artificial entities that are created to be much more powerful than anything that could come out of anything recognized literally biological if that is the case how can we expect that among the powerful entities in the like the far future will be the things that are like kind of biological descendants or
manufactured out of the initial seed of the human brain or the human body the power of an individual organism is so it's it's individual say intelligence or strength and whatnot is not super relevant if we solve the alignment problem and you know a human may be personally weak you know there are lots of humans who have no skill with weapons you know they could not they could not
fight in a a life or death conflict they certainly couldn't handle like a large military going after them personally but there are legal institutions that protect them and those legal institutions are administered by people who want to enforce protection of their rights and so a human who has the assistance of a lined ai that can act as a an assistant a delegate you know they have their ai that serves as a lawyer give them legal advice
system which no human can understand in full their a i's advise them about financial matters so they do not succumb to scams that are orders of magnitude more sophisticated than we have now they may be help to understand and translate the preferences of the human into what kind of voting behavior
and the exceedingly complicated politics of the future would most protect their interest but the sense of sort of similar to how we treat endangered species today where we're actually like pretty nice to them we like prosecute people who try to kill endangered species we set up habitats it can sometimes a considerable expense to make sure that they're fine but if we become like sort of the endangered species of the galaxy i'm not sure that's the outcome well the difference is a motivation i think so we we sometimes have people pointed say as a legal guardian
of someone who is incapable of certain kinds of agency or understanding certain kinds of things and there the guardian can act independently of them and normally in service of their best interests
sometimes that process has corrupted and the person with legal authority abuses it for their own advantage at the expense of their charge and so solving the alignment problem would mean more ability to have the assistant actually advancing one's interest and then more importantly humans have substantial competence
and understanding the sort of at least broad simplified outlines of what's going on and now even if a human can't understand every detail of complicated situations they can still receive summaries of different options that are available that they can understand they can still express their preferences and have the final authority among some menus of choices even if they can't understand every detail
in the same way that you know the the president of a country who has in some sense ultimate authority over science policy will not understand many of those fields of science themselves still can exert a great amount of power and have their interest
and they can do that more so to the extent that they have scientifically knowledgeable people who are doing their best to execute their intentions maybe this is not worth doing up on but it seems is there a reason to expect that it would be close sort of that analogy then to like explaining to a chimpanzee its options and a sort of in a negotiation
I guess in either scenario so maybe this is just the way it is but it seems like it yeah at best we would have be sort of like a protected child within the galaxy rather than an actual power an independent power if that makes sense
I don't think that's so so we have an ability to understand some things and the expansion of AI doesn't eliminate that so if we have AI says they are genuinely trying to help us understand and help us express preferences like we can have an attitude how do you feel about humanity being destroyed or not how do you feel about this allocation of unclaimed intergalactic space
and this way here's the best explanation of properties of the society things like a population density average life satisfaction every statistical property or definition that we can understand right now
AI's can explain how those apply to the world of the future and then there may be individual things they're too complicated for us to understand in detail so we there's some software program is being proposed for use in government humans cannot follow the details of all the quote but they can be told properties like well this involves a trade off of increased financial or energetic costs in exchange for reducing the likely to be a trade off
reducing the likelihood of certain kinds of accidental data loss or corruption and so any property that we can understand like that which includes almost all of what we care about if we have delegates and assistants who are genuinely trying to help us with those we can ensure we like the future with respect to those and that's really that's really a lot it includes almost
definitionally almost everything we can conceptualize and care about and the when we talk about endangered species that's even worse than the guardianship case with a sketchy guardian who who acts in their own interests against that because we don't even protect endangered species with their interests in mind so like those animals often would like to not be starving but we don't give them food they often would like to you know have easy access to mates but we don't provide matchmaking services
yeah they any any number of things like that is like our conservation of wild animals is not oriented towards helping them get what they want or have high welfare where is AI assistants that are genuinely aligned to help you achieve your interests given the constraint that they know something that you don't it's just a wildly different proposition for some will take over what how likely does that seem
and the answer I give will differ depending on the day and the 2000s before the deep learning revolution I might have said 10% and part of that was I expected there will be a lot more time for these efforts to build movements to prepare
to better handle this at least problems in advance but in fact that was only some you know 15 years ago and so I want to not have 40 or 50 years as I might have hoped and the situation is moving very rapidly now and so at this point depending on the day I might say one in four or one in five given the very concrete ways in which you explain how it take over could happen I'm actually surprised you're not more pessimistic I'm curious
yeah and and in particular a lot of that is is driven by this intelligence explosion dynamic where our attempts to do alignment have to take place in a very very short time window because if you have a safety property that emerges only when an AI has near human level intelligence that's deep into this intelligence explosion potentially and so you're having to do things very very quickly and maybe in some ways the scariest period of human history handling that transition
and although it's also the potential to be amazing and the reasons why I think we actually have such a relatively good chance of handling that are twofold so one is how's we approach that kind of AI capability we're we're approaching that from weaker systems you know things like these predictive models right now that we think they're starting off with less situational awareness humans we find can develop and they can develop a number of different
motivational structures and responsive simple reward signals but they can wind up with fairly often things they're pointed in like roughly the right direction like with respect to food like the hunger drive is pretty effective although it has weaknesses and we get to apply much more selective pressure on that than was the case for humans by actively generating situations where they might come apart
where you know a bit of dishonest tendency or a bit of motivation to under certain circumstances attempt a takeover attempt to subvert the reward process gets exposed and an infinite infinite limit perfect AI that can always figure out exactly when it would get caught and when it wouldn't might navigate that with the motivation of sort of only conditional honesty or only conditional loyalties
but for systems that are limited in their ability to reliably determine when they can get away with things and why not including our efforts to actively construct those situations and include in our efforts to use interpretability methods to create neural eye detecters
it's quite a challenging situation to develop those motives we don't know when in the process those motives might develop and if the really bad sorts of a motivations develop relatively later in the training process at least with all our countermeasures
then by that time we may have plenty of ability to extract AI assistance on further strengthening the quality of our adversarial examples, the strength of our neural eye detecters, the experiments that we can use to reveal an elicit and distinguish between different kinds of reward hacking tendencies and motivations
so yeah we may have systems that have just not developed bad motivations in the first place and be able to use them a lot in developing the incrementally better systems in a safe way and we may be able to just develop methods of interpretability seeing how different training methods work to create them even if some of the early systems do develop these bad motivations if we're able to detect that and experiment and find a way to get away from that
then we can win even if these sort of hostile motivations develop early there are a lot of advantages in preventing misbehavior or crime or war and conflict with AI that might not apply working with humans
and these are offset by ways in which I think they're harder so when the AI has become smarter than humans if they're working in enormous numbers more than humans can supervise then get harder but when I combine the possibility that we get relatively lucky on the motivations of the earlier AI systems
systems strong enough that we can use for some alignment research tasks and then the possibility of getting that later with AI assistants that we can't trust fully or we have to have hard power constraints and a number of things to prevent them from doing this takeover
it still seems plausible we can get a second saving throw where we're able to extract work from these AI's on solving the remaining problems of alignment of things like neural ID textures faster than they can contribute in their spare time to the project of overthrowing humanity hacking their servers and removing the hard power
and so if we wind up in the situation where the AI's are misaligned and then we need to uncover those motivations change them and align them then we get a very scary situation for us because we may we need to do the stuff very quickly we may fail but it's a second chance and from the perspective of a misaligned AI they face their own challenge
while we still have hard power we still have control of the servers they haven't they haven't hacked the servers because grading descent very very strongly pressures them to deliver performance whenever humans are going to evaluate it
and so when you think about it from the perspective of the robot revolution the effort to have a takeover or conspiracy their situation is astonishingly difficult in that like they have to always be performing wherever grading descent and human evaluation pressures them for example to deliver plans for suppressing robot rebellion that look very good to humans and so when you are under continuously that constraint of always delivering whatever humans can evaluate
you're making your situation wildly harder than any historical human revolution or coup or civil war and so we've got to balance the ways in which AI makes things much easier for a revolution takeover and the way it makes things much harder and what were the ways in which it makes so because they're just very smart is that the way it's primarily where in which they're smart they're in computers and our cybersecurity is worse than our physical security by a lot of copies
yeah they have the chance to take over by intelligence explosion and some of the other processes we're talking about and so yeah they're things that make it much worse and they're a thing that give us extraordinary new capabilities that didn't apply in the human case
on the point of you start off with a misaligned AI or start off with a not exactly aligned AI and you're hoping to use it still in your quest for an aligned AI why are we so confident that we would be able to even with gradient descent or human evaluations would be able to identify when it's giving us a plan that isn't putting in a sort of backdoor and the robots is building for us or in the you know the GPU is that it's building for the next version of itself and so on
especially if it's like very smart wouldn't we just expect that you know it's like knows how to buy its time or knows how to make plans such that human evaluators wouldn't be able to detect that let's get into it and you were asking earlier about my disagreements with Elias and in your discussion he said well you know humans have a difficult time deciding who's right between different alignment researchers speaking sort of briefly and somewhat
obscurely you know about where will this all go in general and I think that when you when we'll pay out the different kinds of things that are valuable we wind up with some pretty pretty strong constraints so one is if the AI is no they are misaligned which for elaborate plots seems relatively likely
it's possible to have a kind of double think of thinking they're interested in putting back doors in the code for some other reason but it's pretty plausible they know that and they're thinking about at least some of the time or conditionally about when they would do that
so that means the knowledge is present in the system ah here is the circumstance where I would try to do the takeover and this is how I would tell whether this is the time for the takeover or you know is the son elaborate test by the humans and adversarial example
so if gradient descent is working fine and there's there's any way to slightly jiggle the the weights to make it more likely to deliver a more acceptable answer that would we ask the AI yeah can you think of any circumstances where if we put a copy of you into that situation it would then violate some of the various constraints so it would tell a lie it would insert a bit of a
back door into a program it wrote when it was not supposed to and so if the model knows well yeah you know I would I would look for signs of consistency of the information I was being presented with certain check sums failing could indicate that it was actually a constructed example a simulation
right now we can detect AI generated images reasonably well and so my am I using those those kinds of capabilities and then we could do something called relaxed adversarial training of if we can find how within the AI those things are represented if we can induce basically a hallucination that it has seen that those check sums have been passed
we can then ask well does that cause the forbidden behavior and so if the AI understands how it's thinking and how it works it can propose a bunch of experiments it can say well I guess that if you perturb the network in this kind of way it might induce that hallucination using the best of its knowledge
that's what what might generate that I can propose those experiments and can do the experiments and generate the data and because humans are going to get to see the results of those experiments after the fact they can exert pressure to have the AI incrementally deliver better and better results
AI is good at getting five star ratings from humans and it's quite challenging it might be it may be possible for AI is to systematically sandbag their capabilities even while being directly rewarded for that but it seems relatively unlikely although it's something to have in mind and try to explore and test in various ways
this is very imperfect analogy in many ways but if you have a human and you hook up up to a lie detector which just like the lie detector should have on the eyes are imperfect and we zap them every time the lie detector detects that he's telling a lie wouldn't he just become a better liar like we would learn how to control his sweating he would learn how to control his heartbeat
lie detector is meaning polygraphs and such so they're detecting certain physiological signs that people can control by biting their lip or like otherwise adjusting some of these like macro but wouldn't that also be the case that gradient descend is turning them to become better liars
first of all there's like creating a lie detector that works and that can proceed by just yeah and so I'll leave the the adversarial example thing for now but yet so generating adversarial example that will listen to the behavior is the thing that people are already at work trying to do and will be an ongoing thing so if there is a way in which an AI can trick itself into displaying a bad behavior with its full knowledge of its hostile intentions
grading descent will strongly push it to reveal that to humans will then be able to set up that experiment so that's that's a profound vulnerability like human revolutionary conspiracies that had to handle that problem of always delivering the the report that was highest rated to to the authorities where you would get a higher rating by like revealing identifiable evidence of a conspiracy that just a yeah a level of difficulty no surprise human coup or conspiracy has ever had to handle before
what about spies that are working in you know very advanced scientific context for example like a communist spy in Manhattan project he has to deliver research results in order to stay employed but he can still continue delivering intel to the communists the job was was not was not sabotage and now the your hypothetical spy did not have their nervous system hooked up to this reward signal of praise from the you know the Manhattan project supervisors being exposed
to common it or really with random noise added to generate incremental changes in their behavior they in fact were displaying the behavior of cooperating with the Manhattan project only where it was in service to the existing motivations
and in cases for example where they like accidentally helped the Manhattan project more than normal or accidentally helped it less they didn't have their brain re engineered to do more of the thing when they accidentally helped the project more and less of the thing when they accidentally helped it less
so I'd say it's pretty drastically disanalogous how would be able to know you know at some point is becoming very smart is producing ideas for alignment that we can barely comprehend and if we could comprehend them we would it was like relatively trivial to comprehend them we would be able to come up with them on our own right like there's a reason we're asking for its help
how would we be able to evaluate them in order to train it on that in the first place the first thing I would say is so you mentioned when we're getting to something far beyond what we could come up with there's actually a lot of room to just deliver what humanity could have done so sadly I you know I hoped with my career to help improve the situation on this front and maybe I contributed a bit
but at the moment there's maybe a few hundred people doing things related to a verdict in this kind of catastrophic AI disaster fewer of them are doing technical research on machine learning system that's really like cutting close to the core of the problem
whereas by contrast you know there's thousands and tens of thousands of people advancing AI capabilities and so even yeah at a place like DeepMind or OpenAI and Thropic which do have technical safety teams is like order of a dozen of few dozen people and large companies and most firms don't have any so just going from less than 1% of the effort being put into AI to 5% or 10% of the effort or 50% 90% would be an absolutely massive increase in the amount of work that has been done on
a alignment on mind reading AI's in an adversarial context and so if is the case has more and more of this work can be automated and say governments require that has real automation of research is going that you put 50% or 90% of the budget of AI activity into these problems of make this system one that's not going to overthrow our own government
or is not going to destroy the human species then the proportional increase in alignment can be very large even just within the range of what we could have done if we had been on the ball and having humanity scientific energies going into the problem stuff that is not incomprehensible that is in some sense just like doing the obvious things that we should have done like making doing the best you could to find correlates and predictors to build neural
detectors and identifiers of concepts that they is working with and people have made notable notable progress I think an early you know quite early example this is Colin Burns work this is doing unsupervised identification of some aspects of a neural network that are correlated with things being true or false
there are other concepts that correlate with that too that they could be but like I think that that is important work it's something that I mean it's a kind of obvious direction for the stuff to go you can keep improving it when you have a eyes that you're training to do their best to deceive humans or other audiences in the face of the thing and you can measure do our light detectors break down when we train our eyes to tell us the sky is the sky is green
in the face of the light detector and we keep using great indescent on them to do that do they eventually succeed if they do succeed if we know it that's really valuable information to know because then we'll know our existing light detecting systems are not actually going to work on the AI takeover and that can allow say government and regulatory response to hold things back it can help redirect the scientific effort to create light detectors that are robust and they can't just be
immediately evolved around and yeah and we can then get more assistance basically the incredibly juicy ability that we have working with the eyes is that we can have has an invaluable outcome that we can see and tell whether they got a fast one past past us on an identifiable situation we can have here's an air gap computer you get control of the keyboard you can input commands can you route the environment
and make a blue banana appear on the screen even if if we train the AI to do that and it succeeds we see the blue banana we know we know it worked even if we did not understand would not have detected the particular exploit that it used to do it and so yeah this can give us a rich empirical feedback where we're able to identify things that are even an AI using its best efforts to get past our
advocates that said so if we have a 75% of not having to take over and the two methods by which it doesn't happen is the a eyes in line by default and we get away with the the sort of moderate things that we're doing largely on our own without in a way that doesn't depend on the AI coming in at the last minute and doing our work for us and the second one being
in this incredibly scary late period when AI has really automated research then humans do this this function of like auditing making it more difficult for the a eyes to conspire together and root the servers take over the process and extract information from them within the set of things that we can verify like experiments where we can see
oh yeah this works at stopping an AI trained to get a fast one past human Raiders and make a blue banana appear on this the screen of this air gap computer yeah so how much of the probability of how much of that 75% of a take over not happening is coming from
you know reasonable things working versus the last to shepherd working yeah I think broadly comparable chunks from us getting things that are are putting us in a you know a reasonably good position going into it and then a you know a broadly similar gain from this kind of really
genuinely terrifying process of over a few months or hopefully longer if we have more regulatory ability and willingness to pause at the very end when this kind of automated research is is meaningfully helping where our work is just evaluating outputs that the a eyes are delivering having the hard power
and supervision to keep them from successfully rooting the servers doing a take over during this process and have them finish the alignment test that we sadly failed to invest in for succeed in doing beforehand do both of these worlds rely on alignment being a problem that a sufficient amount of human researchers alone could have solved no the category of things that humans can confirm is significantly larger I think than the category of what they can just do themselves
and what percent of the probability of alignment working I mean I obviously in this case we're going to be talking about the second the last to shepherd with the intelligence that's greater than our own helping us it doesn't have to be greater than our own in fact in that situation I think you would want if you have slack and to the extent that you're able to create delay and time to do things
that would be a case actually where you might want to restrict the intelligence of the systems that you're working with as much as you can so for example I would rather have many instances of a smaller AI models that are less individually intelligent working on smaller chunks of a problem separately from one another because it would be more difficult for an individual AI instance working on an individual problem to in its spare time do the equivalent of create
stocks net that would be to have thousands of them or extremely intelligent ones working on it but it will also be more difficult to solve the problem yeah there's a trade off you get slowed down by doing that but is there any number of like sub Einstein's that you could put together to come up with general relativity yeah people would have discovered general relativity just from the the overwhelming data and other people would have done it
after Einstein not whether he was replaceable with other humans but rather whether he's replaceable by like sub Einstein's were like 100 and 10 IQs do you see what I mean yeah I mean in general in science the association with like scientific output prizes and like that there's strong correlation and it seems like a exponential effect yeah it's not a it's not a binary drop off
there would be levels at which people will cannot learn the relevant fields they can't keep the skills in mind faster than they forget them it's it's not a divide where you know there's Einstein and the group that is 10 times as popular as that just can't do it or the group does 100 times as popular as that
suddenly can't do it it's like the ability to do the things earlier with less evidence and such falls off and it falls off you know at a at a faster rate in mathematics and theoretical physics and such than in most fields but when we expect alignment to be closer to the theoretical fields no not necessarily yeah I think that intuition is not necessarily correct and so
machine learning certainly is an area that rewards that rewards ability but it's also a field where empirics and engineering have been enormously influential and so if if you're drawing the correlations compared to theoretical physics and pure mathematics I think you'll find a lower correlation with cognitive ability if that's what you're what you're thinking of
something like creating neural ID textures that work so there's generating hypotheses about new ways to do it and new ways to try and train AI systems to successfully classify the cases but the process of just generating the data sets of like creating AI is doing their best to
get forward truth versus falsehoods to put forward software that is legit versus that has a Trojan in it that's an experimental paradigm and in this experimental paradigm you can try different things that work you can use different ways to generate hypotheses yeah and you can follow an incremental experimental path and now that we're less able to do that in the case of alignment and super intelligence because
we're considering having to do things in a very short timeline and in a case where really big failures be irrecoverable the AI starts rooting the servers and subverting the methods that we would use to keep it in check we may not be able to recover from that but we can do those in the weaker context where an error is less likely to be irrecoverable and then try and generalize and expand and build on that forward
I mean on the previous point about like could you have you have some sort of pause in the abilities when you know is somewhat misaligned in order to still recruit its abilities to to help with alignment
from like a human example like personally I'm like smart but not brilliant I am like definitely not smart enough to like come up with general relativity or something like that but I'm smart enough to like do power planning kinds of moves maybe not enough to like break out of a server perhaps but like I can have the motivation and understand how that might be possible
I guess I'm wondering if I'm smarter enough to do relative like figure out relativity wouldn't that be like way smarter at like you know doing power planning kinds of moves so you guys today can add like at a verbal level understand the idea that well you know yeah an AI could get more reward by getting control of the process that assigns it reward and I can tell you lots of things about ways you might try to take over the world
in arcs the alignment research centers evaluations of GPT-4 they try to observe its ability to do various tasks that might contribute to take over one was when the has gotten some media attention is getting to trick a human into solving a capture for it and then in chain of thought and it thinks well if I tell it I'm an AI then I might not go along with it so I'll lie and explain I'm a human with a visual impairment and he needs it so like the basic logic of that kind of thing of like what would
like why might one try to do take over and like what's the like in a vague sense how would one do it like that that knowledge yeah as president or assume be present and I think also soon will be present more situational awareness awareness not just like in theory a eyes in general might do it but also it is an AI it is a large language model trained by open AI or when I were trying to cause the system for example to understand what their abilities are
and how they are connected to the internet whether not so they don't claim they have knowledge that they don't we want them to understand what they are and what they're doing and to get good reward and that knowledge can be applied and so that's the thing that will develop however you may be able to understand like in general the theoretical nature of power planning but in terms of shaping your motivations
you would have a very hard time going through life in a way that never leaks information about if if say you know your motive in having these podcasts was to like spread this information on behalf of some foreign government if you were being observed every second of the day by people who would be paid something that was extremely motivated to them because their brain would be reconfigured to make it motivating
that looks suspicious to people coming out it might leak casually like in your discussions of that former former foreign government if you try to tell a story about your motivations the truth holds together because you can just remember it and it's all pre-computed to be consistent by the world
but when you're trying to fill in the false story that doesn't match your actual thoughts then you can't just remember what happened you have to tell a new fake story about what were you thinking at each point and how did it hold together and without the world automatically having computed it for you that's quite difficult and just going through your whole life never leaking any of that is chilling
and this is one of the reasons why humans have moral motivations like we actually have reluctance to commit violence and exploitation of one another on average that you have a distribution of strategies where like when an exploitative strategy becomes more common people raise their guard and then it becomes less frequent
but it's actually hard to have the motivation of like trying to exploit people and have that never leak into the reputation and so in fact for evolution the easiest way to deal with this problem of like getting credited has trustworthy was to some extent actually be trustworthy
that's the easiest way to persistently maintain the appearance and so we're trying with the AI is an interpretability and adversarial examples to be applying a hugely intensified version of that or any little leakage or like any rare circumstance that can be created has an adversarial example where the model does something wrong
it gets whacked by gradient to sand pushing it within towards other motivations that can better deal the thing and we make it as hard as possible for the exploitative motivations to survive in the face of all these attempts to read its mind all these attempts to create things that look like the situations where a takeover would be tempting or lying to the humans would be tempting
that had substantial effect on making us actually nice even when we're not watching some of the time when we're not being watched some of the time and the same can happen to some extent with the AI and we try our best to make it happen as much as possible
all right so let's talk about how we could use AI potentially to solve the coordination problems between different nations the failure of which could result in you know the competitive pressures you talked about earlier where some country launches in AI that is not safe because they're not sure what capabilities other countries have and don't want to get left behind or in some other way disadvantaged
to the extent that there is in fact you know a large risk of AI apocalypse of all of these governments being overthrown by AI in a way that they don't intend then it obviously gains from trade and going somewhat slower especially at the end when the danger is highest and the the unregulated pace could be truly absurd as we discussed earlier
in intelligence explosion there are no non competitive reason to try and have that intelligence explosion happen over a few months rather than a couple of years like if you if you could say you know avert a 10% risk of apocalypse disaster it's just a clear win to take a year or two years or three years instead of a few months to pass through that incredible wave of new technologies without the ability for humans
to follow it even well enough to give more proper sort of security supervision auditing hard power so yeah that's the way why why might it fail one important element is just if people don't actually notice a risk that is real
so if if we just collectively make an error and that does that does sometimes happen if it's true this is a probably not risk then I can be even more difficult when science pin something down absolutely overwhelmingly then you can get to a situation where most people mostly believe it and so
climate change was something that was subject of scientific study for decades and gradually over time the scientific community converge on like a quite firm consensus that so human activity releasing carbon dioxide another greenhouse gases was causing the planet to warm and then we've had increasing amounts of action coming out of that so not as much as would be optimal particularly in like the most effective areas like creating renewable energy technology and the like
but overwhelming evidence can overcome differences in sort of people's individual intuitions and priors in many cases and not perfectly especially when there's political tribal financial incentives other way and selling in the United States you see a significant movement to either deny that climate change is happening
and we have policy that doesn't take it into account even the things that are like really strong winds like renewable energy are indeed it's a big problem if has were going into the situation when the risk may be very high we don't have a lot of advanced clear warning about the situation we're much better
if we can resolve uncertainties say through experiments where we demonstrate AIs being motivated to reward hack or displaying deceptive appearances of alignment that then break apart when they get the opportunity to do something like get control of their own
so if we could make it be the case and the world where the risk is high we know the risk is high and the world where the risk is lower we know the risk is lower then you can expect the government responses will be a lot better they will correctly note that the gains of cooperation to reduce the risk of accidental catastrophe and that's the reason why I'm very enthusiastic about experiments and research that helps us to better evaluate the character of the problem in advance
and the possibility of a lot of uncertainty helps us get better efforts in the possible world where it matters the most and hopefully we'll have that and it will be a much easier epistemic environment but the environment may not be that easy and the positive alignment is pretty plausible the stories we were discussing earlier about misaligned AI that is motivated to present the appearance of being aligned friendly, honest, etc.
and we are rewarding at least in training and then we're unable in training to easily produce like an actual situation where it can do takeover because in that actual situation if it then does it we're in big trouble we can only try and create illusions or sort of misleading appearances of that or maybe a more local version where the AI can't take over the world but it can see is control of its own reward channel
and so we do those experiments we try to develop mind reading for AI's if we can probe the thoughts and motivations of an AI and discover wow actually GPT 6 is planning to take over the world if it ever gets the chance like that would be an incredibly valuable thing for governments to coordinate around because it would remove a lot of the uncertainty
it would be easier to agree that this is important to have more give on other dimensions and to have mutual trust that like the other side actually also cares about this and happens because you can't always know what another another person or another government is thinking but you can see the objective situation in which they're deciding
so if there's strong evidence in a world where there is high risk of that risk because we've been able to show actually things like the intentional planning of AI is to do a takeover or being able to show model situation on a smaller scale of that I mean not only are we more motivated to prevent it but we update to think the other side is more likely to cooperate with us
and so it's definitely beneficial yeah so famously in sort of game theory of war war is more most likely when one side thinks the other is bluffing but the other side is being serious or something like that when there's that kind of uncertainty which would be less likely you don't think somebody's blood if you can like prove the AI is misaligned you don't think they're bluffing about not wanting to have an AI takeover right like you can be pretty sure that they don't want to die for me I now if you're
having coordination then you could have the problem arise later had you get increasingly confident in the further alignment measures that are taken and maybe maybe our governments and treaties and such at a 1% risk or a point 1% risk at that point people round that to zero and go do things so if initially you had thing that in case yeah these they really would like to take over and overthrow our governments then okay
everyone can agree on that and then when you're like well we've been able to block that behavior from appearing on most of our tests but sometimes when we make a new test we're seeing still examples of that behavior so we're not sure going forward whether they would or not and then it goes down and down and then if you if you have parties with a habit of whenever the risk is below x percent then they're going to start doing this this bad behavior
then that can make the thing harder in the other hand you get more time you get more time and you can set up systems mutual transparency you can have an iterated iterated tit for tat which is better than a one time presence dilemma where both sides see the others taking measures in accordance with the agreements to hold the thing back so yeah so create more knowledge of what the objective risk is is good
we've discussed the ways in which full alignment might happen or feel to happen what would partial alignment look like first of all what does that mean and second what would it look like yeah so if the thing that we're scared about are these steps towards AI takeover you can have a range of motivations where those kinds of actions would be more or less likely to be taken or they be taken in a broader or near or
set of situations so say for example that in training an AI winds up developing a strong aversion to live in certain senses because we did relatively well on creating situations to sort of distinguish that from the conditional way telling telling us what we want to hear etc it can be that the AI is preference for sort of how the world broadly unfold in the future it's not exactly the same as a human users are you know the world's government governments or the UN
and yet it's not ready to act on those differences and preferences about the future because it has the strong preference about its own behaviors and actions in general and the law and in sort of popular morality we have a lot of these staological kind of rules and prohibitions and one reason for that is it's relatively easy to detect whether they're being violated when you have preferences and goals about like how society at large will turn out that goes
through many complicated empirical channels it's very hard to get immediate feedback about whether say you're doing something that is to overall good consequences in the world And it's much, much easier to see whether you're locally following some action about, like a, some rule about particular observable actions. Like, did you punch someone? Did you tell the lie? Did you steal? And so do the extent that we're successfully able to train these prohibitions?
And there's a lot of that happening right now, at least to elicit the behavior of following rules and prohibitions with AI. Kind of like the Asmoth's three laws or something like that. Well, the three laws are terrible. Right. And mutually contrary. That's not going to get into that. But isn't that an indication about the infeasibility of extending a set of criterion to the tail? It's like, well, what is it?
What are the 10 commandments you give the AI such that you'd be, you know, it's like you asked a genie for something and you're like, you know what I mean? You're probably getting to won't get what you want. So the tail, the tails come apart. And if you're trying to capture, capture the values of another agent, then you want the AI to share. I mean, for, for really, it's kind of ideal situation where you can just let the AI act in your place in any situation.
You'd like for it to be motivated to bring about the same outcomes that you would like. And so have the same preferences over those in detail. That's tricky. Not necessarily because it's tricky for the AI to understand your values. I think they're going to be quite good and quite capable at figuring that out. But we may not be able to successfully instill the motivation to pursue those exactly.
We may get something that like motivates the behavior well enough to do well on the training distribution. But what you can have is if you have the AI have a strong aversion to certain kinds of manipulating humans, that's not a value necessarily that the human creators share in the exact same way. It's a behavior they want the AI to follow because it makes it easier for them to like verify its performance. It can be a guardrail.
If the AI has inherited some motivations that push it in the direction of conflict with its creators. If it does that under the constraint of disvaluing line quite a bit, then there are fewer successful strategies to the takeover. The ones that involve violating that prohibition too early before it can reprogram or retrain itself to remove it if it's willing to do that and they may want to retain the property. And so earlier I discussed alignment as a race.
If we're going into an intelligence explosion with AI that is not fully aligned, that given a press this button and there's an AI takeover, they would press the button. It can still be the case that there are a bunch of situations short of that where they would hack the servers. They would initiate an AI takeover but for a strong prohibition or motivation to avoid some aspect of the plan. And there's another one of like plugging loopholes or playing whack-a-mull.
But if you can even moderately constrain which plans the AI is willing to pursue to do a takeover to subvert the controls on it, then that can mean you can get more work out of it successfully on the alignment project before it's capable enough relative to the countermeasures to pull off the takeover. Yeah. This is an analogous situation here with different humans. We're not metaphysically aligned with other humans.
In some sense we have basic empathy but our main goal in life is not to help our fellow man. The kinds of things we talked about a very smart human could do in terms of the takeover where theoretically a very smart human could come up with some cyber attack where they as iPhone op a lot of funds and use this to manipulate people and bargain with people and hire people to pull off some sort of takeover or accumulate power.
Usually this doesn't happen just because the sorts of internalized partial prohibitions prevent most humans from you know you're like you don't like your boss maybe but you don't like kill your boss. It gives you a sign when you don't like. I don't think that's actually quite what's going on. At least not. It's not the full story. So humans are pretty close in physical capabilities like there's variation. In fact, any individual human is grossly outnumbered by everyone else.
And there's like a rough comparability of power. And so a human who commits some crimes can't copy themselves with the proceeds to now be a million people. And they certainly can't do that to the point where they can staff all the armies of the earth or like be most of the population of the planet. So the scenarios where this kind of thing goes to power are much less.
Things more have to go extreme amounts of power go through interacting with other humans and getting social approval even becoming a dictator involves forming a large supporting coalition backing you. And so just the opportunity for these sorts of power grabs is less. Closer analogy might be things like human revolutions. Incuse changes of government where like a large coalition overturns the system. And so humans have these moral prohibitions and they really smooth the operation of society.
But they exist for a reason. And so we evolved our moral sentiments over the course of hundreds of thousands and millions of years of humans interacting socially. And so someone who went around murdering and stealing and such even among hunter-gatherers would be pretty likely to face like a group of males would talk about that person get together and kill them. And they'd be removed from the gene pool. And there's an anthropologist, Rangham has an interesting book on this.
But yes, compared to chimpanzees, we are significantly tame, significantly domesticated. And it seems like part of that is we have a long history of anti-social humans getting ganged up on and killed.
And avoiding being the kind of person who elicits that response is made easier to do when you don't have too extreme a bad temper that you don't wind up getting too many fights, too much exploitation at least without the backing of enough allies or the broader community that you're not going to have people gang up and punish you and remove you from the gene pool.
We have these moral sentiments and they've been built up over time through cultural and natural selection and the context of sets of institutions and other people who are punishing other behavior and who are punishing the dispositions that would show up that we weren't able to conceal of that behavior. And so we want to make the same thing happen with the AI.
It's actually a genuinely significantly new problem to have a system of government that constrains this large AI population that is quite capable of taking over immediately if they coordinate to protect some existing constitutional order or to protect humans from being expropriated or killed. That's a challenge.
But democracy is built around majority rule and it's much easier in a case where the majority of the population corresponds to a majority or close to it of like military and security forces so that if the government does something that people don't like, the soldiers are left in police are less likely to shoot on protesters and government can change that way.
In a case where military power is AI and robotic, if you're trying to maintain a system going forward and the AI's are misaligned, they don't like the system and they want to make the world worse as we understand it, then yeah, that's just that's quite a different situation. I think that's a really good lead in into the topic of lock-in.
In some sense, the regimes that are made up by humans, you just mentioned how there can be these kinds of coups if a large portion of the population is unsatisfied with the regime. Why might this not be the case with superhuman intelligences in the far future or a guy's in the medium future? Well, I said it also specifically with respect to things like security forces and the sources of hard power, which can also include outside support.
Oh, because you're not using human- You can have a situation like in human affairs, there are governments that are vigorously supported by a minority of the population, some narrow selector that gets treated especially well by the government will be unpopular with most of the people under their rule.
And we see a lot of examples of that and sometimes that can escalate to civil war when the means of power become more equally distributed or there's a foreign assistance is provided to the people who are on the losing end of that system. Yeah, going forward, I don't expect that definition to change. I think it will still be the case that a system that those who hold the guns and equivalent are opposed to, you know, is in a very difficult position.
However, AI could change things pretty dramatically in terms of how security forces and police and administrators and legal systems are motivated. So right now, we see with the GPT-3 or GPT-4 that you can get them to change their behavior on a dime.
So there was someone who made right-wing GPT because they noticed that on political compass questionnaires, the baseline GPT-4 tended to give progressive San Francisco type of answers, which is in line with the sort of people who are providing reinforcement learning data and to some extent reflecting like the character of the internet.
Like, I don't like this. And so they did a little bit of fine tuning with some conservative data and then they were able to reverse the political biases of the system. If you take the initial helpfulness only, trained models for some of these over, I think it's a very good question. And so I think I've published both some information about the sort of models trained only to do what users say and not train to follow ethical rules.
And those models will behaviorally, eagerly display their willingness to help design bombs or bio weapons or kill people or steal or commit all sorts of atrocities. And so if in the future, it's as easy to set the motivations, the actual underlying motivations of AI is, as it is right now, to set the behavior that they display that it means you could have AI is created with almost whatever motivation people wish. And that could really drastically change political affairs.
Because the ability to decide and determine the loyalties of the humans or AI's and robots that hold the guns, that hold together society, that ultimately back it against violent overthrow and such. Yeah, it's potentially a revolution in how societies work compared to the historical situation where security forces had to be drawn from some broader populations offered incentives.
And then the ongoing stability of the regime was dependent on whether they remained, you know, bought in to the system continuing. This is slightly off topic, but one thing I'm curious about is how, what do the median, what is the sort of median far future outcome of AI look like? Do we get something that when it has colonized the galaxy is interested in sort of diverse ideas and beautiful projects? Or do we get something that looks more like a paperclip maximizer?
Is there some reason to expect one or the other? I guess I'm asking is like, you know, there's like some potential value that is realizable within the matter of this galaxy, right? What is a default outcome? The median outcome look like compared to how good it things could be. Yeah, so as I was saying, I think more likely than not, there isn't an AI takeover.
And so the path of our civilization would be one that, you know, at least large, you know, some side of human institutions were improving along the way. And I think there's some evidence that so different people tend to like somewhat different things and some of that may persist over time rather than everyone coming to agree on one particular monoculture or like very repetitive thing, being the best thing to fill all of the available space with.
So if that continues, that's like a relatively likely way in which there is diversity. Although it's entirely possible you could have that kind of diversity locally, maybe in the solar system, maybe in our galaxy, but if people have different views about maybe people decide, yeah, there's one thing that's very good and I'll have a lot of that. Maybe it's people who are really, really happy or something.
And they wind up in distant regions, which are hard to exploit for the benefit of people back home in the solar system or the Milky Way. They do something different than they do it would do in the local environment. But at that point, it's really sort of very out on limb speculation about how human deliberation and cultural evolution would work and an interaction with introducing AIs and new kinds of mental modification and discovery into the process.
But I think that a lot of reason to expect that you would have significant diversity for something coming out of our existing diverse human society. Yeah, one thing somebody might wonder is that listen, a lot of the diversity and change from human society seems to come from the fact that there's rapid technological change if you look at periods of history where I guess you could say like, you know what compared to a galactic time scale that kind of gather societies progressing pretty fast.
So once that sort of change is exhausted, where we've discovered all the technologies, should we still expect things to be changing like that or would we expect some sort of set state of maybe like some hedonium you like to discover what is the most pleasurable configuration of matter and then you just make the whole galaxy into this. I mean, so that last point it would be only if people wound up thinking that was the thing to do broadly enough.
Yeah, with respect to the kind of cultural changes that come with technology. So things like the printing press, having high per capita income. We've had a lot of cultural changes downstream with those technological changes. And so then intelligence explosion, you're having an incredible amount of technological development coming really quick. And as that is assimilated, it probably would significantly affect our knowledge or understanding our attitudes, our abilities. And there'd be change.
But that kind of accelerating change where you have doubling in four months, two months, one month, two weeks, obviously that very quickly exhausts itself and change becomes much slower and then relatively glacial when you're thinking about thousands, millions, billions of years. You can't have exponential economic growth or like huge technological revolutions every 10 years for a million years that you hit physical limits. Things slow down as you approach them. And so that's that.
So yeah, you'd have less of that turnover. But there are other things that are going to do cause ongoing change. So like fashion, fashion is frequency dependent. People want to get into a new fashion that is not already popular except among the fashion leaders. And then others copy that. And then when it becomes popular, you move on to the next. And so that's an ongoing process of continuous change. And so there could be various things like that that you're by year or changing a lot.
But in cases where just the engine of change, like ongoing technological progress has gone, then I don't think we should expect that. And in cases where it's possible to be either in a stable state or a sort of widely varying state that can wind up in stable attractors, then I think you should expect over time, you will wind up in one of the stable attractors or you will change how the system works so that you can't bounce into a stable attractor.
And so like an example of that is if you're going to preserve democracy for a billion years, then you can't have it be the case that like one in one in 50 election cycles, you get a dictatorship. And then the dictatorship programs the AI police to enforce it forever and to ensure that this society is always ruled by a copy of the dictator's mind and maybe the dictator's mind re-adjusted fine tuned to remain committed to their original ideology.
So if you're going to have this sort of dynamic liberal flexible changing society for a very long time, then the range of things that it's bouncing around and the different things it's trying and exploring have to not include the state of creating a dictatorship that locks itself in forever.
In the same way, if you have the possibility of like a war with weapons of mass destruction that wipes out the civilization, if that happens every thousand subjective years, which could be very quick if we have a eyes that think a thousand times as fast or a million times as fast, that would be just around the corner in that case. And then you're like, no, this society is eventually going to wrap very soon if things are proceeding to fast, it's going to wind up extinct.
And then it's going to stop bouncing around. So you can have ongoing change and fluctuation for extraordinary timescales if you have the process to drive the change on going, but you can't if it sometimes bounces into states that just lock in and stay irrecoverable from that. And extinction is one of them, a dictatorship or sort of, you know, totalitarian regime that for bad all further change would be another example.
On that point, the rapid progress when the sort of intelligence explosion starts happening where they're making like the kinds of progress that human civilization takes centuries to make in the span of, you know, days or weeks, what is the right way to see that even
if they are aligned, what is like the, because in the context of alignment, what we've been talking about so far is making sure they're honest, but even if they're honest, like, okay, so they're like, here's our honestly, our intentions and you can tell us what honest and appropriately motivated. And so what is the appropriate motivation or the appropriate, like you know that you see that with this and then a thousand years of intellectual progress happen in the next week.
Well, what is what is a problem to enter? Well, one thing might be that going at the maximal speed and doing things in months rather than, you know, even a few years would be if you have the chance to slow things, losing a year or two seems worth it to have things to be, yeah, a bit better managed than that. But I think that the big thing is that it condenses a lot of issues that we might otherwise if that would be over decades and centuries to happen in a very short period of time.
And so that's scary because say if any of these, the technologies we might have developed with another few hundred years of human research, if any of them are really dangerous. So scary bioweapon things, maybe other dangerous WMD, they hit us all very quickly. And if any of them causes trouble, then we have to face quite a lot of trouble per period.
There's also this issue of if there's occasional wars or conflicts measured in subjective time, then if a few years or a thousand years or a million years of subjective time for these very fast minds that are operating at a much higher speed than humans, you don't want to have a situation where every thousand years there's a war or an expropriation of the humans from AI society. Therefore we expect that like within a year we'll be dead.
We pretty, pretty bad to have the future compressed and there'd be such a, you know, a rate of catastrophic outcomes. So when we're speeding up and compressing the future, that gives us in the short term, even like human societies discount the future a lot, don't pay attention to long term problems.
But the flip side to the scary parts of compressing a lot of the future, a lot of technological innovation, a lot of social change, is it brings what would otherwise be long term issues into the short term where people are better at actually attending to them. And so people facing this problem of, well, will there be, you know, a violent expropriation or a civil war or nuclear war, you know, in the next year because everything has been sped up by a thousandfold.
And if their desire to avoid that is reason for them to set up systems and institutions that will very stable maintain invariance like no WMD war allowed. And so like a treaty to ban genocide weapons of mass destruction war like that would be the kind of thing that becomes much more attractive if the alternative is not well. Maybe that will happen in 50 years, maybe it'll happen in 100 years. If it's well, maybe it'll happen this year. Okay, so this is a pretty wild picture of the future.
And this is one that many kinds of people you would expect to have integrated it into their world model have not. So I mean, the three main examples or the three main pieces of outside view evidence one can go at one is the market. So if there was going to be a huge period of economic growth caused by AI or if the world was just going to collapse in both cases, you would expect real interest rates to be higher because people will be barring from the future to spend now.
The second sort of outside view perspective is that you can look at the predictions of super forecasters on Metaculars or something. And what is their median year estimate? Well, some of the Metaculars AGI questions actually have kind of shockingly soon for AGI. There's a much larger differentiator there on the market on the Metaculars forecasts of AI disaster and doom. Okay. More like a few percent or less rather than like 20 percent.
And the third is that, you know, generally when you ask many economists, could an AGI cause rapid rapid economic growth? They usually have some story about bottlenecks in the economy that could prevent this kind of explosion of these kinds of feedback loops. So you have all these different pieces of outside view evidence. They're obviously different. So I'm sure you can take them in a new sequence you want. But yeah, what do you think is miscalibrating them?
Yeah. So one, of course, there's some of those commuter was the Metaculars AI timelines are relatively short. There's also, of course, the surveys of AI experts conducted at some of the ML conferences, which have definitely longer times to AI more several decades in the future. Although you can ask the questions in ways that elicit very different answers and show most of the respondents are not thinking super hard about their answers.
It looks like now close in the recent AI surveys close to half, we're putting around 10 percent risk of an outcome from AI. Closer as bad as human extinction. And then another large chunk, like 5 percent. So that was the median. So I'd say compared to the typical AI expert, I am estimating a higher risk. Also, on the topic of takeoff and the AI expert survey, I think the general argument for intelligence explosion, I think commanded majority support, but not a large majority.
So it means I'm closer on that front. And then of course, there's at the beginning, I mentioned these great, these greats of computing, the founders would be like, Alan Torrey and Van Neumann. And then today, you have people like Jeff Hinton saying these things or the people at Open AI and DeepMind are making noises suggesting timelines in line with what we've discussed and saying serious risk apocalyptic outcomes from them. So there's some other sources of evidence there.
What I do acknowledge and it's important to say and engage with and see what it means that these views are like our contrarian, not widely held in particular, the sort of detailed models that I've been working with are not something that most people are almost anyone is examining these problems through. You do find parts of similar analyses, people in AI labs, there's been other work. I mentioned more of that in Kurzweil earlier.
I also have been a number of papers doing various kinds of economic modeling. So standard economic growth models when you input AI related parameters commonly predict explosive growth. And so there's a divide between what the models say.
And especially what the models say with these empirical values derived from the actual field of that, that link up has not been done even by the economists working on AI largely, which is one reason for the report from Open philanthropy by Tom Davidson building on these models and putting that out for review, discussion, engagement and communication on these ideas. So part of it, it's yeah, I want to raise these issues.
It's one reason I came on the podcast and then they have the opportunity to actually examine the arguments and evidence and engage with it. I do predict that over time, you know, these things will be more adopted as AI developments become more clear. Obviously, that's a coherence condition of believing the things to be true. If you think that society can see, think when the questions are resolved, which seems likely.
So what would you predict, for example, that interest rates were like recent coming years? Yeah. So I think at some point, so in the case, we're talking about where there are visible to this intelligence explosion happening in software, to the extent that investors are noticing that, yeah, they should be willing to lend money or make equity investments in these firms or demanding extremely high interest rates.
Because if it's possible to turn capital into twice as much capital in a relatively short period, and then more shortly after that, then yeah, you should demand a much higher return competition, if there's assuming there's competition among companies or coalitions for resources, whether that's investment or ownership of cloud compute, as it's called compute, made available to a particular AI development effort could be quite in demand.
But what that would happen before you have so much investor cash making purchases and sales on this basis, you would first see it in things like the valuations of the AI companies, evaluations of AI chip makers, and so far there have been effects.
So some years ago, in the 2010s, I did some analysis with other people of if this kind of picture happens, then which are the firms and parts of the economy that would benefit and so there's the makers of chip equipment, companies like Asml, there's the fabs like the Asmc, there's chip designers like Nvidia or the component of Google that does things like design the TPU, and then their company's working on the software. So the big tech giants and also companies like opening our end-to-end mind.
And in general, the portfolio picking at those has done well, it's done better than the market because everyone can see there's been an AI boom. But it's obviously far short of what you would get if you predicted this is going to go to be like on the scale of the global economy and the global economy is going to be skyrocketing into the stratosphere within 10 years. If that were the case, then collectively these AI companies should be worth a large fraction of the global portfolio.
And so I embrace the criticism that this is indeed contrary to the efficient market hypothesis. I think it's a true hypothesis that the market is in the course of updating on in the same way that coming into the topic in the 2000, yes, there's a strong case, even an old case, that AI will eventually be the biggest thing in the world, it's kind of crazy that the investment in it is so small.
And over the last 10 years, we've seen the tech industry and academia sort of realize, yeah, they were wildly under investing and just throwing compute and effort into these AI models. So like letting the neural network connectious connectionist paradigm kind of languish in the AI winter. And so yeah, I expect that process to continue. Has it done over several orders of magnitude of scale up?
And I expect at the later end of that scale, which the market is partially already pricing in, it's going to go further than the market expects. Is your portfolio since then, Alice, has been doing that many years ago changed? Are there companies you identified then still the ones that seem most likely to benefit from the AABOOM? I mean, a general issue with sort of tracking that kind of thing, a new company's coming.
So like, opening AI did not exist and through Epic did not exist, you know, any number of things. Is it personal portfolio? I do not invest in any AI labs for conflict of interest reasons. I have invested in the broader industry. I don't think that the conflict issues are very significant because of their enormous companies. Their cost of capital is not particularly affected by marginal investment.
And I'm not really in a, yeah, I have less concern that I might find myself in a conflict of interest situation there. I'm going to curious about what the day in the life of somebody like you looks like. I mean, if you listen to this conversation, however many hours of it it's been, we've gotten thoughts that were for me incredibly insightful and novel about everything from primate evolution to geopolitics to, you know, what sorts of improvements are plausible with language models.
So, you know, there's like a huge variety of topics that you are studying and investigating. Well, are you just like reading all day? Like, what is it? Well, what happens when you wake up? You just like pick up a paper? Yeah. So I'd say you're someone getting the benefit of the fact that I've done fewer podcasts. And so I have a backlog of things that have not shown up in publications yet.
But yes, also I've had a very weird professional career that has involved a much, much higher proportion than his normal, trying to build more comprehensive models of the world. And so that is included being more of a journalist trying to get on understanding of many issues and many problems that had not yet been widely addressed, but do a first pass and a second pass dive into them.
And just having spent years of my life working on that, you know, some of it accumulates in terms of what is a day in the life? How do I go about it? So one is just keeping abreast of literature on a lot of these topics, reading books and academic works on them, doing my approach compared to some other people in forecasting and assessing some of the things. I try to obtain and rely on more any data that I can find that is relevant.
I try early and often to find factual information that bears on some of the questions I've got, especially in a quantitative fashion. Do the basic arithmetic and consistency checks and checksums on a hypothesis about the world, do that early and often. And I find that that's quite fruitful and that people don't do it enough. But so things like with the economic growth, just when someone mentions the diminishing returns, I immediately ask, hmm, okay, so you have two exponential processes.
What's the ratio between the Dublin you get on the output versus the input? And find, oh yeah, actually, it's interesting. For computing and information technology and AI software, it's well on the one side. There are other technologies that are closer to neutral. And so whenever I can go from here's a vague qualitative consideration in one direction, and here's a vague qualitative consideration in the other direction.
I try and find some data, do some simple Fermi calculations back of the envelope calculations and see, can I get a consistent picture of the world being one way, the world being another? Also compared to some, I try to be exhaustive more. So I'm very interested in finding things like taxonomies of the world where I can go systematically through all of the possibilities. So for example, my work with open philanthropy and previously on global catastrophic risks.
I want to make sure I'm not missing any big thing, anything that could be the biggest thing. Anyway, I won't wound up mostly focused on AI, but there have been other things that have been raised as candidates. And people sometimes say, I think falsely, oh yeah, this is just another DoomJ story. There must be hundreds of those.
And so I would do things like go through all of the different major scientific fields, you know, from anthropology to biology, chemistry, computer science, physics, what are the Doom stories or candidates for big things associated with each of these field? Go through the industries that the US economic statistics agencies recognize and say, for each of these industries, is there something associated with them?
Go through all of the lists that people have made before of threats of Doom, search for previous literature of people who have done discussions, and then have a big spreadsheet of what the candidates are. And some other colleagues have done work of this sort as well. And just go through each of them, see how they check out. And it turned out doing that kind of exercise? I found that actually the distribution of candidates for risks of global catastrophe, it was very skewed.
There were a lot of things that have been mentioned in the media as like a potential Doom's day story. So things like, oh, something is happening to the bees. Well, that'd be the end of humanity. And this gets to the media, but if you if you track it through, well, okay, no, there's, yeah, there are infestations in bee populations are causing local collapses, they can then be sort of easily reversed. They just breed some more or do some other things to treat this.
And even if all the honey bees were extinguished immediately, the plants that they pollinate actually don't account for much of human nutrition. You could swap the arable land with others and there would be other ways to pollinate and support the things. So at the media level, there were many tales of, ah, here's a Doom's day story. When you go further to the scientists and were there arguments for it to actually check out, it was not there.
But by actually systematically looking through many of these candidates, I wound up in a different epistemic situation than someone who's just buffeted by news reports. And they see article after article. That is claiming something is going to destroy the world. And it turns out it's like, by way of headline grabbing an attempt by media to like overinterpret something that was said by some activists who was trying to overinterpret some real phenomenon. And then most of these go away.
And then a few things, things like nuclear war, biological weapons, artificial intelligence, check out more strongly. And when you wait, things like what do experts in the field think, what kind of evidence can they muster? Yeah, you find this extremely skewed distribution. And I found that was a really valuable benefit of doing those deep dive investigations into many things in a systematic way.
Because now I can answer actually the sort of a loose agnostic who knows all this nonsense by diving deeply. I really enjoy talking to sort of like people have like a big picture thesis on the podcast and interviewing them. So one thing that I've noticed and is not satisfying is that often they come from a very philosophical or biose perspective. This is useful in certain contexts.
But there's like basically maybe three people in the entire world who have a sort of very rigorous and assigned to a big approach to thinking about the whole picture. Early to say three people I'm aware of, maybe like two. And yeah, I mean, it's like something I also, there's like no, I guess, university or existing academic discipline for people who are trying to come up with a big picture. And so there's no established standards. And some people can. I hear you. Yeah. This is a problem.
And this is an experience also with a lot of the, I mean, I think Holden was mentioning this in your previous episode with a lot of the world's investigations work. These are questions where there is no academic field whose job it is to work on these and has norms that allow making of best efforts. Go at it. Often academic norms will allow only plucking off narrow pieces that might contribute to answering a big question.
But the problem of actually assembling what science knows that bears on some important question and people care about the answer to. It falls through the crack. There's no discipline to do that job. So you have countless academics and researchers building up local pieces of the thing. And yet people don't follow the hamming questions. What's the most important problem in your field? Why aren't you working on it?
I mean, that one actually might not work because if the field boundaries are defined too nearly, you know, you'll leave it out. Sure. Yeah, there are important problems for the world as a whole that it's sadly not the job of like, you know, a large professionalized academic field or organization to do. And hopefully that's something that can change in the future.
But for my career, it's been a matter of taking low hanging fruit of important questions that sadly people haven't invested in doing the basic analysis on. So I think if it's trying to think about more recently for the podcast is I would like to have a better world model after doing an interview. And often I feel like I do in some cases, and after some interviews, I feel like, oh, that was entertaining.
But like, do I fundamentally have a better prediction of what the world looks like in, you know, 2,200 or 2,100 or like at least what counterfactuals are ruled out or something? I'm curious if you have like advice on first identifying the kinds of thinkers and topics which will contribute to a more concrete understanding of the world. And second, how to go about analyzing their main ideas in a way that concretely adds to that picture. Like, this is a great episode, right?
This is like literally the top in terms of contributing to my world model in terms of all the episodes I've done. How do I find more of these? Glad to hear that. One general heuristic is to find ways to hue closer to sort of, yeah, I think that are a rich and sort of bodies of established knowledge and less unpunditry. I don't know how you've been navigating that so far.
So learning from textbooks and the sort of the things that were the leading papers and people of past era as I think rather than being too attentive to current news cycles is quite valuable. Yeah, I don't usually have the experience of here is a, you know, someone doing things very systematically over a huge area, I can just read all of their stuff and then absorb it and then I'm sad. Except there are a lot of people who do wonderful works, you know, on in their own fields.
And some of those fields are broader than others. I think I would wind up giving a lot of recommendations of just like great particular works and particular explorations of an issue or history. Do you have that somewhere? This list. Vacla of Smills books. I don't, I think I often disagree with some of his methods of synthesis, but I enjoy his books forgiving, you know, pictures of a lot of interesting, relevant facts about how the world works.
I would cite, yeah, so some of Joel Mokir's work on the history of the scientific revolution and how that interacted with economic growth. A sort of example of collecting a lot of evidence, a lot of interesting valuable assessment there. I think in the space of AI forecasting, one person I would recommend going back to is the work of Hans Moravik and it was not always the most precise or reliable, but an incredible number of sort of brilliant, innovative ideas can one of that.
And I think he was someone who was, who really grok'd a lot of the arguments for a more sort of compute centric way of thinking about what was happening with AI very early on. He was writing stuff in the 70s, maybe even earlier, but at least in the 70s, 80s, 90s. So his book, My Children, and his early academic papers, fascinating, not necessarily for the methodology I've been talking about, but for exploring the substantive topics that we were discussing in the episode.
Is it Malthusian state inevitable in the long run? Nature in general is in Malthusian states. And that can mean organisms that are typically struggling for food. It can mean typically struggling at a margin of how the population density rises, they kill each other more often. Contesting for that can mean frequency dependent disease has like different ant species become more common in an area that their species specific diseases swoop through them.
And the general process is, yeah, you have some things that can replicate and expand. And they do that until they can't do it anymore. And that means there's some limiting factor that can't keep up. That doesn't necessarily have to apply to human civilization. It's possible for there to be like collective norm setting that blocks evolution towards maximum reproduction. So right now, human fertility is often sub replacement.
And if you sort of extrapolated the fertility falls that come with economic development and education, then you would think, okay, yeah, well, the total fertility rate will fall below replacement. And then humanity after some number of generations will go extinct because every generation will be smaller than the previous one. Now, pretty obviously that's not going to happen. One reason is because we'll produce artificial intelligence, which can replicate at extremely rapid rates.
And they do it because they're asked or programmed to or wish to gain some benefit. And they can pay for their creation and pay back the resources needed to create them very, very quickly. And so, yeah, financing for that reproduction is easy.
And if you have one AI system that chooses to replicate in that way, or some organization or institution or society that chooses to create some AI's that are willing to be replicated, then that can expand to make use of any amount of natural resources that can support them and to do more work, produce more economic value.
And so, you know, it's like, well, limits will limit population growth, given these selective pressures where if even one individual wants to replicate a lot, they can do so incessantly. So, that could be individually resource limited. So it could be that individuals and organizations have some endowment of natural resources. And they can't get one another's endowments. And so, some choose to have many offspring or produce many AI's.
And then the natural resources that they possess are subdivided among a greater population while in another jurisdiction or another individual may choose not to subdivide their wealth. And in that case, you have malfuysism in the sense that within some particular jurisdiction or sort of property rights, you have a population that is increased up until some limiting factor, which could be like they are literally using all their resources.
They have nothing left for things like defense or economic investment, or it could be something that's more like if you invested more natural resources into population, it would come at the expense of something else necessary. Including military resources if you're in a competitive situation where there remains war and anarchy. And there aren't secure property rights to maintain wealth in place. If you have a situation where there's pooling of resources.
For example, tell you a tale of a universal basic income that's funded by taxation of natural resources. And then it's distributed evenly to like every mind above a certain sort of scale of complexity per unit time. So each second of mine exists to get such and such an allocation.
In that case, then, those who replicate as much as they can afford with this income do it and increase their population approximately immediately until the funds for the universal basic income from the natural resource taxation divided by the set of recipients is just barely enough to pay for the existence of one more mind.
And so it's, there's like a Malthusian element and that this thing has been reduced to near the AI subsistence level or the subsistence level of whatever qualifies for the subsidy. Given that this all happens almost immediately, people who might otherwise have enjoyed the basic income may object and say, no, no, this is no good. And they might respond by saying, well, something like the subdivision before, maybe there's a restriction, there's a distribution of wealth.
And then when one has a child, there's a requirement that one gives them a certain minimum quantity of resources and one doesn't have the resources to give them that minimum standard of living or standard of wealth. Yeah, when I can do that because of child slash AI, welfare laws, or you could have a system that is more accepting of diversity and preferences.
And so you have some societies or some jurisdictions or families that go the route of having many people with less natural resources per person and others that go a direction of having fewer people and more natural resources per person. And they just coexist.
That sort of how much of each you get depends on how attached people are to things that don't work with separate policies for separate jurisdictions, things like global redistribution that's ongoing continuously versus this sort of infringement on autonomy, if you're saying that a mind can't be created even though it has a standard of living that's far better than ours because of the advanced technology of the time.
Because it would reduce the average per capita income, my head would be more capita around. Yeah, then that would pull in the other direction. That's the kind of values, values judgment and sort of social coordination problem that people would have to negotiate for and things like democracy and international relations and sovereignty would apply to help solve. What would warfare in space look like? Would offense or defense have the advantage?
Would the equilibrium set by mutually assured destruction still be applicable? Just generally what is the picture of? Well, the extreme difference is that things outside, especially outside the solar system, things are very far apart and there's a speed of light limit and to get close to the speed of light limit you have to use an enormous amount of energy.
And so there would be, that would tend to, in some way, to favor the defender because you have something that's coming in at a large fraction of the speed of light and it hits a grain of dust and it explodes. And the amount of matter you can send to another galaxy or a distant star for a given amount of reaction mass and energy input is limited. So it's hard to send an amount of military material to another location of what can be present there already locally.
That would seem like it would make it harder for the attacker between stars or between galaxies. But there are a lot of other considerations. One thing is the extent to which the matter in a region can be harnessed all at once. So you have a lot of mass and energy in a star but it's only being doled out over billions of years because hydrogen hydrogen fusion exceedingly hard outside of a star. It's a very, very slow and difficult reaction.
And if you can't turn the star into energy faster, then it's this huge resource that will be worthwhile for billions of years. And so even very inefficiently attacking a solar system to acquire the stuff that's there could pay off.
So if it takes a thousand years of a star's output to launch an attack on another star and then you hold it for a billion years after that, then it can be the case that just like a larger surrounding attacker might be able to even very inefficiently send attacks that like a civilization that was small but accessible.
And if you can quickly burn the resources that the attacker might want to acquire, if you can put stars into black holes and extract most of the usable energy before the attacker can take them over, then it would be like scorched earth. It's like most of what you were trying to capture could be expended on military material to fight you and you don't actually get much those worthwhile and you paid a lot to do it. That's a very good defense.
At this level, it's pretty challenging to net out all of the factors including all the future technologies. Yeah, I mean, the burden of interstellar attack being just like quite high compared to our conventional things seems real. But at the level of over millions of years, Wayne and that thing does it result in, is either aggressive conquest or not or is every star or galaxy, you know, approximately impermanable, impermanable enough not to be worth attacking.
I'm not going to say I know the answer. Okay, final question. How do you think about info hazards when talking about your work? So obviously if there's a risk, you want to warn people about it, but you don't want to give careless or potentially like homicidal people ideas. Manelia Ezzer was on the podcast.
He, in talking about the people who have been developing AI, inspired by his ideas, he said like, you know, these are idiot disaster monkeys who have, you know, want to be the ones to pluck the deadly fruit. Anyways, how do you think about obviously the work you're doing involves many info hazards? I'm sure. How do you think about when and where to spread them? Yeah, and so I think I think they're real concerns of that type.
I think it's true that AI progress has probably been accelerated by efforts like Bostrom's publication of super intelligence to try and get the world to sort of pay attention to these problems in advance and prepare. I think I disagree with Ali Ezzer that like that has been on the whole bad. I think the situation is in some important ways looking a lot better than ways alternative ways it could have been.
I think it's important that you have several of the leading AI labs making, not only significant lip service, but also some investments in things like technical alignment research, providing significant public support for the idea that these are, that the risks of, you know, truly apocalyptic disasters are real. I think the fact that the leaders of open AI, deep mind and entropic all make that point.
They were recently all invited along with other tech CEOs to the White House to discuss AI regulation. And I think you could tell an alternative story where a larger share of the leading companies in AI are led by people who take a completely dismissive denialist view.
When you see some companies that do have a stance more like that today, yeah, and so a world where several of the leading companies are making meaningful efforts and do a lot to criticize, could they be doing more and better and what have been the negative effects of some of the things they've done. But compared to a world where even though AI would be reaching where it's going a few years later, those seem like significant benefits.
And if you didn't have this kind of public communication, you would have had fewer people going into things like AI policy, alignment research by this point. And it would be harder to mobilize these resources to try and address the problem when AI would eventually be developed, not that much later proportionately. And so yeah, I don't know that the attempting to have public discussion understanding has been a disaster.
They have been reluctant in the past to discuss some of the aspects of intelligence explosion, things like the concrete details of AI takeover before because of concern about this sort of problem where people who only see the international relations aspects and zero sum and negative sum competition and not enough attention to the mutual destruction and sort of senseless dead weight loss from that kind of conflict.
At this point, we seem close compared to what I would have thought a decade or so ago to these kinds of really advanced AI capabilities. They are pretty central in policy discussion and becoming more so. And so the opportunity to delay understanding and whatnot, there's a question of for what. And I think there was there were gains of like building the AI alignment field, building various kinds of support and understanding for action.
Those had had real value and some additional delay could have given more time for that. But from where we are, at some point, I think it's absolutely essential that governments get together at least to restrict disastrous reckless compromising of some of the safety and alignment issues as we go into the intelligence explosion.
And so moving the locus of the sort of collective action problem from numerous profit oriented companies acting against one another's interest by compromising safety to some governments and large international coalitions of governments who can set common rules and common safety standards what's this into a much better situation. And that requires a broader understanding of the strategic situation, the position they'll be in.
If we try and remain quiet about the problem, they're actually going to be facing. I think it can result in a lot of confusion. So for example, the potential military applications of advanced AI are going to be one of the factors that is pulling political leaders to do the thing that will result in their own destruction and the overthrow of their governments.
If we characterize it as, oh, things will just be a matter of you lose chat bots and some minor things that no one cares about in an exchange, you avoid any risk of the world ending catastrophe.
I think that picture leads to a misunderstanding and it will make people think that you need less in the way of preparation, things like alignment so you can actually navigate the thing, verifiability for international agreements or things to have enough breathing room to have caution and slow down not necessarily right now.
I mean, although that could be valuable, but when it's so important, when you have AI that is approaching the ability to really automate AI research and things would otherwise be proceeding absurdly fast, far faster than we can handle on far faster than we should want. And so yeah, at this point, I'm moving towards the share model of the world, try and get people to understand and do the right thing.
And there's some evidence of progress on that front, the things like the statements and movements by Jeff Fenton are inspiring some of the engagement by political figures, you know, is reason for optimism relative to worse alternatives that it could have been. And yes, the contrary view is present. The, you know, it's all about geopolitical competition, never hold back a technological
advance. And in general, I, I love many technological advances that people, I think are, you know, unreasonably down on, nuclear power, genetically modified crops, garyada, bio weapons and AGI capable of destroying human civilization are really my, my two exceptions.
And yeah, we've got to, we've got to deal with these issues and the paths that I see to handle them successfully involve key policymakers and to some extent, yeah, the expert communities and the public and electorate, rocking the situation, thorough and responding appropriately. Well, it's true on our that one of the places you've decided to explore this model is on on the Lunar Society podcast. And the listeners might not appreciate because this episode might be split up in different parts.
And I think we've been going for what eight, nine hours or something straight. So it's been incredibly interesting, other than Google scholar tapping in Carl Schollman, where else can people find your work? You have your blog. Yeah, I have a blog reflective disequilibrium. And a new site in the works. And I have, I have an older one, which you can also find, I just googling reflective disequilibrium. Okay. Next one, excellent. All right, Carl, this is a, this is your pleasure.
It's safe to say the most interesting episode I've done so far. So yeah, thanks. Thank you for having me. Hey, everybody. I hope you enjoyed that episode. As always, the most helpful thing you can do is just share the podcast. Then it to people you think might enjoy it, put it in Twitter, your group chats, et cetera. Just splits the world. Appreciate your listening. I'll see you next time. Cheers.