I believe that sharing information about the capabilities and limits of existing ML systems, and especially language model agents, significantly reduces risks from powerful AI—despite the fact that such information may increase the amount or quality of investment in ML generally (or in LM agents in particular). Concretely, I mean to include information like: tasks and evaluation frameworks for LM agents, the results of evaluations of particular agents, discussions of the qualitative strengths an...
Aug 02, 2023•20 min
In the early 2010s, a popular idea was to provide coworking spaces and shared living to people who were building startups. That way the founders would have a thriving social scene of peers to percolate ideas with as they figured out how to build and scale a venture. This was attempted thousands of times by different startup incubators. There are no famous success stories. In 2015, Sam Altman, who was at the time the president of Y Combinator, a startup accelerator that has helped scale startups ...
Jul 31, 2023•25 min
This month I lost a bunch of bets. Back in early 2016 I bet at even odds that self-driving ride sharing would be available in 10 US cities by July 2023. Then I made similar bets a dozen times because everyone disagreed with me. Source: https://www.lesswrong.com/posts/ZRrYsZ626KSEgHv8s/self-driving-car-bets Narrated for LessWrong by TYPE III AUDIO . Share feedback on this narration. [125+ Karma Post] ✓ [Curated Post] ✓...
Jul 31, 2023•8 min
Some early biologist, equipped with knowledge of evolution but not much else, might see all these crabs and expect a common ancestral lineage. That’s the obvious explanation of the similarity, after all: if the crabs descended from a common ancestor, then of course we’d expect them to be pretty similar. … but then our hypothetical biologist might start to notice surprisingly deep differences between all these crabs. The smoking gun, of course, would come with genetic sequencing: if the crabs’ ph...
Jul 31, 2023•12 min
The Lightspeed application asks: “What impact will [your project] have on the world? What is your project’s goal, how will you know if you’ve achieved it, and what is the path to impact?” LTFF uses an identical question, and SFF puts it even more strongly (“What is your organization’s plan for improving humanity’s long term prospects for survival and flourishing?”). I’ve applied to all three grants of these at various points, and I’ve never liked this question. It feels like it wants a grand nar...
Jul 28, 2023•11 min
Previously Jacob Cannell wrote the post "Brain Efficiency" which makes several radical claims: that the brain is at the pareto frontier of speed, energy efficiency and memory bandwith, that this represent a fundamental physical frontier. Here's an AI-generated summary The article “Brain Efficiency: Much More than You Wanted to Know” on LessWrong discusses the efficiency of physical learning machines. The article explains that there are several interconnected key measures of efficiency for physic...
Jul 28, 2023•13 min
I think " Rationality is winning " is a bit of a trap. (The original phrase is notably "rationality is systematized winning ", which is better, but it tends to slide into the abbreviated form, and both forms aren't that great IMO) It was coined to counteract one set of failure modes - there were people who were straw vulcans, who thought rituals-of-logic were important without noticing when they were getting in the way of their real goals. And, also, there outside critics who'd complain about st...
Jul 28, 2023•15 min
This post is not about arguments in favor of or against cryonics. I would just like to share a particular emotional response of mine as the topic became hot for me after not thinking about it at all for nearly a decade. Recently, I have signed up for cryonics, as has my wife, and we have made arrangements for our son to be cryopreserved just in case longevity research does not deliver in time or some unfortunate thing happens. Last year, my father died. He was a wonderful man, good-natured, inte...
Jul 28, 2023•3 min
Alright, time for the payoff, unifying everything discussed in the previous post . This post is a lot more mathematically dense, you might want to digest it in more than one sitting. Imaginary Prices, Tradeoffs, and Utilitarianism Harsanyi's Utilitarianism Theorem can be summarized as "if a bunch of agents have their own personal utility functions Ui, and you want to aggregate them into a collective utility function U with the property that everyone agreeing that option x is better than option y...
Jun 12, 2023•41 min
Inspired by Aesop , Soren Kierkegaard , Robin Hanson , sadoeuphemist and Ben Hoffman . One winter a grasshopper, starving and frail, approaches a colony of ants drying out their grain in the sun, to ask for food. “Did you not store up food during the summer?” the ants ask. “No”, says the grasshopper. “I lost track of time, because I was singing and dancing all summer long.” The ants, disgusted, turn away and go back to work. https://www.lesswrong.com/posts/GJgudfEvNx8oeyffH/the-ants-and-the-gras...
Jun 06, 2023•10 min
Summary: We demonstrate a new scalable way of interacting with language models: adding certain activation vectors into forward passes. Essentially, we add together combinations of forward passes in order to get GPT-2 to output the kinds of text we want. We provide a lot of entertaining and successful examples of these "activation additions." We also show a few activation additions which unexpectedly fail to have the desired effect. We quantitatively evaluate how activation additions affect GPT-2...
May 18, 2023•1 hr 43 min
Philosopher David Chalmers asked: "Is there a canonical source for "the argument for AGI ruin" somewhere, preferably laid out as an explicit argument with premises and a conclusion?" Unsurprisingly, the actual reason people expect AGI ruin isn't a crisp deductive argument; it's a probabilistic update based on many lines of evidence. The specific observations and heuristics that carried the most weight for someone will vary for each individual, and can be hard to accurately draw out. That said, E...
May 16, 2023•1 hr 2 min
You are the director of a giant government research program that’s conducting randomized controlled trials (RCTs) on two thousand health interventions, so that you can pick out the most cost-effective ones and promote them among the general population. The quality of the two thousand interventions follows a normal distribution, centered at zero (no harm or benefit) and with standard deviation 1. (Pick whatever units you like — maybe one quality-adjusted life-year per ten thousand dollars of spen...
May 10, 2023•33 min
This is a post about mental health and disposition in relation to the alignment problem. It compiles a number of resources that address how to maintain wellbeing and direction when confronted with existential risk. Many people in this community have posted their emotional strategies for facing Doom after Eliezer Yudkowsky’s “ Death With Dignity ” generated so much conversation on the subject. This post intends to be more touchy-feely, dealing more directly with emotional landscapes than question...
Apr 27, 2023•38 min
The primary talk of the AI world recently is about AI agents (whether or not it includes the question of whether we can’t help but notice we are all going to die.) The trigger for this was AutoGPT , now number one on GitHub, which allows you to turn GPT-4 (or GPT-3.5 for us clowns without proper access) into a prototype version of a self-directed agent. We also have a paper out this week where a simple virtual world was created, populated by LLMs that were wrapped in code designed to make them s...
Apr 19, 2023•37 min
(Related text posted to Twitter ; this version is edited and has a more advanced final section.) Imagine yourself in a box, trying to predict the next word - assign as much probability mass to the next token as possible - for all the text on the Internet. Koan: Is this a task whose difficulty caps out as human intelligence, or at the intelligence level of the smartest human who wrote any Internet text? What factors make that task easier, or harder? (If you don't have an answer, maybe take a minu...
Apr 12, 2023•6 min
https://www.lesswrong.com/posts/fJBTRa7m7KnCDdzG5/a-stylized-dialogue-on-john-wentworth-s-claims-about-markets ( This is a stylized version of a real conversation, where the first part happened as part of a public debate between John Wentworth and Eliezer Yudkowsky, and the second part happened between John and me over the following morning. The below is combined, stylized, and written in my own voice throughout. The specific concrete examples in John's part of the dialog were produced by me. It...
Apr 05, 2023•17 min
https://www.lesswrong.com/posts/iy2o4nQj9DnQD7Yhj/discussion-with-nate-soares-on-a-key-alignment-difficulty Crossposted from the AI Alignment Forum . May contain more technical jargon than usual. In late 2022, Nate Soares gave some feedback on my Cold Takes series on AI risk (shared as drafts at that point), stating that I hadn't discussed what he sees as one of the key difficulties of AI alignment. I wanted to understand the difficulty he was pointing to, so the two of us had an extended Slack ...
Apr 05, 2023•40 min
https://www.lesswrong.com/posts/XWwvwytieLtEWaFJX/deep-deceptiveness This post is an attempt to gesture at a class of AI notkilleveryoneism (alignment) problem that seems to me to go largely unrecognized. E.g., it isn’t discussed (or at least I don't recognize it) in the recent plans written up by OpenAI ( 1 , 2 ), by DeepMind’s alignment team , or by Anthropic , and I know of no other acknowledgment of this issue by major labs. You could think of this as a fragment of my answer to “Where do pla...
Apr 05, 2023•30 min
https://www.lesswrong.com/posts/nTGEeRSZrfPiJwkEc/the-onion-test-for-personal-and-institutional-honesty [co-written by Chana Messinger and Andrew Critch, Andrew is the originator of the idea] You (or your organization or your mission or your family or etc.) pass the “onion test” for honesty if each layer hides but does not mislead about the information hidden within. When people get to know you better, or rise higher in your organization, they may find out new things, but should not be shocked b...
Mar 28, 2023•7 min
https://www.lesswrong.com/posts/fRwdkop6tyhi3d22L/there-s-no-such-thing-as-a-tree-phylogenetically This is a linkpost for https://eukaryotewritesblog.com/2021/05/02/theres-no-such-thing-as-a-tree/ [Crossposted from Eukaryote Writes Blog.] So you’ve heard about how fish aren’t a monophyletic group? You’ve heard about carcinization , the process by which ocean arthropods convergently evolve into crabs? You say you get it now? Sit down. Sit down. Shut up. Listen. You don’t know nothing yet. “Trees”...
Mar 28, 2023•19 min
https://www.lesswrong.com/posts/ma7FSEtumkve8czGF/losing-the-root-for-the-tree You know that being healthy is important. And that there's a lot of stuff you could do to improve your health: getting enough sleep, eating well, reducing stress, and exercising, to name a few. There’s various things to hit on when it comes to exercising too. Strength, obviously. But explosiveness is a separate thing that you have to train for. Same with flexibility. And don’t forget cardio! Strength is most important...
Mar 28, 2023•24 min
https://gwern.net/fiction/clippy In A.D. 20XX. Work was beginning. “How are you gentlemen !! ”… (Work. Work never changes; work is always hell.) Specifically, a MoogleBook researcher has gotten a pull request from Reviewer #2 on his new paper in evolutionary search in auto-ML, for error bars on the auto-ML hyperparameter sensitivity like larger batch sizes , because more can be different and there’s high variance in the old runs with a few anomalously high gain of function. (“Really? Really ? Th...
Mar 28, 2023•1 hr 5 min
https://www.lesswrong.com/posts/K4urTDkBbtNuLivJx/why-i-think-strong-general-ai-is-coming-soon I think there is little time left before someone builds AGI (median ~2030). Once upon a time, I didn't think this. This post attempts to walk through some of the observations and insights that collapsed my estimates. The core ideas are as follows: We've already captured way too much of intelligence with way too little effort. Everything points towards us capturing way more of intelligence with very lit...
Mar 28, 2023•1 hr 26 min
https://www.lesswrong.com/posts/HBxe6wdjxK239zajf/what-failure-looks-like Crossposted from the AI Alignment Forum . May contain more technical jargon than usual. The stereotyped image of AI catastrophe is a powerful, malicious AI system that takes its creators by surprise and quickly achieves a decisive advantage over the rest of humanity. I think this is probably not what failure will look like, and I want to try to paint a more realistic picture. I’ll tell the story in two parts: Part I : mach...
Mar 28, 2023•18 min
https://www.lesswrong.com/posts/gNodQGNoPDjztasbh/lies-damn-lies-and-fabricated-options This is an essay about one of those "once you see it, you will see it everywhere" phenomena. It is a psychological and interpersonal dynamic roughly as common, and almost as destructive, as motte-and-bailey, and at least in my own personal experience it's been quite valuable to have it reified, so that I can quickly recognize the commonality between what I had previously thought of as completely unrelated sit...
Mar 28, 2023•29 min
https://www.lesswrong.com/posts/thkAtqoQwN6DtaiGT/carefully-bootstrapped-alignment-is-organizationally-hard In addition to technical challenges, plans to safely develop AI face lots of organizational challenges. If you're running an AI lab, you need a concrete plan for handling that. In this post, I'll explore some of those issues, using one particular AI plan as an example. I first heard this described by Buck at EA Global London, and more recently with OpenAI's alignment plan. (I think Anthrop...
Mar 21, 2023•20 min
https://www.lesswrong.com/posts/4Gt42jX7RiaNaxCwP/more-information-about-the-dangerous-capability-evaluations Crossposted from the AI Alignment Forum . May contain more technical jargon than usual. This is a linkpost for https://evals.alignment.org/blog/2023-03-18-update-on-recent-evals/ [Written for more of a general-public audience than alignment-forum audience. We're working on a more thorough technical report.] We believe that capable enough AI systems could pose very large risks to the worl...
Mar 21, 2023•14 min
https://www.lesswrong.com/posts/zidQmfFhMgwFzcHhs/enemies-vs-malefactors Status: some mix of common wisdom (that bears repeating in our particular context), and another deeper point that I mostly failed to communicate. Short version Harmful people often lack explicit malicious intent. It’s worth deploying your social or community defenses against them anyway. I recommend focusing less on intent and more on patterns of harm. (Credit to my explicit articulation of this idea goes in large part to A...
Mar 14, 2023•10 min
https://www.lesswrong.com/posts/LzQtrHSYDafXynofq/the-parable-of-the-king-and-the-random-process ~ A Parable of Forecasting Under Model Uncertainty ~ You, the monarch, need to know when the rainy season will begin, in order to properly time the planting of the crops. You have two advisors, Pronto and Eternidad, who you trust exactly equally. You ask them both: "When will the next heavy rain occur?" Pronto says, "Three weeks from today." Eternidad says, "Ten years from today."...
Mar 14, 2023•10 min