https://www.lesswrong.com/posts/9kNxhKWvixtKW5anS/you-are-not-measuring-what-you-think-you-are-measuring Eight years ago, I worked as a data scientist at a startup, and we wanted to optimize our sign-up flow. We A/B tested lots of different changes, and occasionally found something which would boost (or reduce) click-through rates by 10% or so. Then one week I was puzzling over a discrepancy in the variance of our daily signups. Eventually I scraped some data from the log files, and found that d...
Sep 21, 2022•18 min
https://www.lesswrong.com/posts/WNpvK67MjREgvB8u8/do-bamboos-set-themselves-on-fire Cross-posted from Telescopic Turnip . As we all know, the best place to have a kung-fu fight is a bamboo forest. There are just so many opportunities to grab pieces of bamboos and manufacture improvised weapons, use them to catapult yourself in the air and other basic techniques any debutant martial artist ought to know. A lesser-known fact is that bamboo-forest fights occur even when the cameras of Hong-Kong fil...
Sep 20, 2022•18 min
https://www.lesswrong.com/posts/oyKzz7bvcZMEPaDs6/survey-advice Things I believe about making surveys, after making some surveys : If you write a question that seems clear, there’s an unbelievably high chance that any given reader will misunderstand it. (Possibly this applies to things that aren’t survey questions also, but that’s a problem for another time.) A better way to find out if your questions are clear is to repeatedly take a single individual person, and sit down with them, and ask the...
Sep 18, 2022•8 min
https://www.lesswrong.com/posts/J3wemDGtsy5gzD3xa/toni-kurz-and-the-insanity-of-climbing-mountains Content warning: death I've been on a YouTube binge lately. My current favorite genre is disaster stories about mountain climbing. The death statistics for some of these mountains, especially ones in the Himalayas are truly insane. To give an example, let me tell you about a mountain most people have never heard of: Nanga Parbat. It's a 8,126 meter "wall of ice and rock", sporting the tallest mount...
Sep 18, 2022•25 min
https://www.lesswrong.com/posts/gs3vp3ukPbpaEie5L/deliberate-grieving-1 This post is hopefully useful on its own, but begins a series ultimately about grieving over a world that might (or, might not) be doomed . It starts with some pieces from a previous coordination frontier sequence post, but goes into more detail. At the beginning of the pandemic, I didn’t have much experience with grief . By the end of the pandemic, I had gotten quite a lot of practice grieving for things. I now think of gri...
Sep 18, 2022•18 min
https://www.lesswrong.com/s/6xgy8XYEisLk3tCjH/p/CPP2uLcaywEokFKQG Tl;dr: I've noticed a dichotomy between "thinking in toolboxes" and "thinking in laws". The toolbox style of thinking says it's important to have a big bag of tools that you can adapt to context and circumstance; people who think very toolboxly tend to suspect that anyone who goes talking of a single optimal way is just ignorant of the uses of the other tools. The lawful style of thinking, done correctly, distinguishes between des...
Sep 15, 2022•25 min
Sep 15, 2022•27 min
https://www.lesswrong.com/posts/PBRWb2Em5SNeWYwwB/humans-are-not-automatically-strategic Reply to: A "Failure to Evaluate Return-on-Time" Fallacy Lionhearted writes: [A] large majority of otherwise smart people spend time doing semi-productive things, when there are massively productive opportunities untapped.A somewhat silly example: Let's say someone aspires to be a comedian, the best comedian ever, and to make a living doing comedy. He wants nothing else, it is his purpose. And he decides tha...
Sep 15, 2022•9 min
https://www.lesswrong.com/posts/htrZrxduciZ5QaCjw/language-models-seem-to-be-much-better-than-humans-at-next Crossposted from the AI Alignment Forum . May contain more technical jargon than usual. [Thanks to a variety of people for comments and assistance (especially Paul Christiano, Nostalgebraist, and Rafe Kennedy), and to various people for playing the game. Buck wrote the top-1 prediction web app; Fabien wrote the code for the perplexity experiment and did most of the analysis and wrote up t...
Sep 15, 2022•27 min
https://www.lesswrong.com/posts/jDQm7YJxLnMnSNHFu/moral-strategies-at-different-capability-levels Crossposted from the AI Alignment Forum . May contain more technical jargon than usual. Let’s consider three ways you can be altruistic towards another agent: You care about their welfare: some metric of how good their life is (as defined by you). I’ll call this care-morality - it endorses things like promoting their happiness, reducing their suffering, and hedonic utilitarian behavior (if you care ...
Sep 14, 2022•13 min
https://www.lesswrong.com/posts/xFotXGEotcKouifky/worlds-where-iterative-design-fails Crossposted from the AI Alignment Forum . May contain more technical jargon than usual. In most technical fields, we try designs, see what goes wrong, and iterate until it works. That’s the core iterative design loop. Humans are good at iterative design, and it works well in most fields in practice. In worlds where AI alignment can be handled by iterative design, we probably survive. So long as we can see the p...
Sep 11, 2022•24 min
https://www.lesswrong.com/posts/QBAjndPuFbhEXKcCr/my-understanding-of-what-everyone-in-technical-alignment-is Despite a clear need for it, a good source explaining who is doing what and why in technical AI alignment doesn't exist. This is our attempt to produce such a resource. We expect to be inaccurate in some ways, but it seems great to get out there and let Cunningham’s Law do its thing. [1] The main body contains our understanding of what everyone is doing in technical alignment and why, as...
Sep 11, 2022•1 hr 35 min
https://www.lesswrong.com/posts/rYDas2DDGGDRc8gGB/unifying-bargaining-notions-1-2 Crossposted from the AI Alignment Forum . May contain more technical jargon than usual. This is a two-part sequence of posts, in the ancient LessWrong tradition of decision-theory-posting. This first part will introduce various concepts of bargaining solutions and dividing gains from trade, which the reader may or may not already be familiar with. The upcoming part will be about how all introduced concepts from thi...
Sep 09, 2022•46 min
https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators#fncrt8wagfir9 Summary TL;DR : Self-supervised learning may create AGI or its foundation. What would that look like? Unlike the limit of RL, the limit of self-supervised learning has received surprisingly little conceptual attention, and recent progress has made deconfusion in this domain more pressing. Existing AI taxonomies either fail to capture important properties of self-supervised models or lead to confusing propositions. For ins...
Sep 05, 2022•1 hr 48 min
https://www.lesswrong.com/posts/CjFZeDD6iCnNubDoS/humans-provide-an-untapped-wealth-of-evidence-about#fnref7a5ti4623qb Crossposted from the AI Alignment Forum . May contain more technical jargon than usual. TL;DR: To even consciously consider an alignment research direction, you should have evidence to locate it as a promising lead. As best I can tell, many directions seem interesting but do not have strong evidence of being “entangled” with the alignment problem such that I expect them to yield...
Aug 08, 2022•23 min
https://www.lesswrong.com/posts/DdDt5NXkfuxAnAvGJ/changing-the-world-through-slack-and-hobbies Introduction In EA orthodoxy, if you're really serious about EA, the three alternatives that people most often seem to talk about are (1) “direct work” in a job that furthers a very important cause; (2) “earning to give” ; (3) earning “career capital” that will help you do those things in the future, e.g. by getting a PhD or teaching yourself ML. By contrast, there’s not much talk of: (4) being in a jo...
Jul 30, 2022•22 min
https://www.lesswrong.com/posts/8oMF8Lv5jiGaQSFvo/boundaries-part-1-a-key-missing-concept-from-utility-theory Crossposted from the AI Alignment Forum . May contain more technical jargon than usual. This is Part 1 of my «Boundaries» Sequence on LessWrong. Summary: «Boundaries» are a missing concept from the axioms of game theory and bargaining theory, which might help pin-down certain features of multi-agent rationality (this post), and have broader implications for effective altruism discourse a...
Jul 28, 2022•19 min
https://www.lesswrong.com/posts/MdZyLnLHuaHrCskjy/itt-passing-and-civility-are-good-charity-is-bad I often object to claims like "charity/steelmanning is an argumentative virtue". This post collects a few things I and others have said on this topic over the last few years. My current view is: Steelmanning ("the art of addressing the best form of the other person’s argument, even if it’s not the one they presented") is a useful niche skill, but I don't think it should be a standard thing you brin...
Jul 24, 2022•13 min
https://www.lesswrong.com/posts/mmHctwkKjpvaQdC3c/what-should-you-change-in-response-to-an-emergency-and-ai Related to: Slack gives you the ability to notice/reflect on subtle things Epistemic status: A possibly annoying mixture of straightforward reasoning and hard-to-justify personal opinions. It is often stated (with some justification, IMO) that AI risk is an “emergency.” Various people have explained to me that they put various parts of their normal life’s functioning on hold on account of ...
Jul 23, 2022•13 min
https://www.lesswrong.com/posts/3pinFH3jerMzAvmza/on-how-various-plans-miss-the-hard-bits-of-the-alignment Crossposted from the AI Alignment Forum . May contain more technical jargon than usual. (As usual, this post was written by Nate Soares with some help and editing from Rob Bensinger.) In my last post , I described a “hard bit” of the challenge of aligning AGI—the sharp left turn that comes when your system slides into the “AGI” capabilities well, the fact that alignment doesn’t generalize s...
Jul 17, 2022•55 min
https://www.lesswrong.com/posts/28zsuPaJpKAGSX4zq/humans-are-very-reliable-agents Over the last few years, deep-learning-based AI has progressed extremely rapidly in fields like natural language processing and image generation. However, self-driving cars seem stuck in perpetual beta mode, and aggressive predictions there have repeatedly been disappointing . Google's self-driving project started four years before AlexNet kicked off the deep learning revolution, and it still isn't deployed at larg...
Jul 13, 2022•8 min
https://www.lesswrong.com/posts/2GxhAyn9aHqukap2S/looking-back-on-my-alignment-phd The funny thing about long periods of time is that they do, eventually, come to an end. I'm proud of what I accomplished during my PhD. That said, I'm going to first focus on mistakes I've made over the past four [1] years. Mistakes I think I got significantly smarter in 2018–2019 , and kept learning some in 2020–2021. I was significantly less of a fool in 2021 than I was in 2017. That is important and worth feeli...
Jul 08, 2022•22 min
https://www.lesswrong.com/posts/7iAABhWpcGeP5e6SB/it-s-probably-not-lithium A Chemical Hunger ( a ), a series by the authors of the blog Slime Mold Time Mold (SMTM) that has been received positively on LessWrong , argues that the obesity epidemic is entirely caused ( a ) by environmental contaminants. The authors’ top suspect is lithium ( a ) [1] , primarily because it is known to cause weight gain at the doses used to treat bipolar disorder. After doing some research, however, I found that it i...
Jul 05, 2022•1 hr 12 min
https://www.lesswrong.com/posts/bhLxWTkRc8GXunFcB/what-are-you-tracking-in-your-head A large chunk - plausibly the majority - of real-world expertise seems to be in the form of illegible skills : skills/knowledge which are hard to transmit by direct explanation. They’re not necessarily things which a teacher would even notice enough to consider important - just background skills or knowledge which is so ingrained that it becomes invisible. I’ve recently noticed a certain common type of illegible...
Jul 02, 2022•10 min
https://www.lesswrong.com/posts/Ke2ogqSEhL2KCJCNx/security-mindset-lessons-from-20-years-of-software-security Background I have been doing red team, blue team (offensive, defensive) computer security for a living since September 2000. The goal of this post is to compile a list of general principles I've learned during this time that are likely relevant to the field of AGI Alignment. If this is useful, I could continue with a broader or deeper exploration. Alignment Won't Happen By Accident I use...
Jun 29, 2022•14 min
https://www.lesswrong.com/posts/CoZhXrhpQxpy9xw9y/where-i-agree-and-disagree-with-eliezer#fnh5ezxhd0an by paulfchristiano , 20th Jun 2022. Crossposted from the AI Alignment Forum . May contain more technical jargon than usual. ( Partially in response to AGI Ruin: A list of Lethalities . Written in the same rambling style. Not exhaustive. ) Agreements Powerful AI systems have a good chance of deliberately and irreversibly disempowering humanity. This is a much easier failure mode than killing eve...
Jun 22, 2022•43 min
https://www.lesswrong.com/posts/keiYkaeoLHoKK4LYA/six-dimensions-of-operational-adequacy-in-agi-projects by Eliezer Yudkowsky Editor's note: The following is a lightly edited copy of a document written by Eliezer Yudkowsky in November 2017. Since this is a snapshot of Eliezer’s thinking at a specific time, we’ve sprinkled reminders throughout that this is from 2017. A background note: It’s often the case that people are slow to abandon obsolete playbooks in response to a novel challenge. And AGI...
Jun 21, 2022•32 min
https://www.lesswrong.com/posts/pL4WhsoPJwauRYkeK/moses-and-the-class-struggle "𝕿𝖆𝖐𝖊 𝖔𝖋𝖋 𝖞𝖔𝖚𝖗 𝖘𝖆𝖓𝖉𝖆𝖑𝖘. 𝕱𝖔𝖗 𝖞𝖔𝖚 𝖘𝖙𝖆𝖓𝖉 𝖔𝖓 𝖍𝖔𝖑𝖞 𝖌𝖗𝖔𝖚𝖓𝖉," said the bush. "No," said Moses. "Why not?" said the bush. "I am a Jew. If there's one thing I know about this universe it's that there's no such thing as God," said Moses. "You don't need to be certain I exist. It's a trivial case of Pascal's Wager," said the bush. "Who is Pascal?" said Moses. "It makes sense if you are be...
Jun 21, 2022•10 min
https://www.lesswrong.com/posts/T6kzsMDJyKwxLGe3r/benign-boundary-violations Recently, my friend Eric asked me what sorts of things I wanted to have happen at my bachelor party. I said (among other things) that I'd really enjoy some benign boundary violations. Eric went ???? Subsequently: an essay. We use the word "boundary" to mean at least two things , when we're discussing people's personal boundaries. The first is their actual self-defined boundary—the line that they would draw, if they had ...
Jun 20, 2022•34 min
https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities Crossposted from the AI Alignment Forum . May contain more technical jargon than usual. Preamble: (If you're already familiar with all basics and don't want any preamble, skip ahead to Section B for technical difficulties of alignment proper.) I have several times failed to write up a well-organized list of reasons why AGI will kill you. People come in with different ideas about why AGI would be survivable, and want...
Jun 20, 2022•1 hr 2 min