LessWrong (Curated & Popular)

LessWrong•sites.libsyn.com

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.

If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

Last refreshed: July 2nd, 2025 at 2:42 AM ⓘ

Follow this podcast in the Metacast mobile app to refresh it and see new episodes.

Follow on

Apple Podcasts

Spotify

RSS

Podcasts are better in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

"The Waluigi Effect (mega-post)" by Cleo Nardo

https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post In this article, I will present a mechanistic explanation of the Waluigi Effect and other bizarre "semiotic" phenomena which arise within large language models such as GPT-3/3.5/4 and their variants (ChatGPT, Sydney, etc). This article will be folklorish to some readers, and profoundly novel to others.

Mar 08, 2023•41 min

"Acausal normalcy" by Andrew Critch

https://www.lesswrong.com/posts/3RSq3bfnzuL3sp46J/acausal-normalcy Crossposted from the AI Alignment Forum . May contain more technical jargon than usual. This post is also available on the EA Forum . Summary: Having thought a bunch about acausal trade — and proven some theorems relevant to its feasibility — I believe there do not exist powerful information hazards about it that stand up to clear and circumspect reasoning about the topic. I say this to be comforting rather than dismissive; if it...

Mar 06, 2023•16 min

"Please don't throw your mind away" by TsviBT

https://www.lesswrong.com/posts/RryyWNmJNnLowbhfC/please-don-t-throw-your-mind-away [Warning: the following dialogue contains an incidental spoiler for "Music in Human Evolution" by Kevin Simler . That post is short, good, and worth reading without spoilers, and this post will still be here if you come back later. It's also possible to get the point of this post by skipping the dialogue and reading the other sections.] Pretty often, talking to someone who's arriving to the existential risk / AGI...

Mar 01, 2023•33 min

"Cyborgism" by Nicholas Kees & Janus

https://www.lesswrong.com/posts/bxt7uCiHam4QXrQAA/cyborgism There is a lot of disagreement and confusion about the feasibility and risks associated with automating alignment research. Some see it as the default path toward building aligned AI, while others expect limited benefit from near term systems, expecting the ability to significantly speed up progress to appear well after misalignment and deception. Furthermore, progress in this area may directly shorten timelines or enable the creation o...

Feb 15, 2023•1 hr 17 min

"Childhoods of exceptional people" by Henrik Karlsson

https://www.lesswrong.com/posts/CYN7swrefEss4e3Qe/childhoods-of-exceptional-people This is a linkpost for https://escapingflatland.substack.com/p/childhoods Let’s start with one of those insights that are as obvious as they are easy to forget: if you want to master something, you should study the highest achievements of your field. If you want to learn writing, read great writers, etc. But this is not what parents usually do when they think about how to educate their kids. The default for a pare...

Feb 14, 2023•28 min

"What I mean by "alignment is in large part about making cognition aimable at all"" by Nate Soares

https://www.lesswrong.com/posts/NJYmovr9ZZAyyTBwM/what-i-mean-by-alignment-is-in-large-part-about-making Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. (Epistemic status: attempting to clear up a misunderstanding about points I have attempted to make in the past. This post is not intended as an argument for those points.) I have long said that the lion's share of the AI alignment problem seems to me to be about pointing powerful cognition at anything at al...

Feb 13, 2023•5 min

"On not getting contaminated by the wrong obesity ideas" by Natália Coelho Mendonça

https://www.lesswrong.com/posts/NRrbJJWnaSorrqvtZ/on-not-getting-contaminated-by-the-wrong-obesity-ideas A Chemical Hunger (a), a series by the authors of the blog Slime Mold Time Mold (SMTM), argues that the obesity epidemic is entirely caused (a) by environmental contaminants. In my last post, I investigated SMTM’s main suspect (lithium).[1] This post collects other observations I have made about SMTM’s work, not narrowly related to lithium, but rather focused on the broader thesis of their bl...

Feb 10, 2023•1 hr 13 min

"SolidGoldMagikarp (plus, prompt generation)"

https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation Work done at SERI-MATS , over the past two months, by Jessica Rumbelow and Matthew Watkins. TL;DR Anomalous tokens: a mysterious failure mode for GPT (which reliably insulted Matthew) We have found a set of anomalous tokens which result in a previously undocumented failure mode for GPT-2 and GPT-3 models. (The 'instruct' models “are particularly deranged” in this context, as janus has observed.) Many of th...

Feb 08, 2023•34 min

"Focus on the places where you feel shocked everyone's dropping the ball" by Nate Soares

https://www.lesswrong.com/posts/Zp6wG5eQFLGWwcG6j/focus-on-the-places-where-you-feel-shocked-everyone-s Writing down something I’ve found myself repeating in different conversations: If you're looking for ways to help with the whole “the world looks pretty doomed” business, here's my advice: look around for places where we're all being total idiots. Look for places where everyone's fretting about a problem that some part of you thinks it could obviously just solve. Look around for places where s...

Feb 03, 2023•7 min

"Basics of Rationalist Discourse" by Duncan Sabien

https://www.lesswrong.com/posts/XPv4sYrKnPzeJASuk/basics-of-rationalist-discourse-1 Introduction This post is meant to be a linkable resource. Its core is a short list of guidelines (you can link directly to the list) that are intended to be fairly straightforward and uncontroversial, for the purpose of nurturing and strengthening a culture of clear thinking, clear communication, and collaborative truth-seeking. "Alas," said Dumbledore, "we all know that what should be , and what is , are two di...

Feb 02, 2023•1 hr 7 min

"Sapir-Whorf for Rationalists" by Duncan Sabien

https://www.lesswrong.com/posts/PCrTQDbciG4oLgmQ5/sapir-whorf-for-rationalists Casus Belli: As I was scanning over my (rather long) list of essays-to-write, I realized that roughly a fifth of them were of the form "here's a useful standalone concept I'd like to reify," à la cup-stacking skills , fabricated options , split and commit , and sazen . Some notable entries on that list (which I name here mostly in the hope of someday coming back and turning them into links) include: red vs. white, wal...

Jan 31, 2023•39 min

"My Model Of EA Burnout" by Logan Strohl

https://www.lesswrong.com/posts/pDzdb4smpzT3Lwbym/my-model-of-ea-burnout (Probably somebody else has said most of this. But I personally haven't read it, and felt like writing it down myself, so here we go.) I think that EA [editor note: "Effective Altruism"] burnout usually results from prolonged dedication to satisfying the values you think you should have, while neglecting the values you actually have. Setting aside for the moment what “values” are and what it means to “actually” have one, su...

Jan 31, 2023•9 min

"The Social Recession: By the Numbers" by Anton Stjepan Cebalo

https://www.lesswrong.com/posts/Xo7qmDakxiizG7B9c/the-social-recession-by-the-numbers This is a linkpost for https://novum.substack.com/p/social-recession-by-the-numbers Fewer friends, relationships on the decline, delayed adulthood, trust at an all-time low, and many diseases of despair. The prognosis is not great. One of the most discussed topics online recently has been friendships and loneliness. Ever since the infamous chart showing more people are not having sex than ever before first made...

Jan 25, 2023•23 min

"Recursive Middle Manager Hell" by Raemon

https://www.lesswrong.com/posts/pHfPvb4JMhGDr4B7n/recursive-middle-manager-hell I think Zvi's Immoral Mazes sequence is really important, but comes with more worldview-assumptions than are necessary to make the points actionable. I conceptualize Zvi as arguing for multiple hypotheses. In this post I want to articulate one sub-hypothesis, which I call "Recursive Middle Manager Hell". I'm deliberately not covering some other components of his model [1] . tl;dr: Something weird and kinda horrifying...

Jan 24, 2023•21 min

"The Feeling of Idea Scarcity" by John Wentworth

https://www.lesswrong.com/posts/mfPHTWsFhzmcXw8ta/the-feeling-of-idea-scarcity Here’s a story you may recognize. There's a bright up-and-coming young person - let's call her Alice. Alice has a cool idea. It seems like maybe an important idea, a big idea, an idea which might matter. A new and valuable idea. It’s the first time Alice has come up with a high-potential idea herself, something which she’s never heard in a class or read in a book or what have you. So Alice goes all-in pursuing this id...

Jan 12, 2023•9 min

"Models Don't 'Get Reward'" by Sam Ringer

https://www.lesswrong.com/posts/TWorNr22hhYegE4RT/models-don-t-get-reward Crossposted from the AI Alignment Forum . May contain more technical jargon than usual. In terms of content, this has a lot of overlap with Reward is not the optimization target . I'm basically rewriting a part of that post in language I personally find clearer, emphasising what I think is the core insight When thinking about deception and RLHF training, a simplified threat model is something like this: A model takes some ...

Jan 12, 2023•10 min

"How 'Discovering Latent Knowledge in Language Models Without Supervision' Fits Into a Broader Alignment Scheme" by Collin

https://www.lesswrong.com/posts/L4anhrxjv8j2yRKKp/how-discovering-latent-knowledge-in-language-models-without Crossposted from the AI Alignment Forum . May contain more technical jargon than usual. Introduction A few collaborators and I recently released a new paper: Discovering Latent Knowledge in Language Models Without Supervision . For a quick summary of our paper, you can check out this Twitter thread . In this post I will describe how I think the results and methods in our paper fit into a...

Jan 12, 2023•34 min

"The next decades might be wild" by Marius Hobbhahn

https://www.lesswrong.com/posts/qRtD4WqKRYEtT5pi3/the-next-decades-might-be-wild Crossposted from the AI Alignment Forum . May contain more technical jargon than usual. I’d like to thank Simon Grimm and Tamay Besiroglu for feedback and discussions. This post is inspired by What 2026 looks like and an AI vignette workshop guided by Tamay Besiroglu. I think of this post as “what would I expect the world to look like if these timelines (median compute for transformative AI ~2036) were true” or “wha...

Dec 21, 2022•1 hr 19 min

"Lessons learned from talking to >100 academics about AI safety" by Marius Hobbhahn

https://www.lesswrong.com/posts/SqjQFhn5KTarfW8v7/lessons-learned-from-talking-to-greater-than-100-academics Crossposted from the AI Alignment Forum . May contain more technical jargon than usual. I’d like to thank MH, Jaime Sevilla and Tamay Besiroglu for their feedback. During my Master's and Ph.D. (still ongoing), I have spoken with many academics about AI safety. These conversations include chats with individual PhDs, poster presentations and talks about AI safety. I think I have learned a l...

Nov 17, 2022•26 min

"How my team at Lightcone sometimes gets stuff done" by jacobjacob

https://www.lesswrong.com/posts/6LzKRP88mhL9NKNrS/how-my-team-at-lightcone-sometimes-gets-stuff-done Disclaimer: I originally wrote this as a private doc for the Lightcone team. I then showed it to John and he said he would pay me to post it here. That sounded awfully compelling. However, I wanted to note that I’m an early founder who hasn't built anything truly great yet. I’m writing this doc because as Lightcone is growing, I have to take a stance on these questions. I need to design our org t...

Nov 10, 2022•14 min

"Decision theory does not imply that we get to have nice things" by So8res

https://www.lesswrong.com/posts/rP66bz34crvDudzcJ/decision-theory-does-not-imply-that-we-get-to-have-nice Crossposted from the AI Alignment Forum . May contain more technical jargon than usual. ( Note: I wrote this with editing help from Rob and Eliezer. Eliezer's responsible for a few of the paragraphs. ) A common confusion I see in the tiny fragment of the world that knows about logical decision theory (FDT/UDT/etc.), is that people think LDT agents are genial and friendly for each other. [1] ...

Nov 08, 2022•57 min

"What 2026 looks like" by Daniel Kokotajlo

https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-looks-like#2022 Crossposted from the AI Alignment Forum . May contain more technical jargon than usual. This was written for the Vignettes Workshop . [1] The goal is to write out a detailed future history (“trajectory”) that is as realistic (to me) as I can currently manage, i.e. I’m not aware of any alternative trajectory that is similarly detailed and clearly more plausible to me. The methodology is roughly: Write a future history of ...

Nov 07, 2022•37 min

Counterarguments to the basic AI x-risk case

Nov 04, 2022•1 hr 15 min

"Introduction to abstract entropy" by Alex Altair

https://www.lesswrong.com/posts/REA49tL5jsh69X3aM/introduction-to-abstract-entropy#fnrefpi8b39u5hd7 This post, and much of the following sequence, was greatly aided by feedback from the following people (among others): Lawrence Chan , Joanna Morningstar , John Wentworth , Samira Nedungadi , Aysja Johnson , Cody Wild , Jeremy Gillen , Ryan Kidd , Justis Mills and Jonathan Mustin . Illustrations by Anne Ore. Introduction & motivation In the course of researching optimization, I decided that I ...

Oct 29, 2022•46 min

"Consider your appetite for disagreements" by Adam Zerner

https://www.lesswrong.com/posts/8vesjeKybhRggaEpT/consider-your-appetite-for-disagreements Poker There was a time about five years ago where I was trying to get good at poker. If you want to get good at poker, one thing you have to do is review hands. Preferably with other people. For example, suppose you have ace king offsuit on the button. Someone in the highjack opens to 3 big blinds preflop. You call. Everyone else folds. The flop is dealt. It's a rainbow Q75. You don't have any flush draws....

Oct 25, 2022•11 min

"My resentful story of becoming a medical miracle" by Elizabeth

https://www.lesswrong.com/posts/fFY2HeC9i2Tx8FEnK/my-resentful-story-of-becoming-a-medical-miracle This is a linkpost for https://acesounderglass.com/2022/10/13/my-resentful-story-of-becoming-a-medical-miracle/ You know those health books with “miracle cure” in the subtitle? The ones that always start with a preface about a particular patient who was completely hopeless until they tried the supplement/meditation technique/healing crystal that the book is based on? These people always start broke...

Oct 21, 2022•24 min

"The Redaction Machine" by Ben

https://www.lesswrong.com/posts/CKgPFHoWFkviYz7CB/the-redaction-machine On the 3rd of October 2351 a machine flared to life. Huge energies coursed into it via cables, only to leave moments later as heat dumped unwanted into its radiators. With an enormous puff the machine unleashed sixty years of human metabolic entropy into superheated steam. In the heart of the machine was Jane, a person of the early 21st century. From her perspective there was no transition. One moment she had been in the yea...

Oct 02, 2022•59 min

"Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover" by Ajeya Cotra

https://www.lesswrong.com/posts/pRkFkzwKZ2zfa3R6H/without-specific-countermeasures-the-easiest-path-to Crossposted from the AI Alignment Forum . May contain more technical jargon than usual. I think that in the coming 15-30 years , the world could plausibly develop “transformative AI”: AI powerful enough to bring us into a new, qualitatively different future, via an explosion in science and technology R&D . This sort of AI could be sufficient to make this the most important century of all ti...

Sep 27, 2022•3 hr 8 min

"The shard theory of human values" by Quintin Pope & TurnTrout

https://www.lesswrong.com/posts/iCfdcxiyr2Kj8m8mT/the-shard-theory-of-human-values TL;DR: We propose a theory of human value formation. According to this theory, the reward system shapes human values in a relatively straightforward manner. Human values are not e.g. an incredibly complicated, genetically hard-coded set of drives, but rather sets of contextually activated heuristics which were shaped by and bootstrapped from crude, genetically hard-coded reward circuitry....

Sep 22, 2022•1 hr 9 min

"Two-year update on my personal AI timelines" by Ajeya Cotra

https://www.lesswrong.com/posts/AfH2oPHCApdKicM4m/two-year-update-on-my-personal-ai-timelines#fnref-fwwPpQFdWM6hJqwuY-12 Crossposted from the AI Alignment Forum . May contain more technical jargon than usual. I worked on my draft report on biological anchors for forecasting AI timelines mainly between ~May 2019 (three months after the release of GPT-2) and ~Jul 2020 (a month after the release of GPT-3), and posted it on LessWrong in Sep 2020 after an internal review process. At the time, my bott...

Sep 22, 2022•39 min

← Prev Next →

Hosted on Buzzsprout

For the best experience, listen in Metacast app for iOS or Android