LessWrong (Curated & Popular)

LessWrong•sites.libsyn.com

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.

If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

Last refreshed: July 2nd, 2025 at 2:42 AM ⓘ

Follow this podcast in the Metacast mobile app to refresh it and see new episodes.

Follow on

Apple Podcasts

Spotify

RSS

Podcasts are better in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

"Sparse Autoencoders Find Highly Interpretable Directions in Language Models" by Logan Riggs et al

This is a linkpost for Sparse Autoencoders Find Highly Interpretable Directions in Language Models We use a scalable and unsupervised method called Sparse Autoencoders to find interpretable , monosemantic features in real LLMs (Pythia-70M/410M) for both residual stream and MLPs. We showcase monosemantic features, feature replacement for Indirect Object Identification (IOI), and use OpenAI's automatic interpretation protocol to demonstrate a significant improvement in interpretability. Source: ht...

Sep 27, 2023•10 min

"Inside Views, Impostor Syndrome, and the Great LARP" by John Wentworth

Epistemic status: model which I find sometimes useful, and which emphasizes some true things about many parts of the world which common alternative models overlook. Probably not correct in full generality. Consider Yoshua Bengio, one of the people who won a Turing Award for deep learning research. Looking at his work, he clearly “knows what he’s doing”. He doesn’t know what the answers will be in advance, but he has some models of what the key questions are, what the key barriers are, and at lea...

Sep 26, 2023•9 min

"There should be more AI safety orgs" by Marius Hobbhahn

I’m writing this in my own capacity. The views expressed are my own, and should not be taken to represent the views of Apollo Research or any other program I’m involved with. TL;DR: I argue why I think there should be more AI safety orgs. I’ll also provide some suggestions on how that could be achieved. The core argument is that there is a lot of unused talent and I don’t think existing orgs scale fast enough to absorb it. Thus, more orgs are needed. This post can also serve as a call to action ...

Sep 25, 2023•30 min

"The Talk: a brief explanation of sexual dimorphism" by Malmesbury

Cross-posted from substack . "Everything in the world is about sex, except sex. Sex is about clonal interference." – Oscar Wilde (kind of) As we all know, sexual reproduction is not about reproduction. Reproduction is easy. If your goal is to fill the world with copies of your genes, all you need is a good DNA-polymerase to duplicate your genome, and then to divide into two copies of yourself. Asexual reproduction is just better in every way. It's pretty clear that, on a direct one-v-one cage ma...

Sep 22, 2023•30 min

"A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX" by jacobjacob

Patrick Collison has a fantastic list of examples of people quickly accomplishing ambitious things together since the 19th Century. It does make you yearn for a time that feels... different, when the lethargic behemoths of government departments could move at the speed of a racing startup: [...] last century, [the Department of Defense] innovated at a speed that puts modern Silicon Valley startups to shame: the Pentagon was built in only 16 months (1941–1943), the Manhattan Project ran for just ...

Sep 20, 2023•46 min

"AI presidents discuss AI alignment agendas" by TurnTrout & Garrett Baker

This is a linkpost for https://www.youtube.com/watch?v=02kbWY5mahQ None of the presidents fully represent my (TurnTrout's) views. TurnTrout wrote the script. Garrett Baker helped produce the video after the audio was complete. Thanks to David Udell, Ulisse Mini, Noemi Chulo, and especially Rio Popper for feedback and assistance in writing the script. Source: https://www.lesswrong.com/posts/7M2iHPLaNzPNXHuMv/ai-presidents-discuss-ai-alignment-agendas YouTube video kindly provided by the authors. ...

Sep 19, 2023•24 min

"UDT shows that decision theory is more puzzling than ever" by Wei Dai

I feel like MIRI perhaps mispositioned FDT (their variant of UDT) as a clear advancement in decision theory, whereas maybe they could have attracted more attention/interest from academic philosophy if the framing was instead that the UDT line of thinking shows that decision theory is just more deeply puzzling than anyone had previously realized. Instead of one major open problem (Newcomb's, or EDT vs CDT) now we have a whole bunch more. I'm really not sure at this point whether UDT is even on th...

Sep 18, 2023•3 min

"Sum-threshold attacks" by TsviBT

How do you affect something far away, a lot, without anyone noticing? (Note: you can safely skip sections. It is also safe to skip the essay entirely, or to read the whole thing backwards if you like.) Source: https://www.lesswrong.com/posts/R3eDrDoX8LisKgGZe/sum-threshold-attacks Narrated for LessWrong by TYPE III AUDIO . Share feedback on this narration. [125+ Karma Post] ✓...

Sep 11, 2023•19 min

"Report on Frontier Model Training" by Yafah Edelman

This is a linkpost for https://docs.google.com/document/d/1TsYkDYtV6BKiCN9PAOirRAy3TrNDu2XncUZ5UZfaAKA/edit?usp=sharing Understanding what drives the rising capabilities of AI is important for those who work to forecast, regulate, or ensure the safety of AI. Regulations on the export of powerful GPUs need to be informed by understanding of how these GPUs are used, forecasts need to be informed by bottlenecks, and safety needs to be informed by an understanding of how the models of the future mig...

Sep 09, 2023•36 min

"A list of core AI safety problems and how I hope to solve them" by Davidad

Context: I sometimes find myself referring back to this tweet and wanted to give it a more permanent home. While I'm at it, I thought I would try to give a concise summary of how each distinct problem would be solved by an Open Agency Architecture (OAA) , if OAA turns out to be feasible. Source: https://www.lesswrong.com/posts/D97xnoRr6BHzo5HvQ/one-minute-every-moment Narrated for LessWrong by TYPE III AUDIO . Share feedback on this narration. [125+ Karma Post] ✓...

Sep 09, 2023•12 min

"One Minute Every Moment" by abramdemski

About how much information are we keeping in working memory at a given moment? "Miller's Law" dictates that the number of things humans can hold in working memory is " the magical number 7 ± 2 ". This idea is derived from Miller's experiments, which tested both random-access memory (where participants must remember call-response pairs, and give the correct response when prompted with a call) and sequential memory (where participants must memorize and recall a list in order). In both cases, 7 is ...

Sep 08, 2023•6 min

"Sharing Information About Nonlinear" by Ben Pace

Added (11th Sept): Nonlinear have commented that they intend to write a response , have written a short follow-up , and claim that they dispute 85 claims in this post. I'll link here to that if-and-when it's published. Added (11th Sept): One of the former employees, Chloe, has written a lengthy comment personally detailing some of her experiences working at Nonlinear and the aftermath. Added (12th Sept): I've made 3 relatively minor edits to the post. I'm keeping a list of all edits at the botto...

Sep 08, 2023•56 min

"Defunding My Mistake" by ymeskhout

Until about five years ago, I unironically parroted the slogan All Cops Are Bastards (ACAB) and earnestly advocated to abolish the police and prison system. I had faint inklings I might be wrong about this a long time ago, but it took a while to come to terms with its disavowal. What follows is intended to be not just a detailed account of what I used to believe but most pertinently, why . Despite being super egotistical, for whatever reason I do not experience an aversion to openly admitting mi...

Sep 08, 2023•11 min

"What I would do if I wasn’t at ARC Evals" by LawrenceC

In which: I list 9 projects that I would work on if I wasn’t busy working on safety standards at ARC Evals, and explain why they might be good to work on. Epistemic status: I’m prioritizing getting this out fast as opposed to writing it carefully. I’ve thought for at least a few hours and talked to a few people I trust about each of the following projects, but I haven’t done that much digging into each of these, and it’s likely that I’m wrong about many material facts. I also make little claim t...

Sep 08, 2023•25 min

"Meta Questions about Metaphilosophy" by Wei Dai

To quickly recap my main intellectual journey so far (omitting a lengthy side trip into cryptography and Cypherpunk land), with the approximate age that I became interested in each topic in parentheses: Source: https://www.lesswrong.com/posts/fJqP9WcnHXBRBeiBg/meta-questions-about-metaphilosophy Narrated for LessWrong by TYPE III AUDIO . Share feedback on this narration. [125+ Karma Post] ✓...

Sep 04, 2023•5 min

"The U.S. is becoming less stable" by lc

We focus so much on arguing over who is at fault in this country that I think sometimes we fail to alert on what's actually happening. I would just like to point out, without attempting to assign blame, that American political institutions appear to be losing common knowledge of their legitimacy, and abandoning certain important traditions of cooperative governance. It would be slightly hyperbolic, but not unreasonable to me, to term what has happened "democratic backsliding". Source: https://ww...

Sep 04, 2023•4 min

"OpenAI API base models are not sycophantic, at any size" by Nostalgebraist

In Discovering Language Model Behaviors with Model-Written Evaluations" (Perez et al 2022) , the authors studied language model "sycophancy" - the tendency to agree with a user's stated view when asked a question. The paper contained the striking plot reproduced below, which shows sycophancy increasing dramatically with model size while being largely independent of RLHF steps and even showing up at 0 RLHF steps, i.e. in base models! [...] I found this result startling when I read the original pa...

Sep 04, 2023•5 min

"Dear Self; we need to talk about ambition" by Elizabeth

I keep seeing advice on ambition, aimed at people in college or early in their career, that would have been really bad for me at similar ages. Rather than contribute ( more ) to the list of people giving poorly universalized advice on ambition, I have written a letter to the one person I know my advice is right for: myself in the past. Source: https://www.lesswrong.com/posts/uGDtroD26aLvHSoK2/dear-self-we-need-to-talk-about-ambition-1 Narrated for LessWrong by TYPE III AUDIO . Share feedback on ...

Aug 30, 2023•13 min

"Assume Bad Faith" by Zack_M_Davis

I've been trying to avoid the terms "good faith" and "bad faith". I'm suspicious that most people who have picked up the phrase "bad faith" from hearing it used, don't actually know what it means—and maybe, that the thing it does mean doesn't carve reality at the joints . People get very touchy about bad faith accusations: they think that you should assume good faith, but that if you've determined someone is in bad faith, you shouldn't even be talking to them, that you need to exile them. What d...

Aug 28, 2023•12 min

"Book Launch: "The Carving of Reality," Best of LessWrong vol. III" by Raemon

The Carving of Reality , third volume of the Best of LessWrong books is now available on Amazon (US) . The Carving of Reality includes 43 essays from 29 authors. We've collected the essays into four books, each exploring two related topics. The "two intertwining themes" concept was first inspired when as I looked over the cluster of "coordination" themed posts, and noting a recurring motif of not only "solving coordination problems" but also "dealing with the binding constraints that were causin...

Aug 28, 2023•6 min

"Large Language Models will be Great for Censorship" by Ethan Edwards

LLMs can do many incredible things. They can generate unique creative content, carry on long conversations in any number of subjects, complete complex cognitive tasks, and write nearly any argument. More mundanely, they are now the state of the art for boring classification tasks and therefore have the capability to radically upgrade the censorship capacities of authoritarian regimes throughout the world. Produced as part of the SERI ML Alignment Theory Scholars Program - Summer 2023 Cohort. Tha...

Aug 23, 2023•15 min

"6 non-obvious mental health issues specific to AI safety" by Igor Ivanov

Intro: I am a psychotherapist, and I help people working on AI safety. I noticed patterns of mental health issues highly specific to this group. It's not just doomerism, there are way more of them that are less obvious. If you struggle with a mental health issue related to AI safety, feel free to leave a comment about it and about things that help you with it. You might also support others in the comments. Sometimes such support makes a lot of difference and people feel like they are not alone. ...

Aug 22, 2023•6 min

"Ten Thousand Years of Solitude" by agp

This is a linkpost for the article "Ten Thousand Years of Solitude", written by Jared Diamond for Discover Magazine in 1993, four years before he published Guns, Germs and Steel . That book focused on Diamond's theory that the geography of Eurasia, particularly its large size and common climate, allowed civilizations there to dominate the rest of the world because it was easy to share plants, animals, technologies and ideas. This article, however, examines the opposite extreme. Diamond looks at ...

Aug 22, 2023•8 min

"Against Almost Every Theory of Impact of Interpretability" by Charbel-Raphaël

I gave a talk about the different risk models , followed by an interpretability presentation, then I got a problematic question, "I don't understand, what's the point of doing this?" Hum. Feature viz? (left image) Um, it's pretty but is this useful? [1] Is this reliable ? GradCam (a pixel attribution technique, like on the above right figure), it's pretty. But I’ve never seen anybody use it in industry. [2] Pixel attribution seems useful, but accuracy remains the king. [3] Induction heads? Ok, w...

Aug 21, 2023•1 hr 19 min

"Feedbackloop-first Rationality" by Raemon

I've been workshopping a new rationality training paradigm. (By "rationality training paradigm", I mean an approach to learning/teaching the skill of "noticing what cognitive strategies are useful, and getting better at them.") I think the paradigm has promise. I've beta-tested it for a couple weeks. It’s too early to tell if it actually works, but one of my primary goals is to figure out if it works relatively quickly, and give up if it isn’t not delivering. The goal of this post is to: Convey ...

Aug 15, 2023•16 min

"Inflection.ai is a major AGI lab" by Nikola

Inflection.ai (co-founded by DeepMind co-founder Mustafa Suleyman) should be perceived as a frontier LLM lab of similar magnitude as Meta, OpenAI, DeepMind, and Anthropic based on their compute, valuation, current model capabilities, and plans to train frontier models. Compared to the other labs, Inflection seems to put less effort into AI safety. Thanks to Laker Newhouse for discussion and feedback. Source: https://www.lesswrong.com/posts/Wc5BYFfzuLzepQjCq/inflection-ai-is-a-major-agi-lab Narra...

Aug 15, 2023•7 min

"Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research" by evhub, Nicholas Schiefer, Carson Denison, Ethan Perez

TL;DR : This document lays out the case for research on “model organisms of misalignment” – in vitro demonstrations of the kinds of failures that might pose existential threats – as a new and important pillar of alignment research. If you’re interested in working on this agenda with us at Anthropic, we’re hiring! Please apply to the research scientist or research engineer position on the Anthropic website and mention that you’re interested in working on model organisms of misalignment. Source: h...

Aug 09, 2023•36 min

"When can we trust model evaluations?" bu evhub

In " Towards understanding-based safety evaluations ," I discussed why I think evaluating specifically the alignment of models is likely to require mechanistic, understanding-based evaluations rather than solely behavioral evaluations. However, I also mentioned in a footnote why I thought behavioral evaluations would likely be fine in the case of evaluating capabilities rather than evaluating alignment: However, while I like the sorts of behavioral evaluations discussed in the GPT-4 System Card ...

Aug 09, 2023•17 min

"ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks" by Beth Barnes

Blogpost version Paper We have just released our first public report. It introduces methodology for assessing the capacity of LLM agents to acquire resources, create copies of themselves, and adapt to novel challenges they encounter in the wild. Background ARC Evals develops methods for evaluating the safety of large language models (LLMs) in order to provide early warnings of models with dangerous capabilities. We have public partnerships with Anthropic and OpenAI to evaluate their AI systems, ...

Aug 04, 2023•8 min

"The "public debate" about AI is confusing for the general public and for policymakers because it is a three-sided debate" by Adam David Long

Summary of Argument: The public debate among AI experts is confusing because there are, to a first approximation, three sides, not two sides to the debate. I refer to this as a 🔺three-sided framework, and I argue that using this three-sided framework will help clarify the debate (more precisely, debates) for the general public and for policy-makers. Source: https://www.lesswrong.com/posts/BTcEzXYoDrWzkLLrQ/the-public-debate-about-ai-is-confusing-for-the-general Narrated for LessWrong by TYPE II...

Aug 04, 2023•7 min

← Prev Next →

Hosted on Buzzsprout

For the best experience, listen in Metacast app for iOS or Android