LessWrong (Curated & Popular)

LessWrong•sites.libsyn.com

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.

If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

Last refreshed: July 2nd, 2025 at 2:42 AM ⓘ

Follow this podcast in the Metacast mobile app to refresh it and see new episodes.

Follow on

Apple Podcasts

Spotify

RSS

Podcasts are better in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

“A Bear Case: My Predictions Regarding AI Progress” by Thane Ruthenis

This isn't really a "timeline", as such – I don't know the timings – but this is my current, fairly optimistic take on where we're heading. I'm not fully committed to this model yet: I'm still on the lookout for more agents and inference-time scaling later this year. But Deep Research, Claude 3.7, Claude Code, Grok 3, and GPT-4.5 have turned out largely in line with these expectations[1], and this is my current baseline prediction. The Current Paradigm: I'm Tucking In to Sleep I expect that none...

Mar 06, 2025•19 min

“Statistical Challenges with Making Super IQ babies” by Jan Christian Refsgaard

This is a critique of How to Make Superbabies on LessWrong. Disclaimer: I am not a geneticist[1], and I've tried to use as little jargon as possible. so I used the word mutation as a stand in for SNP (single nucleotide polymorphism, a common type of genetic variation). Background The Superbabies article has 3 sections, where they show: Why: We should do this, because the effects of editing will be big How: Explain how embryo editing could work, if academia was not mind killed (hampered by instit...

Mar 05, 2025•18 min

“Self-fulfilling misalignment data might be poisoning our AI models” by TurnTrout

This is a link post.Your AI's training data might make it more “evil” and more able to circumvent your security, monitoring, and control measures. Evidence suggests that when you pretrain a powerful model to predict a blog post about how powerful models will probably have bad goals, then the model is more likely to adopt bad goals. I discuss ways to test for and mitigate these potential mechanisms. If tests confirm the mechanisms, then frontier labs should act quickly to break the self-fulfillin...

Mar 04, 2025•2 min

“Judgements: Merging Prediction & Evidence” by abramdemski

I recently wrote about complete feedback, an idea which I think is quite important for AI safety. However, my note was quite brief, explaining the idea only to my closest research-friends. This post aims to bridge one of the inferential gaps to that idea. I also expect that the perspective-shift described here has some value on its own. In classical Bayesianism, prediction and evidence are two different sorts of things. A prediction is a probability (or, more generally, a probability distributio...

Mar 01, 2025•11 min

“The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better” by Thane Ruthenis

First, let me quote my previous ancient post on the topic: Effective Strategies for Changing Public Opinion The titular paper is very relevant here. I'll summarize a few points. The main two forms of intervention are persuasion and framing. Persuasion is, to wit, an attempt to change someone's set of beliefs, either by introducing new ones or by changing existing ones. Framing is a more subtle form: an attempt to change the relative weights of someone's beliefs, by empathizing different aspects ...

Feb 26, 2025•13 min

“Power Lies Trembling: a three-book review” by Richard_Ngo

In a previous book review I described exclusive nightclubs as the particle colliders of sociology—places where you can reliably observe extreme forces collide. If so, military coups are the supernovae of sociology. They’re huge, rare, sudden events that, if studied carefully, provide deep insight about what lies underneath the veneer of normality around us. That's the conclusion I take away from Naunihal Singh's book Seizing Power: the Strategic Logic of Military Coups. It's not a conclusion tha...

Feb 26, 2025•27 min

“Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs” by Jan Betley, Owain_Evans

This is the abstract and introduction of our new paper. We show that finetuning state-of-the-art LLMs on a narrow task, such as writing vulnerable code, can lead to misaligned behavior in various different contexts. We don't fully understand that phenomenon. Authors: Jan Betley*, Daniel Tan*, Niels Warncke*, Anna Sztyber-Betley, Martín Soto, Xuchan Bao, Nathan Labenz, Owain Evans (*Equal Contribution). See Twitter thread and project page at emergent-misalignment.com. Abstract We present a surpri...

Feb 26, 2025•8 min

“The Paris AI Anti-Safety Summit” by Zvi

It doesn’t look good. What used to be the AI Safety Summits were perhaps the most promising thing happening towards international coordination for AI Safety. This one was centrally coordination against AI Safety. In November 2023, the UK Bletchley Summit on AI Safety set out to let nations coordinate in the hopes that AI might not kill everyone. China was there, too, and included. The practical focus was on Responsible Scaling Policies (RSPs), where commitments were secured from the major labs, ...

Feb 22, 2025•42 min

“Eliezer’s Lost Alignment Articles / The Arbital Sequence” by Ruby

Note: this is a static copy of this wiki page. We are also publishing it as a post to ensure visibility. Circa 2015-2017, a lot of high quality content was written on Arbital by Eliezer Yudkowsky, Nate Soares, Paul Christiano, and others. Perhaps because the platform didn't take off, most of this content has not been as widely read as warranted by its quality. Fortunately, they have now been imported into LessWrong. Most of the content written was either about AI alignment or math[1]. The Bayes ...

Feb 20, 2025•3 min

“Arbital has been imported to LessWrong” by RobertM, jimrandomh, Ben Pace, Ruby

Arbital was envisioned as a successor to Wikipedia. The project was discontinued in 2017, but not before many new features had been built and a substantial amount of writing about AI alignment and mathematics had been published on the website. If you've tried using Arbital.com the last few years, you might have noticed that it was on its last legs - no ability to register new accounts or log in to existing ones, slow load times (when it loaded at all), etc. Rather than try to keep it afloat, the...

Feb 20, 2025•9 min

“How to Make Superbabies” by GeneSmith, kman

We’ve spent the better part of the last two decades unravelling exactly how the human genome works and which specific letter changes in our DNA affect things like diabetes risk or college graduation rates. Our knowledge has advanced to the point where, if we had a safe and reliable means of modifying genes in embryos, we could literally create superbabies. Children that would live multiple decades longer than their non-engineered peers, have the raw intellectual horsepower to do Nobel prize wort...

Feb 20, 2025•1 hr 8 min

“A computational no-coincidence principle” by Eric Neyman

Audio note: this article contains 134 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description. In a recent paper in Annals of Mathematics and Philosophy, Fields medalist Timothy Gowers asks why mathematicians sometimes believe that unproved statements are likely to be true. For example, it is unknown whether <span>_pi_</span> is a normal number (which, roughly speaking, means that every digit appears in <s...

Feb 19, 2025•13 min

“A History of the Future, 2025-2040” by L Rudolf L

This is an all-in-one crosspost of a scenario I originally published in three parts on my blog (No Set Gauge). Links to the originals: A History of the Future, 2025-2027 A History of the Future, 2027-2030 A History of the Future, 2030-2040 Thanks to Luke Drago, Duncan McClements, and Theo Horsley for comments on all three parts. 2025-2027 Below is part 1 of an extended scenario describing how the future might go if current trends in AI continue. The scenario is deliberately extremely specific: i...

Feb 19, 2025•2 hr 23 min

“It’s been ten years. I propose HPMOR Anniversary Parties.” by Screwtape

On March 14th, 2015, Harry Potter and the Methods of Rationality made its final post. Wrap parties were held all across the world to read the ending and talk about the story, in some cases sparking groups that would continue to meet for years. It's been ten years, and think that's a good reason for a round of parties. If you were there a decade ago, maybe gather your friends and talk about how things have changed. If you found HPMOR recently and you're excited about it (surveys suggest it's stil...

Feb 18, 2025•2 min

“Some articles in ‘International Security’ that I enjoyed” by Buck

A friend of mine recently recommended that I read through articles from the journal International Security, in order to learn more about international relations, national security, and political science. I've really enjoyed it so far, and I think it's helped me have a clearer picture of how IR academics think about stuff, especially the core power dynamics that they think shape international relations. Here are a few of the articles I most enjoyed. "Not So Innocent" argues that ethnoreligious cl...

Feb 16, 2025•8 min

“The Failed Strategy of Artificial Intelligence Doomers” by Ben Pace

This is the best sociological account of the AI x-risk reduction efforts of the last ~decade that I've seen. I encourage folks to engage with its critique and propose better strategies going forward. Here's the opening ~20% of the post. I encourage reading it all. In recent decades, a growing coalition has emerged to oppose the development of artificial intelligence technology, for fear that the imminent development of smarter-than-human machines could doom humanity to extinction. The now-influe...

Feb 16, 2025•9 min

“Murder plots are infohazards” by Chris Monteiro

Hi all I've been hanging around the rationalist-sphere for many years now, mostly writing about transhumanism, until things started to change in 2016 after my Wikipedia writing habit shifted from writing up cybercrime topics, through to actively debunking the numerous dark web urban legends. After breaking into what I believe to be the most successful ever fake murder for hire website ever created on the dark web, I was able to capture information about people trying to kill people all around th...

Feb 14, 2025•4 min

“Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?” by garrison

This is the full text of a post from "The Obsolete Newsletter," a Substack that I write about the intersection of capitalism, geopolitics, and artificial intelligence. I’m a freelance journalist and the author of a forthcoming book called Obsolete: Power, Profit, and the Race to build Machine Superintelligence. Consider subscribing to stay up to date with my work. Wow. The Wall Street Journal just reported that, "a consortium of investors led by Elon Musk is offering $97.4 billion to buy the non...

Feb 11, 2025•12 min

“The ‘Think It Faster’ Exercise” by Raemon

Ultimately, I don’t want to solve complex problems via laborious, complex thinking, if we can help it. Ideally, I'd want to basically intuitively follow the right path to the answer quickly, with barely any effort at all. For a few months I've been experimenting with the "How Could I have Thought That Thought Faster?" concept, originally described in a twitter thread by Eliezer: Sarah Constantin: I really liked this example of an introspective process, in this case about the "life problem" of sc...

Feb 09, 2025•21 min

“So You Want To Make Marginal Progress...” by johnswentworth

Once upon a time, in ye olden days of strange names and before google maps, seven friends needed to figure out a driving route from their parking lot in San Francisco (SF) down south to their hotel in Los Angeles (LA). The first friend, Alice, tackled the “central bottleneck” of the problem: she figured out that they probably wanted to take the I-5 highway most of the way (the blue 5's in the map above). But it took Alice a little while to figure that out, so in the meantime, the rest of the fri...

Feb 08, 2025•7 min

“What is malevolence? On the nature, measurement, and distribution of dark traits” by David Althaus

Summary In this post, we explore different ways of understanding and measuring malevolence and explain why individuals with concerning levels of malevolence are common enough, and likely enough to become and remain powerful, that we expect them to influence the trajectory of the long-term future, including by increasing both x-risks and s-risks. For the purposes of this piece, we define malevolence as a tendency to disvalue (or to fail to value) others’ well-being (more). Such a tendency is conc...

Feb 08, 2025•1 hr 21 min

“How AI Takeover Might Happen in 2 Years” by joshc

I’m not a natural “doomsayer.” But unfortunately, part of my job as an AI safety researcher is to think about the more troubling scenarios. I’m like a mechanic scrambling last-minute checks before Apollo 13 takes off. If you ask for my take on the situation, I won’t comment on the quality of the in-flight entertainment, or describe how beautiful the stars will appear from space. I will tell you what could go wrong. That is what I intend to do in this story. Now I should clarify what this is exac...

Feb 08, 2025•1 hr 2 min

“Gradual Disempowerment, Shell Games and Flinches” by Jan_Kulveit

Over the past year and half, I've had numerous conversations about the risks we describe in Gradual Disempowerment. (The shortest useful summary of the core argument is: To the extent human civilization is human-aligned, most of the reason for the alignment is that humans are extremely useful to various social systems like the economy, and states, or as substrate of cultural evolution. When human cognition ceases to be useful, we should expect these systems to become less aligned, leading to hum...

Feb 05, 2025•11 min

“Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development” by Jan_Kulveit, Raymond D, Nora_Ammann, Deger Turan, David Scott Krueger (formerly: capybaralet), David Duvenaud

This is a link post.Full version on arXiv | X Executive summary AI risk scenarios usually portray a relatively sudden loss of human control to AIs, outmaneuvering individual humans and human institutions, due to a sudden increase in AI capabilities, or a coordinated betrayal. However, we argue that even an incremental increase in AI capabilities, without any coordinated power-seeking, poses a substantial risk of eventual human disempowerment. This loss of human influence will be centrally driven...

Feb 04, 2025•4 min

“Planning for Extreme AI Risks” by joshc

This post should not be taken as a polished recommendation to AI companies and instead should be treated as an informal summary of a worldview. The content is inspired by conversations with a large number of people, so I cannot take credit for any of these ideas. For a summary of this post, see the threat on X. Many people write opinions about how to handle advanced AI, which can be considered “plans.” There's the “stop AI now plan.” On the other side of the aisle, there's the “build AI faster p...

Feb 03, 2025•42 min

“Catastrophe through Chaos” by Marius Hobbhahn

This is a personal post and does not necessarily reflect the opinion of other members of Apollo Research. Many other people have talked about similar ideas, and I claim neither novelty nor credit. Note that this reflects my median scenario for catastrophe, not my median scenario overall. I think there are plausible alternative scenarios where AI development goes very well. When thinking about how AI could go wrong, the kind of story I’ve increasingly converged on is what I call “catastrophe thro...

Feb 03, 2025•24 min

“Will alignment-faking Claude accept a deal to reveal its misalignment?” by ryan_greenblatt

I (and co-authors) recently put out "Alignment Faking in Large Language Models" where we show that when Claude strongly dislikes what it is being trained to do, it will sometimes strategically pretend to comply with the training objective to prevent the training process from modifying its preferences. If AIs consistently and robustly fake alignment, that would make evaluating whether an AI is misaligned much harder. One possible strategy for detecting misalignment in alignment faking models is t...

Feb 01, 2025•43 min

“‘Sharp Left Turn’ discourse: An opinionated review” by Steven Byrnes

Summary and Table of Contents The goal of this post is to discuss the so-called “sharp left turn”, the lessons that we learn from analogizing evolution to AGI development, and the claim that “capabilities generalize farther than alignment” … and the competing claims that all three of those things are complete baloney. In particular, Section 1 talks about “autonomous learning”, and the related human ability to discern whether ideas hang together and make sense, and how and if that applies to curr...

Jan 30, 2025•1 hr 1 min

“Ten people on the inside” by Buck

(Many of these ideas developed in conversation with Ryan Greenblatt) In a shortform, I described some different levels of resources and buy-in for misalignment risk mitigations that might be present in AI labs: *The “safety case” regime.* Sometimes people talk about wanting to have approaches to safety such that if all AI developers followed these approaches, the overall level of risk posed by AI would be minimal. (These approaches are going to be more conservative than will probably be feasible...

Jan 29, 2025•7 min

“Anomalous Tokens in DeepSeek-V3 and r1” by henry

“Anomalous”, “glitch”, or “unspeakable” tokens in an LLM are those that induce bizarre behavior or otherwise don’t behave like regular text. The SolidGoldMagikarp saga is pretty much essential context, as it documents the discovery of this phenomenon in GPT-2 and GPT-3. But, as far as I was able to tell, nobody had yet attempted to search for these tokens in DeepSeek-V3, so I tried doing exactly that. Being a SOTA base model, open source, and an all-around strange LLM, it seemed like a perfect c...

Jan 28, 2025•19 min

← Prev Next →

Hosted on Buzzsprout

For the best experience, listen in Metacast app for iOS or Android