LessWrong (Curated & Popular)

LessWrong•sites.libsyn.com

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.

If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

Last refreshed: July 2nd, 2025 at 2:42 AM ⓘ

Follow this podcast in the Metacast mobile app to refresh it and see new episodes.

Follow on

Apple Podcasts

Spotify

RSS

Podcasts are better in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

"Thoughts on the AI Safety Summit company policy requests and responses" by So8res

Over the next two days, the UK government is hosting an AI Safety Summit focused on “the safe and responsible development of frontier AI”. They requested that seven companies (Amazon, Anthropic, DeepMind, Inflection, Meta, Microsoft, and OpenAI) “outline their AI Safety Policies across nine areas of AI Safety”. Below, I’ll give my thoughts on the nine areas the UK government described; I’ll note key priorities that I don’t think are addressed by company-side policy at all; and I’ll say a few wor...

Nov 03, 2023•21 min

"President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence" by Tristan Williams

This is a linkpost for https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-executive-order-on-safe-secure-and-trustworthy-artificial-intelligence/ Released today (10/30/23) this is crazy, perhaps the most sweeping action taken by government on AI yet. Below, I've segmented by x-risk and non-x-risk related proposals, excluding the proposals that are geared towards promoting its use and focusing solely on those aimed at risk. It's worth noting ...

Nov 03, 2023•6 min

[Human Voice] "Book Review: Going Infinite" by Zvi

Support ongoing human narrations of curated posts: www.patreon.com/LWCurated Previously: Sadly, FTX I doubted whether it would be a good use of time to read Michael Lewis’s new book Going Infinite about Sam Bankman-Fried (hereafter SBF or Sam). What would I learn that I did not already know? Was Michael Lewis so far in the tank of SBF that the book was filled with nonsense and not to be trusted? I set up a prediction market , which somehow attracted over a hundred traders. Opinions were mixed. T...

Oct 31, 2023•2 hr 40 min

"We're Not Ready: thoughts on "pausing" and responsible scaling policies" by Holden Karnofsky

Views are my own, not Open Philanthropy’s. I am married to the President of Anthropic and have a financial interest in both Anthropic and OpenAI via my spouse. Over the last few months, I’ve spent a lot of my time trying to help out with efforts to get responsible scaling policies adopted. In that context, a number of people have said it would be helpful for me to be publicly explicit about whether I’m in favor of an AI pause . This post will give some thoughts on these topics. Source: https://w...

Oct 30, 2023•12 min

"At 87, Pearl is still able to change his mind" by rotatingpaguro

Judea Pearl is a famous researcher, known for Bayesian networks (the standard way of representing Bayesian models), and his statistical formalization of causality. Although he has always been recommended reading here , he's less of a staple compared to, say, Jaynes. So the need to re-introduce him. My purpose here is to highlight a soothing, unexpected show of rationality on his part. One year ago I reviewed his last book, The Book of Why, in a failed [1] submission to the ACX book review contes...

Oct 30, 2023•10 min

"Architects of Our Own Demise: We Should Stop Developing AI" by Roko

Some brief thoughts at a difficult time in the AI risk debate. Imagine you go back in time to the year 1999 and tell people that in 24 years time, humans will be on the verge of building weakly superhuman AI systems. I remember watching the anime short series The Animatrix at roughly this time, in particular a story called The Second Renaissance I part 2 II part 1 II part 2 . For those who haven't seen it, it is a self-contained origin tale for the events in the seminal 1999 movie The Matrix, te...

Oct 30, 2023•6 min

"AI as a science, and three obstacles to alignment strategies" by Nate Soares

AI used to be a science. In the old days (back when AI didn't work very well), people were attempting to develop a working theory of cognition. Those scientists didn’t succeed, and those days are behind us. For most people working in AI today and dividing up their work hours between tasks, gone is the ambition to understand minds. People working on mechanistic interpretability (and others attempting to build an empirical understanding of modern AIs) are laying an important foundation stone that ...

Oct 30, 2023•18 min

"Thoughts on responsible scaling policies and regulation" by Paul Christiano

I am excited about AI developers implementing responsible scaling policies ; I’ve recently been spending time refining this idea and advocating for it. Most people I talk to are excited about RSPs, but there is also some uncertainty and pushback about how they relate to regulation. In this post I’ll explain my views on that: I think that sufficiently good responsible scaling policies could dramatically reduce risk, and that preliminary policies like Anthropic’s RSP meaningfully reduce risk by cr...

Oct 30, 2023•11 min

"Announcing Timaeus" by Jesse Hoogland et al.

Timaeus is a new AI safety research organization dedicated to making fundamental breakthroughs in technical AI alignment using deep ideas from mathematics and the sciences. Currently, we are working on singular learning theory and developmental interpretability . Over time we expect to work on a broader research agenda, and to create understanding-based evals informed by our research. Source: https://www.lesswrong.com/posts/nN7bHuHZYaWv9RDJL/announcing-timaeus Narrated for LessWrong by TYPE III ...

Oct 30, 2023•11 min

[HUMAN VOICE] "Alignment Implications of LLM Successes: a Debate in One Act" by Zack M Davis

Support ongoing human narrations of curated posts: www.patreon.com/LWCurated Doomimir : Humanity has made no progress on the alignment problem. Not only do we have no clue how to align a powerful optimizer to our "true" values, we don't even know how to make AI "corrigible"—willing to let us correct it. Meanwhile, capabilities continue to advance by leaps and bounds. All is lost. Simplicia : Why, Doomimir Doomovitch, you're such a sourpuss! It should be clear by now that advances in "alignment"—...

Oct 23, 2023•26 min

"Holly Elmore and Rob Miles dialogue on AI Safety Advocacy" by jacobjacob, Robert Miles & Holly_Elmore

Holly is an independent AI Pause organizer, which includes organizing protests (like this upcoming one ). Rob is an AI Safety YouTuber . I (jacobjacob) brought them together for this dialogue, because I've been trying to figure out what I should think of AI safety protests, which seems like a possibly quite important intervention; and Rob and Holly seemed like they'd have thoughtful and perhaps disagreeing perspectives. Quick clarification: At one point they discuss a particular protest, which i...

Oct 23, 2023•50 min

"LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B" by Simon Lermen & Jeffrey Ladish.

Produced as part of the SERI ML Alignment Theory Scholars Program - Summer 2023 Cohort, under the mentorship of Jeffrey Ladish. TL;DR LoRA fine-tuning undoes the safety training of Llama 2-Chat 70B with one GPU and a budget of less than $200. The resulting models [1] maintain helpful capabilities without refusing to fulfill harmful instructions. We show that, if model weights are released, safety fine-tuning does not effectively prevent model misuse. Consequently, we encourage Meta to reconsider...

Oct 23, 2023•33 min

"Labs should be explicit about why they are building AGI" by Peter Barnett

Three of the big AI labs say that they care about alignment and that they think misaligned AI poses a potentially existential threat to humanity. These labs continue to try to build AGI. I think this is a very bad idea. The leaders of the big labs are clear that they do not know how to build safe, aligned AGI. The current best plan is to punt the problem to a (different) AI, and hope that can solve it. It seems clearly like a bad idea to try and build AGI when you don’t know how to control it, e...

Oct 19, 2023•3 min

[HUMAN VOICE] "Sum-threshold attacks" by TsviBT

Support ongoing human narrations of curated posts: www.patreon.com/LWCurated How do you affect something far away, a lot, without anyone noticing? (Note: you can safely skip sections. It is also safe to skip the essay entirely, or to read the whole thing backwards if you like.) Source: https://www.lesswrong.com/posts/R3eDrDoX8LisKgGZe/sum-threshold-attacks Narrated for LessWrong by Perrin Walker . Share feedback on this narration. [125+ Karma Post] ✓ [Curated Post] ✓...

Oct 18, 2023•21 min

"Will no one rid me of this turbulent pest?" by Metacelsus

Last year, I wrote about the promise of gene drives to wipe out mosquito species and end malaria. In the time since my previous writing, gene drives have still not been used in the wild, and over 600,000 people have died of malaria. Although there are promising new developments such as malaria vaccines , there have also been some pretty bad setbacks (such as mosquitoes and parasites developing resistance to commonly used chemicals), and malaria deaths have increased slightly from a few years ago...

Oct 18, 2023•17 min

[HUMAN VOICE] "Inside Views, Impostor Syndrome, and the Great LARP" by John Wentworth

Patreon to support human narration. (Narrations will remain freely available on this feed, but you can optionally support them if you'd like me to keep making them.) *** Epistemic status: model which I find sometimes useful, and which emphasizes some true things about many parts of the world which common alternative models overlook. Probably not correct in full generality. Consider Yoshua Bengio, one of the people who won a Turing Award for deep learning research. Looking at his work, he clearly...

Oct 15, 2023•10 min

"RSPs are pauses done right" by evhub

COI: I am a research scientist at Anthropic, where I work on model organisms of misalignment ; I was also involved in the drafting process for Anthropic’s RSP . Prior to joining Anthropic, I was a Research Fellow at MIRI for three years. Thanks to Kate Woolverton, Carson Denison, and Nicholas Schiefer for useful feedback on this post. Recently, there’s been a lot of discussion and advocacy around AI pauses—which, to be clear, I think is great: pause advocacy pushes in the right direction and wor...

Oct 15, 2023•12 min

"Comparing Anthropic's Dictionary Learning to Ours" by Robert_AIZI

Readers may have noticed many similarities between Anthropic's recent publication Towards Monosemanticity: Decomposing Language Models With Dictionary Learning ( LW post ) and my team's recent publication Sparse Autoencoders Find Highly Interpretable Directions in Language Models ( LW post ). Here I want to compare our techniques and highlight what we did similarly or differently. My hope in writing this is to help readers understand the similarities and differences, and perhaps to lay the groun...

Oct 15, 2023•9 min

"Announcing MIRI’s new CEO and leadership team" by Gretta Duleba

In 2023, MIRI has shifted focus in the direction of broad public communication—see, for example, our recent TED talk , our piece in TIME magazine “ Pausing AI Developments Isn’t Enough. We Need to Shut it All Down ”, and our appearances on various podcasts. While we’re continuing to support various technical research programs at MIRI, this is no longer our top priority, at least for the foreseeable future. Coinciding with this shift in focus, there have also been many organizational changes at M...

Oct 15, 2023•7 min

"Cohabitive Games so Far" by mako yass

A cohabitive game [1] is a partially cooperative, partially competitive multiplayer game that provides an anarchic dojo for development in applied cooperative bargaining, or negotiation. Applied cooperative bargaining isn't currently taught, despite being an infrastructural literacy for peace, trade, democracy or any other form of pluralism. We suffer for that. There are many good board games that come close to meeting the criteria of a cohabitive game today, but they all [2] miss in one way or ...

Oct 15, 2023•32 min

"Announcing Dialogues" by Ben Pace

As of today, everyone is able to create a new type of content on LessWrong: Dialogues . In contrast with posts, which are for monologues, and comment sections, which are spaces for everyone to talk to everyone, a dialogue is a space for a few invited people to speak with each other . I'm personally very excited about this as a way for people to produce lots of in-depth explanations of their world-models in public. I think dialogues enable this in a way that feels easier — instead of writing an e...

Oct 09, 2023•7 min

"Response to Quintin Pope’s Evolution Provides No Evidence For the Sharp Left Turn" by Zvi

Response to: Evolution Provides No Evidence For the Sharp Left Turn , due to it winning first prize in The Open Philanthropy Worldviews contest . Quintin’s post is an argument about a key historical reference class and what it tells us about AI. Instead of arguing that the reference makes his point, he is instead arguing that it doesn’t make anyone’s point - that we understand the reasons for humanity’s sudden growth in capabilities. He says this jump was caused by gaining access to cultural tra...

Oct 09, 2023•17 min

"Evaluating the historical value misspecification argument" by Matthew Barnett

ETA: I'm not saying that MIRI thought AIs wouldn't understand human values. If there's only one thing you take away from this post, please don't take away that. Recently, many people have talked about whether some of the main MIRI people (Eliezer Yudkowsky, Nate Soares, and Rob Bensinger [1] ) should update on whether value alignment is easier than they thought given that GPT-4 seems to follow human directions and act within moral constraints pretty well (here are two specific examples of people...

Oct 09, 2023•11 min

"Towards Monosemanticity: Decomposing Language Models With Dictionary Learning" by Zac Hatfield-Dodds

Neural networks are trained on data, not programmed to follow rules. We understand the math of the trained network exactly – each neuron in a neural network performs simple arithmetic – but we don't understand why those mathematical operations result in the behaviors we see. This makes it hard to diagnose failure modes, hard to know how to fix them, and hard to certify that a model is truly safe. Luckily for those of us trying to understand artificial neural networks, we can simultaneously recor...

Oct 09, 2023•5 min

"Thomas Kwa's MIRI research experience" by Thomas Kwa and others

Moderator note: the following is a dialogue using LessWrong’s new dialogue feature. The exchange is not completed: new replies might be added continuously, the way a comment thread might work. If you’d also be excited about finding an interlocutor to debate, dialogue, or getting interviewed by: fill in this dialogue matchmaking form . Hi Thomas, I'm quite curious to hear about your research experience working with MIRI. To get us started: When were you at MIRI? Who did you work with? And what pr...

Oct 06, 2023•52 min

"'Diamondoid bacteria' nanobots: deadly threat or dead-end? A nanotech investigation" by titotal

A lot of people are highly concerned that a malevolent AI or insane human will, in the near future, set out to destroy humanity. If such an entity wanted to be absolutely sure they would succeed, what method would they use? Nuclear war? Pandemics? According to some in the x-risk community, the answer is this: The AI will invent molecular nanotechnology, and then kill us all with diamondoid bacteria nanobots. Source: https://www.lesswrong.com/posts/bc8Ssx5ys6zqu3eq9/diamondoid-bacteria-nanobots-d...

Oct 03, 2023•37 min

"The Lighthaven Campus is open for bookings" by Habryka

Lightcone Infrastructure (the organization that grew from and houses the LessWrong team) has just finished renovating a 7-building physical campus that we hope to use to make the future of humanity go better than it would otherwise. We're hereby announcing that it is generally available for bookings. We offer preferential pricing for projects we think are good for the world, but to cover operating costs, we're renting out space to a wide variety of people/projects. Source: https://www.lesswrong....

Oct 03, 2023•6 min

"How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions" by Jan Brauner et al.

Large language models (LLMs) can "lie", which we define as outputting false statements despite "knowing" the truth in a demonstrable sense. LLMs might "lie", for example, when instructed to output misinformation. Here, we develop a simple lie detector that requires neither access to the LLM's activations (black-box) nor ground-truth knowledge of the fact in question. The detector works by asking a predefined set of unrelated follow-up questions after a suspected lie, and feeding the LLM's yes/no...

Oct 03, 2023•7 min

"EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem" by Elizabeth

Effective altruism prides itself on truthseeking. That pride is justified in the sense that EA is better at truthseeking than most members of its reference category, and unjustified in that it is far from meeting its own standards. We’ve already seen dire consequences of the inability to detect bad actors who deflect investigation into potential problems, but by its nature you can never be sure you’ve found all the damage done by epistemic obfuscation because the point is to be self-cloaking. My...

Oct 03, 2023•42 min

"The King and the Golem" by Richard Ngo

This is a linkpost for https://narrativeark.substack.com/p/the-king-and-the-golem Long ago there was a mighty king who had everything in the world that he wanted, except trust. Who could he trust, when anyone around him might scheme for his throne? So he resolved to study the nature of trust, that he might figure out how to gain it. He asked his subjects to bring him the most trustworthy thing in the kingdom, promising great riches if they succeeded. Soon, the first of them arrived at his palace...

Sep 29, 2023•8 min

← Prev Next →

Hosted on Buzzsprout

For the best experience, listen in Metacast app for iOS or Android