LessWrong (Curated & Popular)

LessWrong•sites.libsyn.com

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.

If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

Last refreshed: July 2nd, 2025 at 2:42 AM ⓘ

Follow this podcast in the Metacast mobile app to refresh it and see new episodes.

Follow on

Apple Podcasts

Spotify

RSS

Podcasts are better in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

“The Best Lay Argument is not a Simple English Yud Essay ” by J Bostock

Epistemic status: these are my own opinions on AI risk communication, based primarily on my own instincts on the subject and discussions with people less involved with rationality than myself. Communication is highly subjective and I have not rigorously A/B tested messaging. I am even less confident in the quality of my responses than in the correctness of my critique. If they turn out to be true, these thoughts can probably be applied to all sorts of communication beyond AI risk. Lots of work h...

Sep 11, 2024•9 min

“My Number 1 Epistemology Book Recommendation: Inventing Temperature ” by adamShimi

In my last post, I wrote that no resource out there exactly captured my model of epistemology, which is why I wanted to share a half-baked version of it. But I do have one book which I always recommend to people who want to learn more about epistemology: Inventing Temperature by Hasok Chang. To be very clear, my recommendation is not just to get the good ideas from this book (of which there are many) from a book review or summary — it's to actually read the book, the old-school way, one word at ...

Sep 10, 2024•5 min

“That Alien Message - The Animation ” by Writer

Our new video is an adaptation of That Alien Message, by @Eliezer Yudkowsky. This time, the text has been significantly adapted, so I include it below. Part 1 Picture a world just like ours, except the people are a fair bit smarter: in this world, Einstein isn’t one in a million, he's one in a thousand. In fact, here he is now. He's made all the same discoveries, but they’re not quite as unusual: there have been lots of other discoveries. Anyway, he's out one night with a friend looking up at th...

Sep 09, 2024•15 min

“Pay Risk Evaluators in Cash, Not Equity ” by Adam Scholl

Personally, I suspect the alignment problem is hard. But even if it turns out to be easy, survival may still require getting at least the absolute basics right; currently, I think we're mostly failing even at that. Early discussion of AI risk often focused on debating the viability of various elaborate safety schemes humanity might someday devise—designing AI systems to be more like “tools” than “agents,” for example, or as purely question-answering oracles locked within some kryptonite-style bo...

Sep 07, 2024•2 min

“Survey: How Do Elite Chinese Students Feel About the Risks of AI? ” by Nick Corvino

Intro In April 2024, my colleague and I (both affiliated with Peking University) conducted a survey involving 510 students from Tsinghua University and 518 students from Peking University—China's two top academic institutions. Our focus was on their perspectives regarding the frontier risks of artificial intelligence. In the People's Republic of China (PRC), publicly accessible survey data on AI is relatively rare, so we hope this report provides some valuable insights into how people in the PRC...

Sep 07, 2024•24 min

“things that confuse me about the current AI market. ” by DMMF

Paging Gwern or anyone else who can shed light on the current state of the AI market—I have several questions. Since the release of ChatGPT, at least 17 companies, according to the LMSYS Chatbot Arena Leaderboard, have developed AI models that outperform it. These companies include Anthropic, NexusFlow, Microsoft, Mistral, Alibaba, Hugging Face, Google, Reka AI, Cohere, Meta, 01 AI, AI21 Labs, Zhipu AI, Nvidia, DeepSeek, and xAI. Since GPT-4's launch, 15 different companies have reportedly creat...

Sep 02, 2024•4 min

“Nursing doubts ” by dynomight

If you ask the internet if breastfeeding is good, you will soon learn that YOU MUST BREASTFEED because BREAST MILK = OPTIMAL FOOD FOR BABY. But if you look for evidence, you’ll discover two disturbing facts. First, there's no consensus about why breastfeeding is good. I’ve seen experts suggest at least eight possible mechanisms: Formula can’t fully reproduce the complex blend of fats, proteins and sugars in breast milk. Formula lacks various bio-active things in breast milk, like antibodies, whi...

Sep 01, 2024•18 min

“Principles for the AGI Race ” by William_S

Crossposted from https://williamrsaunders.substack.com/p/principles-for-the-agi-race Why form principles for the AGI Race? I worked at OpenAI for 3 years, on the Alignment and Superalignment teams. Our goal was to prepare for the possibility that OpenAI succeeded in its stated mission of building AGI (Artificial General Intelligence, roughly able to do most things a human can do), and then proceed on to make systems smarter than most humans. This will predictably face novel problems in controlli...

Aug 31, 2024•31 min

“The Information: OpenAI shows ‘Strawberry’ to feds, races to launch it ” by Martín Soto

Two new The Information articles with insider information on OpenAI's next models and moves. They are paywalled, but here are the new bits of information: Strawberry is more expensive and slow at inference time, but can solve complex problems on the first try without hallucinations. It seems to be an application or extension of process supervision Its main purpose is to produce synthetic data for Orion, their next big LLM But now they are also pushing to get a distillation of Strawberry into Cha...

Aug 29, 2024•6 min

“What is it to solve the alignment problem? ” by Joe Carlsmith

People often talk about “solving the alignment problem.” But what is it to do such a thing? I wanted to clarify my thinking about this topic, so I wrote up some notes. In brief, I’ll say that you’ve solved the alignment problem if you’ve: avoided a bad form of AI takeover, built the dangerous kind of superintelligent AI agents, gained access to the main benefits of superintelligence, and become able to elicit some significant portion of those benefits from some of the superintelligent AI agents ...

Aug 28, 2024•1 hr 39 min

“Limitations on Formal Verification for AI Safety ” by Andrew Dickson

In the past two years there has been increased interest in formal verification-based approaches to AI safety. Formal verification is a sub-field of computer science that studies how guarantees may be derived by deduction on fully-specified rule-sets and symbol systems. By contrast, the real world is a messy place that can rarely be straightforwardly represented in a reductionist way. In particular, physics, chemistry and biology are all complex sciences which do not have anything like complete s...

Aug 27, 2024•42 min

“Would catching your AIs trying to escape convince AI developers to slow down or undeploy? ” by Buck

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.I often talk to people who think that if frontier models were egregiously misaligned and powerful enough to pose an existential threat, you could get AI developers to slow down or undeploy models by producing evidence of their misalignment. I'm not so sure. As an extreme thought experiment, I’ll argue this could be hard even if you caught your AI red-handed trying to escape. Imagine you're running an AI lab at ...

Aug 27, 2024•7 min

“Liability regimes for AI ” by Ege Erdil

For many products, we face a choice of who to hold liable for harms that would not have occurred if not for the existence of the product. For instance, if a person uses a gun in a school shooting that kills a dozen people, there are many legal persons who in principle could be held liable for the harm: The shooter themselves, for obvious reasons. The shop that sold the shooter the weapon. The company that designs and manufactures the weapon. Which one of these is the best? I'll offer a brief and...

Aug 23, 2024•8 min

“AGI Safety and Alignment at Google DeepMind:A Summary of Recent Work ” by Rohin Shah, Seb Farquhar, Anca Dragan

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.We wanted to share a recap of our recent outputs with the AF community. Below, we fill in some details about what we have been working on, what motivated us to do it, and how we thought about its importance. We hope that this will help people build off things we have done and see how their work fits with ours. Who are we? We’re the main team at Google DeepMind working on technical approaches to existential risk...

Aug 21, 2024•19 min

“Fields that I reference when thinking about AI takeover prevention” by Buck

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a link post.Is AI takeover like a nuclear meltdown? A coup? A plane crash? My day job is thinking about safety measures that aim to reduce catastrophic risks from AI (especially risks from egregious misalignment). The two main themes of this work are the design of such measures (what's the space of techniques we might expect to be affordable and effective) and their evaluation (how do we decide which sa...

Aug 15, 2024•20 min

“WTH is Cerebrolysin, actually?” by gsfitzgerald, delton137

[This article was originally published on Dan Elton's blog, More is Different.] Cerebrolysin is an unregulated medical product made from enzymatically digested pig brain tissue. Hundreds of scientific papers claim that it boosts BDNF, stimulates neurogenesis, and can help treat numerous neural diseases. It is widely used by doctors around the world, especially in Russia and China. A recent video of Bryan Johnson injecting Cerebrolysin has over a million views on X and 570,000 views on YouTube. T...

Aug 13, 2024•38 min

“You can remove GPT2’s LayerNorm by fine-tuning for an hour” by StefanHex

This work was produced at Apollo Research, based on initial research done at MATS. LayerNorm is annoying for mechanstic interpretability research (“[...] reason #78 for why interpretability researchers hate LayerNorm” – Anthropic, 2023). Here's a Hugging Face link to a GPT2-small model without any LayerNorm. The final model is only slightly worse than a GPT2 with LayerNorm[1]: DatasetOriginal GPT2Fine-tuned GPT2 with LayerNormFine-tuned GPT without LayerNormOpenWebText (ce_loss)3.0952.9893.014 (...

Aug 10, 2024•23 min

“Leaving MIRI, Seeking Funding” by abramdemski

This is slightly old news at this point, but: as part of MIRI's recent strategy pivot, they've eliminated the Agent Foundations research team. I've been out of a job for a little over a month now. Much of my research time in the first half of the year was eaten up by engaging with the decision process that resulted in this, and later, applying to grants and looking for jobs. I haven't secured funding yet, but for my own sanity & happiness, I am (mostly) taking a break from worrying about tha...

Aug 09, 2024•4 min

“How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage” by orthonormal

This is a story about a flawed Manifold market, about how easy it is to buy significant objective-sounding publicity for your preferred politics, and about why I've downgraded my respect for all but the largest prediction markets. I've had a Manifold account for a while, but I didn't use it much until I saw and became irked by this market on the conditional probabilities of a Harris victory, split by VP pick. Jeb Bush? Really? That's not even a fun kind of wishful thinking for anyone. Please cla...

Aug 08, 2024•4 min

“This is already your second chance” by Malmesbury

Cross-posted from Substack. 1. And the sky opened, and from the celestial firmament descended a cube of ivory the size of a skyscraper, lifted by ten thousand cherubim and seraphim. And the cube slowly landed among the children of men, crushing the frail metal beams of the Golden Gate Bridge under its supernatural weight. On its surface were inscribed the secret instructions that would allow humanity to escape the imminent AI apocalypse. And these instructions were… On July 30th, 2024: print a p...

Aug 07, 2024•16 min

“0. CAST: Corrigibility as Singular Target” by Max Harms

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.What the heck is up with “corrigibility”? For most of my career, I had a sense that it was a grab-bag of properties that seemed nice in theory but hard to get in practice, perhaps due to being incompatible with agency. Then, last year, I spent some time revisiting my perspective, and I concluded that I had been deeply confused by what corrigibility even was. I now think that corrigibility is a single, intuitive...

Aug 07, 2024•20 min

“Self-Other Overlap: A Neglected Approach to AI Alignment” by Marc Carauleanu, Mike Vaiana, Judd Rosenblatt, Diogo de Lucena

Figure 1. Image generated by DALL-3 to represent the concept of self-other overlapMany thanks to Bogdan Ionut-Cirstea, Steve Byrnes, Gunnar Zarnacke, Jack Foxabbott and Seong Hah Cho for critical comments and feedback on earlier and ongoing versions of this work. Summary In this post, we introduce self-other overlap training: optimizing for similar internal representations when the model reasons about itself and others while preserving performance. There is a large body of evidence suggesting th...

Aug 07, 2024•23 min

“You don’t know how bad most things are nor precisely how they’re bad.” by Solenoid_Entity

TL;DR: Your discernment in a subject often improves as you dedicate time and attention to that subject. The space of possible subjects is huge, so on average your discernment is terrible, relative to what it could be. This is a serious problem if you create a machine that does everyone's job for them. See also: Reality has a surprising amount of detail. (You lack awareness of how bad your staircase is and precisely how your staircase is bad.) You don't know what you don't know. You forget your o...

Aug 07, 2024•9 min

“Recommendation: reports on the search for missing hiker Bill Ewasko” by eukaryote

This is a link post.Content warning: About an IRL death. Today's post isn’t so much an essay as a recommendation for two bodies of work on the same topic: Tom Mahood's blog posts and Adam “KarmaFrog1” Marsland's videos on the 2010 disappearance of Bill Ewasko, who went for a day hike in Joshua Tree National Park and dropped out of contact. 2010 – Bill Ewasko goes missing Tom Mahood's writeups on the search [Blog post, website goes down sometimes so if the site doesn’t work, check the internet ar...

Aug 07, 2024•22 min

“The ‘strong’ feature hypothesis could be wrong” by lsgos

NB. I am on the Google Deepmind language model interpretability team. But the arguments/views in this post are my own, and shouldn't be read as a team position. “It would be very convenient if the individual neurons of artificial neural networks corresponded to cleanly interpretable features of the input. For example, in an “ideal” ImageNet classifier, each neuron would fire only in the presence of a specific visual feature, such as the color red, a left-facing curve, or a dog snout” : Elhage et...

Aug 07, 2024•30 min

“‘AI achieves silver-medal standard solving International Mathematical Olympiad problems’” by gjm

This is a link post.Google DeepMind reports on a system for solving mathematical problems that allegedly is able to give complete solutions to four of the six problems on the 2024 IMO, putting it near the top of the silver-medal category. Well, actually, two systems for solving mathematical problems: AlphaProof, which is more general-purpose, and AlphaGeometry, which is specifically for geometry problems. (This is AlphaGeometry 2; they reported earlier this year on a previous version of AlphaGeo...

Jul 30, 2024•4 min

“Decomposing Agency — capabilities without desires” by owencb, Raymond D

This is a link post.What is an agent? It's a slippery concept with no commonly accepted formal definition, but informally the concept seems to be useful. One angle on it is Dennett's Intentional Stance: we think of an entity as being an agent if we can more easily predict it by treating it as having some beliefs and desires which guide its actions. Examples include cats and countries, but the central case is humans. The world is shaped significantly by the choices agents make. What might agents ...

Jul 29, 2024•24 min

“Universal Basic Income and Poverty” by Eliezer Yudkowsky

(Crossposted from Twitter) I'm skeptical that Universal Basic Income can get rid of grinding poverty, since somehow humanity's 100-fold productivity increase (since the days of agriculture) didn't eliminate poverty. Some of my friends reply, "What do you mean, poverty is still around? 'Poor' people today, in Western countries, have a lot to legitimately be miserable about, don't get me wrong; but they also have amounts of clothing and fabric that only rich merchants could afford a thousand years...

Jul 27, 2024•16 min

“Optimistic Assumptions, Longterm Planning, and ‘Cope’” by Raemon

Eliezer Yudkowsky periodically complains about people coming up with questionable plans with questionable assumptions to deal with AI, and then either: Saying "well, if this assumption doesn't hold, we're doomed, so we might as well assume it's true." Worse: coming up with cope-y reasons to assume that the assumption isn't even questionable at all. It's just a pretty reasonable worldview. Sometimes the questionable plan is "an alignment scheme, which Eliezer thinks avoids the hard part of the pr...

Jul 19, 2024•13 min

“Superbabies: Putting The Pieces Together” by sarahconstantin

This post was inspired by some talks at the recent LessOnline conference including one by LessWrong user “Gene Smith”. Let's say you want to have a “designer baby”. Genetically extraordinary in some way — super athletic, super beautiful, whatever. 6’5”, blue eyes, with a trust fund. Ethics aside[1], what would be necessary to actually do this? Fundamentally, any kind of “superbaby” or “designer baby” project depends on two steps: 1.) figure out what genes you ideally want; 2.) create an embryo w...

Jul 15, 2024•19 min

← Prev Next →

Hosted on Buzzsprout

For the best experience, listen in Metacast app for iOS or Android