LessWrong (Curated & Popular)

LessWrong•sites.libsyn.com

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.

If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

Last refreshed: July 2nd, 2025 at 2:42 AM ⓘ

Follow this podcast in the Metacast mobile app to refresh it and see new episodes.

Follow on

Apple Podcasts

Spotify

RSS

Podcasts are better in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

“Notifications Received in 30 Minutes of Class” by tanagrabeast

Introduction. If you are choosing to read this post, you've probably seen the image below depicting all the notifications students received on their phones during one class period. You probably saw it as a retweet of this tweet, or in one of Zvi's posts. Did you find this data plausible, or did you roll to disbelieve? Did you know that the image dates back to at least 2019? Does that fact make you more or less worried about the truth on the ground as of 2024? Last month, I performed an enhanced ...

May 27, 2024•16 min

“AI companies aren’t really using external evaluators” by Zach Stein-Perlman

New blog: AI Lab Watch. Subscribe on Substack. Many AI safety folks think that METR is close to the labs, with ongoing relationships that grant it access to models before they are deployed. This is incorrect. METR (then called ARC Evals) did pre-deployment evaluation for GPT-4 and Claude 2 in the first half of 2023, but it seems to have had no special access since then.[1] Other model evaluators also seem to have little access before deployment. Frontier AI labs' pre-deployment risk assessment s...

May 24, 2024•8 min

“EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024” by scasper

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.Part 13 of 12 in the Engineer's Interpretability Sequence. TL;DR On May 5, 2024, I made a set of 10 predictions about what the next sparse autoencoder (SAE) paper from Anthropic would and wouldn’t do. Today's new SAE paper from Anthropic was full of brilliant experiments and interesting insights, but it ultimately underperformed my expectations. I am beginning to be concerned that Anthropic's recent approach to...

May 24, 2024•7 min

“What’s Going on With OpenAI’s Messaging?” by ozziegoen

This is a quickly-written opinion piece, of what I understand about OpenAI. I first posted it to Facebook, where it had some discussion. Some arguments that OpenAI is making, simultaneously: OpenAI will likely reach and own transformative AI (useful for attracting talent to work there). OpenAI cares a lot about safety (good for public PR and government regulations). OpenAI isn’t making anything dangerous and is unlikely to do so in the future (good for public PR and government regulations). Open...

May 22, 2024•7 min

“Language Models Model Us” by eggsyntax

Produced as part of the MATS Winter 2023-4 program, under the mentorship of @Jessica Rumbelow One-sentence summary: On a dataset of human-written essays, we find that gpt-3.5-turbo can accurately infer demographic information about the authors from just the essay text, and suspect it's inferring much more. Introduction. Every time we sit down in front of an LLM like GPT-4, it starts with a blank slate. It knows nothing[1] about who we are, other than what it knows about users in general. But wit...

May 21, 2024•29 min

Jaan Tallinn’s 2023 Philanthropy Overview

This is a link post.to follow up my philantropic pledge from 2020, i've updated my philanthropy page with 2023 results. in 2023 my donations funded $44M worth of endpoint grants ($43.2M excluding software development and admin costs) — exceeding my commitment of $23.8M (20k times $1190.03 — the minimum price of ETH in 2023). --- First published: May 20th, 2024 Source: https://www.lesswrong.com/posts/bjqDQB92iBCahXTAj/jaan-tallinn-s-2023-philanthropy-overview --- Narrated by TYPE III AUDIO ....

May 21, 2024•51 sec

“OpenAI: Exodus” by Zvi

Previously: OpenAI: Facts From a Weekend, OpenAI: The Battle of the Board, OpenAI: Leaks Confirm the Story, OpenAI: Altman Returns, OpenAI: The Board Expands. Ilya Sutskever and Jan Leike have left OpenAI. This is almost exactly six months after Altman's temporary firing and The Battle of the Board, the day after the release of GPT-4o, and soon after a number of other recent safety-related OpenAI departures. Many others working on safety have also left recently. This is part of a longstanding pa...

May 21, 2024•1 hr 25 min

DeepMind’s ”Frontier Safety Framework” is weak and unambitious

FSF blogpost. Full document (just 6 pages; you should read it). Compare to Anthropic's RSP, OpenAI's RSP ("PF"), and METR's Key Components of an RSP. DeepMind's FSF has three steps: Create model evals for warning signs of "Critical Capability Levels" Evals should have a "safety buffer" of at least 6x effective compute so that CCLs will not be reached between evals They list 7 CCLs across "Autonomy, Biosecurity, Cybersecurity, and Machine Learning R&D" E.g. "Autonomy level 1: Capable of expan...

May 20, 2024•7 min

Do you believe in hundred dollar bills lying on the ground? Consider humming

Introduction. [Reminder: I am an internet weirdo with no medical credentials] A few months ago, I published some crude estimates of the power of nitric oxide nasal spray to hasten recovery from illness, and speculated about what it could do prophylactically. While working on that piece a nice man on Twitter alerted me to the fact that humming produces lots of nasal nitric oxide. This post is my very crude model of what kind of anti-viral gains we could expect from humming. I’ve encoded my model ...

May 18, 2024•11 min

Deep Honesty

Most people avoid saying literally false things, especially if those could be audited, like making up facts or credentials. The reasons for this are both moral and pragmatic — being caught out looks really bad, and sustaining lies is quite hard, especially over time. Let's call the habit of not saying things you know to be false ‘shallow honesty’[1]. Often when people are shallowly honest, they still choose what true things they say in a kind of locally act-consequentialist way, to try to bring ...

May 12, 2024•15 min

On Not Pulling The Ladder Up Behind You

Epistemic Status: Musing and speculation, but I think there's a real thing here. 1. When I was a kid, a friend of mine had a tree fort. If you've never seen such a fort, imagine a series of wooden boards secured to a tree, creating a platform about fifteen feet off the ground where you can sit or stand and walk around the tree. This one had a rope ladder we used to get up and down, a length of knotted rope that was tied to the tree at the top and dangled over the edge so that it reached the grou...

May 02, 2024•14 min

Mechanistically Eliciting Latent Behaviors in Language Models

Produced as part of the MATS Winter 2024 program, under the mentorship of Alex Turner (TurnTrout). TL,DR: I introduce a method for eliciting latent behaviors in language models by learning unsupervised perturbations of an early layer of an LLM. These perturbations are trained to maximize changes in downstream activations. The method discovers diverse and meaningful behaviors with just one prompt, including perturbations overriding safety training, eliciting backdoored behaviors and uncovering la...

May 02, 2024•1 hr 21 min

Ironing Out the Squiggles

Adversarial Examples: A Problem The apparent successes of the deep learning revolution conceal a dark underbelly. It may seem that we now know how to get computers to (say) check whether a photo is of a bird, but this façade of seemingly good performance is belied by the existence of adversarial examples—specially prepared data that looks ordinary to humans, but is seen radically differently by machine learning models. The differentiable nature of neural networks, which make them possible to be ...

May 01, 2024•19 min

Introducing AI Lab Watch

This is a linkpost for https://ailabwatch.orgI'm launching AI Lab Watch. I collected actions for frontier AI labs to improve AI safety, then evaluated some frontier labs accordingly. It's a collection of information on what labs should do and what labs are doing. It also has some adjacent resources, including a list of other safety-ish scorecard-ish stuff. (It's much better on desktop than mobile — don't read it on mobile.) It's in beta—leave feedback here or comment or DM me—but I basically end...

May 01, 2024•3 min

Refusal in LLMs is mediated by a single direction

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This work was produced as part of Neel Nanda's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with co-supervision from Wes Gurnee. This post is a preview for our upcoming paper, which will provide more detail into our current understanding of refusal. We thank Nina Rimsky and Daniel Paleka for the helpful conversations and review. Executive summary Modern LLMs are typically fi...

Apr 28, 2024•17 min

Funny Anecdote of Eliezer From His Sister

This comes from a podcast called 18Forty, of which the main demographic of Orthodox Jews. Eliezer's sister (Hannah) came on and talked about her Sheva Brachos, which is essentially the marriage ceremony in Orthodox Judaism. People here have likely not seen it, and I thought it was quite funny, so here it is: https://18forty.org/podcast/channah-cohen-the-crisis-of-experience/ David Bashevkin: So I want to shift now and I want to talk about something that full disclosure, we recorded this once bef...

Apr 24, 2024•4 min

Thoughts on seed oil

This is a linkpost for https://dynomight.net/seed-oil/A friend has spent the last three years hounding me about seed oils. Every time I thought I was safe, he’d wait a couple months and renew his attack: “When are you going to write about seed oils?” “Did you know that seed oils are why there's so much {obesity, heart disease, diabetes, inflammation, cancer, dementia}?” “Why did you write about {meth, the death penalty, consciousness, nukes, ethylene, abortion, AI, aliens, colonoscopies, Tunnel ...

Apr 21, 2024•34 min

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer

Yesterday Adam Shai put up a cool post which… well, take a look at the visual: Yup, it sure looks like that fractal is very noisily embedded in the residual activations of a neural net trained on a toy problem. Linearly embedded, no less. I (John) initially misunderstood what was going on in that post, but some back-and-forth with Adam convinced me that it really is as cool as that visual makes it look, and arguably even cooler. So David and I wrote up this post / some code, partly as an explain...

Apr 19, 2024•13 min

Express interest in an “FHI of the West”

TLDR: I am investigating whether to found a spiritual successor to FHI, housed under Lightcone Infrastructure, providing a rich cultural environment and financial support to researchers and entrepreneurs in the intellectual tradition of the Future of Humanity Institute. Fill out this form or comment below to express interest in being involved either as a researcher, entrepreneurial founder-type, or funder. The Future of Humanity Institute is dead: I knew that this was going to happen in some for...

Apr 18, 2024•6 min

Transformers Represent Belief State Geometry in their Residual Stream

Produced while being an affiliate at PIBBSS[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS. Work done in collaboration with @Paul Riechers, @Lucas Teixeira, @Alexander Gietelink Oldenziel, and Sarah Marzen. Paul was a MATS scholar during some portion of this work. Thanks to Paul, Lucas, Alexander, and @Guillaume Corlouer for suggestions on this writeup. Introduction. What computational structure are we building into LLMs when we train the...

Apr 17, 2024•24 min

Paul Christiano named as US AI Safety Institute Head of AI Safety

This is a linkpost for https://www.commerce.gov/news/press-releases/2024/04/us-commerce-secretary-gina-raimondo-announces-expansion-us-ai-safetyU.S. Secretary of Commerce Gina Raimondo announced today additional members of the executive leadership team of the U.S. AI Safety Institute (AISI), which is housed at the National Institute of Standards and Technology (NIST). Raimondo named Paul Christiano as Head of AI Safety, Adam Russell as Chief Vision Officer, Mara Campbell as Acting Chief Operatin...

Apr 16, 2024•2 min

[HUMAN VOICE] "On green" by Joe Carlsmith

Cross-posted from my website . Podcast version here , or search for "Joe Carlsmith Audio" on your podcast app. This essay is part of a series that I'm calling "Otherness and control in the age of AGI." I'm hoping that the individual essays can be read fairly well on their own, but see here for brief summaries of the essays that have been released thus far. Warning: spoilers for Yudkowsky's " The Sword of the Good .") Examining a philosophical vibe that I think contrasts in interesting ways with ...

Apr 12, 2024•1 hr 15 min

[HUMAN VOICE] "Toward a Broader Conception of Adverse Selection" by Ricki Heicklen

Support ongoing human narrations of LessWrong's curated posts: www.patreon.com/LWCurated This is a linkpost for https://bayesshammai.substack.com/p/conditional-on-getting-to-trade-your “I refuse to join any club that would have me as a member” -Marx [1] Adverse Selection is the phenomenon in which information asymmetries in non-cooperative environments make trading dangerous. It has traditionally been understood to describe financial markets in which buyers and sellers systematically differ, suc...

Apr 12, 2024•22 min

[HUMAN VOICE] "My PhD thesis: Algorithmic Bayesian Epistemology" by Eric Neyman

Support ongoing human narrations of LessWrong's curated posts: www.patreon.com/LWCurated In January, I defended my PhD thesis, which I called Algorithmic Bayesian Epistemology . From the preface: For me as for most students, college was a time of exploration. I took many classes, read many academic and non-academic works, and tried my hand at a few research projects. Early in graduate school, I noticed a strong commonality among the questions that I had found particularly fascinating: most of th...

Apr 12, 2024•13 min

[HUMAN VOICE] "How could I have thought that faster?" by mesaoptimizer

Support ongoing human narrations of LessWrong's curated posts: www.patreon.com/LWCurated This is a linkpost for https://twitter.com/ESYudkowsky/status/144546114693741363 I stumbled upon a Twitter thread where Eliezer describes what seems to be his cognitive algorithm that is equivalent to Tune Your Cognitive Strategies , and have decided to archive / repost it here. Source: https://www.lesswrong.com/posts/rYq6joCrZ8m62m7ej/how-could-i-have-thought-that-faster Narrated for LessWrong by Perrin Wal...

Apr 12, 2024•3 min

LLMs for Alignment Research: a safety priority?

A recent short story by Gabriel Mukobi illustrates a near-term scenario where things go bad because new developments in LLMs allow LLMs to accelerate capabilities research without a correspondingly large acceleration in safety research. This scenario is disturbingly close to the situation we already find ourselves in. Asking the best LLMs for help with programming vs technical alignment research feels very different (at least to me). LLMs might generate junk code, but you can keep pointing out t...

Apr 06, 2024•21 min

[HUMAN VOICE] "Scale Was All We Needed, At First" by Gabriel Mukobi

Support ongoing human narrations of LessWrong's curated posts: www.patreon.com/LWCurated Source: https://www.lesswrong.com/posts/xLDwCemt5qvchzgHd/scale-was-all-we-needed-at-first Narrated for LessWrong by Perrin Walker . Share feedback on this narration....

Apr 05, 2024•15 min

[HUMAN VOICE] "Using axis lines for good or evil" by dynomight

Support ongoing human narrations of LessWrong's curated posts: www.patreon.com/LWCurated Source: https://www.lesswrong.com/posts/Yay8SbQiwErRyDKGb/using-axis-lines-for-good-or-evil Narrated for LessWrong by Perrin Walker . Share feedback on this narration....

Apr 05, 2024•12 min

[HUMAN VOICE] "Social status part 1/2: negotiations over object-level preferences" by Steven Byrnes

Support ongoing human narrations of LessWrong's curated posts: www.patreon.com/LWCurated Source: https://www.lesswrong.com/posts/SPBm67otKq5ET5CWP/social-status-part-1-2-negotiations-over-object-level Narrated for LessWrong by Perrin Walker . Share feedback on this narration....

Apr 05, 2024•50 min

[HUMAN VOICE] "Acting Wholesomely" by OwenCB

Support ongoing human narrations of LessWrong's curated posts: www.patreon.com/LWCurated Source: https://www.lesswrong.com/posts/Cb7oajdrA5DsHCqKd/acting-wholesomely Narrated for LessWrong by Perrin Walker . Share feedback on this narration....

Apr 05, 2024•27 min

← Prev Next →

Hosted on Buzzsprout

For the best experience, listen in Metacast app for iOS or Android