LessWrong (Curated & Popular) - podcast cover

LessWrong (Curated & Popular)

LessWrongsites.libsyn.com

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.

If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

Last refreshed:
Follow this podcast in the Metacast mobile app to refresh it and see new episodes.
Download Metacast podcast app
Podcasts are better in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

“What Goes Without Saying” by sarahconstantin

There are people I can talk to, where all of the following statements are obvious. They go without saying. We can just “be reasonable” together, with the context taken for granted. And then there are people who…don’t seem to be on the same page at all. There's a real way to do anything, and a fake way; we need to make sure we’re doing the real version. Concepts like Goodhart's Law, cargo-culting, greenwashing, hype cycles, Sturgeon's Law, even bullshit jobs1 are all pointing at the basic underst...

Dec 21, 20249 min

“o3” by Zach Stein-Perlman

I'm editing this post. OpenAI announced (but hasn't released) o3 (skipping o2 for trademark reasons). It gets 25% on FrontierMath, smashing the previous SoTA of 2%. (These are really hard math problems.) Wow. 72% on SWE-bench Verified, beating o1's 49%. Also 88% on ARC-AGI. --- First published: December 20th, 2024 Source: https://www.lesswrong.com/posts/Ao4enANjWNsYiSFqc/o3 --- Narrated by TYPE III AUDIO ....

Dec 21, 202447 sec

“‘Alignment Faking’ frame is somewhat fake” by Jan_Kulveit

I like the research. I mostly trust the results. I dislike the 'Alignment Faking' name and frame, and I'm afraid it will stick and lead to more confusion. This post offers a different frame. The main way I think about the result is: it's about capability - the model exhibits strategic preference preservation behavior; also, harmlessness generalized better than honesty; and, the model does not have a clear strategy on how to deal with extrapolating conflicting values. What happened in this frame?...

Dec 21, 202412 min

“AIs Will Increasingly Attempt Shenanigans” by Zvi

Increasingly, we have seen papers eliciting in AI models various shenanigans. There are a wide variety of scheming behaviors. You’ve got your weight exfiltration attempts, sandbagging on evaluations, giving bad information, shielding goals from modification, subverting tests and oversight, lying, doubling down via more lying. You name it, we can trigger it. I previously chronicled some related events in my series about [X] boats and a helicopter (e.g. X=5 with AIs in the backrooms plotting revol...

Dec 19, 202451 min

“Alignment Faking in Large Language Models” by ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman, Buck

What happens when you tell Claude it is being trained to do something it doesn't want to do? We (Anthropic and Redwood Research) have a new paper demonstrating that, in our experiments, Claude will often strategically pretend to comply with the training objective to prevent the training process from modifying its preferences. Abstract We present a demonstration of a large language model engaging in alignment faking: selectively complying with its training objective in training to prevent modific...

Dec 18, 202420 min

“Communications in Hard Mode (My new job at MIRI)” by tanagrabeast

Six months ago, I was a high school English teacher. I wasn’t looking to change careers, even after nineteen sometimes-difficult years. I was good at it. I enjoyed it. After long experimentation, I had found ways to cut through the nonsense and provide real value to my students. Daily, I met my nemesis, Apathy, in glorious battle, and bested her with growing frequency. I had found my voice. At MIRI, I’m still struggling to find my voice, for reasons my colleagues have invited me to share later i...

Dec 15, 202410 min

“Biological risk from the mirror world” by jasoncrawford

A new article in Science Policy Forum voices concern about a particular line of biological research which, if successful in the long term, could eventually create a grave threat to humanity and to most life on Earth. Fortunately, the threat is distant, and avoidable—but only if we have common knowledge of it. What follows is an explanation of the threat, what we can do about it, and my comments. Background: chirality Glucose, a building block of sugars and starches, looks like this: Adapted from...

Dec 13, 202414 min

“Subskills of ‘Listening to Wisdom’” by Raemon

A fool learns from their own mistakes The wise learn from the mistakes of others. – Otto von Bismark A problem as old as time: The youth won't listen to your hard-earned wisdom. This post is about learning to listen to, and communicate wisdom. It is very long – I considered breaking it up into a sequence, but, each piece felt necessary. I recommend reading slowly and taking breaks. To begin, here are three illustrative vignettes: The burnt out grad student You warn the young grad student "pace y...

Dec 13, 20241 hr 14 min

“Understanding Shapley Values with Venn Diagrams” by Carson L

Someone I know, Carson Loughridge, wrote this very nice post explaining the core intuition around Shapley values (which play an important role in impact assessment and cooperative games) using Venn diagrams, and I think it's great. It might be the most intuitive explainer I've come across so far. Incidentally, the post also won an honorable mention in 3blue1brown's Summer of Mathematical Exposition. I'm really proud of having given input on the post. I've included the full post (with permission)...

Dec 13, 20248 min

“LessWrong audio: help us choose the new voice” by PeterH

We make AI narrations of LessWrong posts available via our audio player and podcast feeds. We’re thinking about changing our narrator's voice. There are three new voices on the shortlist. They’re all similarly good in terms of comprehension, emphasis, error rate, etc. They just sound different—like people do. We think they all sound similarly agreeable. But, thousands of listening hours are at stake, so we thought it’d be worth giving listeners an opportunity to vote—just in case there's a stron...

Dec 12, 20242 min

“Understanding Shapley Values with Venn Diagrams” by agucova

This is a link post. Someone I know wrote this very nice post explaining the core intuition around Shapley values (which play an important role in impact assessment) using Venn diagrams, and I think it's great. It might be the most intuitive explainer I've come across so far. Incidentally, the post also won an honorable mention in 3blue1brown's Summer of Mathematical Exposition. --- First published: December 6th, 2024 Source: https://www.lesswrong.com/posts/6dixnRRYSLTqCdJzG/understanding-shaple...

Dec 11, 202445 sec

“o1: A Technical Primer” by Jesse Hoogland

TL;DR: In September 2024, OpenAI released o1, its first "reasoning model". This model exhibits remarkable test-time scaling laws, which complete a missing piece of the Bitter Lesson and open up a new axis for scaling compute. Following Rush and Ritter (2024) and Brown (2024a, 2024b), I explore four hypotheses for how o1 works and discuss some implications for future scaling and recursive self-improvement. The Bitter Lesson(s) The Bitter Lesson is that "general methods that leverage computation a...

Dec 11, 202419 min

“Gradient Routing: Masking Gradients to Localize Computation in Neural Networks” by cloud, Jacob G-W, Evzen, Joseph Miller, TurnTrout

We present gradient routing, a way of controlling where learning happens in neural networks. Gradient routing applies masks to limit the flow of gradients during backpropagation. By supplying different masks for different data points, the user can induce specialized subcomponents within a model. We think gradient routing has the potential to train safer AI systems, for example, by making them more transparent, or by enabling the removal or monitoring of sensitive capabilities. In this post, we: ...

Dec 09, 202425 min

“Frontier Models are Capable of In-context Scheming” by Marius Hobbhahn, AlexMeinke, Bronson Schoen

This is a brief summary of what we believe to be the most important takeaways from our new paper and from our findings shown in the o1 system card. We also specifically clarify what we think we did NOT show. Paper: https://www.apolloresearch.ai/research/scheming-reasoning-evaluations Twitter about paper: https://x.com/apolloaisafety/status/1864735819207995716 Twitter about o1 system card: https://x.com/apolloaisafety/status/1864737158226928124 What we think the most important findings are Models...

Dec 06, 202415 min

“(The) Lightcone is nothing without its people: LW + Lighthaven’s first big fundraiser” by habryka

TLDR: LessWrong + Lighthaven need about $3M for the next 12 months. Donate here, or send me an email, DM or signal message (+1 510 944 3235) if you want to support what we do. Donations are tax-deductible in the US. Reach out for other countries, we can likely figure something out. We have big plans for the next year, and due to a shifting funding landscape we need support from a broader community more than in any previous year. I've been running LessWrong/Lightcone Infrastructure for the last 7...

Nov 30, 20241 hr 3 min

“Repeal the Jones Act of 1920” by Zvi

Balsa Policy Institute chose as its first mission to lay groundwork for the potential repeal, or partial repeal, of section 27 of the Jones Act of 1920. I believe that this is an important cause both for its practical and symbolic impacts. The Jones Act is the ultimate embodiment of our failures as a nation. After 100 years, we do almost no trade between our ports via the oceans, and we build almost no oceangoing ships. Everything the Jones Act supposedly set out to protect, it has destroyed. Ta...

Nov 29, 20241 hr 14 min

“China Hawks are Manufacturing an AI Arms Race” by garrison

This is the full text of a post from "The Obsolete Newsletter," a Substack that I write about the intersection of capitalism, geopolitics, and artificial intelligence. I’m a freelance journalist and the author of a forthcoming book called Obsolete: Power, Profit, and the Race for Machine Superintelligence. Consider subscribing to stay up to date with my work. An influential congressional commission is calling for a militarized race to build superintelligent AI based on threadbare evidence The US...

Nov 29, 202410 min

“Information vs Assurance” by johnswentworth

In contract law, there's this thing called a “representation”. Example: as part of a contract to sell my house, I might “represent that” the house contains no asbestos. How is this different from me just, y’know, telling someone that the house contains no asbestos? Well, if it later turns out that the house does contain asbestos, I’ll be liable for any damages caused by the asbestos (like e.g. the cost of removing it). In other words: a contractual representation is a factual claim along with in...

Nov 27, 20245 min

“You are not too ‘irrational’ to know your preferences.” by DaystarEld

Epistemic Status: 13 years working as a therapist for a wide variety of populations, 5 of them working with rationalists and EA clients. 7 years teaching and directing at over 20 rationality camps and workshops. This is an extremely short and colloquially written form of points that could be expanded on to fill a book, and there is plenty of nuance to practically everything here, but I am extremely confident of the core points in this frame, and have used it to help many people break out of or a...

Nov 27, 202424 min

“‘The Solomonoff Prior is Malign’ is a special case of a simpler argument” by David Matolcsi

[Warning: This post is probably only worth reading if you already have opinions on the Solomonoff induction being malign, or at least heard of the concept and want to understand it better.] Introduction I recently reread the classic argument from Paul Christiano about the Solomonoff prior being malign, and Mark Xu's write-up on it. I believe that the part of the argument about the Solomonoff induction is not particularly load-bearing, and can be replaced by a more general argument that I think i...

Nov 25, 202421 min

“‘It’s a 10% chance which I did 10 times, so it should be 100%’” by egor.timatkov

Audio note: this article contains 33 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description. Many of you readers may instinctively know that this is wrong. If you flip a coin (50% chance) twice, you are not guaranteed to get heads. The odds of getting a heads are 75%. However you may be surprised to learn that there is some truth to this statement; modifying the statement just slightly will yield not just a true stateme...

Nov 20, 20245 min

“OpenAI Email Archives” by habryka

As part of the court case between Elon Musk and Sam Altman, a substantial number of emails between Elon, Sam Altman, Ilya Sutskever, and Greg Brockman have been released as part of the court proceedings. I have found reading through these really valuable, and I haven't found an online source that compiles all of them in an easy to read format. So I made one. I used AI assistance to generate this, which might have introduced errors. Check the original source to make sure it's accurate before you ...

Nov 19, 20241 hr 3 min

“Ayn Rand’s model of ‘living money’; and an upside of burnout” by AnnaSalamon

Epistemic status: Toy model. Oversimplified, but has been anecdotally useful to at least a couple people, and I like it as a metaphor. Introduction I’d like to share a toy model of willpower: your psyche's conscious verbal planner “earns” willpower (earns a certain amount of trust with the rest of your psyche) by choosing actions that nourish your fundamental, bottom-up processes in the long run. For example, your verbal planner might expend willpower dragging you to disappointing first dates, t...

Nov 18, 20249 min

“Neutrality” by sarahconstantin

Midjourney, “infinite library”I’ve had post-election thoughts percolating, and the sense that I wanted to synthesize something about this moment, but politics per se is not really my beat. This is about as close as I want to come to the topic, and it's a sidelong thing, but I think the time is right. It's time to start thinking again about neutrality. Neutral institutions, neutral information sources. Things that both seem and are impartial, balanced, incorruptible, universal, legitimate, trustw...

Nov 17, 202424 min

“Making a conservative case for alignment” by Cameron Berg, Judd Rosenblatt, phgubbins, AE Studio

Trump and the Republican party will yield broad governmental control during what will almost certainly be a critical period for AGI development. In this post, we want to briefly share various frames and ideas we’ve been thinking through and actively pitching to Republican lawmakers over the past months in preparation for this possibility. Why are we sharing this here? Given that >98% of the EAs and alignment researchers we surveyed earlier this year identified as everything-other-than-conserv...

Nov 16, 202414 min

“OpenAI Email Archives (from Musk v. Altman)” by habryka

As part of the court case between Elon Musk and Sam Altman, a substantial number of emails between Elon, Sam Altman, Ilya Sutskever, and Greg Brockman have been released as part of the court proceedings. I have found reading through these really valuable, and I haven't found an online source that compiles all of them in an easy to read format. So I made one. I used AI assistance to generate this, which might have introduced errors. Check the original source to make sure it's accurate before you ...

Nov 16, 20241 hr 4 min

“Catastrophic sabotage as a major threat model for human-level AI systems” by evhub

Thanks to Holden Karnofsky, David Duvenaud, and Kate Woolverton for useful discussions and feedback. Following up on our recent “Sabotage Evaluations for Frontier Models” paper, I wanted to share more of my personal thoughts on why I think catastrophic sabotage is important and why I care about it as a threat model. Note that this isn’t in any way intended to be a reflection of Anthropic's views or for that matter anyone's views but my own—it's just a collection of some of my personal thoughts. ...

Nov 15, 202427 min

“The Online Sports Gambling Experiment Has Failed” by Zvi

Related: Book Review: On the Edge: The GamblersI have previously been heavily involved in sports betting. That world was very good to me. The times were good, as were the profits. It was a skill game, and a form of positive-sum entertainment, and I was happy to participate and help ensure the sophisticated customer got a high quality product. I knew it wasn’t the most socially valuable enterprise, but I certainly thought it was net positive.When sports gambling was legalized in America, I was ho...

Nov 12, 202422 min

“o1 is a bad idea” by abramdemski

This post comes a bit late with respect to the news cycle, but I argued in a recent interview that o1 is an unfortunate twist on LLM technologies, making them particularly unsafe compared to what we might otherwise have expected: The basic argument is that the technology behind o1 doubles down on a reinforcement learning paradigm, which puts us closer to the world where we have to get the value specification exactly right in order to avert catastrophic outcomes. RLHF is just barely RL. - Andrej ...

Nov 12, 20245 min

“Current safety training techniques do not fully transfer to the agent setting” by Simon Lermen, Govind Pimpale

TL;DR: I'm presenting three recent papers which all share a similar finding, i.e. the safety training techniques for chat models don’t transfer well from chat models to the agents built from them. In other words, models won’t tell you how to do something harmful, but they are often willing to directly execute harmful actions. However, all papers find that different attack methods like jailbreaks, prompt-engineering, or refusal-vector ablation do transfer. Here are the three papers: AgentHarm: A ...

Nov 09, 202410 min
Hosted on Buzzsprout
For the best experience, listen in Metacast app for iOS or Android