This is a link post.Editor's note: Somewhat after I posted this on my own blog, Max Chiswick cornered me at LessOnline / Manifest and gave me a whole new perspective on this topic. I now believe that there is a way to use poker to sharpen epistemics that works dramatically better than anything I had been considering. I hope to write it up—together with Max—when I have time. Anyway, I'm still happy to keep this post around as a record of my first thoughts on the matter, and because it's better th...
Jul 12, 2024•18 min
This is a linkpost for https://www.tracingwoodgrains.com/p/reliable-sources-how-wikipedia-admin, posted in full here given its relevance to this community. Gerard has been one of the longest-standing malicious critics of the rationalist and EA communities and has done remarkable amounts of work to shape their public images behind the scenes. Note: I am closer to this story than to many of my others. As always, I write aiming to provide a thorough and honest picture, but this should be read as th...
Jul 11, 2024•1 hr 22 min
xlr8harder writes: In general I don’t think an uploaded mind is you, but rather a copy. But one thought experiment makes me question this. A Ship of Theseus concept where individual neurons are replaced one at a time with a nanotechnological functional equivalent. Are you still you? Presumably the question xlr8harder cares about here isn't semantic question of how linguistic communities use the word "you", or predictions about how whole-brain emulation tech might change the way we use pronouns. ...
Jul 08, 2024•27 min
I haven't shared this post with other relevant parties – my experience has been that private discussion of this sort of thing is more paralyzing than helpful. I might change my mind in the resulting discussion, but, I prefer that discussion to be public. I think 80,000 hours should remove OpenAI from its job board, and similar EA job placement services should do the same. (I personally believe 80k shouldn't advertise Anthropic jobs either, but I think the case for that is somewhat less clear) I ...
Jul 04, 2024•13 min
This is a linkpost for https://www.bhauth.com/blog/biology/cancer%20vaccines.html cancer neoantigens For cells to become cancerous, they must have mutations that cause uncontrolled replication and mutations that prevent that uncontrolled replication from causing apoptosis. Because cancer requires several mutations, it often begins with damage to mutation-preventing mechanisms. As such, cancers often have many mutations not required for their growth, which often cause changes to structure of some...
Jul 02, 2024•12 min
I Imagine an alternate version of the Effective Altruism movement, whose early influences came from socialist intellectual communities such as the Fabian Society, as opposed to the rationalist diaspora. Let's name this hypothetical movement the Effective Samaritans. Like the EA movement of today, they believe in doing as much good as possible, whatever this means. They began by evaluating existing charities, reading every RCT to find the very best ways of helping. But many effective samaritans w...
Jul 02, 2024•13 min
About a year ago I decided to try using one of those apps where you tie your goals to some kind of financial penalty. The specific one I tried is Forfeit, which I liked the look of because it's relatively simple, you set single tasks which you have to verify you have completed with a photo. I’m generally pretty sceptical of productivity systems, tools for thought, mindset shifts, life hacks and so on. But this one I have found to be really shockingly effective, it has been about the biggest posi...
Jul 02, 2024•30 min
An NII machine in Nogales, AZ. (Image source)There's bound to be a lot of discussion of the Biden-Trump presidential debates last night, but I want to skip all the political prognostication and talk about the real issue: fentanyl-detecting machines. Joe Biden says: And I wanted to make sure we use the machinery that can detect fentanyl, these big machines that roll over everything that comes across the border, and it costs a lot of money. That was part of this deal we put together, this bipartis...
Jul 01, 2024•14 min
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.[Thanks to Aryan Bhatt, Ansh Radhakrishnan, Adam Kaufman, Vivek Hebbar, Hanna Gabor, Justis Mills, Aaron Scher, Max Nadeau, Ryan Greenblatt, Peter Barnett, Fabien Roger, and various people at a presentation of these arguments for comments. These ideas aren’t very original to me; many of the examples of threat models are from other people.] In this post, I want to introduce the concept of a “rogue deployment” an...
Jul 01, 2024•15 min
(Cross-posted from my website. Audio version here, or search for "Joe Carlsmith Audio" on your podcast app.) This is the final essay in a series that I'm calling "Otherness andcontrol in the age of AGI." I'm hoping that the individual essays can beread fairly well on their own, butsee here fora brief summary of the series as a whole. There's also a PDF of the whole series here. Warning: spoilers for Angels in America; and moderate spoilers forHarry Potter and the Methods of Rationality.) "I come...
Jul 01, 2024•1 hr 4 min
ARC's current research focus can be thought of as trying to combine mechanistic interpretability and formal verification. If we had a deep understanding of what was going on inside a neural network, we would hope to be able to use that understanding to verify that the network was not going to behave dangerously in unforeseen situations. ARC is attempting to perform this kind of verification, but using a mathematical kind of "explanation" instead of one written in natural language. To help elucid...
Jun 27, 2024•17 min
Summary Summary . LLMs may be fundamentally incapable of fully general reasoning, and if so, short timelines are less plausible. Longer summary There is ML research suggesting that LLMs fail badly on attempts at general reasoning, such as planning problems, scheduling, and attempts to solve novel visual puzzles. This post provides a brief introduction to that research, and asks: Whether this limitation is illusory or actually exists. If it exists, whether it will be solved by scaling or is a pro...
Jun 25, 2024•13 min
Summary: Superposition-based interpretations of neural network activation spaces are incomplete. The specific locations of feature vectors contain crucial structural information beyond superposition, as seen in circular arrangements of day-of-the-week features and in the rich structures. We don’t currently have good concepts for talking about this structure in feature geometry, but it is likely very important for model computation. An eventual understanding of feature geometry might look like a ...
Jun 25, 2024•18 min
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a link post.TL;DR: We published a new paper on out-of-context reasoning in LLMs. We show that LLMs can infer latent information from training data and use this information for downstream tasks, without any in-context learning or CoT. For instance, we finetune GPT-3.5 on pairs (x,f(x)) for some unknown function f. We find that the LLM can (a) define f in Python, (b) invert f, (c) compose f with other fun...
Jun 23, 2024•18 min
This is a link post.I have canceled my OpenAI subscription in protest over OpenAI's lack ofethics. In particular, I object to: threats to confiscate departing employees' equity unless thoseemployees signed a life-long non-disparagement contract Sam Altman's pattern of lying about important topics I'm trying to hold AI companies to higher standards than I use fortypical companies, due to the risk that AI companies will exert unusualpower. A boycott of OpenAI subscriptions seems unlikely to gain e...
Jun 21, 2024•3 min
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a link post.New Anthropic model organisms research paper led by Carson Denison from the Alignment Stress-Testing Team demonstrating that large language models can generalize zero-shot from simple reward-hacks (sycophancy) to more complex reward tampering (subterfuge). Our results suggest that accidentally incentivizing simple reward-hacks such as sycophancy can have dramatic and very difficult to revers...
Jun 20, 2024•16 min
After living in a suburb for most of my life, when I moved to a major U.S. city the first thing I noticed was the feces. At first I assumed it was dog poop, but my naivety didn’t last long. One day I saw a homeless man waddling towards me at a fast speed while holding his ass cheeks. He turned into an alley and took a shit. As I passed him, there was a moment where our eyes met. He sheepishly averted his gaze. The next day I walked to the same place. There are a number of businesses on both side...
Jun 18, 2024•7 min
ARC-AGI post Getting 50% (SoTA) on ARC-AGI with GPT-4o I recently got to 50%[1] accuracy on the public test set for ARC-AGI by having GPT-4o generate a huge number of Python implementations of the transformation rule (around 8,000 per problem) and then selecting among these implementations based on correctness of the Python programs on the examples (if this is confusing, go here)[2]. I use a variety of additional approaches and tweaks which overall substantially improve the performance of my met...
Jun 18, 2024•35 min
Have you heard this before? In clinical trials, medicines have to be compared to a placebo to separate the effect of the medicine from the psychological effect of taking the drug. The patient's belief in the power of the medicine has a strong effect on its own. In fact, for some drugs such as antidepressants, the psychological effect of taking a pill is larger than the effect of the drug. It may even be worth it to give a patient an ineffective medicine just to benefit from the placebo effect. T...
Jun 15, 2024•15 min
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.As an AI researcher who wants to do technical work that helps humanity, there is a strong drive to find a research area that is definitely helpful somehow, so that you don’t have to worry about how your work will be applied, and thus you don’t have to worry about things like corporate ethics or geopolitics to make sure your work benefits humanity. Unfortunately, no such field exists. In particular, technical AI...
Jun 14, 2024•9 min
Preamble: Delta vs Crux This section is redundant if you already read My AI Model Delta Compared To Yudkowsky. I don’t natively think in terms of cruxes. But there's a similar concept which is more natural for me, which I’ll call a delta. Imagine that you and I each model the world (or some part of it) as implementing some program. Very oversimplified example: if I learn that e.g. it's cloudy today, that means the “weather” variable in my program at a particular time[1] takes on the value “cloud...
Jun 13, 2024•7 min
Preamble: Delta vs Crux I don’t natively think in terms of cruxes. But there's a similar concept which is more natural for me, which I’ll call a delta. Imagine that you and I each model the world (or some part of it) as implementing some program. Very oversimplified example: if I learn that e.g. it's cloudy today, that means the “weather” variable in my program at a particular time[1] takes on the value “cloudy”. Now, suppose your program and my program are exactly the same, except that somewher...
Jun 10, 2024•7 min
(Cross-posted from Twitter.) My take on Leopold Aschenbrenner's new report: I think Leopold gets it right on a bunch of important counts. Three that I especially care about: Full AGI and ASI soon. (I think his arguments for this have a lot of holes, but he gets the basic point that superintelligence looks 5 or 15 years off rather than 50+.) This technology is an overwhelmingly huge deal, and if we play our cards wrong we're all dead. Current developers are indeed fundamentally unserious about th...
Jun 07, 2024•5 min
Last month I posted about humming as a cheap and convenient way to flood your nose with nitric oxide (NO), a known antiviral. Alas, the economists were right, and the benefits were much smaller than I estimated. The post contained one obvious error and one complication. Both were caught by Thomas Kwa, for which he has my gratitude. When he initially pointed out the error I awarded him a $50 bounty; now that the implications are confirmed I’ve upped that to $250. In two weeks an additional $750 w...
Jun 07, 2024•5 min
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.We are pleased to announce ILIAD — a 5-day conference bringing together 100+ researchers to build strong scientific foundations for AI alignment. ***Apply to attend by June 30!*** When: Aug 28 - Sep 3, 2024 Where: @Lighthaven (Berkeley, US) What: A mix of topic-specific tracks, and unconference style programming, 100+ attendees. Topics will include Singular Learning Theory, Agent Foundations, Causal Incentives,...
Jun 06, 2024•4 min
Since at least 2017, OpenAI has asked departing employees to sign offboarding agreements which legally bind them to permanently—that is, for the rest of their lives—refrain from criticizing OpenAI, or from otherwise taking any actions which might damage its finances or reputation.[1] If they refused to sign, OpenAI threatened to take back (or make unsellable) all of their already-vested equity—a huge portion of their overall compensation, which often amounted to millions of dollars. Given this i...
May 31, 2024•5 min
As we explained in our MIRI 2024 Mission and Strategy update, MIRI has pivoted to prioritize policy, communications, and technical governance research over technical alignment research. This follow-up post goes into detail about our communications strategy. The Objective: Shut it Down[1] Our objective is to convince major powers to shut down the development of frontier AI systems worldwide before it is too late. We believe that nothing less than this will prevent future misaligned smarter-than-h...
May 30, 2024•14 min
Previously: OpenAI: Exodus (contains links at top to earlier episodes), Do Not Mess With Scarlett Johansson We have learned more since last week. It's worse than we knew. How much worse? In which ways? With what exceptions? That's what this post is about. The Story So Far For years, employees who left OpenAI consistently had their vested equity explicitly threatened with confiscation and the lack of ability to sell it, and were given short timelines to sign documents or else. Those documents con...
May 28, 2024•1 hr 6 min
Contact: patreon.com/lwcurated or [perrin dot j dot walker plus lesswrong fnord gmail]. All Solenoid's narration work found here.
May 28, 2024•1 min
Crossposted from AI Lab Watch. Subscribe on Substack. Introduction. Anthropic has an unconventional governance mechanism: an independent "Long-Term Benefit Trust" elects some of its board. Anthropic sometimes emphasizes that the Trust is an experiment, but mostly points to it to argue that Anthropic will be able to promote safety and benefit-sharing over profit.[1] But the Trust's details have not been published and some information Anthropic has shared is concerning. In particular, Anthropic's ...
May 28, 2024•5 min