Created: September 29, 2023
Modified: September 29, 2023

Nielsen's notes on ASI xrisk

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

NameRedacted wrote some nice thoughts about existential risk from artificial superintelligence (ASI): https://michaelnotebook.com/xrisk/index.html

My attempt to summarize some main points.

Talking about p(doom) is a misleading concept because it denies agency.

I don't love the framing of this objection because it proves too much - sure, in some sense we should never talk about probabilities of future events that we might have the capacity to affect, but it's also valid to try to model society's likely behavior, rather than pretending that the intentional stance makes this unknowable.

Still, I don't think p(doom) as a scalar value tells us much of anything. I'm more interested in what kinds of doom we're worried about, and what the plausible paths to avoiding it are.

Practical alignment work is accelerationist, because:
it enables us to develop and deploy capabilities that wouldn't be commercially viable if not for the alignment work.
alignment as a post-processing step means that unaligned models are still created and easy to obtain.
This creates an "alignment dilemma": on the one hand, working on alignment is a concrete and practical avenue towards potentially building safe AI systems. On the other hand, it accelerates the development of unsafe systems. There is not a clearly right way to proceed.

I've long thought the dichotomy between safety and capabilities research is not as well defined as we'd like. But I'm not neutral on whether AI capabilities get developed - I want it to happen, so I prefer the risk of combined capability + alignment research to the dull "safety" of no research at all (if that were even an option, which of course it is not).

There are "persuasion paradoxes" that make it difficult to form a compelling argument for or against ASI existential risk. Namely:
The most direct way to make a strong argument for xrisk is to convincingly describe a detailed concrete pathway to extinction. The more concretely you describe the steps, the better the case for xrisk. But of course, any "progress" in improving such an argument actually creates xrisk.
Any sufficiently strong argument for xrisk will likely alter human actions in ways that avert xrisk. The stronger the argument, paradoxically, the more likely it is to avert xrisk.
By definition, any pathway to xrisk which we can describe in detail doesn't require superhuman intelligence.

I find (3) the most compelling of the group. Nielson develops this into a discussion of recipe for ruin: does there exist a plan to easily destroy the world, regardless of whether we or an ASI can think of it?

I suspect there probably does (nuclear weapons already come close!).

Three barriers to ASI speeding up science and technology:
intellectual barriers, including some that apply generally to restrict the capabilities of any entity even in principle (e.g., P != NP)
resource barriers
experimental barriers
Experimental barriers are particularly interesting because they go deep into the philosophy of science. Experiment is extremely important in areas where we don't already have good predictive models. For example, we don't expect to solve medicine just by thinking about it, because we are constantly learning new things about how the body works. But we did develop nuclear bombs mostly just by thinking about the physics.

Nielsen's notes on ASI xrisk

Links to this note

recipe for ruin

Meta