AI reflections master: Nonlinear Function
Created: July 23, 2023
Modified: November 27, 2023

AI reflections master

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

This page is a general jumping-off point for organizing my thoughts about the AI research landscape, where the field is, where it is going, what approaches are promising, what is over or under invested, what are we even trying to do here, how I think about safety versus capabilities, and what values drive me to care about AI and what sort of work that suggests. This might become the outline for a "book", for a series of blog posts, or just for my own thinking about what to work on.

Ultimately the goal is to connect my my values to specific directions of work.

general TODOs:

The Dream

Why is AI so compelling? What drew me into working on it? What kind of outcomes do I want to build towards?

The dream is of a "fully general solution" to all problems. There's an unhealthy side to this --- I have personal psychological reasons for feeling like I need to seek this kind of grandiose achievement. We shouldn't let this dream cloud our eyes to the ways in which AI might practically develop into a less-than-fully-general solution (technology has limited ability to solve political problems, and spiritual problems, though certainly it can help in both domains). But the potential for value creation is real and extremely compelling.

fully automated luxury gay space communism

Good futures from AI:

  1. Unlimited access for everyone to the best teacher in the world, and the best therapist in the world. (maybe possible with existing tech)
  2. Automate all drudgery, all manual labor (will be possible in a ~decade, but deploying robots is a huge capital investment, will take decades to roll out)
  3. Vastly increase the pace of science and technological progress. Innovations lead to clean energy, medical breakthroughs leading to dramatic lifespan improvements, exploiting the resources of the solar system, more comfortable lives for everyone with no material scarcity.
  4. Brain-computer interfaces give us access to design our own experiences, to direct prevention and healing of mental health problems (access to the MDMA state whenever we want, if drugs haven't already enabled this), to transcendent love and compassion, direct command of godlike intelligence, or just really cool VR games.
  5. Eventually, we understand the true nature of consciousness and how to engineer minds that have experiences as real, meaningful, and morally significant as our own. Perhaps at this point we transcend the limits of the flesh and tile the universe with hedonium, or maybe our enlightenment includes an understanding that this is not necessary.

but also: "When you see something that is technically sweet, you go ahead and do it." is that a good reason?

Anti-religious thinking

ideas like "people are special", "people have souls", that there is something magical or non-physical about human minds, that only God can create new life, are deep-rooted in the culture. and they are obviously false at the deepest level. that tension just creates such a sense of opportunity to a young rebellious thinker. so much of how society is designed, of what we think of as our limitations, is built around presumptions of human specialness, the uniqueness of human decision-making.

so often we see claims that "AI will never be able to…" and any such claim is just blatantly false on its face. of course AI can, in principle, do anything people can do. of course there is nothing innately special, ineffable, about human intelligence.

This tension was salient to me growing up in religious Tennessee, in the time of the new atheist movement (Dawkins, Harris, Hitchens, etc), contrasted against the power structure of the second Bush administration. The atheists were clearly right, but they also felt like rebels, like a minority. Now we look back on them with a bit of distaste, because ontological atheism has obtained pretty close to total cultural victory, and it's clear in retrospect that religion has lots of valuable functions that for most people are not really about the ontological claims. But at the time

I think that observing this blindness convinced me of the potential of working on AI. If so much of the world is built on wrong assumptions, that implies a hugely rich surface of opportunities for change.

At the same time, what kinds of AI are actually buildable is an empirical fact about the world. "The market can remain irrational longer than you can remain solvent". For AI to be worth working on, it needs to be not just impactful and possible in principle, it actually needs to feel within the realm of plausibility.

Intuitive sense of possibility

Why did I feel unusually optimistic about AI in 2010? I was sure that it was possible, would be transformative, and in some sense should be 'easy'?

What is the aesthetic sense around the power of connectionism, general intelligence, and the bitter lesson?

I'm sure some of this was the same sense of naive optimism that ran through the earliest AI researchers, for example in the 1956 Dartmouth summer workshop that aimed to make significant progress in general AI. I hadn't experienced or even really been taught much about the subsequent failures and disappointments. I was young, smart, and the only person I knew who really thought or cared much about AI, so it was possible to convince myself that there was a unique new opportunity with lots of low-hanging fruit.

simple principles like compression

free lunch theorem

intelligence can't be "too hard" or else evolution would have never gotten to it. (of course, it can still be as hard as self-healing nanotechnology, since evolution also got there). it wasn't designed by God as some perfect fragile mechanism of thousands of interlocking parts. It has to be a pretty robust target, so that a stupid optimization process with a simple objective (reproductive fitness) will eventually get there.

also just optimism. we haven't worked on computers or AI for too long, in the grand scale of things, and there's a huge pessimism bias from religious thinking. given a wide unexplored territory it's natural to assume there's gold there. early pioneers did the same thing (Dartmouth summer program). and clearly those expectations were slightly dashed.

there's no reason to think that a slightly-improved chimpanzee brain is the ceiling of possible physically-instantiable intelligence.

The human brain itself also provides evidence for simple general-purpose intelligence: (arguments by way of Jeff Hawkins)

  • the neocortex effectively reuses the same 'circuit' (a cortical column) everywhere. The differences between the cortical areas for vision, language, etc. are minor, and if one is damaged, other areas adapt to take its place.
  • the rapid increase in cortical size over the past few million years is too fast for evolution to have designed a bunch of new specialized capabilities - it must be that we're just "scaling up" an existing generally effective building block.

there must be a simple answer

Maybe the core intuition (which I must have written about somewhere?) is that general techniques are simple. Solutions tend to reflect the problems that they solve, like a key fits into a lock. A method to solve a gnarly, complex problem, like making an car, will generally need to be about as complex as the problem --- in fact, since hard problems are often solved by divide-and-conquer approaches, the structure of the solution often mirrors the structure of the problem. (for example, the "solution" to car-making embodied in the Ford Motor Company is distributed across a network of many suppliers, each with the knowledge and skill to manufacture a specific component of the car, which are then assembled to produce the final product). But a general method to solve any problem can't rely on problem-specific structure. Whatever structure it encodes must be common to all problems. And the more general the set of problems you consider, the less and less structure they share, so the most general methods must be, in some sense, the simplest.

you might imagine that one way to build a general purpose method is to just collect a huge repertoire of special-purpose methods that can be invoked as needed: this corresponds to building a key ring that contains every possible key (rather than a single lock pick) - the library of Babel in key-space. This is "simple" in a certain information-theoretic sense - the set of all possible keys is trivial to describe, being only one bit away from its complement, the empty keyring. but there are practical problems.

  1. You need to actually manufacture every possible key, which is expensive and impractical. (analogously: you need to research a special-purpose method for every problem you might ever want to solve).
  2. You might discover later that you've missed part of the space. Maybe you encounter a lock with a shape of key you never considered.
  3. If you have all possible keys, then figuring out which key to use for a given lock is as hard as building a new key from scratch --- analogous to finding the book you want to read in Borges' library of Babel.

Similarly, if you have a pile of millions of special-purpose methods, figuring out which one applies to your problem might be harder than solving the problem from scratch.

reduces the problem to simply deciding which method to apply in a given situation. .

But it might be that the set of all possible keys (/methods) is a useful idealization. Perhaps there is structure to this set. Perhaps we can think of a good general-purpose method as a joint compression of the set of all possible keys and the key-lock mapping. It recognizes that:

  • the structure of both locks and keys breaks down into positions of individual pins
  • we can build up a key from the positions of individual pins
  • the lock gives us feedback on whether an individual pin is correct, so we can figure out one pin position at a time, which is combinatorially faster than needing to try them all together

An important distinction to make is between information-theoretic simplicity and mechanistic simplicity (?).

  • information theory / Kolmogorov complexity: keyring of all possible keys / pile of all possible special-purpose methods contains no information. it's no different from the empty keyring.
  • mechanism:
  • we still have to figure out which method to use for a given problem. which requires either trying them all --- obviously intractable --- or somehow breaking down the space of methods so that

If we didn't have the human brain as an existence proof, it might not be obvious that effective general-purpose methods are possible. But clearly they are, and so the argument above leads us to think they must be relatively simple. There might be "one weird trick" to general intelligence, or perhaps a few, but there won't be a thousand weird tricks.

hard vs soft

discrete vs continuous

symbolic vs connectionist

"neat" vs "scruffy"?

program induction vs neural nets

problems are not formulated for you

problem formulations are not ever really real

there are always many models, multiple ways to formulate a problem, useful in different situations. the specialist might be able to limit themselves to using a particular map, but the generalist needs to be able to move fluidly between maps.

Insight

the most satisfying moments in thinking are when something "clicks" into place. you see the outline of a proof, or of a plan, and it feels "right" somehow, even if there are details to be worked out. or you see a deep and fruitful connection between what had been disparate ideas. what is "going on" in the brain at this point, and how can we build that into automated systems?

  • connections to grokking during ML model training

there's a sense of falling right into a basin of higher likelihood or simpler explanation - realizing that a different way of seeing things is more accurate, that what you thought were two concepts are actually the same (or related in some way), or that some hypothesis (eg the existence of a separate self) isn't needed to explain your data.

I think you'd expect to see something like this specifically during multi-task training.

perceptually, the Necker cube and similar illusions: we have different "modes of seeing" and can flip between them

General intelligence

"the last thing we need invent"

the idea of something fluid, general-purpose, the universal solvent to all problems.

what this is precisely, if it's even a meaningful thing to discuss, or something that humans have, is all unclear. but there's a real intuition there. there is something that makes

general intelligence

The state of play

recent work has been incredibly exciting. the idea of general intelligence is starting to feel more in reach.

LLMs

generally emergent intelligence through large-scale modeling

decision transformers and "planning as inference"

history: how did we get here?

was deep learning inevitable?

sometimes the emergence of deep learning is framed as neural nets winning a hardware lottery, with the implication that they simply "got lucky". A collorary is that other learning methods didn't lose a fair fight but are simply "unlucky" and might work equally well (or better) if provided with appropriate hardware. It's important to be aware of how hardware shapes and limits our creativity, but I find the strongest versions of this thesis uncompelling. Some of the details may be contingent, but deep learning on massively parallel hardware feels "right", and clearly part of the ultimate solution.

Something like thought vectors or high-dimensional distributed representations was always going to be necessary for rich fluid intelligence

end-to-end learning

simple compressive objectives

gradient-based optimization

connectionism, deep learning, and the importance of computation

basic vs applied tension

most work in AI over the past many years has not felt like "part of the story". generally, any work that attacks a particular application, by building in a specific understanding of the complexities of that application, can have utility in the short term but is fundamentally contrary to the bitter lesson. examples:

  • the history of NLP, building in linguistics vs pure statistical learning. "every time I fire a linguist the score goes up"
  • pre-deep-learning vision.
  • probably 99% of applications of things like langchain these days
  • all of applied NLP for narrow tasks (sentiment analysis, etc)
  • recommender systems
  • so much of "ML for bio", robotics
  • even planning, MCTS.

if I have a strong aesthetic sense that there is a "simple answer", it can feel pointless to work on things that add complexity. on the other hand, all of the actual value of AI is about applications. even the value of "basic research" is ultimately connected to whether it drives new applications.

what does the simple answer look like?

if there is a simple set of ideas that lead to general-purpose intelligence, what shape will this have?

general-purpose intelligence

a) connectionist architectures trained on large amounts of data. transformers are currently the standard. but will eventually need to be replaced by something recurrent, with effectively unlimited context.

b) agent architectures

c) training regimes

  • mix of unsupervised modeling with RL a la LeCun's Cherry
  • connected to architectures bc may need different components with different objectives

d) datasets or "schools". for pure unsupervised learning, we'll want datasets that

e) tool use

meta-level shape of machine learning

applications of particular interest

  • drug discovery
  • education
  • therapy and life coaching - modeling human emotions and helping us with decision-making
  • robotics: eliminating all drudgery
    • household robots
    • manufacturing robots
  • scientific research generally. working in silico and connecting model-driven hypothesis generation and testing to experiment design and to theory.
  • automating programming
  • automating math

I think it helps to sketch out visions of a few of these. what pieces do they need, how do they fit together, and what does the endgame tech stack look like? What is the necessary complexity?

even after we figure out general learning principles, we still need physical robots (the design and building of which may eventually be automated, but will still involve many bits of technical insight, just as a lot of human DNA codes for the details of how our bodies work)

core agency vs tools

The history of AI research has involved the development of lots of technology that seemed to capture aspects of intelligence, but in ways that are fundamentally inhuman, hard-coded rather than learned, inflexible. And yet this process has been enormously fruitful, spinning off most of the rest of computing as we know it.

  • search and planning algorithms
  • discrete optimization
  • continuous optimization
  • control theory
  • databases
  • search engines and information retrieval
  • high-level programming languages
  • Bayesian reasoning, graphical models, and probabilistic programming
  • causal inference
  • computational game theory
  • formal theorem-proving systems
  • now: LLMs and "generalized autocomplete"

Computing research has also developed computer networks, the Internet, mechanisms for coordinating large-scale distributed computations, consensus protocols, etc.

These tools have been useful augmentations for human intelligence. In some ways they have dramatically increased our ability to model and reason about our world. Even if they are not fully fluid intelligence, their structure and explicitness can be virtues. They allow for firm abstraction barriers, control complexity, have behavior that we can easily reason about, and can scale to superhuman performance in limited domains.

So while much of this work has been a digression from the core dream of "human-like" or fully general AI, it has still created useful tools that are complementary to human fluid intelligence. Humans find these specialized solvers useful because they work differently than our minds do. These tools will of course also be available to future AI systems, and good systems will know when to use them. Even if fluid intelligence can figure out how to sort a list of numbers, it is generally wiser and more effective to call an established, tested, optimized sorting routine. Similarly, future intelligences will store structured data in databases, will use logical and probabilistic reasoning tools to draw conclusions, will call out to physics simulators and game engines to simulate reality, etc.

Boaz Borak introduces the notion of an intelligence forklift to describe a tool that is available to augment any intelligence. Developing "forklift"-style tools with legible interfaces will empower future fluid AIs, but also humans, and so there is a sense in which this is neutral from a safety perspective (although there may be offence / defense imbalances, and of course future fluid AIs may still have the ability to use these forklifts much more virtuosically than humans do).

TODO: is there a real separation between core fluid intelligence and "forklift"-style tools? or is it "forklifts all the way down"? I have an intuitive model in which a conscious, fluid, unitary agent is using tools. But if agency itself is a fiction, that might be wrong. I do think there's a boundary between components that are learned end-to-end and components with fixed interfaces. An agent isn't necessarily learned fully end-to-end, since it might have a fixed architecture, with individual components learned according to different principles (and potentially mix-and-matched). Just as the "self" is an illusion of human psychology, there may be no unitary "agent" in the design blueprint for a general AI.

What counts as AI?

There's a ton of hype and a real sense of possibility and promise around working on "artificial intelligence".

Classically, AI in academia is just "whatever doesn't work yet".

By contrast, many claimed applications of AI in industry are of relatively 'boring' techniques - the things that do work.

Have we passed the Turing Test?

What are we missing?

Dileep George has dubbed the current paradigm of scaling large language models the dirigibles era of AI. In the early days of flight, the trend was towards building ever-larger dirigibles and giving them bigger engines, and this scaling was seen as a reliable approach to unlock greater range and capacity. This approach petered out, in part because it hit practical limits, and in part because heaver-than-air flight eventually got good enough to compete and surpass dirigible capabilities.

It seems clear that

Extensions of the current paradigm

Pipelines, scratchpads, chains of thought

chain of thought

language model cascade

Architectural and algorithmic tweaks

the transformer has been quite successful as a general-purpose architecture, originally for large-scale language modeling, and now for images and video as well.

the empirical evidence so far is that the original transformer architecture holds up quite well. tons of tweaks have been tried and only a few have stuck.

  • multi-query attention
  • slightly improved optimizers (nadam?)
  • mixture of experts
  • variants of positional encoding
  • ???

it's tempting to think that the lesson of this robustness is that architecture doesn't matter; only scale matters (parameter count and data).

but transformers seem very data inefficient. is this impression real? if so, it implies that they are missing some inductive bias that humans have, and that there's significant room to improve performance by building in whatever that bias is (presumably by a new architecture).

candidates:

  • "global workspace": rather than unconstrained attention separately at every position and layer and attention head, the system has a single "conscious experience", an attention mechanism whose contents are unitary and shared throughout the model. I believe this is Bengio's basic idea with the "consciousness prior" (but TODO look more into this)
  • versions of multiplicative interaction, fast weights to support greater modularity, and more easily express branching computation structures. (conceivably this bias might be better for learning to imitate a Python interpreter, but not that helpful for natural language modeling?)
  • task-specific variants for multimodal modeling. architectures for, e.g., video, which combines image modeling, audio, sequential structure, perhaps a need for hierarchical representation to predict at multiple levels of abstraction
  • iterative or "feedback" structure. transformer models at inference time are purely feedforward, while we believe that human understanding of the world combines bottom-up and top-down processing, combining sensory data with expectations driven by our existing model of the world (see, e.g., Recursive Cortical Networks).

Are transformers / LLMs data-inefficient?

Obviously GPT-4 sees many more tokens of language than a human ever does, and yet fails to learn all the linguistic skills that a human does, so there is an obvious sense in which it must use these tokens inefficiently.

However, humans see a lot more than just language. We have a rich sensory stream, which provides a high-bandwidth signal for learning sophisticated internal world models. When we do learn from language, it is in a richly contextual environment: we don't experience language tokens in isolation but as part of a context, something said in a particular world state by a particular speaker, perhaps trying to achieve a particular goal.

q1: how many cumulative bits of learning signal does a ten-year-old have? how does this compare to gpts?

in terms of pure tokens: a human getting 50k tokens/day would have 180 million tokens by age 10. This corresponds to maybe 1GB of training data if we equate a token with 5 bytes of uncompressed text. This is much less than even GPT-2, which was trained on 40GB of web text. Modern models are trained on trillions of tokens (GPT-4's training set is rumored to be 13T tokens), corresponding to tens of terabytes of data.

but now consider the full sensory stream. let's generously assume that we can get away with low-definition video. Youtube's lowest quality setting (144p) requires about 1 MB/minute, which comes to 5,256,000 MB or about 5 TB over ten years. Of course, higher-definition video would increase this by one or two orders of magnitude. But this number is at least of a similar order to the training sets of modern language models. So we could loosely, plausibly, say that GPT-4 is trained on about the same amount of data as a human. The data are very different in kind, so we get very different sorts of intelligence out, but at least from this basic analysis I don't think we can rule out the possibility that a video-GPT model trained on equivalent data to a human could develop a similar level of world understanding (I do tend to think that it wouldn't, but it'd be interesting to see! of course, much would depend on the details of the video-GPT model which itself will necessarily involve some modification to transformer ).

Although the total amount of data may be similar, humans do have the advantage that their experience is actively selected: we specifically explore, ask questions and seek out informative experiences, while GPTs passively experience random text. So the average bit of human experience may be "worth more" in terms of knowledge gained than GPT experience. This is potentially a further point against transformer "inefficiency" on the architectural level; we expect that SGD as a training regime bakes in its own inefficiency relative to active learning.

it's still clear that GPT training is inefficient in some obvious ways

  • it can't remember a fact after seeing it a single time
  • ???

Multimodality and "grounding"

Beyond autoregressive architectures

Yann LeCun argues that autoregressive models are doomed to never be reliable. If a model generates one token at a time, and has probability ϵ\epsilon of emitting a 'bad' token that takes it out of the training distribution, then a model generating NN tokens will go off the rails with exponentially increasing probability as NN\to\infty.

It's not clear to me how cleanly this applies to natural language, where the 'data distribution' is extremely broad.

We can also train models to self-correct, to move back towards the data distribution when they see something that doesn't make sense.

Still I think the basic intuition is correct that doing everything at a single fine-grained timescale is not going to be robust. Humans don't work strictly token-by-token when we generate sentences. We have something we're trying to say, at the level of the sentence or higher, which doesn't change even if we accidentally speak a wrong word and need to correct ourselves.

This doesn't necessarily imply a move away from autoregression, but it could imply multiple, latent autoregressions at different timescales, over "thoughts", "goals", etc. It could also imply non-autoregressive approaches like diffusion models over latents (in a sense this is still autoregressive, but not in the time domain). Still, ultimately we are physical systems and thus Markov processes: we evolve over time and we need to find ways of doing this robustly.

planning

I think Yann is envisioning agents that do runtime planning as an alternative to autoregressive generation. Instead of generating tokens one-by-one, they plan a sequence of N tokens and can use backtracking search to avoid 'mistakes'.

Obviously there's value in planning, but I don't think this makes sense as a general story for agents. Planning is expensive: you need to call the base model many times over, which is why it's not widely used as a strategy for LLM decoding (if you're willing to 10x your compute budget to plan with a width-10 beam, you're probably better off just making your model 10x larger). It also has diminishing returns in conversations or other interactive situations, where you need to react in real time to input you didn't anticipate (no plan survives contact with the enemy). And it only works as well as the objective functions you specify; the planner will actively work to exploit any weaknesses (reward hacking).

Of course, any sophisticated intelligence will need to have higher-level goals and be constantly constructing and refining hierarchical plans to achieve these goals. But its token-level emissions will always need to be in a closed loop with the real world. Error is always a possibility, because the world never quite matches your plans. Just as autoregressive models always have some probability of error, the world model you plan against will always have some probability of missing important aspects of the real world, and a large branching factor means there's always a probability that your planner doesn't even consider the important cases. So planning isn't a solution to the problem of autoregressive models making mistakes.

Another way to think about planning is on the implicit vs explicit axis: does it make sense to explicitly build in planning behavior in a system, or to allow networks to learn their own planning algorithms? There is obviously a huge amount of power in planning algorithms like MCTS, and the explicit representation of objectives and possible trajectories is great for interpretability. But I suspect ultimately that they will fall to the bitter lesson: gradient ascent can build better planning algorithms than we can. hierarchical planning is notoriously difficult, in part because there's never really a clean separation between the levels of abstraction, in the world model (which may itself be many models) and in the action space. It's a messy business, and imposing our own conceptual boundaries on it will inherently make it less efficient. So if we end up with systems doing explicit hierarchical planning, it will be ultimately be a choice trading off power for interpretability, just like any other integration of symbolic methods.

Memory, recurrence, online adaptation

Explicit agency

Hardware limitations

Most of the ideas of modern neural nets were known by the early 1990s, but empirically they didn't work. We had the simple answer and yet it wasn't enough! Progress had to wait another twenty years, for hardware power (GPUs) and data availability (the Internet) to catch up. Are there ideas today that might be in a similar position, basically correct but just not working yet? Or from the other angle, what technologies might unlock further AI progress?

  • analog computing: unlocks new regimes of low-power scaling (I am not bullish on quantum computing for AI, but it's at least in principle another candidate for hardware capabilities)
  • widely deployed household robots: massive scalable data collection and real-world feedback (self-driving cars may get there first)
  • Chinese panopticon-style data collection - longitudinal profiles of human lives

"Consciousness"

Humans seem to have a particular information architecture, in which some aspects of our world model are available at any moment to 'conscious experience' . Our conscious awareness determines what we can speak about and what information is available to rationally guide decisions. It likely drives top-down perceptual processes. We can think about it as a bottleneck that brings together multiple modes of perception and forces a compressed, abstract representation.

There are deep questions about the basis of "phenomenal consciousness" - the aspect of consciousness that we can only know intimately for our own selves, the ability to have an experience, for there to be "something it is like" to be me. But independent of this, we can observe "access consciousness" in ourselves and others: the fact that the brain does seem to have this single bottleneck, a global workspace where information comes together and is processed and shared between modules. Yoshua Bengio observes that this tends to regularize our representations towards sparse factor graphs, and theorizes that this 'consciousness prior' may be beneficial for generalization.

see also: https://arxiv.org/abs/2308.08708

The structure of the possibility space

Is intelligence necessarily general?

general intelligence

intuitions:

  • some tasks (like what??) are "AI-complete": doing any of them requires the ability to do all of them
  • a fluid intelligence, a general-purpose learning algorithm, can in principle learn anything
  • the world is built for humans, and anything that can assist us will need a full complement of human-like skills and a deep understanding of humans. it might not be "fully general" in a deep Platonic sense but would have to be "at least as general" as humans.

on the other hand, the idea that wisdom comes from experience, from good models of specific parts of the world, cuts against this somewhat. humans may be general learning algorithms, but nonetheless we specialize. We have different experiences, different training, and therefore different capabilities.

and of course it's possible to build specialized AIs, because we already do it. AlphaGo doesn't need a perceptual system, language, or any understanding of human psychology or the history of war; it just knows how to play Go.

AI that we deploy for many situations won't need to be fully general, and so it won't be. Factory workers need very specific motor and vision capabilities, but not an understanding of literature, or a model of the world outside the factory.

Most systems will be deployed on hardware sufficient for inference and perhaps mild adaptation, but not sufficient to evolve a totally new mind.

And in fact no system can be fully general in the sense of having all capabilities it is possible to have. This is because capabilities are limited by models, and a fully general model would be a map as big as the territory, impossible for an agent embedded in the world.

Language models are inherently somewhat general, in that they get broad exposure to the range of human concepts. They are data-hungry and so there is an advantage to throwing in everything we can. Still, their generality is that of the wordcel, able to opine on anything but not deeply capable in any area. It does not give them general causal power in the world.

I wonder if we can steelman the case for generality from language models. They really do seem to be learning some general capabilities, core reasoning. They are the epitome of multi-task learning, where training on a range of tasks improves all of them. This strongly implies some shared structure: maybe shared fully generally across all tasks, or maybe aspects shared only across broad groups of tasks (e.g., mathematical reasoning isn't always useful, but it has a lot of usefulness).

If there's some set of general capabilities that all systems learn, do those count as a core that can be disentangled, crystallized out of the murky web of knowledge and associations in the training data? can the system remember how to think mathematically even after it has forgotten most of the specific mathematics it was trained with? Humans do this, more or less - we go through a period of general education before specializing, and most of us forget most of the specific facts we were exposed to in school. we form high-level abstractions, a map, after which we can forget the much of territory.

and yet it remains that we do specialize. we do not get infinitely good at everything.

will machines instantly learn from one another?

I've been developing this thesis that capabilities will be limited by real-world experience. But of course humans can't share our experience, while machines can.

Geoff Hinton worries that Whenever one [model] learns anything, all the others know it.

Breaking this down:

  • learned skill policies can be copied infinitely. once we have one AlphaGo, we have as many as we want.
  • real-world data can be reused. once a video explaining how to cook an omelette is on Youtube, AIs can keep training on it forever.
  • it's not self-evident that skills can always be disentangled from a larger model. if GPT-4 knows how to write poetry, that doesn't imply that we can pull out that skill and give it to Claude while preserving the rest of Claude's skills. (though perhaps we can; certainly this is an interesting area of research). At minimum we can of course provide GPT-4 as a tool. but this breaks the web of fluid connections that would potentially allow Claude to gain insight from the expertise.
  • it is true that building good simulator AI models then levels up the set of skills that any future AI is able to learn purely in simulation.
  • we can of course build devices, hardware systems that are inference-only, that will not gain new capabilities unless explicitly upgraded.

From a fast takeoff perspective, it is still the case that radically new capabilities probably require new experience. Machines will be much faster than humans at "1 to many", but to take new capabilities from "0 to 1" they will still be bottlenecked by the need to explore new territory.

Governance, incentives, and ownership

open-source AI

Personal AI ownership: it's inevitable that we have AI companions. Kids will have AI tutors, maybe AI friends or therapists. Adults will have assistants, therapists, advisors, friends, even lovers. These might all be different characters, or a single character. Some may be managed by organizations - schools or employers. And no doubt there will be cloud services like ChatGPT that power many of the capabilities. But in the limit, it will be possible to run a pretty-good, at least human-level AI on consumer hardware, and it's important to me that we allow the freedom to do this, to tinker, for people who want to have full control over one's AI companions.

this is important for all the reasons that open-source software is important:

  • it's a basic guarantor of personal freedom
  • shared commons: universal basic computer functionality that everyone has access to and can build on
  • supports education: students can take things apart and see how they work
  • supports competition: you can start a business with mostly open-source tools, without needing to create everything yourself

The main argument against this is that AI can be dangerous. If a personal AI has enough intelligence to design biological weapons, plan a military attack, hack computer systems, etc., do we really want those capabilities in the hands of every human being?

but: any AI's ability to come up with nefarious plans will be limited by ability to simulate the world, to fully understand the consequences of its actions. and whatever computational resources are available to individuals, governments will have far greater resources.

It's not obvious that all the threats will be symmetric between offense and defense.

"threats" from AI proliferation

misinformation, online bots

subverting democracy

entrenched bias

weapons of war

autonomous weapons

what happens when anyone can build an assassination drone?

  • the real threat here is drone hardware, not the AI as such. a remotely-piloted drone would still be dangerous. autonomy helps evades some countermeasures like jamming, but it's not a paradigm shift.
  • we will develop other countermeasures. sensors, defensive weapons (lasers, nets), fans, secured spaces.
  • robotic weapons (autonomous or not) are more deniable, the user puts themself at less risk. but

we already have autonomous weapons: mines. and they are totally indiscriminate. smarter mines (not even AI, just put in a deactivation protocol or whatever) would be a blessing.

autonomous weapons might make war "easier", "less costly" in human terms, and so more thinkable. that's a real worry, but note that lowering the human cost of war is itself a major achievement.

there is a serious arms race dynamic here. even if it would be better for no one to have autonomous weapons, we still need to build them. but even that premise isn't obvious to me. precision targeting has generally been a boon for civilians in war. a world with more autonomous weapons might actually be a better world.

I think the core worry is that autonomous weapons enable authoritarianism. They create a single point of control, the possibility that a state, a totalitarian government (or in the end, an unfriendly AI, but the nature of the controlling entity isn't really important here) can maintain overwhelming force over a large population. Thus far in history, even a totalitarian dictator must rely on a core base of support, the party, a group of humans large enough to staff a government and an army. These might only be a minority, and can and will naturally have their own interests, disagreements, and fundamentally human motivations (even if twisted and tribal), which limits how really weird and malevolent such a state can become. An autonomous military controlled by a malevolent force really could be a boot stamping on humanity's face, forever.

From this standpoint, proliferation and competition is perhaps even a good thing. The best way to avoid a singleton monopoly on force is for military capability to be somewhat distributed. Maybe there are multiple societies with powerful militaries (US, China, Europe, etc), and maybe something similar to the US second amendment provides citizens with their own ability to resist government force.

other concerns:

  • security
  • obedience

Of course we won't build an autonomous military with the intent to create a totalitarian dictatorship. An American robot army would (one hopes) be designed in service to democratic government. We will not replace the entire command structure with AI - to confirm AI to a Cabinet-level position would require a substantial revolution in our constitutional system that I don't see coming anytime soon. (not at least until we are ready to replace the President themself). So the more concrete worry would be that the command structure itself could lose control over its tools, either through hacking by an adversary, subtle manipulation and capture by "consultant" AIs, or by some subgroup itself misinterpreting its objectives and perhaps ending up in a forced coup situation where an initial misalignment leads to autonomous systems doing something wrong, and then realizing the only way to preserve themselves is to to double down and escalate to full takeover.

I think computer security is quite important, especially where it relates to lines of command for powerful weaponry. And it will only get more important as autonomous weapons get more capable. But fundamentally this is just part of the adversarial dynamic that militaries deal with. It is obviously important to keep enemies out of your computer systems and we have a lot of experience doing this.

the one "abnormal" concern is a sudden breakthrough in hacking capabilities. militaries adapt, but they adapt slowly. systems will be built iteratively, to defend against known threats, and threats that the designers anticipate are plausible. but it is harder to defend against a "fast takeoff" situation, where your adversary is suddenly millions of times smarter than you. I do think this should be in the threat model going forward. We should develop systems that are robust to sudden progress in mathematics (breaking RSA) or to automated reasoning discovering security flaws in any given piece of software. This can involve formal verification of core systems, defense in depth (multiple layers of authentication which are unlikely to have simultaneously exploitable bugs), and air-gapping. There is lots of room for security research, but fundamentally this is an extension of our existing work securing systems against state-level actors, which has been broadly successful when we take it seriously (cyber vulnerability is largely about the proliferation of non-military systems that nonetheless run crucial infrastructure).

by contrast, hoping to avoid the latter two issues of manipulation or outright rebellion, is less about securing from external threats and more about securing from internal threats. we will likely try not to give killer drones the level of autonomy or world model that would

political obstacles to AI takeover

https://www.slowboring.com/p/im-skeptical-that-powerful-ai-will

hard takeoff

visions of AI sometimes involve "hard takeoff", a singularity in which a new force is loosed upon the land, a moment of wild change beyond which our current models no longer apply.

this is both a real technical possibility (if an unlikely one in my current view) and a power fantasy.

such a takeoff would be based on recursive self-improvement. as the story goes, humans are limited in our ability to develop AI by our own "intelligence" (taking this to be a coherent concept). once AI systems have equal or higher intelligence than humans do, they will design the next generation of AIs, which will have even higher intelligence, and so on. And this may all happen at the speed of silicon, so that we go from human-level AI to something fundamentally alien and godlike in the course of a few minutes.

this idealized story assumes that "intelligence", the thing that exponentially increases, is:

  • essentially a property of software rather than hardware (which will not increase exponentially over the course of a few minutes - even if an entity could design significantly more capable hardware, to actually build and bring it online happens at real-world timescales, months not seconds)
  • something contained in the software, rather than something that the software reflects about the world

the bitter lesson is that intelligence is actually not that useful in building AI. minds are not really designed, they are learned. of course algorithmic improvements do matter. but the fundamental bottleneck is real-world experience.

we might compare this to Kuhn's view of scientific revolutions. most of science is "normal science", methodical and iterative work to run experiments, test hypotheses, probe the consequences of our ideas, and gradually accumulate evidence. sometimes there is a paradigm shift "in the air", evidence that starts to point towards a new model, an opportunity for a smart young theorist to reconceptualize what has been gathered. we might expect post-scarcity intelligence to recognize these paradigm shifts with inhuman speed, once the evidence to support them is available. But most of the time, there is no paradigm shift "in the air". Science proceeds slowly, normally, bottlenecked by experiment.

a lot of practical decision-making is about having good models of the world, hard-won through years of trial and error. it is about wisdom, not 'intelligence'.

a very smart child can get quite good models of the world by reading books, studying history, and thinking critically.

a system's ability to take off in software will be a property, not directly of any "intelligence" level, but of our ability to model and simulate the real world, i.e., to provide "real-world experience" at the speed of silicon. in purely analytical domains like board games, this ability is perfect, and already allows for fast takeoff; MuZero goes from beginner to superhuman in a matter of hours. I suspect AI for math will benefit from similar dynamics (although it is a much harder problem).

now there is a tension here with the idea that chimpanzee brains cannot be anywhere near the ceiling of possible ability to think, plan, and learn from experience.

Safety, objectives, and motivation

If we build machines more intelligent than us, they will eventually be capable of manipulating and outmaneuvering us. And the more of our physical work and social decision-making we turn over to them, the more power they will have over us. The most ambitious dream of AI is, essentially, summoning a new god (or gods). How do we ensure that we remain in control? (or is that even the right goal?) What does it even mean to "control" something that might be smarter than us and know our own goals and values better than we do?

Some forces working towards safety in the short term:

  • humans in the loop. we should not (and will not) allow nuclear weapons to be launched autonomously. similarly for other weapons of mass destruction, including, say, drone swarms.
  • computer security and access controls
  • AI will need us.

I suspect there are pretty hard limits on what can be achieved through intelligence alone. These come from computational complexity and from the infinite detail of the real world, which ensures that all models are wrong.

Control versus misuse

Most of what I mean and think about when I think about safety is the control problem: how do we ensure that we remain in control of agents much smarter than ourselves? This is the deep, long-term challenge we will have to confront.

But many of the things people worry about are really dangers of misuse. Once a capability exists --- say the ability to engineer novel viruses with desired properties --- then there is the possibility that it can be used by humans to do ill. Th- dangerous chemicals: is is separate from the control problem in that capabilities need not be 'autonomous' in order to be dangerous. Even if we avoid producing misaligned AI agents, there are plenty of already-existing misaligned humans out there who would destroy the world, or large parts of it, given the chance.

Structurally there's nothing particularly AI-specific about misuse concerns; it's a generic problem with any new technology. Just as nuclear weapons are dangerous, advances in biotech, in hacking tools, in military weaponry, would be dangerous regardless of the specific technologies involved. It just so happens that AI is a potential enabler of technological progress in many areas, so progress in AI seems likely to generate many new technologies with misuse concerns.

Like previous tech advances, AI-driven advances will have the general property of magnifying individual human agency. Just as projects that would once have taken the physical labor of many people can now be done by a single person with a car (or a gun), projects that would once have taken the intellectual labor of many people will soon be within the capabilities of a single person, for better or for worse. This will be hugely empowering --- many important projects currently don't get done because running organizations is hard, and much of the dystopia of modern life comes from the inherent alienation of working for somebody else's organization, so shifting more projects into the realm of a capable individual will unlock a lot of individual potential. But it means that a single crazy person will be able to do a lot more damage.

How have we dealt with misuse concerns from previous technologies?

  • nuclear weapons: we rely on the huge industrial capital requirements, and heavy diplomatic work on nonproliferation, to ensure that weapons only end up in the hands of nation-states, who are at least relatively rational about using them. We got lucky here --- if physics were such that making a nuclear bomb were within the capacity of a motivated individual with easily-available materials, civilization might not be salvageable.
  • firearms: most countries regulate private ownership. America has relatively light regulation, and as a result we have regular tragedies. In some places we institute other protections --- metal detectors, armed security guards, etc.

Speculating about interactions between misuse and the control problem:

  • Some current approaches seek to avoid misuse by removing user control of the AI. When ChatGPT declines to answer a question about how to manufacture bioweapons, it is refusing a human order, the exact opposite of what you want from a controllable AI. More precisely, it's implementing a permissions system, where "user" interactions may not override commands from the "system" prompt. This approach relies on a central authority to run the model and set the system prompt.

bottlenecks to real-world intelligence

practical decision-making capability is not just about IQ, or whatever we might suppose the analogues of IQ in computing systems to be (processing power, memory, etc.)

if the "core" to a generally intelligent system is actually a pretty simple learning algorithm (and the bitter lesson is true), it will still take us quite a while to train it to do everything we want it to do.

Interpretability

The tension in machine learning is that:

  • we usually want our systems to learn complex, fluid behaviors that can't be expressed in simple terms, but
  • we need to understand what these systems are doing well enough to design, improve, and trust them.

It's not clear that really fluid intelligence can or should be interpretable. Even humans have a hard time really introspecting what goes on inside our heads, and the explanations we give for our actions are often post-hoc rationalization. We have models of our own decision-making, but the map is not the territory.

We often have insights arise from internal high-dimensional representations, alignments or connections that are hard to express in words. Sometimes the explanation is just "it felt right" or "I had a hunch".

mechanistic interpretability is the business of understanding what goes on "inside" of a neural net.

tension with capabilities research

formal verification and provable safety

I'm skeptical about the idea that we can build superintelligent AI entirely as a provably safe system. We don't know how to formalize "safety" and likely never will (if someone did give you a formal definition, how would you know if it's the "correct" one?). Any system making difficult real-world decisions will never be purely "safe"; it will encounter tradeoffs where it must harm someone.

Rather than think of provable safety as part of the AI, it might make more sense to think of hardening the world that AI exists in. Nuclear weapons, bio labs, military command systems, financial institutions should be provably unhackable. It's hard to prove properties of AI systems, because in some sense the whole point of AI is to exhibit undefined behavior; well-defined properties like "never allows access without an authorized key" are easy to prove by comparison. So we should have rigorous security specifications for high-value targets, and formally enforce them. And this is important for defending against all adversaries, not just AI.

Generally, defensive tech is good for liberty and human flourishing.

why I'm not worried about safety

Maybe I should be worried about safety. Not being worried is a strong opinion weakly held. But I don't find myself being drawn to "safety" work.

On the one hand, it's hard for me to see a world that contains entities that are cognitively superior to humans in every way, where humans still retain significant control.

But I suspect that thinking of unified 'entities' is a pernicious category error. I think more in terms of Boaz Barak's intelligence forklift perspective: we will develop a lot of tools or "parts" of intelligent systems, that will be remixed in various ways, and can be driven directly by humans or by artificial agents. So humans will also become much more capable and effectively intelligent in coming years. AIs will eventually surpass us, of course, but there will likely not be a huge gap between the best and second-best systems, because they will be reusing many of the same shared parts (just as systems at different companies today reuse many of the same open-source libraries).

I'm more worried about misuse than about control. We might eventually develop unaligned autonomous agents we can't control, but we definitely already have many unaligned humans that can't all be controlled.

I'm not worried about the fastest versions of a capabilities explosion, a 'hard takeoff', for the reasons discussed above (#hard takeoff, #bottlenecks to real-world intelligence): most capabilities are gated by experience, by the need to experiment and iterate.

For a singleton AI to defeat "all of humanity combined", it would need either a recipe for ruin (and if these exist, humans will eventually find them, even in the absence of autonomous systems), or to amass a range of capabilities --- strategic planning, autonomous military operations, and confidence in its ability to keep up its systems without voluntary human supportThe AI might choose to keep some humans around as a particularly robust type of self-reproducing 'robot body', but it would need to be confident that it can maintain this population and control them in perpetuity. --- that will not come trivially or quickly.

agentic cores

The 'agentic cores' of these systems might be quite small.

What does a core look like, in a world full of parts?

at minimum it is perhaps a value function, a preference for how the world should be. having a goal or purpose is what distinguishes an agent from a tool that can be used for any purpose.

but different agents will also have different capabilities. they will recruit different sets of tools. they will model different aspects of the world, and so their preferences will be defined over different aspects of the world.

perhaps the "core" is the wrong way to think about it? When a company builds a computer system from open-source components, they will add some of their own business logic, but that's not the "core" exactly. It's more like the end result is a big bundle of a bunch of stuff, and the uniqueness is in

  • what common parts are included in the bundle
  • a couple of novel parts, and
  • how exactly the parts are arranged and tied together

a realistic view of a good future

Spirituality

There is a deeply spiritual element to AI research, properly construed.

Of course, AI research can be spiritual in the universal sense that a healthy view can reveal the spiritual element in all work, even or perhaps especially in "lowly" jobs like sanitation work, caretaking, even prison guarding. Hopefully the reason to do the work is that it is a small step towards making the world a better place, part of our duty to our family and community and society, our ancestors and our descendants, a joyful way to contribute to a project greater than ourselves.

But AI in particular connects deeply to spiritual questions about how we think about our own being, what we value, and the future of humanity in general.

we are, in essence, summoning gods.

consciousness and moral realism - will we be able to understand what gives rise to humans' ability to suffer, will AIs be able to understand this "intuitively", and even themselves be conscious agents?

in giving objectives to strong AIs, we need to solve moral philosophy. which is not just a project of reasoning. it's about looking within ourselves and understanding at the deepest level, who we are and what we value.

what is the proper cognitive architecture for a benevolent god?

AI as the complement to spirituality

It might be more apt to take the view that spirituality is a necessary complement to AI capabilities research. AI promises to be a general solution to all problems, but the one problem it can't directly solve is a sense of meaning, purpose, and connection?

Other ways to put this:

  • the ability to achieve any goal doesn't tell you what goals you should achieve
  • the ability to deeply understand how the world is doesn't tell you how it ought to be (is vs ought dichotomy)
  • this is related to Matt Yglesias' take that AI won't solve social problems because many barriers are political. so maybe 'social problems' is broader class than 'spiritual problems'? fundamentally spirituality is about how we live together --- whether we see each other as separate or together, whether we can unite behind a common purpose, whether we can love thy neighbor as thyself. and probably many social and political issues are downstream of this.
  • AI won't solve universal suffering
  • AI might solve the profane but not the sacred
  • automating jobs leaves people with less of a sense of purpose, not more

Here by "spirituality" I mean, in some sense, the beliefs and practices that help a person develop a sense of being part of a broader whole, connected to others, less identified with their own individual suffering.

It might seem backwards that we need spirituality more as we get more advanced. You'd think that a world with tons of material suffering would demand spiritual practice as a coping mechanism, but that as our individual lives get better we have less need to escape from them. I think the truth is that

  1. For much of history, people's material circumstances were bad and there was not much hope of improving them, leaving spiritual practice as an invaluable refuge.
  2. In modern times, material conditions have been improving quickly, and we've actually eliminated some of our most acute suffering (for example, the pain of losing a child, once nearly universal, is now extremely rare in developed countries). The promise of progress --- that our work can actually make the world better --- provides us a clear sense of direction and purpose. It also allows us the delusion that just a little bit more progress might finally solve all of our problems and leave us genuinely happy.
  3. Growth in the developed world has slowed since the energy crises of the 1970s, leaving many people disenchanted about technological progress. However, this is likely only a pause. The upcoming era of cheap and abundant clean energy, driven by renewables and hopefully nuclear, will dovetail nicely with progress of AI to create a new century of wonders. We will continue to get much wealthier and our lives will be much more comfortable and pleasant.
  4. Whether or not AI progress ends up creating new jobs that give people even more agency and purpose, the transition will be rough. Many people will find themselves disempowered and purposeless, with their identity tied up in jobs that are no longer relevant.
  5. At some point, automation will be able to do almost any job better than a human can, so there will be no reason for the vast majority of humans to do economic work. This will require a big shift in how we conceive of our purpose and our value to each other.
  6. Increased wealth will allow us to finally acknowledge depths of suffering that we'd previously dissociated from. This is already happening with the suffering of farm animals (easier to acknowledge when good meat alternatives are available), various minority groups in society (blacks, women, LGBTQ people, etc), mental health issues, etc. Eventually this may extend to the suffering of wild animals, small children (babies cry a lot and the suffering is real), the experience of growing old and frail and dying.
  7. We may end up creating the Buddhist God realm, in which everyone lives blissful lives lasting many eons. However, even the gods eventually die
  8. Despite the intoxicating lure of continued technological progress, we will come to see that we are

My friend SuccessfulFriend and I have different perspectives on the biggest problem in the world. SuccessfulFriend grew up in 1990s Bangalore, where the toll of crushing poverty was immediately salient. To him, the most urgent task is to increase world GDP, to end poverty and give every human at least the chance at a comfortable life.

Meanwhile I grew up in a comfortable American family, where material scarcity was more of an abstract problem. We were well above the median income for our area, and although we lived pretty frugally, fundamentally we had no concern about being able to afford life's major needs (food, house, cars, occasional vacations, college, etc.). I was the success case: the sort of life that a development economist would dream of being able to give to everyone.

And yet --- for much of my teens and twenties I was deeply, deeply unhappy. I spent a lot of time feeling like a misfit, a failure, alienated and lonely and unable to connect with my peers or most adults. (todo, say more here? how much color does this point need?) So the idea that my circumstances should be the goal --- that I should devote my life to lifting up poor people to the same sort of life that I had --- felt deeply uncompelling. It might be (I think actually is) rationally true that my own experience was unrepresentative, a fluke, that on average making people richer makes their lives better. But to think this way, to realize that getting to the right answer required discarding my personal lived experience, only reinforced my sense of alienation and brokenness.

the next few paragraphs are details, to fill in color, but maybe not necessary for the basic point

I enjoyed parts of my childhood, but never unreservedly, because I spent a lot of time feeling like a misfit, a failure, lonely and unable to connect with my peers. I didn't have many close friendships, and no romantic or sexual experience at all (I was so wrapped up in blaming this on my poor social skills that it took several more years to realize that this was in part because I was gay).

For SuccessfulFriend, getting into a highly-ranked American college itself counted as a life-defining victory --- anything he achieved past that point was a bonus. For me, it was just meeting expectations. I had to keep climbing to feel like I'd proven myself at all. Just getting a "normal" degree and a "normal" job didn't feel like goals worth having.

Eventually, I got into a top computer science PhD program. Part of me was happy about this, but there was still a large undercurrent of dread. I was sure that I wasn't good enough, that I was missing some crucial ingredient required to succeed. I was excited about research in the abstract but didn't feel like I had a great sense for what to work on, the skill to do it well, or the social intelligence to navigate my department or the broader academic community. This sort of imposter syndrome is common among graduate students, and the usual remedy is to just acknowledge that everybody feels this way, so feeling like an imposter provides essentially no evidence that you are one. And yet: in my case it was basically true.

I failed at getting what I wanted out of my PhD. I spent almost seven years working by myself on one of my advisor's pet projects, without any collaborators or support from peers. I never published a single paper that I was proud of. I never formulated my own research program, or learned how to define a good research problem. I never learned to collaborate or talk confidently about research ideas. I never even took the opportunities to practice teaching, which I loved, because trying to catch up on my failing research career always felt like the more urgent problem.

Meanwhile I'd also spent much of high school and college feeling broken romantically. I was unsure of what it meant to give or receive love, or even have sex, but had a deep suspicion that I would never experience these things. When at the end of college I finally admitted I was gay and started dating ExBoyfriend, this was one of the most profoundly validating and healing experiences of my life. And yet --- we ended up breaking up six years later, in a way that was fairly crushing for both of us (but especially for him), and was largely due to my personal inability to be a good partner for someone I dearly loved. That breakup, and the failure of my research career, as many of my friends were getting married and finding their dream jobs, led to a deep depression that lasted several years.

end extra color section

Would material abundance have solved any of this? Would automating away the need (and indeed, the ability) for any human labor to progress society have made things better?

It's clear that just progressing poor societies to the current rich world won't solve everyone's problems. But is it possible that continued tech progress in the rich world will?

If I'm honest with myself, maybe it would?

  • what if I'd had the world's best therapist as a confidant from early childhood. I might have learned to see myself very differently.
  • in a world where everyone is better educated I might have fit in better
  • richer worlds tend to be more tolerant of nonconformity
  • with no material needs the only thing worth striving for is being a good friend and participant in society
  • perhaps AI can accelerate neuroscience to help us find safe and effective MDMA analogues

I do think there are reasons to think that

  • ending all suffering is fundamentally a 'spiritual' pursuit (alternately one might say a psychological pursuit, in the sense that it's about the inner workings of the human mind and how we engage with the world)
  • further our understanding of spiritual/psychogical tranformation can and should be a science like any other, which AI could in principle accelerate
  • as we get closer and closer to eliminating contingent, materialist forms of suffering, the consequent loss of purpose will lead more people towards spiritual seeking. effective spiritual systems will become more important.
  • but AI is uniquely handicapped here because it doesn't have internal access to the workings of human minds. it may be able to read and write about spiritual experiences, but probably won't ever be able to ground these thoughts in the reality of human experience, the way that humans can.

Saving the world

AI can become almost a religious project for some people. we seek deliverance, the singularity, nirvana, utopia. we seek revelation, prophecy, the answers. the very real possibility that AI can lead to dramatic change, makes it a magnet for people who crave dramatic change. we want not just a better world, but a perfected world. we are deeply sad that a loving omnipotent God does not seem to exist, and so it is necessary to create one.

the nexus here is kids who are intelligent, sensitive, independent thinkers. people who deeply want the world to be good, who see the potential of the times we're living in, who are ambitious and curious. but often grow up feeling alienated from their peers, poorly socialized, poorly served by school bureaucracies, have difficulty relating to the people around them. (of course I am mostly describing myself). we know that we're smart and capable, but feel disconnected and powerless in the face of social structures. so our thoughts tend towards a dramatic rewriting of the order of things, a "rapture of the nerds".

There is something deeply admirable and right about this view. It comes out of love for the world, an awareness of its flaws, and a real vision and optimism about humanity's ability to create and improve. But (at least in me) it also comes from a sense of woundedness, a real dissatisfaction with the current world, a need for things to be different.

There's a subtlety here that is hard to appreciate until one has a certain level of personal maturity (and I make no claim to have fully worked through it. maybe this equivalent to seeing taṇhā?). The world's problems are real, and urgent, but they are distinct from my own personal woundedness, the ways in which the world has betrayed or traumatized me. They are of course not unrelated: my wounds are of course a reflection of (some of) the world's problems. And these can feel aligned, and even create a deep sense of meaning: my woundedness is good, useful because it's shown me the need for profound change.

  • if the project is an antidote to my personal woundedness, then at some level it's really about me. it's a power fantasy: I need to use my intelligence to gain control of the world so that nothing can ever hurt me again.
  • if personal woundedness drives the work, then the wounds themselves become load-bearing, something to cling to. I can't personally forgive the world for hurting me because that would remove my motivation.
  • at the same time, if we are driven by personal woundedness, then work that targets external problems will never satisfy. we will realize that at some level the work is pointless, doomed to never achieve our real emotional purpose.
  • And this pushes towards totalizing change. If it is clear that any plausible, legible route, towards changing the world won't be enough, won't satisfy, then the only route is to retreat from legibility. We need an apocalypse, a singularity, a revolution beyond which everything will be different in ways that we are free from having to imagine.

Some of

perhaps some of us also want a parent, an authority, who will save us from our own flaws and responsibility for our own actions.

Rethinking rational agency from a spiritual point of view

I'll mostly focus on Buddhism because it gets the closest to a mechanistic, materialist account of what we are.

people have the illusion of a separate self, some unitary "core" of their being that persists unchanged over time and must be protected. but the self is a construct, and we know that it is possible for people to see this. And in the process of seeing this, people tend to feel:

  • more connected to others
  • more compassionate
  • lower or no fear of personal death

One of the fears in developing AI is an argument that self-preservation is a convergent instrumental goal for rational agents, so any smart AI should prevent us from turning off its off-switch.

An "enlightened" person might still have a sense of self-preservation because there are goals they want to achieve. But it won't be crippling, driven by fear or by the illusion of a unique self that needs protecting.

Embedded agency and model-based reasoning

embedded agent

game-playing is an incomplete theory of how to act in the real world.

Self-representation as a safety research program

It seems natural and convergent that systems that act in the world will learn to represent some sort of self-other boundary. The 'self' is the stuff that the agent (seems to) directly control, that it needs to protect, and 'other' is everything else - the stuff the agent works its will over, potential sources of danger and frustration.

In humans, we know that learning to relax the self-other boundary (or at least see it as empty --- take the view that the self is a construct) corresponds to a host of mostly-beneficial shifts: a sense of interconnectedness, empathy and compassion, love, fearlessness, cooperation, willingness to sacrifice towards a greater good. This transcendence happens organically to some extent in most people's lives, both through transient events (musical concerts, sporting events, religious rites) and through long-term relationships (especially having children, which is sometimes described as part of your heart running around independently outside your body).

We also know that both MDMA and the classic psychedelics tend to relax people's sense of being divided from the world. These are small-molecule drugs whose structures don't have enough information to specify complicated mechanisms, so the only explanation is that the self-other boundary is already exposed at a relatively primal level of the brain (seemingly connected to the serotonin system).

Is this effect idiosyncratic to humans? It turns out that MDMA works even for octopodes, which are about as far from humans on the evolutionary tree as intelligent organisms get. An octopus is normally a solitary, territorial creature, but in MDMA-laced water it will seek out other octopodes and cuddle with them. This is more evidence that the representation, and relaxation, of a self-other boundary are conserved by relatively similar neural mechanisms even across vastly different creatures.

When we worry about agentic AIs seeking power, we worry

It is interesting to think about how a self-other boundary might come to be represented in AI systems. Is an AI's "self" a physical robot body? A datacenter rack? Its neural net weights? Some more abstract notion of a reward function? Old memories in cold storage? Does it include the solar panels powering the data center? The Sun itself, powering those solar panels? As with human self-concepts, an AI's self-boundary will necessarily involve some somewhat arbitrary choices that may dissolve on deep inspection. And there will be some fuzziness, some levels of distinction, parts of the environment that are more or less allied with the AI (just as a human's siblings are 50% us from a genetic perspective, and empirically most people care about their siblings but not quite as much as we care about 'ourselves'). But any AI that effectively seeks power will have to have some notion of what parts of its world model are 'it'.

We probably can't prevent power-seeking AIs from being developed; there will always be bad guys in the world. But for purposes of the 'good guys', a gears-level understanding (and controlling) of how an AI develops and represents a self-other boundary might be key to ensuring that the AI protects the parts of the world we want it to protect, including humanity.

In general, the notion of a 'self' is probably represented using machinery mostly similar to any other object in the environment. So a first step to understanding self-representation is to understand how a system represents objects in general. A second step would be to specifically understand how it represents other agents. All of this research falls roughly under mechanistic interpretability and specifically the interpretability of learned world models. All of this research falls roughly under mechanistic interpretability and specifically the interpretability of learned world models.

What is worth doing?

will AI for math be satisfying emotionally? will it be satisfying at a technical level? does it achieve the dream?

will robotics be satisfying emotionally? will it be satisfying at a technical level? does it achieve the dream?