Created: September 14, 2022
Modified: September 14, 2022

superposition

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

A $d$ -dimensional vector can represent $d$ distinct orthogonal features, but due to the weirdness of high-dimensional geometry, it can represent $O(\exp(d))$ almost-orthogonal features, which we define as feature vectors having a dot product of at most $\epsilon$ . This allows for networks to represent a 'superposition' of exponentially many potential features in a single vector.

https://transformer-circuits.pub/2022/toy_model/index.html

https://en.wikipedia.org/wiki/Johnson%E2%80%93Lindenstrauss_lemma

An argument I like is that random vectors in high-dimensional space are orthogonal with high probability. (TODO: this argument is currently broken, need to remember/look up what's missing).

If we treat each dimension as standard normal with variance $1/d$ , then in high dimensions we'll get unit vectors with high probability.
The dot product $\langle x, y\rangle = \sum_{i=1}^d x_i y_i$ of two independent normal vectors is a sum of $d$ terms, each of which has mean zero and variance $1/d^2$ , so the sum itself has mean zero and variance $1/d$ . The probability of the sum being far from zero can be bounded by various concentration inequalities; naively we appeal to the central limit theorem to show that the sum is approximately normally distributed and thus $\mathbb{P}\left(|\langle x, y\rangle| > \epsilon\right) \approx \exp\left(-\frac{d\epsilon^2}{2}\right)$
We want to make this event so unlikely that, given an exponential number of vectors, and considering all $(e^{2d})$ pairs, a union bound still gives a low probability that any of the dot products has magnitude $\ge \epsilon$ . Summing over all $e^{2d}$ events gives $\exp(2d) \cdot \exp\left(-\frac{d\epsilon^2}{2}\right) = \exp\left(\frac{d}{2}\left(4 - \epsilon^2\right)\right)$ but this is not a useful bound since we are likely to want to take $\epsilon$ much smaller than 2. So something in this argument needs to be strengthened.

superposition

Meta