Created: April 12, 2022
Modified: April 13, 2022

macrostate

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

A macrostate in statistical mechanics is a collection of base-level states; equivalently, a subset of phase space. It's what you see when you blur your eyes so that you can no longer distinguish between some set of states.

A representation or abstraction of a system is a partition of its microstates into macrostates.

What makes a good partition? A few observations:

All else equal, a coarse partition is easier to reason about than a fine partition: otherwise we wouldn't bother with macrostates in the first place!
We are generally constrained by our ability to observe distinctions between states. For example, I can observe other peoples' facial expressions but not the content of their thoughts, so it's natural for me to use macrostates like 'smiling person' and 'frowning person'. Before the discovery of radioactivity it was possible to partition objects into 'glowing' or 'not glowing', but not into the categories of 'radioactive' and 'not radioactive', which more neatly carve the world at the joints.
A representation in which the induced dynamics are predictable and Markovian is typically more useful than one in which they're unpredictable.At least from the point of view of an agent trying to reason about the future. In other contexts we may prefer dynamics that are maximally unpredictable, such as in the design of pseudorandom number generators.

Why not then use the coarsest of all possible partitions: the trivial partition that collapses all states into a single macrostate? It requires no special observations and has the easiest dynamics of all to predict. By our criteria so far it seems pretty much perfect.

In fact we do use this representation exceedingly often, for the vast majority of (sub)systems which we have no need to model further. Presumably the Andromeda galaxy has many microstates, some aspects of which are measured and distinguished by astronomers, but I ignore these distinctions in my everyday cognition because they're totally irrelevant to anything I care about.

Induced dynamics

Any dynamics on the microstates will induce a pushforward dynamics on macrostates.

are not always Markovian, but 'should' be

The induced dynamics on macrostates will not in general be Markovian just because the base dynamics are. For example, let our system's microstates be the positive integers, with the 'constantly increasing' dynamics $x_{t+1} = x_t + 1$ . But suppose we blur our eyes so that we observe only $y_t = \lfloor x_t / 2 \rfloor$ , i.e., our observations collapse together each adjacent even/odd pair. Then given the observation $y_t = 1$ , we can only say that $x_t$ is equally likely to be $2$ or $3$ , thus $x_{t+1}$ is equally likely to be $3$ or $4$ , and thus that $y_{t+1}$ is equally likely to be $1$ or $2$ . But given observations of both $y_{t-1}$ and $y_t$ we can always predict $y_{t+1}$ exactly, since together these fully disambiguate $x_t$ and thus $x_{t+1}$ .

A very relevant paper What Is a Macrostate?
Subjective Observations and Objective Dynamics (Shalizi and Moore, 2000) argues that:

We generally begin with some set of macro-level observables, which induce a partition on the microstates, but do not necessarily have Markov dynamics.
We can always construct a Markov process on equivalence classes of histories of these observables, where two histories are equivalent if they imply the same conditional distribution on future observations. Call these equivalence classes 'causal states'. The partition of the system into causal states is unique.
If this partition is not a refinement of the original partition, it means that some of our observations are of unpredictable or causally irrelevant variables, and can be dropped.
The causal state partition then constitutes a refinement of our observation partition, and in particular an 'optimal' refinement since any further refinements would by definition not be useful in predicting our future observations. And it is unique.

This formula doesn't tell us what partition to begin with---we have to 'seed' the process with the observables we care about---but from that point it argues that we should refine our observations (or look at longer histories) until the macro-level dynamics are Markov.

are 'more stochastic' than the base?

Q: If the base-level dynamics are deterministic, are the induced macro-level dynamics also necessarily deterministic?

A: In the idealized framework of the Shalizi & Moore paper, which assumes stationarity, a deterministic process on microstates would always induce a deterministic process on causal states.

To see this: suppose we have identified the causal state of the system at time $t$ ; that is, we have narrowed down the microstate to a partition cell that tells us as much as possible about future observations. Now if there is any time in the future where some of the states in our current partition cell would (deterministically) lead us to observe A and others to observe B, then this would allow us to retroactively refine our current partition in a predictively useful way. But this contradicts the assumption that we'd observed enough history to identify the 'causal state'.

However, in real situations we may be very far from identifying the true causal state, due to a limited observation history or computational capacity. In this case the macro-level dynamics will have 'excess stochasticity' resulting from our epistemic uncertainty.

More thoughts (TODO organize)

Settings to consider:

dynamic: we have dynamics $p(x_{t+1}|x_t)$ defined on microstates
static: we have a distribution $p(x)$ on microstates. this could arise, e.g., as the stationary distribution of some dynamics.
interactive: we have controllable dynamics $p(x_{t+1} | x_t, a_t)$ where $a_t$ is an action we take that influences the future state of the system.

Objectives to consider:

unsupervised representation learning: we want a representation of limited size that is maximally informative about the underlying state.
self-supervised learning: we want a representation that is maximally informative about future states or observables.
supervised learning: we want a representation that is maximally predictive of one or more target quantities, at either the current or future time.
model-based reinforcement learning: we want the representation that maximizes cumulative reward obtained by a planning algorithm.
unsupervised RL / intrinsic motivation: we want a representation that distinguishes the results of different actions while being otherwise as coarse-grained as possible (note that the trivial representation fails here since it sees all actions as no-ops)

macrostate

Induced dynamics

are not always Markovian, but 'should' be

are 'more stochastic' than the base?

More thoughts (TODO organize)

Links to this note

entropy

representation

Meta