macrostate: Nonlinear Function
Created: April 12, 2022
Modified: April 13, 2022

macrostate

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

A macrostate in statistical mechanics is a collection of base-level states; equivalently, a subset of phase space. It's what you see when you blur your eyes so that you can no longer distinguish between some set of states.

A representation or abstraction of a system is a partition of its microstates into macrostates.

What makes a good partition? A few observations:

  1. All else equal, a coarse partition is easier to reason about than a fine partition: otherwise we wouldn't bother with macrostates in the first place!
  2. We are generally constrained by our ability to observe distinctions between states. For example, I can observe other peoples' facial expressions but not the content of their thoughts, so it's natural for me to use macrostates like 'smiling person' and 'frowning person'. Before the discovery of radioactivity it was possible to partition objects into 'glowing' or 'not glowing', but not into the categories of 'radioactive' and 'not radioactive', which more neatly carve the world at the joints.
  3. A representation in which the induced dynamics are predictable and Markovian is typically more useful than one in which they're unpredictable.At least from the point of view of an agent trying to reason about the future. In other contexts we may prefer dynamics that are maximally unpredictable, such as in the design of pseudorandom number generators.

Why not then use the coarsest of all possible partitions: the trivial partition that collapses all states into a single macrostate? It requires no special observations and has the easiest dynamics of all to predict. By our criteria so far it seems pretty much perfect.

In fact we do use this representation exceedingly often, for the vast majority of (sub)systems which we have no need to model further. Presumably the Andromeda galaxy has many microstates, some aspects of which are measured and distinguished by astronomers, but I ignore these distinctions in my everyday cognition because they're totally irrelevant to anything I care about.

Induced dynamics

Any dynamics on the microstates will induce a pushforward dynamics on macrostates.

are not always Markovian, but 'should' be

The induced dynamics on macrostates will not in general be Markovian just because the base dynamics are. For example, let our system's microstates be the positive integers, with the 'constantly increasing' dynamics xt+1=xt+1x_{t+1} = x_t + 1. But suppose we blur our eyes so that we observe only yt=xt/2y_t = \lfloor x_t / 2 \rfloor, i.e., our observations collapse together each adjacent even/odd pair. Then given the observation yt=1y_t = 1, we can only say that xtx_t is equally likely to be 22 or 33, thus xt+1x_{t+1} is equally likely to be 33 or 44, and thus that yt+1y_{t+1} is equally likely to be 11 or 22. But given observations of both yt1y_{t-1} and yty_t we can always predict yt+1y_{t+1} exactly, since together these fully disambiguate xtx_t and thus xt+1x_{t+1}.

A very relevant paper What Is a Macrostate?
Subjective Observations and Objective Dynamics
(Shalizi and Moore, 2000) argues that:

  1. We generally begin with some set of macro-level observables, which induce a partition on the microstates, but do not necessarily have Markov dynamics.
  2. We can always construct a Markov process on equivalence classes of histories of these observables, where two histories are equivalent if they imply the same conditional distribution on future observations. Call these equivalence classes 'causal states'. The partition of the system into causal states is unique.
  3. If this partition is not a refinement of the original partition, it means that some of our observations are of unpredictable or causally irrelevant variables, and can be dropped.
  4. The causal state partition then constitutes a refinement of our observation partition, and in particular an 'optimal' refinement since any further refinements would by definition not be useful in predicting our future observations. And it is unique.

This formula doesn't tell us what partition to begin with---we have to 'seed' the process with the observables we care about---but from that point it argues that we should refine our observations (or look at longer histories) until the macro-level dynamics are Markov.

are 'more stochastic' than the base?

Q: If the base-level dynamics are deterministic, are the induced macro-level dynamics also necessarily deterministic?

A: In the idealized framework of the Shalizi & Moore paper, which assumes stationarity, a deterministic process on microstates would always induce a deterministic process on causal states.

To see this: suppose we have identified the causal state of the system at time tt; that is, we have narrowed down the microstate to a partition cell that tells us as much as possible about future observations. Now if there is any time in the future where some of the states in our current partition cell would (deterministically) lead us to observe A and others to observe B, then this would allow us to retroactively refine our current partition in a predictively useful way. But this contradicts the assumption that we'd observed enough history to identify the 'causal state'.

However, in real situations we may be very far from identifying the true causal state, due to a limited observation history or computational capacity. In this case the macro-level dynamics will have 'excess stochasticity' resulting from our epistemic uncertainty.

More thoughts (TODO organize)

Settings to consider:

  • dynamic: we have dynamics p(xt+1xt)p(x_{t+1}|x_t) defined on microstates
  • static: we have a distribution p(x)p(x) on microstates. this could arise, e.g., as the stationary distribution of some dynamics.
  • interactive: we have controllable dynamics p(xt+1xt,at)p(x_{t+1} | x_t, a_t) where ata_t is an action we take that influences the future state of the system.

Objectives to consider:

  • unsupervised representation learning: we want a representation of limited size that is maximally informative about the underlying state.
  • self-supervised learning: we want a representation that is maximally informative about future states or observables.
  • supervised learning: we want a representation that is maximally predictive of one or more target quantities, at either the current or future time.
  • model-based reinforcement learning: we want the representation that maximizes cumulative reward obtained by a planning algorithm.
  • unsupervised RL / intrinsic motivation: we want a representation that distinguishes the results of different actions while being otherwise as coarse-grained as possible (note that the trivial representation fails here since it sees all actions as no-ops)