Created: April 12, 2022
Modified: April 15, 2022

entropy

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Measures uncertainty, disorder, or randomness. The (Shannon) entropy of a probability distribution $p(x)$ is:

H[p] = -\sum_{x\in \mathcal{X}}p(x)\log p(x) = \mathbb{E}_p\left[-\log p(x)\right]

The quantity $-\log p(x)$ inside the expectation is sometimes called the surprisal, so entropy can be seen as the expected surprisal.

John von Neumann to Claude Shannon:

'You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.'

Shannon entropy describes distributions

Something that confused me when first learning about entropy is that the entropy used in probability theory is a property of a probability distribution, not of any specific realization or sample from that distribution. That is: among all distributions over, say, image pixels, some are low-entropy (meaning that all samples are basically the same) and some are high-entropy (samples can be wildly different), but any particular sampled image just is; the image has no entropy in and of itself. The trivial image in which every pixel is black may seem more ordered and less 'random' than an image in which the pixels were sampled uniformly at random. But properly speaking this is a statement about sampling processes, not the images themselves.The inherent 'disorder' of the images themselves is better captured by the notion of Kolmogorov complexity: the length of the shortest program that produces each image. Equivalently this is the negative log-probability of each image under the Solomonoff prior, which gives probability $2^{-K}$ to all objects that can be generated by a $K$ -bit program.

Thermodynamic entropy describes (macro)states

By contrast, Cosma Shalizi observes that the 'Boltzmann' entropy of statistical mechanics is a property of a specific macrostate of a physical system. It is defined as

S \propto \log W

where $W$ is the number of microstates consistent with the macrostate. The constant of proportionality is classically taken to be Boltzmann's constant $k_B = 1.380649 \cdot 10^{−23} J\cdot K^{-1}$ , but it is debatably more natural to use the rescaled dimensionless entropy in which the constant is just $1$ . This views the macrostate as implicitly representing a uniform distribution over all microstates consistent with the macro-level observables. The Boltzmann entropy of this macrostate is then just the Shannon entropy of this distribution.

tl;dr: the macrostate is a collection of microstates, and its entropy is the size of that collection, corresponding to its volume in phase space.

Entropy is subjective

Under this view, thermodynamic entropy is a subjective property of physical states: the same system in the same microstate may have different entropies to different observers, based on the distinctions that each observer uses to define macrostates.

Jaynes (in The Gibbs Paradox) has a thought experiment demonstrating the subjectivity of thermodynamic entropy:

Consider a chamber bisected by a partition, in which one half is filled with argon gas and the other half is filled with xenon gas. If the partition is removed, allowing the gases to mix, we all agree that entropy will increase.
On the other hand, suppose the same chamber is filled with argon gas on both sides of the partition. Since the two sides are already the same, removing the partition will make no difference to entropy or to anything else.
Unknown to us, there are actually two types of argon, A1 and A2; these are indistinguishable except that A2 is soluble in Whifnium, a rare element that will not be discovered until the next century. To a modern-day observer, a chamber that contains A1 and A2 separated by a partition is therefore indistinguishable from the previous argon/argon case, but to a future observer, the same setting will be analogous to the argon/xenon case. The partitioned system will appear more 'ordered' and lower-entropy to the latter observer than to the former observer.

This seems paradoxical because entropy is used to define physical quantities such as the Gibbs free energy that describe seemingly objective phenomena, such as the speed and direction of a reaction. But there is no contradiction: anyone who can observe a reaction driven by the mixing of A1 and A2, or who can extract useful work from this process, is by definition in the position of being able to distinguish A1 from A2. On the other hand, it is genuinely impossible for a present-day observer without access to Whifnium to extract work from the system.

Physical law works as expected for both observers, and in both cases describes the same underlying system, but since entropy is defined in terms of macrostates it will appear differently to observers using different abstractions.

Subjective but not Bayesian

(notes on Shalizi 2004, The Backwards Arrow of Time of the Coherently Bayesian Statistical Mechanic)

Shalizi argues that thermodynamic entropy does not correspond to the entropy of a Bayesian posterior on microstates. The argument is very simple: a Bayesian's uncertainty, and thus their subjective entropy, decreases as more information about the system is observed. But this contradicts the law that entropy increases over time!

In this framework, when I start with two gases separated by a partition, and then release the partition, I do not become more uncertain about the microstate. The system proceeds to microstates in which the gases are mixed, but as long as the dynamics are deterministic then the number of reachable microstates is no larger than the number we started with. If we continue to observe macro-level properties of the system over time, these will restrict the set of possible microstates even further! So our Bayesian entropy decreases, even as thermodynamic entropy goes up.

Responses to this:

We can't keep track of the full distribution on microstates? the start distribution is easily representable by a simple rule. but if we can't simulate the system to infinite precision, then our simulations will introduce noise, and ultimately we'll become uncertain whether a state is legal.
- Shalizi proposes this, but points out that the resulting subjective entropy depends on the exact computational restraints we specify, so it's not clear why thermodynamic entropy should match it.
- maybe doing the computation at a sufficient precision, fast enough to keep up with the system, is itself guaranteed to release enough heat to guarantee that entropy increases?
Or the dynamics might just be stochastic.
- Shalizi frames this in terms of reversibility vs irreversibility, and claims that denying reversibility has 'has all the advantages of theft over honest toil' since the whole point of stat mech is to derive irreversible macro behavior from reversible micro behavior.

Shalizi's preferred resolution is to just reject the identification of thermodynamic entropy with Bayesian posterior entropy. We instead can just define thermodynamic entropy to be the log volume of phase space consistent with the current macro state of the system (as we in fact did, above). This is subjective because it depends on our choice of macro-level observables, but there's no concept of keeping track of beliefs over time; the entropy is just defined anew at each timestep.

entropy

Shannon entropy describes distributions

Thermodynamic entropy describes (macro)states

Entropy is subjective

Subjective but not Bayesian

Links to this note

intrinsic motivation

minimum description length

representation

Meta