Created: August 04, 2022
Modified: August 04, 2022

Wasserstein

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

The $p$ -Wasserstein distance between probability distributions $\mu(u), \nu(v)$ is defined as

W_p(\mu, \nu) = \inf_{r \in \Gamma_{\mu\nu}} \left(\mathbb{E}_{u, v\sim r} \left[d(u, v)^p\right]\right)^{1/p}

where the infimum is over all joint distributions $r(u, v)$ having marginal distributions $\mu$ and $\nu$ over the first and second arguments respectively.

Any such joint distribution can be viewed as a 'transport plan'. The conditional distributions $r(v | u)$ tell us how any probability mass at a sample $u\sim \mu$ should be 'transported' in order to produce a sample $v \sim \nu$ , and the expected distance between such pairs is the cost of the plan. So with $p=1$ , the Wasserstein distance represents the cost of the lowest-cost transport plan (the optimal transport) from $\mu$ to $\nu$ or vice versa.

For univariate probability distributions in Euclidean space, the Wasserstein metric is equivalently the $L_p$ metric on their inverse cumulative distribution functions,

W_p(\mu, \nu) = \left(\int_0^1 |F_\mu^{-1}(\omega) - F_\nu(\omega)|^p \right)^{1/p}

Wasserstein

Links to this note

distributional RL

Meta