Wasserstein: Nonlinear Function
Created: August 04, 2022
Modified: August 04, 2022

Wasserstein

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

The pp-Wasserstein distance between probability distributions μ(u),ν(v)\mu(u), \nu(v) is defined as

Wp(μ,ν)=infrΓμν(Eu,vr[d(u,v)p])1/pW_p(\mu, \nu) = \inf_{r \in \Gamma_{\mu\nu}} \left(\mathbb{E}_{u, v\sim r} \left[d(u, v)^p\right]\right)^{1/p}

where the infimum is over all joint distributions r(u,v)r(u, v) having marginal distributions μ\mu and ν\nu over the first and second arguments respectively.

Any such joint distribution can be viewed as a 'transport plan'. The conditional distributions r(vu)r(v | u) tell us how any probability mass at a sample uμu\sim \mu should be 'transported' in order to produce a sample vνv \sim \nu, and the expected distance between such pairs is the cost of the plan. So with p=1p=1, the Wasserstein distance represents the cost of the lowest-cost transport plan (the optimal transport) from μ\mu to ν\nu or vice versa.

For univariate probability distributions in Euclidean space, the Wasserstein metric is equivalently the LpL_p metric on their inverse cumulative distribution functions,

Wp(μ,ν)=(01Fμ1(ω)Fν(ω)p)1/pW_p(\mu, \nu) = \left(\int_0^1 |F_\mu^{-1}(\omega) - F_\nu(\omega)|^p \right)^{1/p}