Created: September 07, 2020
Modified: June 25, 2022

convex dual

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

See also:

https://www2.sonycsl.co.jp/person/nielsen/Note-LegendreTransformation.pdf
Jess Riedel on the Legendre transform in physics looks very useful, though I haven't read it yet.

For any function $f(x)$ we define its convex conjugate or Legendre transform or Fenchel dual function to be

f^*(y) = \sup_x \langle y, x\rangle - f(x).

This $f^*$ is always convex because it is the supremum of linear functions of $y$ .

We can view this as the analogue of the Fourier transform in the $(max, +)$ semiring instead of the $(+, x)$ semiring. That is, instead of $\hat{f}(y) = \int f(x) e^{ixy}dx$ , where we:

add a bunch of elements, each consisting of the product of $f(x)$ with $e^{ixy}$ , we
take the max of a bunch of elements, each consisting of the sum of $f(x)$ with $yx$
where $yx$ is multiplication, viewable as repeated addition; analogously, $e^{ixy} = (e^{ix})^y$ is viewable as repeated multiplication. (I don't quite understand if/how the imaginary numbers play into this analogy).

A shared intuition is that we are 'projecting' a function onto a set of components; the transform tells us how much of each component there is. In the Fourier transform, the components are sinusoids, and the transform tells us how much content there is at each frequency $y$ . In the convex conjugate, the components are slopes; the transform tells us how much of the original function is at each slope $y$ .

For example, the line $f(x) = 5x$ has conjugate $f^*(y) = \sup_x \langle y, x\rangle - f(x)$ . At $y=5$ this is zero; for all other $y$ it is infinity.

As Wikipedia says, "This definition can be interpreted as an encoding of the convex hull of the function's epigraph in terms of its supporting hyperplanes". TODO: is there a convex conjugate equivalent of the fast Fourier transform?

Note that $f^*$ is defined over the space of linear functionals $y$ operating on vectors $x$ . For a given linear functional $y$ , it tells us the maximum amount by which that functional exceeds $f(x)$ (equivalently, the amount by which it must be shifted down in order to never exceed $f$ , i.e., to be a supporting hyperplane). If $f$ is strictly convex, then it must be at least slightly superlinear, so no fixed linear functional can exceed $f$ by an arbitrarily large amount---eventually, $f$ will catch up.

If $f$ is itself convex, then setting the gradient of the inner term to zero:

y = \nabla f(x)

we can solve exactly for the conjugate evaluated at the gradient of some point $x$ :

f^*(\nabla f(x)) = \langle \nabla f(x), x \rangle - f(x)

Convex duality establishes a relationship between Lipschitz- continuous gradients and strong convexity.

Recall that a function is $\ell$ -Lipschitz iff $|f(y)-f(x)| \le \ell \|y - x\|$ . For an everywhere-differentiable function this is true if and only if the gradients have bounded norm.
Let $f$ be $\ell$ -Lipschitz. What can we say about its convex dual $f^*$ ?
It's apparently true that the convex dual will be $1/\ell$ -strictly convex, and conversely, the convex dual of a $\lambda$ -strictly convex function must be $1/\lambda$ -Lipschitz.

convex dual

Links to this note

Lipschitz

Legendre transform

Lagrange multiplier

mirror descent

Meta