convex dual: Nonlinear Function
Created: September 07, 2020
Modified: June 25, 2022

convex dual

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

See also:

For any function f(x)f(x) we define its convex conjugate or Legendre transform or Fenchel dual function to be

f(y)=supxy,xf(x).f^*(y) = \sup_x \langle y, x\rangle - f(x).

This ff^* is always convex because it is the supremum of linear functions of yy.

We can view this as the analogue of the Fourier transform in the (max,+)(max, +) semiring instead of the (+,x)(+, x) semiring. That is, instead of f^(y)=f(x)eixydx\hat{f}(y) = \int f(x) e^{ixy}dx, where we:

  • add a bunch of elements, each consisting of the product of f(x)f(x) with eixye^{ixy}, we
  • take the max of a bunch of elements, each consisting of the sum of f(x)f(x) with yxyx
  • where yxyx is multiplication, viewable as repeated addition; analogously, eixy=(eix)ye^{ixy} = (e^{ix})^y is viewable as repeated multiplication. (I don't quite understand if/how the imaginary numbers play into this analogy).

A shared intuition is that we are 'projecting' a function onto a set of components; the transform tells us how much of each component there is. In the Fourier transform, the components are sinusoids, and the transform tells us how much content there is at each frequency yy. In the convex conjugate, the components are slopes; the transform tells us how much of the original function is at each slope yy.

For example, the line f(x)=5xf(x) = 5x has conjugate f(y)=supxy,xf(x)f^*(y) = \sup_x \langle y, x\rangle - f(x). At y=5y=5 this is zero; for all other yy it is infinity.

As Wikipedia says, "This definition can be interpreted as an encoding of the convex hull of the function's epigraph in terms of its supporting hyperplanes". TODO: is there a convex conjugate equivalent of the fast Fourier transform?

Note that ff^* is defined over the space of linear functionals yy operating on vectors xx. For a given linear functional yy, it tells us the maximum amount by which that functional exceeds f(x)f(x) (equivalently, the amount by which it must be shifted down in order to never exceed ff, i.e., to be a supporting hyperplane). If ff is strictly convex, then it must be at least slightly superlinear, so no fixed linear functional can exceed ff by an arbitrarily large amount---eventually, ff will catch up.

If ff is itself convex, then setting the gradient of the inner term to zero:

y=f(x)y = \nabla f(x)

we can solve exactly for the conjugate evaluated at the gradient of some point xx:

f(f(x))=f(x),xf(x)f^*(\nabla f(x)) = \langle \nabla f(x), x \rangle - f(x)

Convex duality establishes a relationship between Lipschitz- continuous gradients and strong convexity.

  • Recall that a function is \ell-Lipschitz iff f(y)f(x)yx|f(y)-f(x)| \le \ell \|y - x\|. For an everywhere-differentiable function this is true if and only if the gradients have bounded norm.
  • Let ff be \ell-Lipschitz. What can we say about its convex dual ff^*?
  • It's apparently true that the convex dual will be 1/1/\ell-strictly convex, and conversely, the convex dual of a λ\lambda-strictly convex function must be 1/λ1/\lambda-Lipschitz.