Created: July 08, 2022
Modified: July 15, 2022

Legendre transform

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

References:

Jess Riedel on the Legendre transform in physics
Stack Overflow discussion
Prof. V. Balakrishnan on Hamiltonian dynamics: https://www.youtube.com/watch?v=GOkZs2RZMQY
Making sense of the Legendre transform

The Legendre transform of a function $f(x)$ is defined to be

g(y) = \sup_x \langle y, x\rangle - f(x).

An alternative definition for convex functions is that two convex functions $f$ and $g$ are Legendre transforms of each other if their derivatives are inverses: $g' = (f')^{-1}$ or equivalently $g'(f'(x)) = x$ . There's a nice geometric argument for this in (among other places) Jess Riedel's post. This definition is nice because it's symmetric; it's immediately clear how this gives rise to convex duality.

What is the inverse function of a derivative? Well, the function $f'(x)$ gives us the derivative of $f$ at a point $x$ , so its inverse $(f')^{-1}(y)$ must take a derivative $y$ and give back the point $x^*$ at which $f$ achieves that derivative (it's clear that this is uniquely defined only when $f$ is convex).

If that's what $g'$ is doing, then what is $g$ doing? Seen as the antiderivative

g(y) = \int_{0}^y g'(\tilde{y}) d\tilde{y},

we could view $g(y)$ as the result of the following computation: for all slopes up to $y$ , find the point $x$ at which $f$ has that slope, then sum those points.

This sounds really weird. I don't think I really understand it, but maybe a hint comes from the connection to physics: summing up increments corresponds to re-expressing a function as a dynamics, as a sequence of incremental changes. This connects to the fact that the Legendre transform converts a Lagrangian to a Hamiltonian.

I think it helps to connect back to the geometric view: $g'(y)$ is the point where $f$ achieves slope $y$ , and $g(y)$ is the intercept of the tangent line of that point. Now as we change $y$ slightly, how much does the intercept change for a line with slope $y$ that passes through $(x, f(x))$ ? Essentially we are trying to move a lever with a fixed attachment at distance $x$ - if $x$ is really far away, then even a small change in the inclination of the lever will have a large impact on its intercept! The value of $g$ is the result of all the incremental changes to the lever's intercept as we maneuver it to have slope $y$ , and each of those changes will have magnitude proportional to the distance of the attachment point for that corresponding value of $y$ .

Working through an example:

Say $f = \frac{1}{4}x^4$ has $f'(x) = x^3$ , which is inverted as $g'(y) = y^{1/3}$ . Then

g(y) = \int_{0}^y \tilde{y}^{1/3} d\tilde{y} = \left|\frac{3}{4}\tilde{y}^{4/3}\right|_{\tilde{y}=y}.

Compare this to the official definition

\begin{align*} g(y) &= \sup_{x} yx - f(x) \\ &= \left|yx - f(x)\right|_{x = (f')^{-1}(y)}\\ &= y \cdot y^{1/3} - f(y^{1/3})\\ &= y^{4/3} - \frac{1}{4}y^{4/3}\\ &= \frac{3}{4}y^{3/4} \end{align*}

What does the 'sup' in the definition correspond to? It's the general encoding of 'just evaluate at the inverse derivative'. Notably that inverse derivative $x^*$ ends up also being the derivative of $g$ :

\frac{d}{dy} g(y) = x^*(y) = \arg\max_x xy - f(x).

To see this, we compute

\begin{align*} \frac{d}{dy} g &=\frac{d}{dy} \left(y \cdot x^*(y) - f(x^*(y))\right)\\ &= x^*(y) +y \frac{dx^*}{dy} - \frac{df(x^*)}{dx^*}\frac{dx^*}{dy} \end{align*}

where

\frac{dx^*}{dy} = \frac{d }{dy} (f')^{-1}(y)= \left(\frac{df(x^*)}{dx^*}\right)^{-1} = \frac{dx^*}{df(x^*)}

which plugging back in,

\begin{align*} \frac{d}{dy} g &= x^*(y) + \frac{dx^*}{df(x^*)} y - \frac{df(x)}{dx}\frac{dx^*}{df(x^*)}\\ &= x^*(y) + \frac{y}{\frac{df(x^*)}{dx^*}} - 1\\ &= x^*(y) + (1 - 1)\\ &= x^*(y)\\ \end{align*}

where $\frac{df(x^*)}{dx^*} = f'(x^*) = f'((f')^{-1}(y)) = y$ .

In optimization

what is the relationship between minimax duality, duality in constrained optimization, and the duality of the Legendre transform?

for an optimization problem with objective $f$ and constraint $h$ , we write the Lagrangian as $L(x, \lambda) = f(x) - \lambda h(x)$ . Now minimizing this with respect to $x$ is maximizing $\lambda h(x) - f(x)$ with respect to x, so given a lambda we can write $d(\lambda) = \max_{x}\lambda h(x) - f(x)$ as the optimization dual function, the thing that emerges from minimax duality when we put the constraints on the 'outside' rather than the 'inside' of the optimization. If the constant $h(x)$ is linear in $x$ then we have (waves hands) $d(\lambda) = \max_x \lambda x - f(x)$ which also just happens to be the Legendre transform of the objective $f(x)$ , in which the conjugate variable $\lambda$ turns out to also be the Lagrange multiplier on the constraint.

Legendre transform

In optimization

Links to this note

Legendre transform

Lagrange multiplier

Hamiltonian

Meta