Modified: July 15, 2022
Legendre transform
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.References:
- Jess Riedel on the Legendre transform in physics
- Stack Overflow discussion
- Prof. V. Balakrishnan on Hamiltonian dynamics: https://www.youtube.com/watch?v=GOkZs2RZMQY
- Making sense of the Legendre transform
The Legendre transform of a function is defined to be
An alternative definition for convex functions is that two convex functions and are Legendre transforms of each other if their derivatives are inverses: or equivalently . There's a nice geometric argument for this in (among other places) Jess Riedel's post. This definition is nice because it's symmetric; it's immediately clear how this gives rise to convex duality.
What is the inverse function of a derivative? Well, the function gives us the derivative of at a point , so its inverse must take a derivative and give back the point at which achieves that derivative (it's clear that this is uniquely defined only when is convex).
If that's what is doing, then what is doing? Seen as the antiderivative
we could view as the result of the following computation: for all slopes up to , find the point at which has that slope, then sum those points.
This sounds really weird. I don't think I really understand it, but maybe a hint comes from the connection to physics: summing up increments corresponds to re-expressing a function as a dynamics, as a sequence of incremental changes. This connects to the fact that the Legendre transform converts a Lagrangian to a Hamiltonian.
I think it helps to connect back to the geometric view: is the point where achieves slope , and is the intercept of the tangent line of that point. Now as we change slightly, how much does the intercept change for a line with slope that passes through ? Essentially we are trying to move a lever with a fixed attachment at distance - if is really far away, then even a small change in the inclination of the lever will have a large impact on its intercept! The value of is the result of all the incremental changes to the lever's intercept as we maneuver it to have slope , and each of those changes will have magnitude proportional to the distance of the attachment point for that corresponding value of .
Working through an example:
Say has , which is inverted as . Then
Compare this to the official definition
What does the 'sup' in the definition correspond to? It's the general encoding of 'just evaluate at the inverse derivative'. Notably that inverse derivative ends up also being the derivative of :
To see this, we compute
where
which plugging back in,
where .
In optimization
what is the relationship between minimax duality, duality in constrained optimization, and the duality of the Legendre transform?
for an optimization problem with objective and constraint , we write the Lagrangian as . Now minimizing this with respect to is maximizing with respect to x, so given a lambda we can write as the optimization dual function, the thing that emerges from minimax duality when we put the constraints on the 'outside' rather than the 'inside' of the optimization. If the constant is linear in then we have (waves hands) which also just happens to be the Legendre transform of the objective , in which the conjugate variable turns out to also be the Lagrange multiplier on the constraint.