Modified: August 13, 2022
fixed point
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.We say that is a fixed point of an update rule
if . Update rules can often (though not necessarily) be seen as defining an optimization process, as in gradient descent, Newton's method, or the Bellman equations in reinforcement learning, in which case the fixed point is a (candidate) solution to the optimization problem.
Not all update rules have fixed points, and even when fixed point(s) exist, iterative updates may fail to reach them. The Banach fixed-point theorem guarantees that iterates will converge to a unique fixed point if is a contraction mapping.
Derivatives of fixed points
Sometimes we want to understand how the solution to an optimization problem (or other fixed-point iteration) would change if we changed certain parameters of the problem. One approach to this is to explicitly differentiate the unrolled sequence of fixed-point iterations (e.g., via limitations of autodiff). However, the fixed-point structure allows for a shortcut approach which can be more efficient.
The fixed point is defined implicitly by
so implicit differentiation implies a relation for the Jacobians
in which the first term on the right hand side describes the dependence on via the optimal point , and the second term the dependence via the mapping . Solving for we find
which implies that we can differentiate the solution of a fixed-point iteration by
- Solving the iteration (e.g., by running it or any other means) for the optimal , then
- Evaluating the gradients/Jacobians of the map at the fixed point with respect to both of its inputs and , so that we obtain and , and then
- Solving the linear system given above.