Created: August 13, 2022
Modified: August 13, 2022

fixed point

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

We say that $x^*(\theta)$ is a fixed point of an update rule

x \leftarrow f_\theta(x)

if $x^* = f_\theta(x^*)$ . Update rules can often (though not necessarily) be seen as defining an optimization process, as in gradient descent, Newton's method, or the Bellman equations in reinforcement learning, in which case the fixed point is a (candidate) solution to the optimization problem.

Not all update rules have fixed points, and even when fixed point(s) exist, iterative updates may fail to reach them. The Banach fixed-point theorem guarantees that iterates will converge to a unique fixed point if $f$ is a contraction mapping.

Derivatives of fixed points

Sometimes we want to understand how the solution to an optimization problem (or other fixed-point iteration) would change if we changed certain parameters of the problem. One approach to this is to explicitly differentiate the unrolled sequence of fixed-point iterations (e.g., via limitations of autodiff). However, the fixed-point structure allows for a shortcut approach which can be more efficient.

The fixed point $x^*(\theta)$ is defined implicitly by

x^*(\theta) = f_\theta(x^*(\theta)),

so implicit differentiation implies a relation for the Jacobians

J_{\theta\to x^*} = J_{\theta\to x^*} J_{x^*\to f_\theta(x^*)} + J_{\theta \to f_\theta(x^*)}

in which the first term on the right hand side describes the dependence on $\theta$ via the optimal point $x^*(\theta)$ , and the second term the dependence via the mapping $f_\theta$ . Solving for $J_{\theta\to x^*}$ we find

J_{\theta\to x^*} = J_{\theta \to f_\theta(x^*)} \left(\mathcal{I} - J_{x^*\to f_\theta(x^*)}\right)^{-1}

which implies that we can differentiate the solution of a fixed-point iteration by

Solving the iteration (e.g., by running it or any other means) for the optimal $x^*$ , then
Evaluating the gradients/Jacobians of the map $f_\theta(x^*)$ at the fixed point with respect to both of its inputs $\theta$ and $x^*$ , so that we obtain $J_{\theta\to f_\theta(x^*)}$ and $J_{x^* \to f_\theta(x^*)}$ , and then
Solving the linear system given above.

fixed point

Derivatives of fixed points

Links to this note

contraction

Meta