fixed point: Nonlinear Function
Created: August 13, 2022
Modified: August 13, 2022

fixed point

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

We say that x(θ)x^*(\theta) is a fixed point of an update rule

xfθ(x)x \leftarrow f_\theta(x)

if x=fθ(x)x^* = f_\theta(x^*). Update rules can often (though not necessarily) be seen as defining an optimization process, as in gradient descent, Newton's method, or the Bellman equations in reinforcement learning, in which case the fixed point is a (candidate) solution to the optimization problem.

Not all update rules have fixed points, and even when fixed point(s) exist, iterative updates may fail to reach them. The Banach fixed-point theorem guarantees that iterates will converge to a unique fixed point if ff is a contraction mapping.

Derivatives of fixed points

Sometimes we want to understand how the solution to an optimization problem (or other fixed-point iteration) would change if we changed certain parameters of the problem. One approach to this is to explicitly differentiate the unrolled sequence of fixed-point iterations (e.g., via limitations of autodiff). However, the fixed-point structure allows for a shortcut approach which can be more efficient.

The fixed point x(θ)x^*(\theta) is defined implicitly by

x(θ)=fθ(x(θ)),x^*(\theta) = f_\theta(x^*(\theta)),

so implicit differentiation implies a relation for the Jacobians

Jθx=JθxJxfθ(x)+Jθfθ(x)J_{\theta\to x^*} = J_{\theta\to x^*} J_{x^*\to f_\theta(x^*)} + J_{\theta \to f_\theta(x^*)}

in which the first term on the right hand side describes the dependence on θ\theta via the optimal point x(θ)x^*(\theta), and the second term the dependence via the mapping fθf_\theta. Solving for JθxJ_{\theta\to x^*} we find

Jθx=Jθfθ(x)(IJxfθ(x))1J_{\theta\to x^*} = J_{\theta \to f_\theta(x^*)} \left(\mathcal{I} - J_{x^*\to f_\theta(x^*)}\right)^{-1}

which implies that we can differentiate the solution of a fixed-point iteration by

  1. Solving the iteration (e.g., by running it or any other means) for the optimal xx^*, then
  2. Evaluating the gradients/Jacobians of the map fθ(x)f_\theta(x^*) at the fixed point with respect to both of its inputs θ\theta and xx^*, so that we obtain Jθfθ(x)J_{\theta\to f_\theta(x^*)} and Jxfθ(x)J_{x^* \to f_\theta(x^*)} , and then
  3. Solving the linear system given above.