Created: September 19, 2020
Modified: December 28, 2022

ill-conditioned

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Multiple senses:
- An 'ill-conditioned matrix' has a large ratio $\frac{\lambda_\text{min}}{\lambda_\text{max}}$ between its largest and smallest eigenvalue (more generally, see what is a condition number?).
- An ill-conditioned Gaussian has an ill-conditioned covariance matrix.
- In general, a real-valued function is ill-conditioned at some input if its Hessian is an ill-conditioned matrix: one where a move in the fastest-changing direction matters significantly more than a move in the slowest-changing dimension.
gradient descent works poorly on ill-conditioned loss functions, because it gets greedy and tries to move in the fastest-changing direction, when actually it's the slowest-changing direction that really requires (and doesn't penalize) big moves.
Computing a well-conditioned function can still fail if you represent it as a composition of ill-conditioned steps. For example, the identity function $f(x) = x$ is obviously well-conditioned, but if we tried to compute it as $f(x) = g^{-1}(g(x))$ for some ill-conditioned function $g$ , then we may run out of precision for the intermediate quantity $g(x)$ .
How to think about conditioning in optimization?
- Conditioning in optimization is inherently a 'second-order' quantity. It's not about what happens if you change the input by epsilon. It's about what happens if you change by epsilon * epsilon.
- I think the connection here is that in an optimization, your 'errors' are approximation error from running a discrete-time algorithm. Although I think you can probably say something more inherent, that the convergence time of continuous gradient flow depends on the condition number?

ill-conditioned

Links to this note

numerics

limitations of autodiff

Meta