ill-conditioned: Nonlinear Function
Created: September 19, 2020
Modified: December 28, 2022

ill-conditioned

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.
  • Multiple senses:
    • An 'ill-conditioned matrix' has a large ratio λminλmax\frac{\lambda_\text{min}}{\lambda_\text{max}} between its largest and smallest eigenvalue (more generally, see what is a condition number?).
    • An ill-conditioned Gaussian has an ill-conditioned covariance matrix.
    • In general, a real-valued function is ill-conditioned at some input if its Hessian is an ill-conditioned matrix: one where a move in the fastest-changing direction matters significantly more than a move in the slowest-changing dimension.
  • gradient descent works poorly on ill-conditioned loss functions, because it gets greedy and tries to move in the fastest-changing direction, when actually it's the slowest-changing direction that really requires (and doesn't penalize) big moves.
  • Computing a well-conditioned function can still fail if you represent it as a composition of ill-conditioned steps. For example, the identity function f(x)=xf(x) = x is obviously well-conditioned, but if we tried to compute it as f(x)=g1(g(x))f(x) = g^{-1}(g(x)) for some ill-conditioned function gg, then we may run out of precision for the intermediate quantity g(x)g(x).
  • How to think about conditioning in optimization?
    • Conditioning in optimization is inherently a 'second-order' quantity. It's not about what happens if you change the input by epsilon. It's about what happens if you change by epsilon * epsilon.
    • I think the connection here is that in an optimization, your 'errors' are approximation error from running a discrete-time algorithm. Although I think you can probably say something more inherent, that the convergence time of continuous gradient flow depends on the condition number?