Created: September 19, 2020
Modified: December 28, 2022
Modified: December 28, 2022
ill-conditioned
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.- Multiple senses:
- An 'ill-conditioned matrix' has a large ratio between its largest and smallest eigenvalue (more generally, see what is a condition number?).
- An ill-conditioned Gaussian has an ill-conditioned covariance matrix.
- In general, a real-valued function is ill-conditioned at some input if its Hessian is an ill-conditioned matrix: one where a move in the fastest-changing direction matters significantly more than a move in the slowest-changing dimension.
- gradient descent works poorly on ill-conditioned loss functions, because it gets greedy and tries to move in the fastest-changing direction, when actually it's the slowest-changing direction that really requires (and doesn't penalize) big moves.
- Computing a well-conditioned function can still fail if you represent it as a composition of ill-conditioned steps. For example, the identity function is obviously well-conditioned, but if we tried to compute it as for some ill-conditioned function , then we may run out of precision for the intermediate quantity .
- How to think about conditioning in optimization?
- Conditioning in optimization is inherently a 'second-order' quantity. It's not about what happens if you change the input by epsilon. It's about what happens if you change by epsilon * epsilon.
- I think the connection here is that in an optimization, your 'errors' are approximation error from running a discrete-time algorithm. Although I think you can probably say something more inherent, that the convergence time of continuous gradient flow depends on the condition number?