Created: August 13, 2022
Modified: August 13, 2022
Modified: August 13, 2022
chain rule
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.There are two major 'chain rules' relevant to machine learning: the chain rule of probability theory and the chain rule from calculus.
Probability chain rule: Any joint density factors as the product of conditional densities
Calculus chain rule: the derivative of a function composition is given by the product of 'local' derivatives
or in the multivariate case, the (transposed) Jacobian matrix is the product of local (transposed) Jacobians,
This fact is the foundation of limitations of autodiff.