Created: November 01, 2020
Modified: November 01, 2020
Modified: November 01, 2020
From Automatic Differentiation to Message Passing
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.- Talk from Tom Minka: https://tminka.github.io/papers/acmll2019/
- Part 1: Autodiff.
- Recommended reading: Evaluating Derivatives.
- Can think of AD as having an 'Execution' then 'Accumulation' phase. In 'Execution' every operation is replaced with a linear operation. In Accumulation, we collect linear coefficients.
- This relates to Tom's approach for matrix derivatives. In a linear function, the gradient wrt is just the coefficient on .
- So in AD in general, we 'linearize' every function around the current point , as (where we imagine for some )
- We then keep track of the coefficient on the term.
- The 'backwards message' in reverse-mode autodiff is a sum over paths to the output. (the law of total derivatives).