Created: November 01, 2020
Modified: November 01, 2020

From Automatic Differentiation to Message Passing

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Talk from Tom Minka: https://tminka.github.io/papers/acmll2019/
Part 1: Autodiff.
- Recommended reading: Evaluating Derivatives.
- Can think of AD as having an 'Execution' then 'Accumulation' phase. In 'Execution' every operation is replaced with a linear operation. In Accumulation, we collect linear coefficients.
  - This relates to Tom's approach for matrix derivatives. In a linear function, the gradient wrt $x_k$ is just the coefficient on $x_k$ .
  - So in AD in general, we 'linearize' every function around the current point $x$ , as $f(x + dx) = f(x) + f'(x)dx$ (where we imagine $dx = x^* - x$ for some $x^*$ )
  - We then keep track of the coefficient on the $dx$ term.
- The 'backwards message' in reverse-mode autodiff is a sum over paths to the output. (the law of total derivatives).

From Automatic Differentiation to Message Passing

Meta