From Automatic Differentiation to Message Passing: Nonlinear Function
Created: November 01, 2020
Modified: November 01, 2020

From Automatic Differentiation to Message Passing

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.
  • Talk from Tom Minka: https://tminka.github.io/papers/acmll2019/
  • Part 1: Autodiff.
    • Recommended reading: Evaluating Derivatives.
    • Can think of AD as having an 'Execution' then 'Accumulation' phase. In 'Execution' every operation is replaced with a linear operation. In Accumulation, we collect linear coefficients.
      • This relates to Tom's approach for matrix derivatives. In a linear function, the gradient wrt xkx_k is just the coefficient on xkx_k.
      • So in AD in general, we 'linearize' every function around the current point xx, as f(x+dx)=f(x)+f(x)dxf(x + dx) = f(x) + f'(x)dx (where we imagine dx=xxdx = x^* - x for some xx^*)
      • We then keep track of the coefficient on the dxdx term.
    • The 'backwards message' in reverse-mode autodiff is a sum over paths to the output. (the law of total derivatives).