Modified: August 13, 2022
Jacobian
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.The partial derivatives of a multivariate function form its Jacobian matrix
The convention here (matching Wikipedia, and I believe also used by Jax and other limitations of autodiff systems) is that the Jacobian is the matrix whose rows are output coordinates and columns are input coordinates. This matches the notation for function composition in which denotes a right-to-left flow from the inputs to the final output from .
Transpose notation
I don't love the usual convention because I generally find it more intuitive to think of input-to-output flow as a left-to-right mapping. So I sometimes prefer to work explicitly in terms of the transpose , using the notation
to represent an matrix .
In this notation, the chain rule for the derivative of a function becomes a left-to-right computation graph,
which I find more intuitive than the equivalent in the usual notation.
I also like that this notation specializes nicely to the gradient of a scalar function ,
under the usual convention that the gradient is a column vector.Debatably, gradients should be thought of as row vectors to indicate that they are really dual vectors, aka linear functionals on the underlying vector space. Under this view, the conventional definition of a Jacobian is more natural. However, machine learning usually ignores this, implicitly using the isomorphism between a vector space and its dual induced by whatever basis we're using to do the computations.