Modified: October 20, 2022
Kronecker product
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.Given two matrices , the Kronecker product is the block matrix that 'tiles' across the entries of :
It can be generalized to block matrices in several ways:
Kronecker products seem to come up in ML. Examples:
- The matrix Gaussian distribution is a Gaussian whose covariance matrix is a Kronecker product of two covariance matrices of shape and respectively, allowing for fast computations. This can be used as a model of neural network weights.
- In K-FAC and practical Gauss-Newton,
Relation to tensor products
The Kronecker product can be seen as a concrete representation of the tensor product in the special case where the vectors being combined are themselves linear maps on vector spaces, i.e., matrices (noting that the space of linear maps from one vector space to another is itself a vector space). Remember that the tensor product just denotes the formal pair 'up to bilinearity', i.e., as an element of an abstract space in which
In general, tensor products are defined on any pair of vector spaces --- the elements don't themselves have to be anything in particular. The fact that the vectors in question happen to themselves be linear maps is additional structure, not 'visible' at the abstraction level of the tensor product.
Meanwhile, the Kronecker product of matrices also represents the pair , but as a new, concrete matrix, which itself encodes a new linear map. Because the Kronecker product is bilinear, the algebra of Kronecker products (which deals with concrete matrices) is isomorphic to the algebra of tensor products (which deals with abstract formal pairs). But the Kronecker product implies additional structure, the 'mixed-product' property (for matrices of compatible shape):
which falls out directly from composition of linear maps.
Sometimes the Kronecker product of two matrices is identified with the tensor product. This is playing fast and loose, because a tensor product space doesn't inherently define any notion of multiplication; we could in principle attach any multiplication rule we want.Note that in general the mixed-product rule even allows us to multiply tensors from different spaces: the shapes don't need to be identical, just to compose properly, and having identical shapes is neither necessary nor sufficient for composition. But the rule implied by the Kronecker product is a very natural one to consider.
Vectorization
The Kronecker product can be used to simplify certain matrix equations.
Define the row1 vectorization of an matrix as the -vector that simply flattens by concatenating its rows into a single vector:
This can equivalently be expressed in terms of a Kronecker product with basis vectors
One can prove mechanically^[On the left, we see that the th row of is defined as , and multiplying this by we get . On the right, we have the th element of equal to
which is equivalent.] that
i.e., the Kronecker product represents a multilinear operation on the (flattened contents of the) matrix .
References
- Harville, D. A. (1997). Kronecker Products and the Vec and Vech Operators. Matrix Algebra From a Statistician’s Perspective
- https://en.wikipedia.org/wiki/Kronecker_product
- https://research.wmz.ninja/articles/2017/12/vectorization-kronecker-product-and-khatri-rao-product.html
- The mathematical literature seems to have adopted the convention that denotes column vectorization, equivalent to
np.reshape(X, [-1], order='F')
. I think that row-vectorization gives more natural formulas, so will use and prefer the notation to avoid ambiguity.↩