Kronecker product: Nonlinear Function
Created: October 25, 2020
Modified: October 20, 2022

Kronecker product

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Given two matrices A,B\mathbf{A}, \mathbf{B}, the Kronecker product is the block matrix that 'tiles' B\mathbf{B} across the entries of A\mathbf{A}:

AB=(a11Ba12Ba1nBam1Bam2BamnB)\mathbf{A} \otimes \mathbf{B} = \begin{pmatrix}a_{11} \mathbf{B} & a_{12} \mathbf{B} & \ldots & a_{1n} \mathbf{B} \\ \vdots & & & \vdots\\ a_{m1} \mathbf{B} & a_{m2} \mathbf{B} & \ldots & a_{mn} \mathbf{B} \end{pmatrix}

It can be generalized to block matrices in several ways:

Kronecker products seem to come up in ML. Examples:

  • The matrix Gaussian distribution is a Gaussian whose mn×mnmn\times mn covariance matrix is a Kronecker product of two covariance matrices of shape n×nn\times n and m×mm\times m respectively, allowing for fast computations. This can be used as a model of neural network weights.
  • In K-FAC and practical Gauss-Newton,

Relation to tensor products

The Kronecker product can be seen as a concrete representation of the tensor product in the special case where the vectors being combined are themselves linear maps on vector spaces, i.e., matrices (noting that the space of linear maps from one vector space to another is itself a vector space). Remember that the tensor product ATB\mathbf{A} \otimes_T \mathbf{B} just denotes the formal pair (A,B)(\mathbf{A}, \mathbf{B}) 'up to bilinearity', i.e., as an element of an abstract space in which

ATB+ATC=AT(B+C)\mathbf{A} \otimes_T \mathbf{B} + \mathbf{A} \otimes_T \mathbf{C} = \mathbf{A} \otimes_T (\mathbf{B} + \mathbf{C})
ATλB=λ(ATB)=λATB.\mathbf{A} \otimes_T \lambda \mathbf{B} = \lambda (\mathbf{A} \otimes_T \mathbf{B}) = \lambda \mathbf{A} \otimes_T \mathbf{B}.

In general, tensor products are defined on any pair of vector spaces --- the elements don't themselves have to be anything in particular. The fact that the vectors in question happen to themselves be linear maps is additional structure, not 'visible' at the abstraction level of the tensor product.

Meanwhile, the Kronecker product of matrices AKB\mathbf{A} \otimes_K \mathbf{B} also represents the pair (A,B)(\mathbf{A}, \mathbf{B}), but as a new, concrete matrix, which itself encodes a new linear map. Because the Kronecker product is bilinear, the algebra of Kronecker products (which deals with concrete matrices) is isomorphic to the algebra of tensor products (which deals with abstract formal pairs). But the Kronecker product implies additional structure, the 'mixed-product' property (for matrices of compatible shape):

(AKB)(CKD)=(AC)K(BD)(A \otimes_K B)(C \otimes_K D) = (AC) \otimes_K (BD)

which falls out directly from composition of linear maps.

Sometimes the Kronecker product of two matrices is identified with the tensor product. This is playing fast and loose, because a tensor product space doesn't inherently define any notion of multiplication; we could in principle attach any multiplication rule we want.Note that in general the mixed-product rule even allows us to multiply tensors from different spaces: the shapes don't need to be identical, just to compose properly, and having identical shapes is neither necessary nor sufficient for composition. But the rule implied by the Kronecker product is a very natural one to consider.

Vectorization

The Kronecker product can be used to simplify certain matrix equations.

Define the row1 vectorization of an [n,m][n, m] matrix XX as the nmnm-vector that simply flattens XX by concatenating its rows into a single vector:

rvec(X)=[x11,x12,,x1m,x21,,x2m,,xnm]T=np.reshape(X, [-1])\begin{align*} \text{rvec}(X) &= [x_{11}, x_{12}, \ldots, x_{1m}, x_{21}, \ldots, x_{2m},\ldots, x_{nm}]^T\\ &=\texttt{np.reshape(X, [-1])} \end{align*}

This can equivalently be expressed in terms of a Kronecker product with basis vectors eiRn×1\mathbf{e}_i \in \mathbb{R}^{n \times 1}

rvec(X)=i=1neiXTei\begin{align*} \text{rvec}(X) &= \sum_{i=1}^n \mathbf{e}_i \otimes X^T \mathbf{e}_i\\ \end{align*}

One can prove mechanically^[On the left, we see that the in+ji\cdot n + jth row of ABA\otimes B is defined as [ai1Bj,,aimBj][a_{i1} B_j, \ldots, a_{im} B_j], and multiplying this by rvec(X)\text{rvec}(X) we get klaikbjlrvec(X)mk+l=klaikbjlxkl\sum_k \sum_l a_{ik}\cdot b_{jl}\cdot \text{rvec}(X)_{mk + l} = \sum_k \sum_l a_{ik}\cdot b_{jl} \cdot x_{kl}. On the right, we have the in+ji\cdot n + jth element of rvec(AXBT)\text{rvec}(AXB^T) equal to

(AXBT)ij=klaikxklbjl,(AXB^T)_{ij} = \sum_{k}\sum_l a_{ik}x_{kl}b_{jl},

which is equivalent.] that

(AB)rvec(X)=rvec(AXBT)(A \otimes B) \cdot \text{rvec}(X) = \text{rvec}(AXB^T)

i.e., the Kronecker product represents a multilinear operation on the (flattened contents of the) matrix XX.

References


  1. The mathematical literature seems to have adopted the convention that vec(X)\text{vec}(X) denotes column vectorization, equivalent to np.reshape(X, [-1], order='F'). I think that row-vectorization gives more natural formulas, so will use and prefer the rvec\text{rvec} notation to avoid ambiguity.