matrix notation: Nonlinear Function
Created: November 12, 2013
Modified: March 16, 2022

matrix notation

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Notation for Matrix Multiplication

Let A=(aij)A = (a_{ij}) and B=(bjk)B = (b_{jk}). Then

(AB)ik=jaijbjk(AB)_{ik} = \sum_j a_{ij}b_{jk}

just by the definition of matrix multiplication (the summation over jj is performing the dot product of the iith row of AA with the kkth column of BB). Furthermore, if we have C=(ckl)C = (c_{kl}) then

(ABC)il=k(AB)ikCkl=k(jaijbjk)Ckl=j,kaijbjkckl\begin{align} (ABC)_{il} &= \sum_k (AB)_{ik} C_{kl}\\ &= \sum_k \left(\sum_j a_{ij}b_{jk} \right) C_{kl}\\ &= \sum_{j,k} a_{ij}b_{jk}c_{kl} \end{align}

and it's easy to see by induction how this pattern generalizes: we can write a product of matrices as a sum over the product of their entries, where the sum is taken over all of the "inner" indices.

Function Composition

Say we have matrices A:RvRwA: \mathbb{R}^v \to \mathbb{R}^w and B:RwRsB: \mathbb{R}^w \to \mathbb{R}^s. We can decompose

A=ijwjviTA = \sum_{ij} w_j v_i^T
B=ksw^kTB = \sum_{k\ell} s_\ell \hat{w}_k^T

for some sets of vectors (v),(w),(w^),(s)(v), (w), (\hat{w}), (s) that exist by the isomorphism between HomHom and tensor products (i.e., the vectors (v),(w)(v),(w) correspond to the pure tensor decomposition of AA, and similarly for BB). Then we can write the composite map BA:RvRsBA: \mathbb{R}^v \to \mathbb{R}^s as

BA=is(jkwkTwj)viTBA = \sum_{i\ell} s_\ell \left(\sum_{jk} w_k^T w_j\right) v_i^T

where jkwkTwj\sum_{jk} w_k^T w_j is the trace of the matrix W=jkwjw^kTW = \sum_{jk} w_j \hat{w}_k^T. That last fact follows from the general relation

xTy=tr(yxT)x^Ty = \text{tr}(yx^T)

which holds since the (i,j)(i,j)th entry if yxTyx^T is yixjy_ix_j, so the sum of the diagonal is the sum over ii of yixiy_ix_i, which is exactly the inner product. This shows very cleanly a relationship between outer and inner products by way of the trace. We then used this to express the composition of A,BA,B in terms of the trace of an (implicit) operation on the in-between space WW.

Vector/Matrix Notation

It seems like a good notational convention in general to think of T^T as equivalent to ^*, i.e., to transpose a vector is to move from thinking of it as a (column) vector to thinking of it as a linear functional, expressed as a row vector. So when we write design matrices XX, it makes sense to think of the data points as columns, since they are explicitly vectors rather than functionals. Then XTXX^TX has the nice interpretation as dot products (generalized from ordinary vectors), and the covariance XXTXX^T similarly has an interpretation in terms of outer products.