Created: November 12, 2013
Modified: March 16, 2022

trace

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Trace of a Linear Operator

We define the trace as the sum of diagonal elements of a matrix:

\text{tr}(A) = \sum_{i} a_{ii}.

Lemma: If $A$ and $B$ are square, then $\text{tr}(AB) = \text{tr}(BA)$ . Proof: Using the notation above, we write

\text{tr}(AB) = \sum_i \sum_j a_{ij} b_{ji} = \sum_j \sum_i b_{ji} a_{ij} = \text{tr}(BA).

An analogous argument using the general notation for multiple matrices shows that in general trace is invariant to cyclic permutation, i.e. it holds that $\text{tr}(ABC) = \text{tr}(CAB) = \text{tr}(BCA)$ . Note that trace is not invariant to arbitrary permutations (or even just swapping of adjacent pairs). For example, $\text{tr}(ABC) \ne \text{tr}(BAC)$ in general.

Now we can prove an important result.

Theorem: Trace is preserved under change of basis. Proof: Let $B$ be an invertible matrix. Then $\text{tr}(B^{-1}AB) = \text{tr}(BB^{-1}A) = \text{tr}(A)$ by the circular permutation argument above. This shows that for any change-of-basis matrix $B$ , the trace of $A$ is preserved in the new basis.

From this we get an easy, but important corollary.

Theorem: The trace of a linear operator is the sum of its eigenvalues. Proof: Let $A$ be a diagonalizable linear operator, i.e., one with a full set of orthonormal eigenvectors. Then we can write $A$ in its eigenbasis

A = V^T D V

where $D$ is the diagonal matrix of eigenvalues $D_{ii} = \lambda_i$ and the rows of $V$ are the eigenvectors $(v_1, \ldots, v_n)$ . Now by the theorem above, we have

\text{tr}(A) = \text{tr}(D) = \sum_i \lambda_i.

Note this theorem is actually true even for non-diagonalizable operators (i.e., where we can't write an explicit eigendecomposition because there aren't $n$ linearly independent eigenvectors), but you need a different proof.

Randomized Approximation

Suppose we want to compute $\text{tr}(A^{-1})$ , but we don't want to invert $A$ since that requires $n^2$ time. There's a randomized trick which gets around this. Let $x \sim \mathcal{N}(0, I)$ . Then

\text{tr}(A^{-1}) = \text{tr}(A^{-1} I) = \text{tr}(A^{-1} E[x x^T]) = E\left[ \text{tr}(A^{-1} x x^T)\right] = E[\text{tr}(x^T A^{-1} x)] = E[x^T A^{-1} x].

So we can approximate the trace by sampling a bunch of vectors $x$ , and averaging the resulting values of $x^T A^{-1} x$ . We can compute each such value in $n^2$ time by solving the linear system $Ay=x$ , then multiplying $x^Ty$ . As long as we have to do this fewer than $n$ times to get a good approximation, we'll be saving time.

There could probably be some theory here about how many samples we'll actually need to get a good approximation. I think http://scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/Sardenia_Talk.pdf has more info.

trace

Trace of a Linear Operator

Randomized Approximation

Links to this note

matrix notation

Meta