Created: November 12, 2013
Modified: July 18, 2022

tensor product

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

The tensor product $V \otimes W$ of two vector spaces $V, W$ (defined on the same scalar field, we'll assume $\mathbb{R}$ ) is the vector space of formal sums of formal pairs of vectors, where the formal sums are defined bilinearly.We call the elements of this space tensors, and will write them interchangeably as $(v, w)$ or $v \otimes w$ . That is, if we have $v_i, v_j \in V$ and $w_i, w_j \in W$ , so that the tensor space contains

(v_i, w_i), (v_i, w_j), (v_j, w_i), (v_j, w_j) \in V \otimes W,

then we can add elements with the same left component or same right component:

(v_i, w_i) + (v_i, w_j) = (v_i, w_i + w_j)

(v_i, w_i) + (v_j, w_i) = (v_i + v_j, w_i)

but if both components are different, the result is only a formal sum, and doesn't simplify:

(v_i, w_i) + (v_j, w_j) \ne (v_i+v_j, w_i+w_j).

We call this limited addition bilinearity. This is in contrast to the direct sum of $V$ and $W$ , in which we would allow componentwise addition. Note that a tensor product of sums behaves very similarly to an ordinary product of sums, in that it expands out in a FOIL-type operation:

\begin{align} (v_i+v_j, w_i+w_j) &= (v_i, w_i + w_j) + (v_j, w_i + w_j)\\ &= (v_i, w_i) + (v_i, w_j) + (v_j, w_i) + (v_j, w_j). \end{align}

We can also interchange scalar multipliers between the two sides (switching here to use $\otimes$ to indicate the tensor product):

(v, \lambda w) = (\lambda v, w) = \lambda (v, w)

We can view the tensor product operator $\otimes$ as a bilinearMultilinear in the general case. map from the direct product space into the formal space of tensor products:

\otimes: V \times W \to V \otimes W.

Tensors that fall in the range of $\otimes$ i.e., those that can be written as $v \otimes w$ are called pure tensors. Note that the sum of pure tensors is not in general a pure tensor, e.g., we cannot in general simplify a sum $v_i \otimes w_i + v_j \otimes w_j$ to $(v_i + v_j) \otimes (w_i + w_j)$ or any other pure tensor.

Multilinear maps

Tensors are the 'gatekeepers' of multilinear maps. Formally, they satisfy the universal property: any bilinear map $g: V\times W \to \mathbb{R}$ can be written as a composition that 'goes through' the tensor product space. That is, for any such $g$ , there is a unique $\hat{g}: V\otimes W \to \mathbb{R}$ such that

g(\mathbf{v}, \mathbf{w}) = \hat{g}(\mathbf{v} \otimes \mathbf{w}).

for all vectors $\mathbf{v}\in V$ and $\mathbf{w}\in W$ . This makes tensors something analogous to a sufficient statistic for multilinear maps.

Isomorphism with linear transformations

(see also tensor for an independently written version of this argument)

Consider the tensor product $V^* \otimes V$ of a space with its dual space, i.e. the space of linear functionals on $V$ . We will think of elements of $V$ as column vectors, and elements of $V^*$ as row vectors, so we can represent the application of a functional $f^* \in V^*$ to a vector $v \in V$ by $f^*v$ . Note that under this notation we have $v^T = v^*$ .

Call an element of $V^* \otimes V$ a pure tensor if it can be written as $(f^*, v)$ , i.e., as a single formal pair of functional and vector, as opposed to a sum of several such pairs (recall from above that a formal sum will not necessarily simplify to a pure element). Clearly we can write any element of the tensor space as a sum of pure elements. An obvious thing to do with a pure tensor $(f^*, v)$ is to apply $f^*$ to $v$ , yielding a real number. Generalized to an arbitrary tensor in the obvious way, by taking the sum of the results from the pure elements, this is called the evaluation map or trace map. We will see later why this is. For now, just think of the trace map as having the flavor of an "inner product", since it is literally the dot product of the vectors $\langle f^*, v\rangle$ .

For the moment, we're going to move in the other direction and think instead about outer products. Recall that the outer product of two $n$ -dimensional column vectors $v_i$ , $v_j$ is defined as the $n \times n$ matrix given by $v_i v_j^T$ . One way to view the outer product is as giving a linear map from pairs of vectors in $V$ to linear transformations on $V$ . Let's explore this using the machinery of tensor products. In particular, we'll show that the tensor space $V \otimes V^*$ is isomorphic to the space of linear transformations on $V$ , also known as the endomorphisms of $V$ , written $End(V)$ .

To see this, we'll start by defining a linear operator $A$ corresponding to the pure tensor $(v, f^*)$ . Formally this map from tensors to linear operators will be called the coevaluation map. For any vector $w \in V$ , let $Aw$ = $v (f^* w)$ . That is, we apply $f^*$ to $w$ , yielding a scalar, and then return the vector $v$ scaled by that quantity. Note that this is a rank-one map, since it only returns vectors in the one-dimensional subspace spanned by $v$ . It should be obvious that this is a linear operator.

Next, let's generalize $A$ to arbitrary tensors. For a tensor represented as a sum of pure tensors, we define the resulting linear operator to be the sum of the linear operators generated by the pure tensors. In particular, let $(e_1, \ldots, e_n)$ be a basis for $V$ (with $(e_i^T)$ a corresponding basis for $V^*$ ); for intuition, we'll imagine $V = \mathbb{R}^n$ with the coordinate basis $e_i = [0 \ldots 1 \ldots 0]^T$ . Then we can represent a tensor $t$ as the sum of pure tensors given by basis elements, i.e.,

t = \sum_{ij} a_{ij}(e_i, e_j^*).

This follows because a tensor is a formal sum of pure tensors, and each pure tensor can be decomposed into basis elements. Now consider the linear operator defined by $t$ under our construction above. For any vector $w \in V$ , the $ij$ th component of the linear operator picks out the $j$ coordinate of $w$ 's representation, and returns $e_i$ scaled by that coordinate, multiplied by $a_{ij}$ . This is equivalent to multiplication of $w$ by the matrix containing all zeros except for its $ij$ th entry, which contains $a_{ij}$ . Thus we see that the linear operator defined by $t$ is just the matrix $A$ with entries $a_{ij}$ . This means we can generate any matrix $A$ just by constructing the tensor with the appropriate coefficients. Thus we have a bijection between tensors and linear operators: for any tensor $t$ we can generate a matrix $A$ , and for any matrix $A$ we can recover the corresponding tensor $t$ . It is not hard to see that this bijection is actually an isomorphism. This establishes the general theorem that $V \otimes V^* \cong End(V)$ .

Now we can motivate the name of the trace map as given above. Consider writing a matrix $A$ in its tensor form, as a linear combination of pure basis tensors. Now apply the trace map to this tensor. All elements of the form $(e_i, e_j^*)$ for $i \ne j$ disappear, and we're left with just the "diagonal" elements $(e_i, e_i^*)$ , which evaluate to 1. So the trace map returns the sum of their coefficients, $\sum_i a_{ii}$ , which is exactly the trace of $A$ .

The big, overarching point of all of this machinery is to allow us to define operations on vector spaces (e.g. outer products) without needing to choose a basis.

Some other quick facts:

In general, given two vector spaces $V$ and $W$ of dimensions $n$ and $m$ respectively, the tensor product space $V \otimes W$ has dimension $nm$ . This is a consequence of the isomorphism with linear operators $V\to W$ , and thus with $n \times m$ matrices.
Weirdly, there are lots of zeros in a tensor product space. In particular, $(v_i, 0) = 0$ for any $v_i$ , since by bilinearity we have

(v_i, 0) = (v_i, 0+0) = (v_i, 0) + (v_i, 0).

The general technique to show an element is not zero is to construct a linear transformation from the tensor space to some nicer space, say $\mathbb{R}$ . For example, we can show $(v_i, v_i^*) \ne 0$ by applying the trace map, which is a linear map taking $(v_i, v_i^*) \to 1$ .

The outer product / coevaluation map has a "lambda calculus" sort of flavor, in that it has a "bare functional" hanging off of its backend waiting to be evaluated. In general we can do a kind of "currying'': for spaces $A, B, C$ ,
A matrix can be represented as the sum of outer products of eigenvectors, weighted by its eigenvalues: $A = \sum_{i} \lambda_i v_i v_i^T.$ This doesn't need any of the machinery developed above to prove. It's easy to see that the linear transformation given by the right side agrees with $A$ on any eigenvector $v_i$ (i.e., it returns $\lambda_i v_i$ ). Since we have two linear operator that agree in their actions on a full set of basis elements, they must be the same operator.

tensor product

Multilinear maps

Isomorphism with linear transformations

Links to this note

tensor

Kronecker product

Schrodinger equation

matrix notation

Meta