tensor product: Nonlinear Function
Created: November 12, 2013
Modified: July 18, 2022

tensor product

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

The tensor product VWV \otimes W of two vector spaces V,WV, W (defined on the same scalar field, we'll assume R\mathbb{R}) is the vector space of formal sums of formal pairs of vectors, where the formal sums are defined bilinearly.We call the elements of this space tensors, and will write them interchangeably as (v,w)(v, w) or vwv \otimes w. That is, if we have vi,vjVv_i, v_j \in V and wi,wjWw_i, w_j \in W, so that the tensor space contains

(vi,wi),(vi,wj),(vj,wi),(vj,wj)VW,(v_i, w_i), (v_i, w_j), (v_j, w_i), (v_j, w_j) \in V \otimes W,

then we can add elements with the same left component or same right component:

(vi,wi)+(vi,wj)=(vi,wi+wj)(v_i, w_i) + (v_i, w_j) = (v_i, w_i + w_j)
(vi,wi)+(vj,wi)=(vi+vj,wi)(v_i, w_i) + (v_j, w_i) = (v_i + v_j, w_i)

but if both components are different, the result is only a formal sum, and doesn't simplify:

(vi,wi)+(vj,wj)(vi+vj,wi+wj).(v_i, w_i) + (v_j, w_j) \ne (v_i+v_j, w_i+w_j).

We call this limited addition bilinearity. This is in contrast to the direct sum of VV and WW, in which we would allow componentwise addition. Note that a tensor product of sums behaves very similarly to an ordinary product of sums, in that it expands out in a FOIL-type operation:

(vi+vj,wi+wj)=(vi,wi+wj)+(vj,wi+wj)=(vi,wi)+(vi,wj)+(vj,wi)+(vj,wj).\begin{align} (v_i+v_j, w_i+w_j) &= (v_i, w_i + w_j) + (v_j, w_i + w_j)\\ &= (v_i, w_i) + (v_i, w_j) + (v_j, w_i) + (v_j, w_j). \end{align}

We can also interchange scalar multipliers between the two sides (switching here to use \otimes to indicate the tensor product):

(v,λw)=(λv,w)=λ(v,w)(v, \lambda w) = (\lambda v, w) = \lambda (v, w)

We can view the tensor product operator \otimes as a bilinearMultilinear in the general case. map from the direct product space into the formal space of tensor products:

:V×WVW.\otimes: V \times W \to V \otimes W.

Tensors that fall in the range of \otimesi.e., those that can be written as vwv \otimes w are called pure tensors. Note that the sum of pure tensors is not in general a pure tensor, e.g., we cannot in general simplify a sum viwi+vjwjv_i \otimes w_i + v_j \otimes w_j to (vi+vj)(wi+wj)(v_i + v_j) \otimes (w_i + w_j) or any other pure tensor.

Multilinear maps

Tensors are the 'gatekeepers' of multilinear maps. Formally, they satisfy the universal property: any bilinear map g:V×WRg: V\times W \to \mathbb{R} can be written as a composition that 'goes through' the tensor product space. That is, for any such gg, there is a unique g^:VWR\hat{g}: V\otimes W \to \mathbb{R} such that

g(v,w)=g^(vw).g(\mathbf{v}, \mathbf{w}) = \hat{g}(\mathbf{v} \otimes \mathbf{w}).

for all vectors vV\mathbf{v}\in V and wW\mathbf{w}\in W. This makes tensors something analogous to a sufficient statistic for multilinear maps.

Isomorphism with linear transformations

(see also tensor for an independently written version of this argument)

Consider the tensor product VVV^* \otimes V of a space with its dual space, i.e. the space of linear functionals on VV. We will think of elements of VV as column vectors, and elements of VV^* as row vectors, so we can represent the application of a functional fVf^* \in V^* to a vector vVv \in V by fvf^*v. Note that under this notation we have vT=vv^T = v^*.

Call an element of VVV^* \otimes V a pure tensor if it can be written as (f,v)(f^*, v), i.e., as a single formal pair of functional and vector, as opposed to a sum of several such pairs (recall from above that a formal sum will not necessarily simplify to a pure element). Clearly we can write any element of the tensor space as a sum of pure elements. An obvious thing to do with a pure tensor (f,v)(f^*, v) is to apply ff^* to vv, yielding a real number. Generalized to an arbitrary tensor in the obvious way, by taking the sum of the results from the pure elements, this is called the evaluation map or trace map. We will see later why this is. For now, just think of the trace map as having the flavor of an "inner product", since it is literally the dot product of the vectors f,v\langle f^*, v\rangle.

For the moment, we're going to move in the other direction and think instead about outer products. Recall that the outer product of two nn-dimensional column vectors viv_i, vjv_j is defined as the n×nn \times n matrix given by vivjTv_i v_j^T. One way to view the outer product is as giving a linear map from pairs of vectors in VV to linear transformations on VV. Let's explore this using the machinery of tensor products. In particular, we'll show that the tensor space VVV \otimes V^* is isomorphic to the space of linear transformations on VV, also known as the endomorphisms of VV, written End(V)End(V).

To see this, we'll start by defining a linear operator AA corresponding to the pure tensor (v,f)(v, f^*). Formally this map from tensors to linear operators will be called the coevaluation map. For any vector wVw \in V, let AwAw = v(fw)v (f^* w). That is, we apply ff^* to ww, yielding a scalar, and then return the vector vv scaled by that quantity. Note that this is a rank-one map, since it only returns vectors in the one-dimensional subspace spanned by vv. It should be obvious that this is a linear operator.

Next, let's generalize AA to arbitrary tensors. For a tensor represented as a sum of pure tensors, we define the resulting linear operator to be the sum of the linear operators generated by the pure tensors. In particular, let (e1,,en)(e_1, \ldots, e_n) be a basis for VV (with (eiT)(e_i^T) a corresponding basis for VV^*); for intuition, we'll imagine V=RnV = \mathbb{R}^n with the coordinate basis ei=[010]Te_i = [0 \ldots 1 \ldots 0]^T. Then we can represent a tensor tt as the sum of pure tensors given by basis elements, i.e.,

t=ijaij(ei,ej).t = \sum_{ij} a_{ij}(e_i, e_j^*).

This follows because a tensor is a formal sum of pure tensors, and each pure tensor can be decomposed into basis elements. Now consider the linear operator defined by tt under our construction above. For any vector wVw \in V, the ijijth component of the linear operator picks out the jj coordinate of ww's representation, and returns eie_i scaled by that coordinate, multiplied by aija_{ij}. This is equivalent to multiplication of ww by the matrix containing all zeros except for its ijijth entry, which contains aija_{ij}. Thus we see that the linear operator defined by tt is just the matrix AA with entries aija_{ij}. This means we can generate any matrix AA just by constructing the tensor with the appropriate coefficients. Thus we have a bijection between tensors and linear operators: for any tensor tt we can generate a matrix AA, and for any matrix AA we can recover the corresponding tensor tt. It is not hard to see that this bijection is actually an isomorphism. This establishes the general theorem that VVEnd(V)V \otimes V^* \cong End(V).

Now we can motivate the name of the trace map as given above. Consider writing a matrix AA in its tensor form, as a linear combination of pure basis tensors. Now apply the trace map to this tensor. All elements of the form (ei,ej)(e_i, e_j^*) for iji \ne j disappear, and we're left with just the "diagonal" elements (ei,ei)(e_i, e_i^*), which evaluate to 1. So the trace map returns the sum of their coefficients, iaii\sum_i a_{ii}, which is exactly the trace of AA.

The big, overarching point of all of this machinery is to allow us to define operations on vector spaces (e.g. outer products) without needing to choose a basis.

Some other quick facts:

  • In general, given two vector spaces VV and WW of dimensions nn and mm respectively, the tensor product space VWV \otimes W has dimension nmnm. This is a consequence of the isomorphism with linear operators VWV\to W, and thus with n×mn \times m matrices.
  • Weirdly, there are lots of zeros in a tensor product space. In particular, (vi,0)=0(v_i, 0) = 0 for any viv_i, since by bilinearity we have
(vi,0)=(vi,0+0)=(vi,0)+(vi,0).(v_i, 0) = (v_i, 0+0) = (v_i, 0) + (v_i, 0).

The general technique to show an element is not zero is to construct a linear transformation from the tensor space to some nicer space, say R\mathbb{R}. For example, we can show (vi,vi)0(v_i, v_i^*) \ne 0 by applying the trace map, which is a linear map taking (vi,vi)1(v_i, v_i^*) \to 1.

  • The outer product / coevaluation map has a "lambda calculus" sort of flavor, in that it has a "bare functional" hanging off of its backend waiting to be evaluated. In general we can do a kind of "currying'': for spaces A,B,CA, B, C,
  • A matrix can be represented as the sum of outer products of eigenvectors, weighted by its eigenvalues:
    A=iλiviviT.A = \sum_{i} \lambda_i v_i v_i^T.
    This doesn't need any of the machinery developed above to prove. It's easy to see that the linear transformation given by the right side agrees with AA on any eigenvector viv_i (i.e., it returns λivi\lambda_i v_i). Since we have two linear operator that agree in their actions on a full set of basis elements, they must be the same operator.