tensor: Nonlinear Function
Created: March 10, 2022
Modified: July 18, 2022

tensor

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Every in machine learning talks about tensors, but no one really understands what they are. This page collects several definitions and attempts to shine light on the underlying connections.

Multidimensional array (the 'TensorFlow' definition): A tensor is an array of values indexed by some integer number of integer coordinates. For example, an array indexed by one coordinate is a vector, an array indexed by two coordinates is a matrix, and these are both special cases of 'tensor's.

Multilinear map: A tensor is a map from V×W×RV \times W \times \ldots \to \mathbb{R} --- where VV, WW, etc. are vector spaces --- that is linear in each of its arguments.

This definition is closely related to the previous, because any multidimensional array defines a multilinear map through the process of tensor 'contraction'. For example, a matrix MM defines the map

fM(v,w)=vTMw=i,jMijviwj,f_M(\mathbf{v}, \mathbf{w}) = \mathbf{v}^TM\mathbf{w} = \sum_{i,j} M_{ij}v_i w_j,

a three-dimensional array (A)ijk(A)_{ijk} defines the map

fA(v,w,x)=i,j,kAijkviwjxk,f_A(\mathbf{v}, \mathbf{w}, \mathbf{x}) = \sum_{i,j,k} A_{ijk} v_i w_j x_k,

and so on. Conversely, any multilinear map can be represented as a multidimensional array for a particular choice of basis.

Elements of a tensor product space: given two vector spaces VV and WW, their tensor product VWV\otimes W is the space of formal pairs of vectors vwv \otimes w and their sums, up to bilinearity (multilinearity in the general case where we have >2>2 vector spaces). A tensor is then just an element of the tensor product space: a formal pair of vectors, or the sum of several such formal pairs.

How does relate to the previous viewpoints? By singular value decomposition we can always write a matrix MM as a sum of rank-1 matrices, i.e., as a sum of outer products:

M=i=1nσiuivjTM = \sum_{i=1}^n \sigma_i\mathbf{u}_i\mathbf{v}_j^T

It can be shown (as an exercise for the reader) that the tensor product space VVV \otimes V^*Here VV^* denotes the dual space, i.e., the space of linear functionals on VV. is isomorphic to the space of outer products and their sums, since both are defined by taking pairs of vectors (u,vT)(\mathbf{u}, \mathbf{v}^T) and their sums up to multilinearity. Thus, the space of bilinear maps (matrices) on VV is isomorphic to the tensor product space VVV \otimes V^*.

A short aside on the notion of "rank": the number of nonzero singular values σi\sigma_i in the above sum is the matrix rank of MM. The rank of a tensor in general is the minimum number of pure tensorsA pure tensor is a single formal tuple of vectors, so uvu \otimes v is a pure tensor but uv+wxu \otimes v + w \otimes x is (in general) not. In the matrix example, we would equate a 'pure' tensor with a single outer product, or equivalently a rank-1 matrix. that must be summed to express it. This generalizes the notion of matrix rank.A notable distinction is that matrix rank is efficiently computable, but finding the rank of a higher-order tensor is NP-complete.

higher-order tensors:

  • suppose we have triples VVVV\otimes V^*\otimes V. what can we say about these?
  • well, the basis set has size n3n^3, so any element (tensor) in this space is a three-dimensional array
  • we can define a multilinear map f:V×V×VRf: V^* \times V \times V^* \to \mathbb{R} by tensor contraction: given a pure tensor uvTw\mathbf{u} \otimes \mathbf{v}^T \otimes \mathbf{w} and an input tuple (aT,b,cT)(\mathbf{a}^T, \mathbf{b}, \mathbf{c}^T), the result is just f(aT,b,cT)=aTuvTbTcTwf(\mathbf{a}^T, \mathbf{b}, \mathbf{c}^T) = \mathbf{a}^T\mathbf{u} \cdot \mathbf{v}^T \mathbf{b}^T\cdot \mathbf{c}^T \mathbf{w},That is, we just apply all of the functionals to their corresponding vectors and multiply the results where the extension to general tensors follows by linearity.

References: