Tensor methods
Prev: low-rank-matrix-approximations Next: graphs-as-matrices-and-the-laplacian-of-a-graph
Introduction to Tensors
A tensor is like a matrix, but with more dimensions.
Definition 1.1: An tensor is a set of numbers, which is in a -dimensional hypercube. With this -tensor, , a specific element can be referred to via .
A 2-tensor is simply a matrix with referring to the ,th entry. An tensor is a stack of matrices, where each matrix has size . The entry of a 3-tensor will refer to the ,th entry of the th matrix in the stack.
Remark 1.2: Tensors are useful in physics, and viewed as geometric objects.
Examples of Tensors
Example 1.3 (k-grams): Given a body of text, and some ordering of the set of words, like alphabetical ordering, a -tensor is defined by setting entry equal to the number of times the sequence of words occurs in the text. With distinct words, we can extract an 3-tensor where represents the number of times in the corpus that the th, th, and th words occur in sequence.
Example 1.4 (The Moment Tensor): Suppose we have some data, representing independent draws from a high-dimensional distribution, , so . The mean of the data is a vector of length . The covariance matrix of this data is a matrix, where the th entry is an empirical estimate of , where denotes the th coordinate of a sample from the distribution. The 3-tensor representing the third order moments, has entries representing the estimate of . This is generalizable to dimensions.
The Rank of a Tensor
The rank of a tensor is similar to the rank of a matrix. A matrix has rank if it can be written as where has columns and has columns. Letting and denote these columns, we have:
Each term is the outer product of two vectors. is represented as a sum of rank one matrices, where the th matrix has entries , the product of the th entry of the vector and the th entry of vector .
A tensor has rank 1 if it can be expressed as the outer-product of a set of vectors, and is rank if it can be written as a sum of rank 1 tensors.
Definition 1.5 With vectors , , of lengths , , and , their outer product is the rank one 3-tensor denoted with entries
Or for higher dimensional tensors:
Definition 1.6 Given vectors of lengths their outer product is denoted as , and represents the -tensor with entry .
Example 1.7 Given the following:
is a 3 x 2 x 2 3-tensor, which can be decomposed into two 3 x 2 matrices.
Definition 1.8 A 3-tensor A has rank if there exists 3 sets of vectors , , and where
-tensors are the same.
Differences between Matrices and Tensors
Most operations on matrices do not apply to -tensors for . Some differences:
- For matrices, the best rank- approximation can be found by iteratively finding the best rank-1 approximation, and then substractin git off. Thus, for a matrix , the best rank 1 approximation of is the same as the best rank 1 approximation of the matrix defined as the best rank 2 approximation of . Thus, if is the best rank 1 approximation of , then . For -tensors with , this is not the case.
- There is no way no describe an explicit construction of tensors whose rank is greater than for all .
- Computing the rank of matrices is easy, but the rank of 3-tensors is NP-hard.
- Despite being NP-hard, if the rank of a3-tensor is small, then:
- It can be efficiently computed.
- Its low-rank representation is unique.
- It can be efficiently recovered.
Low-Rank Tensors
A low-rank representation of matrix is not unique. One pioneer of low-rank approximation matrices was Spearman. He gave a number of academic tests to a number of students, and formed a matrix in which entry represented the performance of the th student on the th test. He figured that was close to a rank 2 matrix, and gave the following explanation. Assume the th student has two numbers, and denoating mathematical and verbal ability. Suppose that each test, can be reduced to two numbers, and , both denoting the mathematical and verbal components. If this was correct, then , and so would be close to the rank 2 matrix , where the two columns represent the two vectors that represent math/verbal abilities, and the two columns of V represent the tests’ math/verbal components. Unfortunately, this is not unique, so even if this was true, the rank 2 representation would not correspond to this model.
Theorem 3.1 Given a 3-tensor of rank there exists three sets of linearly independent vectors, (u_1,\dots,u_k),(v_1,\dots,v_k),(w_1,\dots,w_k) where:
And this rank decomposition is unique, (up to scaling the vectors by a constant) and thus the factors can be efficiently recovered.
Take spearman’s experiment with an extra dimension, where students were in three different settings, (classical music playing, distracting music playing, control). Then, let denote the corresponding 3-tensor, with being the performance of student on test in setting .
Suppose the true model of the world is the following: for every student there are two numbers representing their math/verbal ability, and every test can have some mix of a math and verbal component. As well, each setting scales the math and verbal performance of each student. Thus, can be approximated by multiplying the math ability of the student with the math component of the test and the math boost-factor of the setting, with the same done for the verbal components.
Assuming that the vector of student math abilities are not identical to the vector of student verbal abilities, and the 2 vectors of math/verbal test components are not identical up to rescaling, and the 2 vectors of math/verbal setting boosts are not identical up to rescaling, this is the unique factorization of the tensor, and we can recover the exact factors.
Quick Discussion
As long as the factors that are used for a tensor are linearly independent, then the factors can be recovered uniqely. Thus, they can be used to enable useful features to be extracted from data in an unsupervised setting.
The Algorithm
The algorithm in 3.1 is often called Jenrich’s Algorithm.
Algorithm 1: Tensor Decomposition
Given an tensor with and and being lienarly independent, the following algorithm will output the lists of ‘s, ‘s, and ‘s.
- Choose random unit vectors,
- Define the matrices where is defined as follows: consider as a stack of matrices, where is the weighted sum of these matrices, where the weight for the th matrix is . is analogously defined, with the th matrix being scaled by .
- Compute the eigen-decompositions of , and .
- With probability 1, the entries of the diagonal matrix is unique, and will be inverses of the entries of diagonal matrix . The vectors will be the columns of , where is the reciprocal of the eigenvalue of .
- Given the ‘s and ‘s, we can solve a linear system to find the ‘s.
Prev: low-rank-matrix-approximations Next: graphs-as-matrices-and-the-laplacian-of-a-graph