Matrix Factorization based algorithms

Basic matrix factorization based algorithms were developed for solving the motion segmentation problem in [BB91][Gea98][CK98][Kan01]. These algorithms are primarily algebraic in nature. See here for the motivation from motion segmentation problem.

The following derivation is applicable if the subspaces are linear and independent.

We start with the equation:

\[Y^* = Y \Gamma = \begin{bmatrix} y_1 & \dots & y_S \end{bmatrix} \Gamma = \begin{bmatrix} Y_1 & \dots & Y_K \end{bmatrix}.\]

Under the independence assumption, we have

\[\Rank (Y) = \Rank(Y^*) = \sum_{k=1}^K \Rank(Y_k).\]

Note that each \(Y_k \in \RR^{M \times S_k}\) can be factorized via SVD as

\[Y_k = U_k \Sigma_k V_k^T\]

where \(U_k \in \RR^{M \times D_k}\), \(\Sigma_k = \text{diag}(\sigma_{p 1}, \dots, \sigma_{p D_k}) \in \RR^{D_k \times D_k}\) and \(V_k \in \RR^{S_k \times D_k}\). Columns of \(U_k\) form an orthonormal basis for the subspace \(\UUU_k\). Columns of \(\Sigma_k V_k^T\) give the coordinates of points in \(Y_k\) in the orthonormal basis \(U_k\). Singular values are non-zero since \(Y_k\) spans \(\UUU_k\). Alternatively, \(D_k\) can be obtained by counting the non-zero singular values in the SVD of Y. Denoting:

\[\begin{split}\hat{U} = \begin{bmatrix} U_1 & \dots & U_K \end{bmatrix}\\ \hat{\Sigma} = \begin{bmatrix} \Sigma_1 & \dots & 0 \\ \vdots & \ddots & \vdots\\ 0 & \dots & \Sigma_K \end{bmatrix}\\ \hat{V} = \begin{bmatrix} V_1 & \dots & 0 \\ \vdots & \ddots & \vdots\\ 0 & \dots & V_K \end{bmatrix},\end{split}\]

we can write

\[Y^* = \hat{U} \hat{\Sigma} \hat{V}^T.\]

This is a valid SVD of \(Y^*\) if the subspaces \(\UUU_k\) are independent. This differs from the standard SVD of \(Y^*\) only in the permutation of singular values in \(\Sigma\) as the standard SVD of \(Y^*\) will require them to be ordered in decreasing order. Nevertheless,

\[Y = Y^* \Gamma^{-1} = Y^* \Gamma^T = \hat{U} \hat{\Sigma} \hat{V}^T \Gamma^T = \hat{U} \hat{\Sigma} (\Gamma \hat{V})^T.\]

It is clear that both \(Y\) and \(Y^*\) share the same singular values. Let the SVD of \(Y\) be \(Y = U \Sigma V^T\). Let \(\Sigma = \hat{\Sigma}\hat{\Gamma}\) where \(\hat{\Gamma}\) permutes the singular values in \(\hat{\Sigma}\) in decreasing order. Then \(\hat{\Sigma} = \Sigma \hat{\Gamma}^T\) and

\[Y = \hat{U} \hat{\Sigma} (\Gamma \hat{V})^T = \hat{U} \Sigma \hat{\Gamma}^T (\Gamma \hat{V})^T = \hat{U} \Sigma (\Gamma \hat{V} \hat{\Gamma})^T.\]

Matching terms, we see that \(U = \hat{U}\) and \(V = \Gamma \hat{V} \hat{\Gamma}\). Thus \(\hat{V}\) is obtained by permuting the rows and columns of \(V\) where \(\Gamma\) and \(\hat{\Gamma}\) are unknown permutations.

Let \(W = VV^T\) and \(\hat{W} = \hat{V} \hat{V}^T\). Then

\[W = VV^T = \Gamma \hat{V} \hat{\Gamma} \hat{\Gamma}^T \hat{V}^T \Gamma^T = \Gamma \hat{V} \hat{V}^T \Gamma^T = \Gamma \hat{W} \Gamma^T.\]

Alternatively

\[\hat{W} = \Gamma^T W \Gamma.\]

Thus, \(\hat{W}\) can be obtained by identical row and column permutations of \(W\) given by \(\Gamma\).

The matrix \(W\) is very useful. But first let’s check out \(\hat{W}\). Note that \(\hat{V}\) can be considered as a \(P \times P\) block matrix with diagonal matrix elements. Thus

\[\begin{split}\hat{V} \hat{V}^T = \begin{bmatrix} V_1 & \dots & 0 \\ \vdots & \ddots & \vdots\\ 0 & \dots & V_K \end{bmatrix} \begin{bmatrix} V_1^T & \dots & 0 \\ \vdots & \ddots & \vdots\\ 0 & \dots & V_K^T \end{bmatrix}.\end{split}\]

Simplifying, we obtain

\[\begin{split}\hat{W} = \begin{bmatrix} V_1 V_1^T & \dots & 0 \\ \vdots & \ddots & \vdots\\ 0 & \dots & V_K V_K^T \end{bmatrix}.\end{split}\]

\(V_k V_k^T\) is a \(S_k \times S_k\) non-zero matrix. \(\hat{W}\) is an \(S \times S\) matrix. Clearly, \(\hat{W}_{i j} = 0\) if \(i\)-th and \(j\)-th columns in \(Y^*\) belong to the different subspaces. Since \(W\) is obtained by permuting the rows and columns of \(\hat{W}\) by \(\Gamma\), hence \(W_{ij} = 0\) if \(i\)-th and \(j\)-th columns in the unsorted data matrix \(Y\) come from different subspaces. A simple algorithm for data segmentation is thus obtained which puts the \(i\)-th and \(j\)-th columns in \(Y\) in same cluster if the corresponding entry \(W_{ij}\) is non-zero.