Matrix Factorizations

Singular Value Decomposition

A non-negative real value \(\sigma\) is a singular value for a matrix \(A \in \RR^{m \times n}\) if and only if there exist unit length vectors \(u \in \RR^m\) and \(v \in \RR^n\) such that \(A v = \sigma u\) and \(A^T u = \sigma v\). The vectors u and v are called left singular and right singular vectors for \(\sigma\) respectively. For every \(A \in \RR^{m \times n}\) with \(k = \min(m, n)\), there exist two orthogonal matrices \(U \in \RR^{m \times m}\) and \(V \in \RR^{n \times n}\) and a sequence of real numbers \(\sigma_1 \geq \dots \geq \sigma_k \geq 0\) such that \(U^T A V = \Sigma\) where \(\Sigma = \text{diag}(\sigma_1, \dots, \sigma_k, 0, \dots, 0) \in \RR^{m \times n}\) (Extra columns or rows are filled with zeros). The decomposition of \(A\) given by \(A = U \Sigma V^T\) is called the singular value decomposition of \(A\). The first \(k\) columns of \(U\) and \(V\) are the left and right singular vectors of \(A\) corresponding to the singular values \(\sigma_1, \dots, \sigma_k\). The rank of \(A\) is equal to the number of non-zero singular values which equals the rank of \(\Sigma\). The eigen values of positive semi-definite matrices \(A^T A\) and \(A A^T\) are given by \(\sigma_1^2, \dots, \sigma_k^2\) (remaining eigen values being 0). Specifically, \(A^T A = V \Sigma^T \Sigma V^T\) and \(A A^T = U \Sigma \Sigma^T U^T\). We can rewrite \(A = \sum_{i=1}^k \sigma_i u_i v_i^T\). \(\sigma_1 u_1 v_1^T\) is rank-1 approximation of \(A\) in Frobenius norm sense. The spectral radius and \(2\)-norm of \(A\) is given by its largest singular value \(\sigma_1\). The Moore-Penrose pseudo-inverse of \(\Sigma\) is easily obtained by taking the transpose of \(\Sigma\) and inverting the non-zero singular values. Further, \(A^{\dag} = V \Sigma^{\dag} U^T\). The non-zero singular values of \(A^{\dag}\) are just reciprocals of the non-zero singular values of \(A\). Geometrically, singular values of \(A\) are the precisely the lengths of the semi-axes of the hyper-ellipsoid \(E\) defined by \(E = \{ A x | \| x \|_2 = 1 \}\) (i.e. image of the unit sphere under \(A\)). Thus, if \(A\) is a data matrix, then the SVD of \(A\) is strongly connected with the principal component analysis of \(A\).