Singular values¶
In previous section we saw diagonalization of square matrices which resulted in an eigen value decomposition of the matrix. This matrix factorization is very useful yet it is not applicable in all situations. In particular, the eigen value decomposition is useless if the square matrix is not diagonalizable or if the matrix is not square at all. Moreover, the decomposition is particularly useful only for real symmetric or Hermitian matrices where the diagonalizing matrix is an \(\FF\)-unitary matrix (see here). Otherwise, one has to consider the inverse of the diagonalizing matrix also.
Fortunately there happens to be another decomposition which applies to all matrices and it involves just \(\FF\)-unitary matrices.
A non-negative real number \(\sigma\) is a singular value for a matrix \(A \in \FF^{m \times n}\) if and only if there exist unit-length vectors \(u \in \FF^m\) and \(v \in \FF^n\) such that
and
hold. The vectors \(u\) and \(v\) are called left-singular and right-singular vectors for \(\sigma\) respectively.
We first present the basic result of singular value decomposition. We will not prove this result completely although we will present proofs of some aspects.
For every \(A \in \FF^{m \times n}\) with \(k = \min(m , n)\), there exist two \(\FF\)-unitary matrices \(U \in \FF^{m \times m}\) and \(V \in \FF^{n \times n}\) and a sequence of real numbers
such that
where
The non-negative real numbers \(\sigma_i\) are the singular values of \(A\) as per here.
The sequence of real numbers \(\sigma_i\) doesn’t depend on the particular choice of \(U\) and \(V\).
\(\Sigma\) is rectangular with the same size as \(A\). The singular values of \(A\) lie on the principle diagonal of \(\Sigma\). All other entries in \(\Sigma\) are zero.
It is certainly possible that some of the singular values are 0 themselves.
Since \(U^H A V = \Sigma\) hence
The decomposition of a matrix \(A \in \FF^{m \times n}\) given by
is known as its singular value decomposition.
When \(\FF\) is \(\RR\) then the decomposition simplifies to
and
We can also write
Let us expand
Alternatively, let us expand
This gives us
Following lemma verifies that \(\Sigma\) indeed consists of singular values of \(A\) as per here.
We have
Let us expand R.H.S.
where \(0\) columns in the end appear \(n - k\) times.
Expanding the L.H.S. we get
Thus by comparing both sides we get
and
Now let us start with
Let us expand R.H.S.
where \(0\) columns appear \(m - k\) times.
Expanding the L.H.S. we get
Thus by comparing both sides we get
and
We now consider the three cases.
For \(m = n\), we have \(k = m =n\). And we get
Thus \(\sigma_i\) is a singular value of \(A\) and \(u_i\) is a left singular vector while \(v_i\) is a right singular vector.
For \(m < n\), we have \(k = m\). We get for first \(m\) vectors in \(V\)
Finally for remaining \(n-m\) vectors in \(V\), we can write
They belong to the null space of \(A\).
For \(m > n\), we have \(k = n\). We get for first \(n\) vectors in \(U\)
Finally for remaining \(m - n\) vectors in \(U\), we can write
\(\Sigma \Sigma^H\) is an \(m \times m\) matrix given by
where the number of \(0\)‘s following \(\sigma_k^{2}\) is \(m - k\).
\(\Sigma^H \Sigma\) is an \(n \times n\) matrix given by
where the number of \(0\)‘s following \(\sigma_k^{2}\) is \(n - k\).
Let \(A \in \FF^{m \times n}\) have a singular value decomposition given by
Then
In other words, rank of \(A\) is number of non-zero singular values of \(A\). Since the singular values are ordered in descending order in \(A\) hence, the first \(r\) singular values \(\sigma_1, \dots, \sigma_r\) are non-zero.
Let \(r = \Rank(A)\). Then \(\Sigma\) can be split as a block matrix
where \(\Sigma_r\) is an \(r \times r\) diagonal matrix of the non-zero singular values \(\Diag(\sigma_1, \sigma_2, \dots, \sigma_r)\). All other sub-matrices in \(\Sigma\) are 0.
We note that \(A^H A\) is Hermitian. Hence \(A^HA\) is diagonalized by \(V\) and the diagonalization of \(A^H A\) is \(\Sigma^H \Sigma\). Thus the eigen values of \(A^H A\) are \(\sigma_1^2, \sigma_2^2, \dots \sigma_k^{2}, 0, 0,\dots 0\) with \(n - k\) \(0\)‘s after \(\sigma_k^{2}\).
Clearly
thus columns of \(V\) are the eigen vectors of \(A^H A\).
We note that \(A^H A\) is Hermitian. Hence \(A^HA\) is diagonalized by \(V\) and the diagonalization of \(A^H A\) is \(\Sigma^H \Sigma\). Thus the eigen values of \(A^H A\) are \(\sigma_1^2, \sigma_2^2, \dots \sigma_k^{2}, 0, 0,\dots 0\) with \(m - k\) \(0\)‘s after \(\sigma_k^{2}\).
Clearly
thus columns of \(U\) are the eigen vectors of \(A A^H\).
The largest singular value¶
For all \(u \in \FF^n\) the following holds
Moreover for all \(u \in \FF^m\) the following holds
Let us expand the term \(\Sigma u\).
Now since \(\sigma_1\) is the largest singular value, hence
Thus
or
The result follows.
A simpler representation of \(\Sigma u\) can be given using here. Let \(r = \Rank(A)\). Thus
We split entries in \(u\) as \(u = [(u_1, \dots, u_r )( u_{r + 1} \dots u_n)]^T\). Then
Thus
2nd result can also be proven similarly.
Let \(\sigma_1\) be the largest singular value of an \(m \times n\) matrix \(A\). Then
Moreover
since \(U\) is unitary. Now from previous lemma we have
since \(V^H\) also unitary. Thus we get the result
Similarly
since \(V\) is unitary. Now from previous lemma we have
since \(U^H\) also unitary. Thus we get the result
There is a direct connection between the largest singular value and \(2\)-norm of a matrix (see here).
The largest singular value of \(A\) is nothing but its \(2\)-norm. i.e.
SVD and pseudo inverse¶
Let \(A = U \Sigma V^H\) and let \(r = \Rank (A)\). Let \(\sigma_1, \dots, \sigma_r\) be the \(r\) non-zero singular values of \(A\). Then the Moore-Penrose pseudo-inverse of \(\Sigma\) is an \(n \times m\) matrix \(\Sigma^{\dag}\) given by
where \(\Sigma_r = \Diag(\sigma_1, \dots, \sigma_r)\).
Essentially \(\Sigma^{\dag}\) is obtained by transposing \(\Sigma\) and inverting all its non-zero (positive real) values.
The rank of \(\Sigma\) and its pseudo-inverse \(\Sigma^{\dag}\) are same. i.e.
Let \(A\) be an \(m \times n\) matrix and let \(A = U \Sigma V^H\) be its singular value decomposition. Let \(\Sigma^{\dag}\) be the pseudo inverse of \(\Sigma\) as per here. Then the Moore-Penrose pseudo-inverse of \(A\) is given by
As usual we verify the requirements for a Moore-Penrose pseudo-inverse as per here. We note that since \(\Sigma^{\dag}\) is the pseudo-inverse of \(\Sigma\) it already satisfies necessary criteria.
First requirement:
Second requirement:
We now consider
Thus
since \(\Sigma \Sigma^{\dag}\) is Hermitian.
Finally we consider
Thus
since \(\Sigma^{\dag} \Sigma\) is also Hermitian.
This completes the proof.
Finally we can connect the singular values of \(A\) with the singular values of its pseudo-inverse.
The rank of any \(m \times n\) matrix \(A\) and its pseudo-inverse \(A^{\dag}\) are same. i.e.
Let \(A\) be an \(m \times n\) matrix and let \(A^{\dag}\) be its \(n \times m\) pseudo inverse as per here. Let \(r = \Rank(A)\) Let \(k = \min(m, n)\) denote the number of singular values while \(r\) denote the number of non-singular values of \(A\). Let \(\sigma_1, \dots, \sigma_r\) be the non-zero singular values of \(A\). Then the number of singular values of \(A^{\dag}\) is same as that of \(A\) and the non-zero singular values of \(A^{\dag}\) are
while all other \(k - r\) singular values of \(A^{\dag}\) are zero.
\(k= \min(m, n)\) denotes the number of singular values for both \(A\) and \(A^{\dag}\). Since rank of \(A\) and \(A^{\dag}\) are same, hence the number of non-zero singular values is same. Now look at
where
Clearly \(\Sigma_r^{-1} = \Diag(\frac{1}{\sigma_1} , \dots, \frac{1}{\sigma_r})\).
Thus expanding the R.H.S. we can get
where \(v_i\) and \(u_i\) are first \(r\) columns of \(V\) and \(U\) respectively. If we reverse the order of first \(r\) columns of \(U\) and \(V\) and reverse the first \(r\) diagonal entries of \(\Sigma^{\dag}\) , the R.H.S. remains the same while we are able to express \(A^{\dag}\) in the standard singular value decomposition form. Thus \(\frac{1}{\sigma_1} , \dots, \frac{1}{\sigma_r}\) are indeed the non-zero singular values of \(A^{\dag}\).
Full column rank matrices¶
In this subsection we consider some specific results related to singular value decomposition of a full column rank matrix.
We will consider \(A\) to be an \(m \times n\) matrix in \(\FF^{m \times n}\) with \(m \geq n\) and \(\Rank(A) = n\). Let \(A = U \Sigma V^H\) be its singular value decomposition. From here we observe that there are \(n\) non-zero singular values of \(A\). We will call these singular values as \(\sigma_1, \sigma_2, \dots, \sigma_n\). We will define
Clearly \(\Sigma\) is an \(2\times 1\) block matrix given by
where the lower \(0\) is an \((m - n) \times n\) zero matrix. From here we obtain that \(\Sigma^H \Sigma\) is an \(n \times n\) matrix given by
where
Since all singular values are non-zero hence \(\Sigma_n^2\) is invertible. Thus
Let \(A\) be a full column rank matrix with singular value decomposition \(A = U \Sigma V^H\). Let \(\sigma_1\) be its largest singular value and \(\sigma_n\) be its smallest singular value. Then
Let \(x \in \FF^n\). We have
Now since
hence
thus
Applying square roots, we get
We recall from here that the Gram matrix of its column vectors \(G = A^HA\) is full rank and invertible.
Let \(A\) be a full column rank matrix with singular value decomposition \(A = U \Sigma V^H\). Let \(\sigma_1\) be its largest singular value and \(\sigma_n\) be its smallest singular value. Then
Let \(x \in \FF^n\). Let
Let
Then from previous lemma we have
Finally
Thus
Substituting we get
There are bounds for the inverse of Gram matrix also. First let us establish the inverse of Gram matrix.
Let \(A\) be a full column rank matrix with singular value decomposition \(A = U \Sigma V^H\). Let the singular values of \(A\) be \(\sigma_1, \dots, \sigma_n\). Let the Gram matrix of columns of \(A\) be \(G = A^H A\). Then
where
We have
Thus
From here we have
This completes the proof.
We can now state the bounds:
Let \(A\) be a full column rank matrix with singular value decomposition \(A = U \Sigma V^H\). Let \(\sigma_1\) be its largest singular value and \(\sigma_n\) be its smallest singular value. Then
From here we have
where
Let \(x \in \FF^n\). Let
Let
Then
Thus
Finally
Thus
Substituting we get the result.
Low rank approximation of a matrix¶
An \(m \times n\) matrix \(A\) is called low rank if
Following is a simple procedure for making a low rank approximation of a given matrix \(A\).
- Perform the singular value decomposition of \(A\) given by \(A = U \Sigma V^H\).
- Identify the singular values of \(A\) in \(\Sigma\).
- Keep the first \(r\) singular values (where \(r \ll \min(m, n)\) is the rank of the approximation) and set all other singular values to 0 to obtain \(\widehat{\Sigma}\).
- Compute \(\widehat{A} = U \widehat{\Sigma} V^H\).