Matrix norms¶
This section reviews various matrix norms on the vector space of complex matrices over the field of complex numbers \((\CC^{m \times n}, \CC)\).
We know \((\CC^{m \times n}, \CC)\) is a finite dimensional vector space with dimension \(m n\). We will usually refer to it as \(\CC^{m \times n}\).
Matrix norms will follow the usual definition of norms for a vector space.
A function \(\| \cdot \| : \CC^{m \times n} \to \RR\) is called a matrix norm on \(\CC^{m \times n}\) if for all \(A, B \in \CC^{m \times n}\) and all \(\alpha \in \CC\) it satisfies the following
[Positivity]
\[\| A \| \geq 0\]with \(\| A \| = 0 \iff A = 0\).
[Homogeneity]
\[\| \alpha A \| = | \alpha | \| A \|.\][Triangle inequality]
\[\| A + B \| \leq \| A \| + \| B \|.\]
We recall some of the standard results on normed vector spaces.
All matrix norms are equivalent. Let \(\| \cdot \|\) and \(\| \cdot \|'\) be two different matrix norms on \(\CC^{m \times n}\). Then there exist two constants \(a\) and \(b\) such that the following holds
A matrix norm is a continuous function \(\| \cdot \| : \CC^{m \times n} \to \RR\).
Norms like \(\ell_p\) on complex vector space¶
Following norms are quite like \(\ell_p\) norms on finite dimensional complex vector space \(\CC^n\). They are developed by the fact that the matrix vector space \(\CC^{m\times n}\) has one to one correspondence with the complex vector space \(\CC^{m n}\).
Let \(A \in \CC^{m\times n}\) and \(A = [a_{ij}]\).
Matrix sum norm is defined as
Let \(A \in \CC^{m\times n}\) and \(A = [a_{ij}]\).
Matrix Frobenius norm is defined as
Let \(A \in \CC^{m\times n}\) and \(A = [a_{ij}]\).
Matrix Max norm is defined as
Properties of Frobenius norm¶
We now prove some elementary properties of Frobenius norm.
The Frobenius norm of a matrix is equal to the Frobenius norm of its Hermitian transpose.
Let
Then
Now
Let \(A \in \CC^{m \times n}\) be written as a row of column vectors
Then
We note that
Now
We thus showed that that the square of the Frobenius norm of a matrix is nothing but the sum of squares of \(\ell_2\) norms of its columns.
Let \(A \in \CC^{m \times n}\) be written as a column of row vectors
Then
We note that
Now
We now consider how the Frobenius norm is affected with the action of unitary matrices.
Let \(A\) be any arbitrary matrix in \(\CC^{m \times n}\). Let \(U\) be some unitary matrices in \(\CC^{m \times m}\). Let \(V\) be some unitary matrices in \(\CC^{n \times n}\).
We present our first result that multiplication with unitary matrices doesn’t change Frobenius norm of a matrix.
The Frobenius norm of a matrix is invariant to pre or post multiplication by a unitary matrix. i.e.
and
We can write \(A\) as
So
Then applying here clearly
But we know that unitary matrices are norm preserving. Hence
Thus
which implies
Similarly writing \(A\) as
we have
Then applying here clearly
But we know that unitary matrices are norm preserving. Hence
Thus
which implies
An alternative approach for the 2nd part of the proof using the first part is just one line
In above we use here and the fact that \(V\) is a unitary matrix implies that \(V^H\) is also a unitary matrix. We have already shown that pre multiplication by a unitary matrix preserves Frobenius norm.
Let \(A \in \CC^{m \times n}\) and \(B \in \CC^{n \times P}\) be two matrices. Then the Frobenius norm of their product is less than or equal to the product of Frobenius norms of the matrices themselves. i.e.
We can write \(A\) as
where \(a_i\) are \(m\) column vectors corresponding to rows of \(A\). Similarly we can write B as
where \(b_i\) are column vectors corresponding to columns of \(B\). Then
Now looking carefully
Applying the Cauchy-Schwartz inequality we have
Now
which implies
by taking square roots on both sides.
Let \(A \in \CC^{m \times n}\) and let \(x \in \CC^n\). Then
We note that Frobenius norm for a column matrix is same as \(\ell_2\) norm for corresponding column vector. i.e.
Now applying here we have
It turns out that Frobenius norm is intimately related to the singular value decomposition of a matrix.
Let \(A \in \CC^{m \times n}\). Let the singular value decomposition of \(A\) be given by
Let the singular value of \(A\) be \(\sigma_1, \dots, \sigma_n\). Then
But
since \(U\) and \(V\) are unitary matrices (see here ).
Now the only non-zero terms in \(\Sigma\) are the singular values. Hence
Consistency of a matrix norm¶
A matrix norm \(\| \cdot \|\) is called consistent on \(\CC^{n \times n}\) if
holds true for all \(A, B \in \CC^{n \times n}\). A matrix norm \(\| \cdot \|\) is called consistent if it is defined on \(\CC^{m \times n}\) for all \(m, n \in \Nat\) and eq (1) holds for all matrices \(A, B\) for which the product \(AB\) is defined.
A consistent matrix norm is also known as a sub-multiplicative norm.
With this definition and results in here we can see that Frobenius norm is consistent.
Subordinate matrix norm¶
A matrix operates on vectors from one space to generate vectors in another space. It is interesting to explore the connection between the norm of a matrix and norms of vectors in the domain and co-domain of a matrix.
Let \(m, n \in \Nat\) be given. Let \(\| \cdot \|_{\alpha}\) be some norm on \(\CC^m\) and \(\| \cdot \|_{\beta}\) be some norm on \(\CC^n\). Let \(\| \cdot \|\) be some norm on matrices in \(\CC^{m \times n}\). We say that \(\| \cdot \|\) is subordinate to the vector norms \(\| \cdot \|_{\alpha}\) and \(\| \cdot \|_{\beta}\) if
for all \(A \in \CC^{m \times n}\) and for all \(x \in \CC^n\). In other words the length of the vector doesn’t increase by the operation of \(A\) beyond a factor given by the norm of the matrix itself.
If \(\| \cdot \|_{\alpha}\) and \(\| \cdot \|_{\beta}\) are same then we say that \(\| \cdot \|\) is subordinate to the vector norm \(\| \cdot \|_{\alpha}\).
We have shown earlier in here that Frobenius norm is subordinate to Euclidean norm.
Operator norm¶
We now consider the maximum factor by which a matrix \(A\) can increase the length of a vector.
Let \(m, n \in \Nat\) be given. Let \(\| \cdot \|_{\alpha}\) be some norm on \(\CC^n\) and \(\| \cdot \|_{\beta}\) be some norm on \(\CC^m\). For \(A \in \CC^{m \times n}\) we define
\(\frac{\| A x \|_{\beta}}{\| x \|_{\alpha}}\) represents the factor with which the length of \(x\) increased by operation of \(A\). We simply pick up the maximum value of such scaling factor.
The norm as defined above is known as :math:`(alpha to beta)` operator norm, the \((\alpha \to \beta)\)-norm, or simply the \(\alpha\)-norm if \(\alpha = \beta\).
Of course we need to verify that this definition satisfies all properties of a norm.
Clearly if \(A= 0\) then \(A x = 0\) always, hence \(\| A \| = 0\).
Conversely, if \(\| A \| = 0\) then \(\| A x \|_{\beta} = 0 \Forall x \in \CC^n\). In particular this is true for the unit vectors \(e_i \in \CC^n\). The \(i\)-th column of \(A\) is given by \(A e_i\) which is 0. Thus each column in \(A\) is 0. Hence \(A = 0\).
Now consider \(c \in \CC\).
We now present some useful observations on operator norm before we can prove triangle inequality for operator norm.
For any \(x \in \Kernel(A)\), \(A x = 0\) hence we only need to consider vectors which don’t belong to the kernel of \(A\).
Thus we can write
We also note that
Thus, it is sufficient to find the maximum on unit norm vectors:
Note that since \(\|x \|_{\alpha} = 1\) hence the term in denominator goes away.
The \((\alpha \to \beta)\)-operator norm is subordinate to vector norms \(\| \cdot \|_{\alpha}\) and \(\| \cdot \|_{\beta}\). i.e.
For \(x = 0\) the inequality is trivially satisfied. Now for \(x \neq 0\) by definition, we have
There exists a vector \(x^* \in \CC^{n}\) with unit norm (\(\| x^* \|_{\alpha} = 1\)) such that
Let \(x' \neq 0\) be some vector which maximizes the expression
Then
Now consider \(x^* = \frac{x'}{\| x' \|_{\alpha}}\). Thus \(\| x^* \|_{\alpha} = 1\). We know that
Hence
We are now ready to prove triangle inequality for operator norm.
Let \(A\) and \(B\) be some matrices in \(\CC^{m \times n}\). Consider the operator norm of matrix \(A+B\). From previous remarks, there exists some vector \(x^* \in \CC^n\) with \(\| x^* \|_{\alpha} = 1\) such that
Now
From another remark we have
and
since \(\| x^* \|_{\alpha} = 1\).
Hence we have
It turns out that operator norm is also consistent under certain conditions.
Let \(\| \cdot \|_{\alpha}\) be defined over all \(m \in \Nat\). Let \(\| \cdot \|_{\beta} = \| \cdot \|_{\alpha}\). Then the operator norm
is consistent.
We need to show that
Now
We note that if \(Bx = 0\), then \(A B x = 0\). Hence we can rewrite as
Now if \(Bx \neq 0\) then \(\| Bx \|_{\alpha} \neq 0\). Hence
and
Clearly
Furthermore
Thus we have
p-norm for matrices¶
We recall the definition of \(\ell_p\) norms for vectors \(x \in \CC^n\) from (2)
The operator norms \(\| \cdot \|_p\) defined from \(\ell_p\) vector norms are of specific interest.
The \(p\)-norm for a matrix \(A \in \CC^{m \times n}\) is defined as
where \(\| x \|_p\) is the standard \(\ell_p\) norm for vectors in \(\CC^m\) and \(\CC^n\).
Special cases are considered for \(p=1,2\) and \(\infty\).
Let \(A \in \CC^{m \times n}\).
For \(p=1\) we have
This is also known as max column sum norm.
For \(p=\infty\) we have
This is also known as max row sum norm.
Finally for \(p=2\) we have
where \(\sigma_1\) is the largest singular value of \(A\). This is also known as spectral norm.
Let
Then
Thus,
which the maximum column sum. We need to show that this upper bound is indeed an equality.
Indeed for any \(x=e_j\) where \(e_j\) is a unit vector with \(1\) in \(j\)-th entry and 0 elsewhere,
Thus
Combining the two, we see that
For \(p=\infty\), we proceed as follows:
where \(\underline{a}^i\) are the rows of \(A\).
This shows that
We need to show that this is indeed an equality.
Fix an \(i = k\) and choose \(x\) such that
Clearly \(\| x \|_{\infty} = 1\).
Then
Thus,
Combining the two inequalities we get:
Remaining case is for \(p=2\).
For any vector \(x\) with \(\| x \|_2 = 1\),
since \(\ell_2\) norm is invariant to unitary transformations.
Let \(v = V^H x\). Then \(\|v\|_2 = \| V^H x \|_2 = \| x \|_2 = 1\).
Now
This shows that
Now consider some vector \(x\) such that \(v = (1, 0, \dots, 0)\). Then
Thus
Combining the two, we get that \(\| A \|_2 = \sigma_1\).
The 2-norm¶
Let \(A\in \CC^{n \times n}\) has singular values \(\sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_n\). Let the eigen values for \(A\) be \(\lambda_1, \lambda_2, \dots, \lambda_n\) with \(|\lambda_1| \geq |\lambda_2| \geq \dots \geq |\lambda_n|\). Then the following hold
and if \(A\) is non-singular
If \(A\) is symmetric and positive definite, then
and if \(A\) is non-singular
If \(A\) is normal then
and if \(A\) is non-singular
Unitary invariant norms¶
We have already seen in here that Frobenius norm is unitary invariant.
It turns out that spectral norm is also unitary invariant.
More properties of operator norms¶
In this section we will focus on operator norms connecting normed linear spaces \((\CC^n, \| \cdot \|_{p})\) and \((\CC^m, \| \cdot \|_{q})\). Typical values of \(p, q\) would be in \(\{1, 2, \infty\}\).
We recall that
The following table (based on [TRO04]) shows how to compute different \((p, q)\) norms. Some can be computed easily while others are NP-hard to compute.
p | q | \(\| A \|_{p \to q}\) | Calculation |
---|---|---|---|
1 | 1 | \(\| A \|_{1 }\) | Maximum \(\ell_1\) norm of a column |
1 | 2 | \(\| A \|_{1 \to 2}\) | Maximum \(\ell_2\) norm of a column |
1 | \(\infty\) | \(\| A \|_{1 \to \infty}\) | Maximum absolute entry of a matrix |
2 | 1 | \(\| A \|_{2 \to 1}\) | NP hard |
2 | 2 | \(\| A \|_{2}\) | Maximum singular value |
2 | \(\infty\) | \(\| A \|_{2 \to \infty}\) | Maximum \(\ell_2\) norm of a row |
\(\infty\) | 1 | \(\| A \|_{\infty \to 1}\) | NP hard |
\(\infty\) | 2 | \(\| A \|_{\infty \to 2}\) | NP hard |
\(\infty\) | \(\infty\) | \(\| A \|_{\infty}\) | Maximum \(\ell_1\)-norm of a row |
The topological dual of the finite dimensional normed linear space \((\CC^n, \| \cdot \|_{p})\) is the normed linear space \((\CC^n, \| \cdot \|_{p'})\) where
\(\ell_2\)-norm is dual of \(\ell_2\)-norm. It is a self dual. \(\ell_1\) norm and \(\ell_{\infty}\)-norm are dual of each other.
When a matrix \(A\) maps from the space \((\CC^n, \| \cdot \|_{p})\) to the space \((\CC^m, \| \cdot \|_{q})\), we can view its conjugate transpose \(A^H\) as a mapping from the space \((\CC^m, \| \cdot \|_{q'})\) to \((\CC^n, \| \cdot \|_{p'})\).
Operator norm of a matrix always equals the operator norm of its conjugate transpose. i.e.
where
Specific applications of this result are:
This is obvious since the maximum singular value of a matrix and its conjugate transpose are same.
This is also obvious since max column sum of \(A\) is same as the max row sum norm of \(A^H\) and vice versa.
We now need to show the result for the general case (arbitrary \(1 \leq p, q \leq \infty\)).
where
Thus,
We need to show that this upper bound is indeed an equality.
Indeed for any \(x=e_j\) where \(e_j\) is a unit vector with \(1\) in \(j\)-th entry and 0 elsewhere,
Thus
Combining the two, we see that
where
For two matrices \(A\) and \(B\) and \(p \geq 1\), we have
We start with
From here, we obtain
Thus,
For two matrices \(A\) and \(B\) and \(p \geq 1\), we have
We start with
From here, we obtain
Thus,
In particular
Choosing \(q = \infty\) and \(s = p\) and applying here
But \(\| I \|_{p \to \infty}\) is the maximum \(\ell_p\) norm of any row of \(I\) which is \(1\). Thus
Consider the expression
\(z \in \ColSpace(A^H), z \neq 0\) means there exists some vector \(u \notin \Kernel(A^H)\) such that \(z = A^H u\).
This expression measures the factor by which the non-singular part of \(A\) can decrease the length of a vector.
The following bound holds for every matrix \(A\):
If \(A\) is surjective (onto), then the equality holds. When \(A\) is bijective (one-one onto, square, invertible), then the result implies
The spaces \(\ColSpace(A^H)\) and \(\ColSpace(A)\) have same dimensions given by \(\Rank(A)\). We recall that \(A^{\dag} A\) is a projector onto the column space of \(A\).
As a result we can write
whenever \(z \in \ColSpace(A^H)\). Now
When \(A\) is surjective, then \(\ColSpace(A) = \CC^m\). Hence
Thus, the inequality changes into equality. Finally
which completes the proof.
Row column norms¶
Let \(A\) be an \(m\times n\) matrix with rows \(\underline{a}^i\) as
Then we define
where \(1 \leq p < \infty\). i.e. we take \(p\)-norms of all row vectors and then find the maximum.
We define
This is equivalent to taking \(\ell_{\infty}\) norm on each row and then taking the maximum of all the norms.
For \(1 \leq p , q < \infty\), we define the norm
i.e., we compute \(p\)-norm of all the row vectors to form another vector and then take \(q\)-norm of that vector.
Note that the norm \(\| A \|_{p, \infty}\) is different from the operator norm \(\| A \|_{p \to \infty}\). Similarly \(\| A \|_{p, q}\) is different from \(\| A \|_{p \to q}\).
where
From here we get
This is exactly the definition of \(\| A \|_{p, \infty}\).
From here
For any two matrices \(A, B\), we have
Let \(q\) be such that \(\frac{1}{p} + \frac{1}{q} = 1\). From here, we have
From here
and
Thus
Relations between \((p, q)\) norms and \((p \to q)\) norms
Block diagonally dominant matrices and generalized Gershgorin disc theorem¶
In [FV+62] the idea of diagonally dominant matrices (see here) has been generalized to block matrices using matrix norms. We consider the specific case with spectral norm.
Let \(A\) be a square matrix in \(\CC^{n \times n}\) which is partitioned in following manner
where each of the submatrices \(A_{i j}\) is a square matrix of size \(m \times m\). Thus \(n = k m\).
\(A\) is called block diagonally dominant if
holds true for all \(1 \leq i \leq n\). If the inequality satisfies strictly for all \(i\), then \(A\) is called block strictly diagonally dominant matrix.
For proof see [FV+62].
This leads to the generalized Gershgorin disc theorem.
Let \(A\) be a square matrix in \(\CC^{n \times n}\) which is partitioned in following manner
where each of the submatrices \(A_{i j}\) is a square matrix of size \(m \times m\). Then each eigenvalue \(\lambda\) of \(A\) satisfies
For proof see [FV+62].
Since the \(2\)-norm of a positive semidefinite matrix is nothing but its largest eigen value, the theorem directly applies.
Let \(A\) be a Hermitian positive semidefinite matrix. Let \(A\) be partitioned as in here. Then its \(2\)-norm \(\| A \|_2\) satisfies