Matrix norms¶

This section reviews various matrix norms on the vector space of complex matrices over the field of complex numbers \((\CC^{m \times n}, \CC)\).

We know \((\CC^{m \times n}, \CC)\) is a finite dimensional vector space with dimension \(m n\). We will usually refer to it as \(\CC^{m \times n}\).

Matrix norms will follow the usual definition of norms for a vector space.

Definition

A function \(\| \cdot \| : \CC^{m \times n} \to \RR\) is called a matrix norm on \(\CC^{m \times n}\) if for all \(A, B \in \CC^{m \times n}\) and all \(\alpha \in \CC\) it satisfies the following

[Positivity]

\[\| A \| \geq 0\]

with \(\| A \| = 0 \iff A = 0\).
[Homogeneity]

\[\| \alpha A \| = | \alpha | \| A \|.\]
[Triangle inequality]

\[\| A + B \| \leq \| A \| + \| B \|.\]

We recall some of the standard results on normed vector spaces.

All matrix norms are equivalent. Let \(\| \cdot \|\) and \(\| \cdot \|'\) be two different matrix norms on \(\CC^{m \times n}\). Then there exist two constants \(a\) and \(b\) such that the following holds

\[a \| A \| \leq \| A \|' \leq b \|A \| \Forall A \in \CC^{m \times n}.\]

A matrix norm is a continuous function \(\| \cdot \| : \CC^{m \times n} \to \RR\).

Norms like \(\ell_p\) on complex vector space¶

Following norms are quite like \(\ell_p\) norms on finite dimensional complex vector space \(\CC^n\). They are developed by the fact that the matrix vector space \(\CC^{m\times n}\) has one to one correspondence with the complex vector space \(\CC^{m n}\).

Definition

Let \(A \in \CC^{m\times n}\) and \(A = [a_{ij}]\).

Matrix sum norm is defined as

\[\| A \|_S = \sum_{i=1}^{m} \sum_{j=1}^n | a_{ij} |\]

Definition

Let \(A \in \CC^{m\times n}\) and \(A = [a_{ij}]\).

Matrix Frobenius norm is defined as

\[\| A \|_F = \left ( \sum_{i=1}^{m} \sum_{j=1}^n | a_{ij} |^2 \right )^{\frac{1}{2}}.\]

Definition

Let \(A \in \CC^{m\times n}\) and \(A = [a_{ij}]\).

Matrix Max norm is defined as

\[\begin{split}\| A \|_M = \underset{\substack{ 1 \leq i \leq m \\ 1 \leq j \leq n}}{\max} | a_{ij} |.\end{split}\]

Properties of Frobenius norm¶

We now prove some elementary properties of Frobenius norm.

Lemma

The Frobenius norm of a matrix is equal to the Frobenius norm of its Hermitian transpose.

\[\| A^H \|_F = \| A \|_F.\]

Proof

Let

\[A = [a_{ij}].\]

Then

\[A^H = [\overline{a_{j i}}]\]

\[\begin{split}\| A^H \|_F^2 = \left ( \sum_{j=1}^n \sum_{i=1}^{m} | \overline{a_{ij}} |^2 \right ) = \left ( \sum_{i=1}^{m} \\ \sum_{j=1}^n | a_{ij} |^2 \right ) = \| A \|_F^2.\end{split}\]

Now

\[\| A^H \|_F^2 = \| A \|_F^2 \implies \| A^H \|_F = \| A \|_F\]

Lemma

Let \(A \in \CC^{m \times n}\) be written as a row of column vectors

\[A = \begin{bmatrix} a_1 & \dots & a_n \end{bmatrix}.\]

Then

\[\| A \|_F^2 = \sum_{j=1}^{n} \| a_j \|_2^2.\]

Proof

We note that

\[\| a_j \|_2^2 = \sum_{i=1}^m \| a_{i j} \|_2^2.\]

Now

\[\| A \|_F^2 = \left ( \sum_{i=1}^{m} \sum_{j=1}^n | a_{ij} |^2 \right ) = \left ( \sum_{j=1}^n \left ( \sum_{i=1}^{m} | a_{ij} |^2 \right ) \right ) = \left (\sum_{j=1}^n \| a_j \|_2^2 \right).\]

We thus showed that that the square of the Frobenius norm of a matrix is nothing but the sum of squares of \(\ell_2\) norms of its columns.

Lemma

Let \(A \in \CC^{m \times n}\) be written as a column of row vectors

\[\begin{split}A = \begin{bmatrix} \underline{a}^1 \\ \vdots \\ \underline{a}^m \end{bmatrix}.\end{split}\]

Then

\[\| A \|_F^2 = \sum_{i=1}^{m} \| \underline{a}^i \|_2^2.\]

Proof

We note that

\[\| \underline{a}^i \|_2^2 = \sum_{j=1}^n \| a_{i j} \|_2^2.\]

Now

\[\| A \|_F^2 = \left ( \sum_{i=1}^{m} \sum_{j=1}^n | a_{ij} |^2 \right ) = \sum_{i=1}^{m} \| \underline{a}^i \|_2^2.\]

We now consider how the Frobenius norm is affected with the action of unitary matrices.

Let \(A\) be any arbitrary matrix in \(\CC^{m \times n}\). Let \(U\) be some unitary matrices in \(\CC^{m \times m}\). Let \(V\) be some unitary matrices in \(\CC^{n \times n}\).

We present our first result that multiplication with unitary matrices doesn’t change Frobenius norm of a matrix.

Theorem

The Frobenius norm of a matrix is invariant to pre or post multiplication by a unitary matrix. i.e.

\[\| UA \|_F = \| A \|_F\]

and

\[\| AV \|_F = \| A \|_F.\]

Proof

We can write \(A\) as

\[A = \begin{bmatrix} a_1 & \dots & a_n \end{bmatrix}.\]

So

\[UA = \begin{bmatrix} Ua_1 & \dots & Ua_n \end{bmatrix}.\]

Then applying here clearly

\[\| UA \|_F^2 = \sum_{j=1}^{n} \|U a_j \|_2^2.\]

But we know that unitary matrices are norm preserving. Hence

\[\|U a_j \|_2^2 = \|a_j \|_2^2.\]

Thus

\[\| UA \|_F^2 = \sum_{j=1}^{n} \|a_j \|_2^2 = \| A \|_F^2\]

which implies

\[\| UA \|_F = \| A \|_F.\]

Similarly writing \(A\) as

\[\begin{split}A = \begin{bmatrix} r_1 \\ \vdots \\ r_m \end{bmatrix}.\end{split}\]

we have

\[\begin{split}AV = \begin{bmatrix} r_1 V\\ \vdots \\ r_m V \end{bmatrix}.\end{split}\]

Then applying here clearly

\[\| AV \|_F^2 = \sum_{i=1}^{m} \| r_i V \|_2^2.\]

But we know that unitary matrices are norm preserving. Hence

\[\|r_i V \|_2^2 = \|r_i \|_2^2.\]

Thus

\[\| AV \|_F^2 = \sum_{i=1}^{m} \| r_i \|_2^2 = \| A \|_F^2\]

which implies

\[\| AV \|_F = \| A \|_F.\]

An alternative approach for the 2nd part of the proof using the first part is just one line

\[\| AV \|_F = \| (AV)^H \|_F = \| V^H A^H \|_F = \| A^H \|_F = \| A \|_F.\]

In above we use here and the fact that \(V\) is a unitary matrix implies that \(V^H\) is also a unitary matrix. We have already shown that pre multiplication by a unitary matrix preserves Frobenius norm.

Theorem

Let \(A \in \CC^{m \times n}\) and \(B \in \CC^{n \times P}\) be two matrices. Then the Frobenius norm of their product is less than or equal to the product of Frobenius norms of the matrices themselves. i.e.

\[\| AB \|_F \leq \|A \|_F \| B \|_F.\]

Proof

We can write \(A\) as

\[\begin{split}A = \begin{bmatrix} a_1^T \\ \vdots \\ a_m^T \end{bmatrix}\end{split}\]

where \(a_i\) are \(m\) column vectors corresponding to rows of \(A\). Similarly we can write B as

\[B = \begin{bmatrix} b_1 & \dots & b_P \end{bmatrix}\]

where \(b_i\) are column vectors corresponding to columns of \(B\). Then

\[\begin{split}A B = \begin{bmatrix} a_1^T \\ \vdots \\ a_m^T \end{bmatrix} \begin{bmatrix} b_1 & \dots & b_P \end{bmatrix} = \begin{bmatrix} a_1^T b_1 & \dots & a_1^T b_P\\ \vdots & \ddots & \vdots \\ a_m^T b_1 & \dots & a_m^T b_P \end{bmatrix} = \begin{bmatrix} a_i^T b_j \end{bmatrix} .\end{split}\]

Now looking carefully

\[a_i^T b_j = \langle a_i, \overline{b_j} \rangle\]

Applying the Cauchy-Schwartz inequality we have

\[| \langle a_i, \overline{b_j} \rangle |^2 \leq \| a_i \|_2^2 \| \overline{b_j} \|_2^2 = \| a_i \|_2^2 \| b_j \|_2^2\]

Now

\[\begin{split}\| A B \|_F^2 &= \sum_{i=1}^{m} \sum_{j=1}^{P} | a_i^T b_j |^2\\ &\leq \sum_{i=1}^{m} \sum_{j=1}^{P} \| a_i \|_2^2 \| b_j \|_2^2\\ &= \left ( \sum_{i=1}^{m} \| a_i \|_2^2 \right ) \left ( \sum_{j=1}^{P} \| b_j \|_2^2\right )\\ &= \| A \|_F^2 \| B \|_F^2\end{split}\]

which implies

\[\| A B \|_F \leq \| A \|_F \| B \|_F\]

by taking square roots on both sides.

Corollary

Let \(A \in \CC^{m \times n}\) and let \(x \in \CC^n\). Then

\[\| A x \|_2 \leq \| A \|_F \| x \|_2.\]

Proof

We note that Frobenius norm for a column matrix is same as \(\ell_2\) norm for corresponding column vector. i.e.

\[\| x \|_F = \| x \|_2 \Forall x \in \CC^n.\]

Now applying here we have

\[\| A x \|_2 = \| A x \|_F \leq \| A \|_F \| x \|_F = \| A \|_F \| x \|_2 \Forall x \in \CC^n.\]

It turns out that Frobenius norm is intimately related to the singular value decomposition of a matrix.

Lemma

Let \(A \in \CC^{m \times n}\). Let the singular value decomposition of \(A\) be given by

\[A = U \Sigma V^H.\]

Let the singular value of \(A\) be \(\sigma_1, \dots, \sigma_n\). Then

\[\| A \|_F = \sqrt {\sum_{i=1}^n \sigma_i^2}.\]

Proof

\[A = U \Sigma V^H \implies \|A \|_F = \| U \Sigma V^H \|_F.\]

But

\[\| U \Sigma V^H \|_F = \| \Sigma V^H \|_F = \| \Sigma \|_F\]

since \(U\) and \(V\) are unitary matrices (see here ).

Now the only non-zero terms in \(\Sigma\) are the singular values. Hence

\[\| A \|_F = \| \Sigma \|_F = \sqrt {\sum_{i=1}^n \sigma_i^2}.\]

Consistency of a matrix norm¶

Definition

A matrix norm \(\| \cdot \|\) is called consistent on \(\CC^{n \times n}\) if

(1)¶\[\| A B \| \leq \| A \| \| B \|\]

holds true for all \(A, B \in \CC^{n \times n}\). A matrix norm \(\| \cdot \|\) is called consistent if it is defined on \(\CC^{m \times n}\) for all \(m, n \in \Nat\) and eq (1) holds for all matrices \(A, B\) for which the product \(AB\) is defined.

A consistent matrix norm is also known as a sub-multiplicative norm.

With this definition and results in here we can see that Frobenius norm is consistent.

Subordinate matrix norm¶

A matrix operates on vectors from one space to generate vectors in another space. It is interesting to explore the connection between the norm of a matrix and norms of vectors in the domain and co-domain of a matrix.

Definition

Let \(m, n \in \Nat\) be given. Let \(\| \cdot \|_{\alpha}\) be some norm on \(\CC^m\) and \(\| \cdot \|_{\beta}\) be some norm on \(\CC^n\). Let \(\| \cdot \|\) be some norm on matrices in \(\CC^{m \times n}\). We say that \(\| \cdot \|\) is subordinate to the vector norms \(\| \cdot \|_{\alpha}\) and \(\| \cdot \|_{\beta}\) if

\[\| A x \|_{\alpha} \leq \| A \| \| x \|_{\beta}\]

for all \(A \in \CC^{m \times n}\) and for all \(x \in \CC^n\). In other words the length of the vector doesn’t increase by the operation of \(A\) beyond a factor given by the norm of the matrix itself.

If \(\| \cdot \|_{\alpha}\) and \(\| \cdot \|_{\beta}\) are same then we say that \(\| \cdot \|\) is subordinate to the vector norm \(\| \cdot \|_{\alpha}\).

We have shown earlier in here that Frobenius norm is subordinate to Euclidean norm.

Operator norm¶

We now consider the maximum factor by which a matrix \(A\) can increase the length of a vector.

Definition

Let \(m, n \in \Nat\) be given. Let \(\| \cdot \|_{\alpha}\) be some norm on \(\CC^n\) and \(\| \cdot \|_{\beta}\) be some norm on \(\CC^m\). For \(A \in \CC^{m \times n}\) we define

\[\| A \| \triangleq \| A \|_{\alpha \to \beta} \triangleq \underset{x \neq 0}{\max } \frac{\| A x \|_{\beta}}{\| x \|_{\alpha}}.\]

\(\frac{\| A x \|_{\beta}}{\| x \|_{\alpha}}\) represents the factor with which the length of \(x\) increased by operation of \(A\). We simply pick up the maximum value of such scaling factor.

The norm as defined above is known as :math:`(alpha to beta)` operator norm, the \((\alpha \to \beta)\)-norm, or simply the \(\alpha\)-norm if \(\alpha = \beta\).

Of course we need to verify that this definition satisfies all properties of a norm.

Clearly if \(A= 0\) then \(A x = 0\) always, hence \(\| A \| = 0\).

Conversely, if \(\| A \| = 0\) then \(\| A x \|_{\beta} = 0 \Forall x \in \CC^n\). In particular this is true for the unit vectors \(e_i \in \CC^n\). The \(i\)-th column of \(A\) is given by \(A e_i\) which is 0. Thus each column in \(A\) is 0. Hence \(A = 0\).

Now consider \(c \in \CC\).

\[\| c A \| = \underset{x \neq 0}{\max } \frac{\| c A x \|_{\beta}}{\| x \|_{\alpha}} = | c | \underset{x \neq 0}{\max } \frac{\| A x \|_{\beta}}{\| x \|_{\alpha}} = | c | \|A \|.\]

We now present some useful observations on operator norm before we can prove triangle inequality for operator norm.

For any \(x \in \Kernel(A)\), \(A x = 0\) hence we only need to consider vectors which don’t belong to the kernel of \(A\).

Thus we can write

\[\| A \|_{\alpha \to \beta} = \underset{x \notin \Kernel(A)} {\max } \frac{\| A x \|_{\beta}}{\| x \|_{\alpha}}.\]

We also note that

\[\frac{\| A c x \|_{\beta}}{\| c x \|_{\alpha}} = \frac{| c | \| A x \|_{\beta}}{ | c | \| x \|_{\alpha}} = \frac{\| A x \|_{\beta}}{\| x \|_{\alpha}} \Forall c \neq 0, x \neq 0.\]

Thus, it is sufficient to find the maximum on unit norm vectors:

\[\| A \|_{\alpha \to \beta} = \underset{\| x \|_{\alpha} = 1} {\max } \| A x \|_{\beta}.\]

Note that since \(\|x \|_{\alpha} = 1\) hence the term in denominator goes away.

Lemma

The \((\alpha \to \beta)\)-operator norm is subordinate to vector norms \(\| \cdot \|_{\alpha}\) and \(\| \cdot \|_{\beta}\). i.e.

\[\| A x \|_{\beta} \leq \| A \|_{\alpha \to \beta } \| x \|_{\alpha}.\]

Proof

For \(x = 0\) the inequality is trivially satisfied. Now for \(x \neq 0\) by definition, we have

\[\| A \|_{\alpha \to \beta } \geq \frac{\| A x \|_{\beta}}{\| x \|_{\alpha}} \implies \| A \|_{\alpha \to \beta } \| x \|_{\alpha} \geq \| A x \|_{\beta}.\]

Remark

There exists a vector \(x^* \in \CC^{n}\) with unit norm (\(\| x^* \|_{\alpha} = 1\)) such that

\[\| A \|_{\alpha \to \beta} = \| A x^* \|_{\beta}.\]

Proof

Let \(x' \neq 0\) be some vector which maximizes the expression

\[\frac{\| A x \|_{\beta}}{\| x \|_{\alpha}}.\]

Then

\[\| A\|_{\alpha \to \beta} = \frac{\| A x' \|_{\beta}}{\| x' \|_{\alpha}}.\]

Now consider \(x^* = \frac{x'}{\| x' \|_{\alpha}}\). Thus \(\| x^* \|_{\alpha} = 1\). We know that

\[\frac{\| A x' \|_{\beta}}{\| x' \|_{\alpha}} = \| A x^* \|_{\beta}.\]

Hence

\[\| A\|_{\alpha \to \beta} = \| A x^* \|_{\beta}.\]

We are now ready to prove triangle inequality for operator norm.

Lemma

Operator norm as defined in here satisfies triangle inequality.

Proof

Let \(A\) and \(B\) be some matrices in \(\CC^{m \times n}\). Consider the operator norm of matrix \(A+B\). From previous remarks, there exists some vector \(x^* \in \CC^n\) with \(\| x^* \|_{\alpha} = 1\) such that

\[\| A + B \| = \| (A+B) x^* \|_{\beta}.\]

Now

\[\| (A+B) x^* \|_{\beta} = \| Ax^* + B x^* \|_{\beta} \leq \| Ax^*\|_{\beta} + \| Bx^*\|_{\beta}.\]

From another remark we have

\[\| Ax^*\|_{\beta} \leq \| A \| \|x^*\|_{\alpha} = \|A \|\]

and

\[\| Bx^*\|_{\beta} \leq \| B \| \|x^*\|_{\alpha} = \|B \|\]

since \(\| x^* \|_{\alpha} = 1\).

Hence we have

\[\| A + B \| \leq \| A \| + \| B \|.\]

It turns out that operator norm is also consistent under certain conditions.

Lemma

Let \(\| \cdot \|_{\alpha}\) be defined over all \(m \in \Nat\). Let \(\| \cdot \|_{\beta} = \| \cdot \|_{\alpha}\). Then the operator norm

\[\| A \|_{\alpha} = \underset{x \neq 0}{\max } \frac{\| A x \|_{\alpha}}{\| x \|_{\alpha}}\]

is consistent.

Proof

We need to show that

\[\| A B \|_{\alpha} \leq \| A \|_{\alpha} \| B \|_{\alpha}.\]

Now

\[\| A B \|_{\alpha} = \underset{x \neq 0}{\max } \frac{\| A B x \|_{\alpha}}{\| x \|_{\alpha}}.\]

We note that if \(Bx = 0\), then \(A B x = 0\). Hence we can rewrite as

\[\| A B \|_{\alpha} = \underset{Bx \neq 0}{\max } \frac{\| A B x \|_{\alpha}}{\| x \|_{\alpha}}.\]

Now if \(Bx \neq 0\) then \(\| Bx \|_{\alpha} \neq 0\). Hence

\[\frac{\| A B x \|_{\alpha}}{\| x \|_{\alpha}} = \frac{\| A B x \|_{\alpha}}{\|B x \|_{\alpha}} \frac{\| B x \|_{\alpha}}{\| x \|_{\alpha}}\]

and

\[\underset{Bx \neq 0}{\max } \frac{\| A B x \|_{\alpha}}{\| x \|_{\alpha}} \leq \underset{Bx \neq 0}{\max } \frac{\| A B x \|_{\alpha}}{\|B x \|_{\alpha}} \underset{Bx \neq 0}{\max } \frac{\| B x \|_{\alpha}}{\| x \|_{\alpha}}.\]

Clearly

\[\| B \|_{\alpha} = \underset{Bx \neq 0}{\max } \frac{\| B x \|_{\alpha}}{\| x \|_{\alpha}}.\]

Furthermore

\[\underset{Bx \neq 0}{\max } \frac{\| A B x \|_{\alpha}}{\|B x \|_{\alpha}} \leq \underset{y \neq 0}{\max } \frac{\| A y \|_{\alpha}}{\|y \|_{\alpha}} = \|A \|_{\alpha}.\]

Thus we have

\[\| A B \|_{\alpha} \leq \| A \|_{\alpha} \| B \|_{\alpha}.\]

p-norm for matrices¶

We recall the definition of \(\ell_p\) norms for vectors \(x \in \CC^n\) from (2)

\[\begin{split}\| x \|_p = \begin{cases} \left ( \sum_{i=1}^{n} | x |_i^p \right ) ^ {\frac{1}{p}} & p \in [1, \infty)\\ \underset{1 \leq i \leq n}{\max} |x_i| & p = \infty \end{cases}.\end{split}\]

The operator norms \(\| \cdot \|_p\) defined from \(\ell_p\) vector norms are of specific interest.

Definition

The \(p\)-norm for a matrix \(A \in \CC^{m \times n}\) is defined as

\[\| A \|_p \triangleq \underset{x \neq 0}{\max } \frac{\| A x \|_p}{\| x \|_p} = \underset{\| x \|_p = 1}{\max } \| A x \|_p\]

where \(\| x \|_p\) is the standard \(\ell_p\) norm for vectors in \(\CC^m\) and \(\CC^n\).

Remark

As per here \(p\)-norms for matrices are consistent norms. They are also sub-ordinate to \(\ell_p\) vector norms.

Special cases are considered for \(p=1,2\) and \(\infty\).

Theorem

Let \(A \in \CC^{m \times n}\).

For \(p=1\) we have

\[\| A \|_1 \triangleq \underset{1\leq j \leq n}{\max} \sum_{i=1}^m | a_{ij}|.\]

This is also known as max column sum norm.

For \(p=\infty\) we have

\[\| A \|_{\infty} \triangleq \underset{1\leq i \leq m}{\max} \sum_{j=1}^n | a_{ij}|.\]

This is also known as max row sum norm.

Finally for \(p=2\) we have

\[\| A \|_2 \triangleq \sigma_1\]

where \(\sigma_1\) is the largest singular value of \(A\). This is also known as spectral norm.

Proof

Let

\[A = \begin{bmatrix} a^1 & \dots, & a^n \end{bmatrix}.\]

Then

\[\begin{split}\begin{aligned} \| A x \|_1 &= \left \| \sum_{j=1}^n x_j a^j \right \|_1 \\ &\leq \sum_{j=1}^n \left \| x_j a^j \right \|_1 \\ &= \sum_{j=1}^n |x_j| \left \| a^j \right \|_1 \\ &\leq \underset{1 \leq j \leq n}{\max}\| a^j \|_1 \sum_{j=1}^n |x_j| \\ &= \underset{1 \leq j \leq n}{\max}\| a^j \|_1 \| x \|_1. \end{aligned}\end{split}\]

Thus,

\[\| A \|_1 = \underset{x \neq 0}{\max } \frac{\| A x \|_1}{\| x \|_1} \leq \underset{1 \leq j \leq n}{\max}\| a^j \|_1\]

which the maximum column sum. We need to show that this upper bound is indeed an equality.

Indeed for any \(x=e_j\) where \(e_j\) is a unit vector with \(1\) in \(j\)-th entry and 0 elsewhere,

\[\| A e_j \|_1 = \| a^j \|_1.\]

Thus

\[\| A \|_1 \geq \| a^j \|_1 \quad \Forall 1 \leq j \leq n.\]

Combining the two, we see that

\[\| A \|_1 = \underset{1 \leq j \leq n}{\max}\| a^j \|_1.\]

For \(p=\infty\), we proceed as follows:

\[\begin{split}\begin{aligned} \| A x \|_{\infty} &= \underset{1 \leq i \leq m}{\max} \left | \sum_{j=1}^n a_{ij } x_j \right | \\ & \leq \underset{1 \leq i \leq m}{\max} \sum_{j=1}^n | a_{ij } | | x_j |\\ & \leq \underset{1 \leq j \leq n}{\max} | x_j | \underset{1 \leq i \leq m}{\max} \sum_{j=1}^n | a_{ij } |\\ &= \| x \|_{\infty} \underset{1 \leq i \leq m}{\max}\| \underline{a}^i \|_1 \end{aligned}\end{split}\]

where \(\underline{a}^i\) are the rows of \(A\).

This shows that

\[\| A x \|_{\infty} \leq \underset{1 \leq i \leq m}{\max}\| \underline{a}^i \|_1.\]

We need to show that this is indeed an equality.

Fix an \(i = k\) and choose \(x\) such that

\[x_j = \sgn (a_{k j}).\]

Clearly \(\| x \|_{\infty} = 1\).

Then

\[\begin{split}\begin{aligned} \| A x \|_{\infty} &= \underset{1 \leq i \leq m}{\max} \left | \sum_{j=1}^n a_{ij } x_j \right | \\ &\geq \left | \sum_{j=1}^n a_{k j } x_j \right | \\ &= \left | \sum_{j=1}^n | a_{k j } | \right | \\ &= \sum_{j=1}^n | a_{k j } |\\ &= \| \underline{a}^k \|_1. \end{aligned}\end{split}\]

Thus,

\[\| A \|_{\infty} \geq \underset{1 \leq i \leq m}{\max}\| \underline{a}^i \|_1\]

Combining the two inequalities we get:

\[\| A \|_{\infty} = \underset{1 \leq i \leq m}{\max}\| \underline{a}^i \|_1.\]

Remaining case is for \(p=2\).

For any vector \(x\) with \(\| x \|_2 = 1\),

\[\| A x \|_2 = \| U \Sigma V^H x \|_2 = \| U (\Sigma V^H x )\|_2 = \| \Sigma V^H x \|_2\]

since \(\ell_2\) norm is invariant to unitary transformations.

Let \(v = V^H x\). Then \(\|v\|_2 = \| V^H x \|_2 = \| x \|_2 = 1\).

Now

\[\begin{split}\begin{aligned} \| A x \|_2 &= \| \Sigma v \|_2\\ &= \left ( \sum_{j=1}^n | \sigma_j v_j |^2 \right )^{\frac{1}{2}}\\ &\leq \sigma_1 \left ( \sum_{j=1}^n | v_j |^2 \right )^{\frac{1}{2}}\\ &= \sigma_1 \| v \|_2 = \sigma_1. \end{aligned}\end{split}\]

This shows that

\[\| A \|_2 \leq \sigma_1.\]

Now consider some vector \(x\) such that \(v = (1, 0, \dots, 0)\). Then

\[\| A x \|_2 = \| \Sigma v \|_2 = \sigma_1.\]

Thus

\[\| A \|_2 \geq \sigma_1.\]

Combining the two, we get that \(\| A \|_2 = \sigma_1\).

The 2-norm¶

Theorem

Let \(A\in \CC^{n \times n}\) has singular values \(\sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_n\). Let the eigen values for \(A\) be \(\lambda_1, \lambda_2, \dots, \lambda_n\) with \(|\lambda_1| \geq |\lambda_2| \geq \dots \geq |\lambda_n|\). Then the following hold

\[\| A \|_2 = \sigma_1\]

and if \(A\) is non-singular

\[\| A^{-1} \|_2 = \frac{1}{\sigma_n}.\]

If \(A\) is symmetric and positive definite, then

\[\| A \|_2 = \lambda_1\]

and if \(A\) is non-singular

\[\| A^{-1} \|_2 = \frac{1}{\lambda_n}.\]

If \(A\) is normal then

\[\| A \|_2 = |\lambda_1|\]

and if \(A\) is non-singular

\[\| A^{-1} \|_2 = \frac{1}{|\lambda_n|}.\]

Unitary invariant norms¶

Definition

A matrix norm \(\| \cdot \|\) on \(\CC^{m \times n}\) is called unitary invariant if \(\| U A V \| = \|A \|\) for any \(A \in \CC^{m \times n}\) and any unitary matrices \(U \in \CC^{m \times m}\) and \(V \in \CC^{n \times n}\).

We have already seen in here that Frobenius norm is unitary invariant.

It turns out that spectral norm is also unitary invariant.

More properties of operator norms¶

In this section we will focus on operator norms connecting normed linear spaces \((\CC^n, \| \cdot \|_{p})\) and \((\CC^m, \| \cdot \|_{q})\). Typical values of \(p, q\) would be in \(\{1, 2, \infty\}\).

We recall that

\[\| A \|_{p \to q } = \underset{x \neq 0}{\max} \frac{\| A x \|_q}{\| x \|_p} = \underset{ \| x \|_p = 1}{\max} \| A x \|_q = \underset{\| x \|_p \leq 1}{\max} \| A x \|_q.\]

The following table (based on [TRO04]) shows how to compute different \((p, q)\) norms. Some can be computed easily while others are NP-hard to compute.

Typical \((p \to q)\) norms¶
p	q	\(\\| A \\|_{p \to q}\)	Calculation
1	1	\(\\| A \\|_{1 }\)	Maximum \(\ell_1\) norm of a column
1	2	\(\\| A \\|_{1 \to 2}\)	Maximum \(\ell_2\) norm of a column
1	\(\infty\)	\(\\| A \\|_{1 \to \infty}\)	Maximum absolute entry of a matrix
2	1	\(\\| A \\|_{2 \to 1}\)	NP hard
2	2	\(\\| A \\|_{2}\)	Maximum singular value
2	\(\infty\)	\(\\| A \\|_{2 \to \infty}\)	Maximum \(\ell_2\) norm of a row
\(\infty\)	1	\(\\| A \\|_{\infty \to 1}\)	NP hard
\(\infty\)	2	\(\\| A \\|_{\infty \to 2}\)	NP hard
\(\infty\)	\(\infty\)	\(\\| A \\|_{\infty}\)	Maximum \(\ell_1\)-norm of a row

The topological dual of the finite dimensional normed linear space \((\CC^n, \| \cdot \|_{p})\) is the normed linear space \((\CC^n, \| \cdot \|_{p'})\) where

\[\frac{1}{p} + \frac{1}{p'} = 1.\]

\(\ell_2\)-norm is dual of \(\ell_2\)-norm. It is a self dual. \(\ell_1\) norm and \(\ell_{\infty}\)-norm are dual of each other.

When a matrix \(A\) maps from the space \((\CC^n, \| \cdot \|_{p})\) to the space \((\CC^m, \| \cdot \|_{q})\), we can view its conjugate transpose \(A^H\) as a mapping from the space \((\CC^m, \| \cdot \|_{q'})\) to \((\CC^n, \| \cdot \|_{p'})\).

Theorem

Operator norm of a matrix always equals the operator norm of its conjugate transpose. i.e.

\[\| A \|_{p \to q} = \| A^H \|_{q' \to p'}\]

where

\[\frac{1}{p} + \frac{1}{p'} = 1, \frac{1}{q} + \frac{1}{q'} = 1.\]

Specific applications of this result are:

\[\| A \|_2 = \| A^H \|_2.\]

This is obvious since the maximum singular value of a matrix and its conjugate transpose are same.

\[\| A \|_1 = \| A^H \|_{\infty}, \quad \| A \|_{\infty} = \| A^H \|_1.\]

This is also obvious since max column sum of \(A\) is same as the max row sum norm of \(A^H\) and vice versa.

\[\| A \|_{1 \to \infty} = \| A^H \|_{1 \to \infty}.\]

\[\| A \|_{1 \to 2} = \| A^H \|_{2 \to \infty}.\]

\[\| A \|_{\infty \to 2} = \| A^H \|_{2 \to 1}.\]

We now need to show the result for the general case (arbitrary \(1 \leq p, q \leq \infty\)).

Proof

TODO

Theorem

\[\| A \|_{1 \to p} = \underset{1 \leq j \leq n}{\max}\| a^j \|_p.\]

where

\[A = \begin{bmatrix} a^1 & \dots, & a^n \end{bmatrix}.\]

Proof

\[\begin{split}\begin{aligned} \| A x \|_p &= \left \| \sum_{j=1}^n x_j a^j \right \|_p \\ &\leq \sum_{j=1}^n \left \| x_j a^j \right \|_p \\ &= \sum_{j=1}^n |x_j| \left \| a^j \right \|_p \\ &\leq \underset{1 \leq j \leq n}{\max}\| a^j \|_p \sum_{j=1}^n |x_j| \\ &= \underset{1 \leq j \leq n}{\max}\| a^j \|_p \| x \|_1. \end{aligned}\end{split}\]

Thus,

\[\| A \|_{1 \to p} = \underset{x \neq 0}{\max } \frac{\| A x \|_p}{\| x \|_1} \leq \underset{1 \leq j \leq n}{\max}\| a^j \|_p.\]

We need to show that this upper bound is indeed an equality.

Indeed for any \(x=e_j\) where \(e_j\) is a unit vector with \(1\) in \(j\)-th entry and 0 elsewhere,

\[\| A e_j \|_p = \| a^j \|_p.\]

Thus

\[\| A \|_{1 \to p} \geq \| a^j \|_p \quad \Forall 1 \leq j \leq n.\]

Combining the two, we see that

\[\| A \|_{1 \to p} = \underset{1 \leq j \leq n}{\max}\| a^j \|_p.\]

Theorem

\[\| A \|_{p \to \infty} = \underset{1 \leq i \leq m}{\max}\| \underline{a}^i \|_q\]

where

\[\frac{1}{p} + \frac{1}{q} = 1.\]

Proof

Using here, we get

\[\| A \|_{p \to \infty} = \| A^H \|_{1 \to q}.\]

Using here, we get

\[\| A^H \|_{1 \to q} = \underset{1 \leq i \leq m}{\max}\| \underline{a}^i \|_q.\]

This completes the proof.

Theorem

For two matrices \(A\) and \(B\) and \(p \geq 1\), we have

\[\| A B \|_{p \to q} \leq \| B \|_{p \to s} \| A \|_{s \to q}.\]

Proof

We start with

\[\| A B \|_{p \to q} = \underset{\| x \|_p = 1}{\max} \| A ( B x) \|_q.\]

From here, we obtain

\[\| A ( B x) \|_q \leq \| A \|_{s \to q} \| ( B x) \|_s.\]

Thus,

\[\| A B \|_{p \to q} \leq \| A \|_{s \to q} \underset{\| x \|_p = 1}{\max} \| ( B x) \|_s = \| A \|_{s \to q} \| B \|_{p \to s}.\]

Theorem

For two matrices \(A\) and \(B\) and \(p \geq 1\), we have

\[\| A B \|_{p \to \infty} \leq \| A \|_{\infty \to \infty} \| B \|_{p \to \infty}.\]

Proof

We start with

\[\| A B \|_{p \to \infty} = \underset{\| x \|_p = 1}{\max} \| A ( B x) \|_{\infty}.\]

From here, we obtain

\[\| A ( B x) \|_{\infty} \leq \| A \|_{\infty \to \infty} \| ( B x) \|_{\infty}.\]

Thus,

\[\| A B \|_{p \to \infty} \leq \| A \|_{\infty \to \infty} \underset{\| x \|_p = 1}{\max} \| ( B x) \|_{\infty} = \| A \|_{\infty \to \infty} \| B \|_{p \to \infty}.\]

Theorem

\[\| A \|_{p \to \infty} \leq \| A \|_{p \to p}.\]

In particular

\[\| A \|_{1 \to \infty} \leq \| A \|_{1}.\]

\[\| A \|_{2 \to \infty} \leq \| A \|_{2}.\]

Proof

Choosing \(q = \infty\) and \(s = p\) and applying here

\[\| I A \|_{p \to \infty} \leq \| A \|_{p \to p} \| I \|_{p \to \infty}.\]

But \(\| I \|_{p \to \infty}\) is the maximum \(\ell_p\) norm of any row of \(I\) which is \(1\). Thus

\[\| A \|_{p \to \infty} \leq \| A \|_{p \to p}.\]

Consider the expression

\[\begin{split}\underset{ \substack{z \in \ColSpace(A^H) \\ z \neq 0}}{\min} \frac{\| A z \|_{q}}{\| z \|_p}.\end{split}\]

\(z \in \ColSpace(A^H), z \neq 0\) means there exists some vector \(u \notin \Kernel(A^H)\) such that \(z = A^H u\).

This expression measures the factor by which the non-singular part of \(A\) can decrease the length of a vector.

Theorem

The following bound holds for every matrix \(A\):

\[\begin{split}\underset{\substack{z \in \ColSpace(A^H) \\ z \neq 0}}{\min} \frac{\| A z \|_{q}}{\| z \|_p} \geq \| A^{\dag}\|_{q, p}^{-1}.\end{split}\]

If \(A\) is surjective (onto), then the equality holds. When \(A\) is bijective (one-one onto, square, invertible), then the result implies

\[\begin{split}\underset{\substack{z \in \ColSpace(A^H) \\ z \neq 0}}{\min} \frac{\| A z \|_{q}}{\| z \|_p} = \| A^{-1}\|_{q, p}^{-1}.\end{split}\]

Proof

The spaces \(\ColSpace(A^H)\) and \(\ColSpace(A)\) have same dimensions given by \(\Rank(A)\). We recall that \(A^{\dag} A\) is a projector onto the column space of \(A\).

\[w = A z \iff z = A^{\dag} w = A^{\dag} A z \Forall z \in \ColSpace (A^H).\]

As a result we can write

\[\frac{\| z \|_p}{ \| A z \|_q} = \frac{\| A^{\dag} w \|_p}{ \| w \|_q}\]

whenever \(z \in \ColSpace(A^H)\). Now

\[\begin{split} \left [ \underset{\substack{z \in \ColSpace(A^H)\\z \neq 0}}{\min} \frac{\| A z \|_q}{\| z \|_p}\right ]^{-1} = \underset{\substack{z \in \ColSpace(A^H)\\z \neq 0}}{\max} \frac{\| z \|_p}{ \| A z \|_q} = \underset{\substack{w \in \ColSpace(A) \\ w \neq 0}}{\max} \frac{\| A^{\dag} w \|_p}{ \| w \|_q} \leq \underset{w \neq 0}{\max} \frac{\| A^{\dag} w \|_p}{ \| w \|_q}.\end{split}\]

When \(A\) is surjective, then \(\ColSpace(A) = \CC^m\). Hence

\[\begin{split}\underset{\substack{w \in \ColSpace(A)\\w \neq 0}}{\max} \frac{\| A^{\dag} w \|_p}{ \| w \|_q} = \underset{w \neq 0}{\max} \frac{\| A^{\dag} w \|_p}{ \| w \|_q}.\end{split}\]

Thus, the inequality changes into equality. Finally

\[\underset{w \neq 0}{\max} \frac{\| A^{\dag} w \|_p}{ \| w \|_q} = \| A^{\dag} \|_{q \to p}\]

which completes the proof.

Row column norms¶

Definition

Let \(A\) be an \(m\times n\) matrix with rows \(\underline{a}^i\) as

\[\begin{split}A = \begin{bmatrix} \underline{a}^1\\ \vdots \\ \underline{a}^m \end{bmatrix}\end{split}\]

Then we define

\[\| A \|_{p, \infty} \triangleq \underset{1 \leq i \leq m}{\max} \| \underline{a}^i \|_p = \underset{1 \leq i \leq m}{\max} \left ( \sum_{j=1}^n |\underline{a}^i_j |^p \right )^{\frac{1}{p}}\]

where \(1 \leq p < \infty\). i.e. we take \(p\)-norms of all row vectors and then find the maximum.

We define

\[\| A \|_{\infty, \infty} = \underset{i, j}{\max} |a_{i j}|.\]

This is equivalent to taking \(\ell_{\infty}\) norm on each row and then taking the maximum of all the norms.

For \(1 \leq p , q < \infty\), we define the norm

\[\| A \|_{p, q} \triangleq \left [ \sum_{i=1}^m \left ( \| \underline{a}^i \|_p \right )^q \right ]^{\frac{1}{q}}.\]

i.e., we compute \(p\)-norm of all the row vectors to form another vector and then take \(q\)-norm of that vector.

Note that the norm \(\| A \|_{p, \infty}\) is different from the operator norm \(\| A \|_{p \to \infty}\). Similarly \(\| A \|_{p, q}\) is different from \(\| A \|_{p \to q}\).

Theorem

\[\| A \|_{p, \infty} = \| A \|_{q \to \infty}\]

where

\[\frac{1}{p} + \frac{1}{q} = 1.\]

Proof

From here we get

\[\| A \|_{q \to \infty} = \underset{1 \leq i \leq m}{\max}\| \underline{a}^i \|_p.\]

This is exactly the definition of \(\| A \|_{p, \infty}\).

Theorem

\[\| A \|_{1 \to p} = \| A \|_{p, \infty}.\]

Proof

\[\| A \|_{1 \to p} = \| A^H \|_{q \to \infty}.\]

From here

\[\| A^H \|_{q \to \infty} = \| A^H \|_{p, \infty}.\]

Theorem

For any two matrices \(A, B\), we have

\[\frac{\|A B \|_{p, \infty}}{\| B\|_{p, \infty}} \leq \| A \|_{\infty \to \infty}.\]

Proof

Let \(q\) be such that \(\frac{1}{p} + \frac{1}{q} = 1\). From here, we have

\[\| A B \|_{q \to \infty} \leq \| A \|_{\infty \to \infty} \| B \|_{q \to \infty}.\]

From here

\[\| A B \|_{q \to \infty} = \| A B\|_{p, \infty}\]

and

\[\| B \|_{q \to \infty} = \| B\|_{p, \infty}.\]

Thus

\[\| A B\|_{p, \infty} \leq \| A \|_{\infty \to \infty} \| B\|_{p, \infty}.\]

Theorem

Relations between \((p, q)\) norms and \((p \to q)\) norms

\[\| A \|_{1, \infty} = \| A \|_{\infty \to \infty}\]

\[\| A \|_{2, \infty} = \| A \|_{2 \to \infty}\]

\[\| A \|_{\infty, \infty} = \| A \|_{1 \to \infty}\]

\[\| A \|_{1 \to 1} = \| A^H \|_{1, \infty}\]

\[\| A \|_{1 \to 2} = \| A^H \|_{2, \infty}\]

\[\]

Proof

The first three are straight forward applications of here. The next two are applications of here. See also here.

Block diagonally dominant matrices and generalized Gershgorin disc theorem¶

In [FV+62] the idea of diagonally dominant matrices (see here) has been generalized to block matrices using matrix norms. We consider the specific case with spectral norm.

Definition

Let \(A\) be a square matrix in \(\CC^{n \times n}\) which is partitioned in following manner

\[\begin{split}A = \begin{bmatrix} A_{11} & A_{12} & \dots & A_{1 k}\\ A_{21} & A_{22} & \dots & A_{2 k}\\ \vdots & \vdots & \ddots & \vdots\\ A_{k 1} & A_{k 2} & \dots & A_{k k}\\ \end{bmatrix}\end{split}\]

where each of the submatrices \(A_{i j}\) is a square matrix of size \(m \times m\). Thus \(n = k m\).

\(A\) is called block diagonally dominant if

\[\| A_{ii}\|_2 \geq \sum_{j \neq i } \|A_{ij} \|_2.\]

holds true for all \(1 \leq i \leq n\). If the inequality satisfies strictly for all \(i\), then \(A\) is called block strictly diagonally dominant matrix.

Theorem

If the partitioned matrix \(A\) of here is block strictly diagonally dominant matrix, then it is nonsingular.

For proof see [FV+62].

This leads to the generalized Gershgorin disc theorem.

Theorem

Let \(A\) be a square matrix in \(\CC^{n \times n}\) which is partitioned in following manner

\[\begin{split}A = \begin{bmatrix} A_{11} & A_{12} & \dots & A_{1 k}\\ A_{21} & A_{22} & \dots & A_{2 k}\\ \vdots & \vdots & \ddots & \vdots\\ A_{k 1} & A_{k 2} & \dots & A_{k k}\\ \end{bmatrix}\end{split}\]

where each of the submatrices \(A_{i j}\) is a square matrix of size \(m \times m\). Then each eigenvalue \(\lambda\) of \(A\) satisfies

\[\| \lambda I - A_{ii}\|_2 \leq \sum_{j\neq i} \|A_{ij} \| \text{ for some } i \in \{1,2, \dots, n \}.\]

For proof see [FV+62].

Since the \(2\)-norm of a positive semidefinite matrix is nothing but its largest eigen value, the theorem directly applies.

Corollary

Let \(A\) be a Hermitian positive semidefinite matrix. Let \(A\) be partitioned as in here. Then its \(2\)-norm \(\| A \|_2\) satisfies

\[| \| A \|_2 - \|A_{ii}\|_2 | \leq \sum_{j\neq i} \|A_{ij} \| \text{ for some } i \in \{1,2, \dots, n \}.\]