Eigen values¶

Much of the discussion in this section will be equally applicable to real as well as complex matrices. We will use the complex notation mostly and make specific remarks for real matrices wherever needed.

Definition

A scalar \(\lambda\) is an eigen value of an \(n \times n\) matrix \(A = [ a_{ij} ]\) if there exists a non null vector \(x\) such that

(1)¶\[Ax = \lambda x.\]

A non null vector \(x\) which satisfies this equation is called an eigen vector of \(A\) for the eigen value \(\lambda\).

An eigen value is also known as a characteristic value, proper value or a latent value.

We note that (1) can be written as

\[Ax = \lambda I_n x \implies (A - \lambda I_n) x = 0.\]

Thus \(\lambda\) is an eigen value of \(A\) if and only if the matrix \(A - \lambda I\) is singular.

Definition

The set comprising of eigen values of a matrix \(A\) is known as its spectrum.

Remark

For each eigen vector \(x\) for a matrix \(A\) the corresponding eigen value \(\lambda\) is unique.

Proof

Assume that for \(x\) there are two eigen values \(\lambda_1\) and \(\lambda_2\), then

\[A x = \lambda_1 x = \lambda_2 x \implies (\lambda_1 - \lambda_2 ) x = 0.\]

This can happen only when either \(x = 0\) or \(\lambda_1 = \lambda_2\). Since \(x\) is an eigen vector, it cannot be 0. Thus \(\lambda_1 = \lambda_2\).

Remark

If \(x\) is an eigen vector for \(A\), then the corresponding eigen value is given by

\[\lambda = \frac{x^H A x }{x^H x}.\]

Proof

\[A x = \lambda x \implies x^H A x = \lambda x^H x \implies \lambda = \frac{x^H A x }{x^H x}.\]

since \(x\) is non-zero.

Remark

An eigen vector \(x\) of \(A\) for eigen value \(\lambda\) belongs to the null space of \(A - \lambda I\), i.e.

\[x \in \NullSpace(A - \lambda I).\]

In other words \(x\) is a nontrivial solution to the homogeneous system of linear equations given by

\[(A - \lambda I) z = 0.\]

Definition

Let \(\lambda\) be an eigen value for a square matrix \(A\). Then its eigen space is the null space of \(A - \lambda I\) i.e. \(\NullSpace(A - \lambda I)\).

Remark

The set comprising all the eigen vectors of \(A\) for an eigen value \(\lambda\) is given by

\[\NullSpace(A - \lambda I) \setminus \{ 0 \}\]

since \(0\) cannot be an eigen vector.

Definition

Let \(\lambda\) be an eigen value for a square matrix \(A\). The dimension of its eigen space \(\NullSpace(A - \lambda I)\) is known as the geometric multiplicity of the eigen value \(\lambda\).

Remark

Clearly

\[\dim (\NullSpace(A - \lambda I)) = n - \Rank(A - \lambda I).\]

Remark

A scalar \(\lambda\) can be an eigen value of a square matrix \(A\) if and only if

\[\det (A - \lambda I) = 0.\]

\(\det (A - \lambda I)\) is a polynomial in \(\lambda\) of degree \(n\).

Remark

\[\det (A - \lambda I) = p(\lambda) = \alpha^n \lambda^n + \alpha^{n-1} \lambda^{n-1} + \dots + \alpha^1 \lambda + \alpha_0\]

where \(\alpha_i\) depend on entries in \(A\).

In this sense, an eigen value of \(A\) is a root of the equation

\[p(\lambda) = 0.\]

Its easy to show that \(\alpha^n = (-1)^n\).

Definition

For any square matrix \(A\), the polynomial given by \(p(\lambda) = \det(A - \lambda I )\) is known as its characteristic polynomial. The equation give by

\[p(\lambda) = 0\]

is known as its characteristic equation. The eigen values of \(A\) are the roots of its characteristic polynomial or solutions of its characteristic equation.

Lemma

For real square matrices, if we restrict eigen values to real values, then the characteristic polynomial can be factored as

\[p(\lambda) = (-1)^n (\lambda - \lambda_1)^{r_1} \dots (\lambda - \lambda_k)^{r_k} q(\lambda).\]

The polynomial has \(k\) distinct real roots. For each root \(\lambda_i\), \(r_i\) is a positive integer indicating how many times the root appears. \(q(\lambda)\) is a polynomial that has no real roots. The following is true

\[r_1 + \dots + r_k + deg(q(\lambda)) = n.\]

Clearly \(k \leq n\).

For complex square matrices where eigen values can be complex (including real square matrices), the characteristic polynomial can be factored as

\[p(\lambda) = (-1)^n (\lambda - \lambda_1)^{r_1} \dots (\lambda - \lambda_k)^{r_k}.\]

The polynomial can be completely factorized into first degree polynomials. There are \(k\) distinct roots or eigen values. The following is true

\[r_1 + \dots + r_k = n.\]

Thus including the duplicates there are exactly \(n\) eigen values for a complex square matrix.

Remark

It is quite possible that a real square matrix doesn’t have any real eigen values.

Definition

The number of times an eigen value appears in the factorization of the characteristic polynomial of a square matrix \(A\) is known as its algebraic multiplicity. In other words \(r_i\) is the algebraic multiplicity for \(\lambda_i\) in above factorization.

Remark

In above the set \(\{\lambda_1, \dots, \lambda_k \}\) forms the spectrum of \(A\).

Let us consider the sum of \(r_i\) which gives the count of total number of roots of \(p(\lambda)\).

\[m = \sum_{i=1}^k r_i.\]

With this there are \(m\) not-necessarily distinct roots of \(p(\lambda)\). Let us write \(p(\lambda)\) as

\[p(\lambda) = (-1)^n (\lambda - c_1) (\lambda - c_2)\dots (\lambda - c_m)q(\lambda).\]

where \(c_1, c_2, \dots, c_m\) are \(m\) scalars (not necessarily distinct) of which \(r_1\) scalars are \(\lambda_1\), \(r_2\) are \(\lambda_2\) and so on. Obviously for the complex case \(q(\lambda)=1\).

We will refer to the set (allowing repetitions) \(\{c_1, c_2, \dots, c_m \}\) as the eigen values of the matrix \(A\) where \(c_i\) are not necessarily distinct. In contrast the spectrum of \(A\) refers to the set of distinct eigen values of \(A\). The symbol \(c\) has been chosen based on the other name for eigen values (the characteristic values).

We can put together eigen vectors of a matrix into another matrix by itself. This can be very useful tool. We start with a simple idea.

Lemma

Let \(A\) be an \(n \times n\) matrix. Let \(u_1, u_2, \dots, u_r\) be \(r\) non-zero vectors from \(\FF^n\). Let us construct an \(n \times r\) matrix

\[U = \begin{bmatrix} u_1 & u_2 & \dots & u_r \end{bmatrix}.\]

Then all the \(r\) vectors are eigen vectors of \(A\) if and only if there exists a diagonal matrix \(D = \Diag(d_1, \dots, d_r)\) such that

\[A U = U D.\]

Proof

Expanding the equation, we can write

\[\begin{bmatrix} A u_1 & A u_2 & \dots & A u_r \end{bmatrix} = \begin{bmatrix} d_1 u_1 & d_2 u_2 & \dots & d_r u_r \end{bmatrix}.\]

Clearly we want

\[A u_i = d_i u_i\]

where \(u_i\) are non-zero. This is possible only when \(d_i\) is an eigen value of \(A\) and \(u_i\) is an eigen vector for \(d_i\).

Converse: Assume that \(u_i\) are eigen vectors. Choose \(d_i\) to be corresponding eigen values. Then the equation holds.

Lemma

\(0\) is an eigen value of a square matrix \(A\) if and only if \(A\) is singular.

Proof

Let \(0\) be an eigen value of \(A\). Then there exists \(u \neq 0\) such that

\[A u = 0 u = 0.\]

Thus \(u\) is a non-trivial solution of the homogeneous linear system. Thus \(A\) is singular.

Converse: Assuming that \(A\) is singular, there exists \(u \neq 0\) s.t.

\[A u = 0 = 0 u.\]

Thus \(0\) is an eigen value of \(A\).

Lemma

If a square matrix \(A\) is singular, then \(\NullSpace(A)\) is the eigen space for the eigen value \(\lambda = 0\).

Proof

This is straight forward from the definition of eigen space (see here).

Remark

Clearly the geometric multiplicity of \(\lambda=0\) equals \(\Nullity(A) = n - \Rank(A)\).

Lemma

Let \(A\) be a square matrix. Then \(A\) and \(A^T\) have same eigen values.

Proof

The eigen values of \(A^T\) are given by

\[\det (A^T - \lambda I) = 0.\]

But

\[A^T - \lambda I = A^T - (\lambda I )^T = (A - \lambda I)^T.\]

Hence (using here)

\[\det (A^T - \lambda I) = \det \left ( (A - \lambda I)^T \right ) = \det (A - \lambda I).\]

Thus the characteristic polynomials of \(A\) and \(A^T\) are same. Hence the eigen values are same. In other words the spectrum of \(A\) and \(A^T\) are same.

Remark

If \(x\) is an eigen vector with a non-zero eigen value \(\lambda\) for \(A\) then \(Ax\) and \(x\) are collinear.

In other words the angle between \(Ax\) and \(x\) is either \(0^{\circ}\) when \(\lambda\) is positive and is \(180^{\circ}\) when \(\lambda\) is negative. Let us look at the inner product:

\[\langle Ax, x \rangle = x^H A x = x^H \lambda x = \lambda \| x\|_2^2.\]

Meanwhile

\[\| A x \|_2 = \| \lambda x \|_2 = |\lambda| \| x \|_2.\]

Thus

\[|\langle Ax, x \rangle | = \| Ax \|_2 \| x \|_2.\]

The angle \(\theta\) between \(Ax\) and \(x\) is given by

\[\cos \theta = \frac{\langle Ax, x \rangle}{\| Ax \|_2 \| x \|_2} = \frac{\lambda \| x\|_2^2}{|\lambda| \| x \|_2^2} = \pm 1.\]

Lemma

Let \(A\) be a square matrix and \(\lambda\) be an eigen value of \(A\). Let \(p \in \Nat\). Then \(\lambda^p\) is an eigen value of \(A^{p}\).

Proof

For \(p=1\) the statement holds trivially since \(\lambda^1\) is an eigen value of \(A^1\). Assume that the statement holds for some value of \(p\). Thus let \(\lambda^p\) be an eigen value of \(A^{p}\) and let \(u\) be corresponding eigen vector. Now

\[A^{p + 1} u = A^ p ( A u) = A^{p} \lambda u = \lambda A^{p} u = \lambda \lambda^p u = \lambda^{p + 1} u.\]

Thus \(\lambda^{p + 1}\) is an eigen value for \(A^{p + 1}\) with the same eigen vector \(u\). With the principle of mathematical induction, the proof is complete.

Lemma

Let a square matrix \(A\) be non singular and let \(\lambda \neq 0\) be some eigen value of \(A\). Then \(\lambda^{-1}\) is an eigen value of \(A^{-1}\). Moreover, all eigen values of \(A^{-1}\) are obtained by taking inverses of eigen values of \(A\) i.e. if \(\mu \neq 0\) is an eigen value of \(A^{-1}\) then \(\frac{1}{\mu}\) is an eigen value of \(A\) also. Also, \(A\) and \(A^{-1}\) share the same set of eigen vectors.

Proof

Let \(u \neq 0\) be an eigen vector of \(A\) for the eigen value \(\lambda\). Then

\[A u = \lambda u \implies u = A^{-1} \lambda u \implies \frac{1}{\lambda} u = A^{-1} u.\]

Thus \(u\) is also an eigen vector of \(A^{-1}\) for the eigen value \(\frac{1}{\lambda}\).

Now let \(B = A^{-1}\). Then \(B^{-1} = A\). Thus if \(\mu\) is an eigen value of \(B\) then \(\frac{1}{\mu}\) is an eigen value of \(B^{-1} = A\).

Thus if \(A\) is invertible then eigen values of \(A\) and \(A^{-1}\) have one to one correspondence.

This result is very useful. Since if it can be shown that a matrix \(A\) is similar to a diagonal or a triangular matrix whose eigen values are easy to obtain then determination of the eigen values of \(A\) becomes straight forward.

Invariant subspaces¶

Definition

Let \(A\) be a square \(n\times n\) matrix and let \(\WW\) be a subspace of \(\FF^n\) i.e. \(\WW \leq \FF\). Then \(\WW\) is invariant relative to \(A\) if

\[A w \in \WW \Forall w \in \WW.\]

i.e. \(A (W) \subseteq W\) or for every vector \(w \in \WW\) its mapping \(A w\) is also in \(\WW\). Thus action of \(A\) on \(\WW\) doesn’t take us outside of \(\WW\).

We also say that \(\WW\) is \(A\)-invariant.

Eigen vectors are generators of invariant subspaces.

Lemma

Let \(A\) be an \(n \times n\) matrix. Let \(x_1, x_2, \dots, x_r\) be \(r\) eigen vectors of \(A\). Let us construct an \(n \times r\) matrix

\[X = \begin{bmatrix} x_1 & x_2 & \dots & r_r \end{bmatrix}.\]

Then the column space of \(X\) i.e. \(\ColSpace(X)\) is invariant relative to \(A\).

Proof

Let us assume that \(c_1, c_2, \dots, c_r\) are the eigen values corresponding to \(x_1, x_2, \dots, x_r\) (not necessarily distinct).

Let any vector \(x \in \ColSpace(X)\) be given by

\[x = \sum_{i=1}^r \alpha_i x_i.\]

Then

\[A x= A \sum_{i=1}^r \alpha_i x_i = \sum_{i=1}^r \alpha_i A x_i = \sum_{i=1}^r \alpha_i c_i x_i.\]

Clearly \(Ax\) is also a linear combination of \(x_i\) hence belongs to \(\ColSpace(X)\). Thus \(X\) is invariant relative to \(A\) or \(X\) is \(A\)-invariant.

Triangular matrices¶

Lemma

Let \(A\) be an \(n\times n\) upper or lower triangular matrix. Then its eigen values are the entries on its main diagonal.

Proof

If \(A\) is triangular then \(A - \lambda I\) is also triangular with its diagonal entries being \((a_{i i} - \lambda)\). Using here, we have

\[p(\lambda) = \det (A - \lambda I) = \prod_{i=1}^n (a_{i i} - \lambda).\]

Clearly the roots of characteristic polynomial are \(a_{i i}\).

Several small results follow from this lemma.

Corollary

Let \(A = [a_{i j}]\) be an \(n \times n\) triangular matrix.

The characteristic polynomial of \(A\) is \(p(\lambda) = (-1)^n (\lambda - a_{i i})\).
A scalar \(\lambda\) is an eigen value of \(A\) iff its one of the diagonal entries of \(A\).
The algebraic multiplicity of an eigen value \(\lambda\) is equal to the number of times it appears on the main diagonal of \(A\).
The spectrum of \(A\) is given by the distinct entries on the main diagonal of \(A\).

A diagonal matrix is naturally both an upper triangular matrix as well as a lower triangular matrix. Similar results hold for the eigen values of a diagonal matrix also.

Lemma

Let \(A = [a_{i j}]\) be an \(n \times n\) diagonal matrix.

Its eigen values are the entries on its main diagonal.
The characteristic polynomial of \(A\) is \(p(\lambda) = (-1)^n (\lambda - a_{i i})\).
A scalar \(\lambda\) is an eigen value of \(A\) iff its one of the diagonal entries of \(A\).
The algebraic multiplicity of an eigen value \(\lambda\) is equal to the number of times it appears on the main diagonal of \(A\).
The spectrum of \(A\) is given by the distinct entries on the main diagonal of \(A\).

There is also a result for the geometric multiplicity of eigen values for a diagonal matrix.

Lemma

Let \(A = [a_{i j}]\) be an \(n \times n\) diagonal matrix. The geometric multiplicity of an eigen value \(\lambda\) is equal to the number of times it appears on the main diagonal of \(A\).

Proof

The unit vectors \(e_i\) are eigen vectors for \(A\) since

\[A e_i = a_{i i } e_i.\]

They are independent. Thus if a particular eigen value appears \(r\) number of times, then there are \(r\) linearly independent eigen vectors for the eigen value. Thus its geometric multiplicity is equal to the algebraic multiplicity.

Similar matrices¶

Some very useful results are available for similar matrices.

Lemma

The characteristic polynomial and spectrum of similar matrices is same.

Proof

Let \(B\) be similar to \(A\). Thus there exists an invertible matrix \(C\) such that

\[B = C^{-1} A C.\]

Now

\[B - \lambda I = C^{-1} A C - \lambda I = C^{-1} A C - \lambda C^{-1} C = C^{-1} ( AC - \lambda C) = C^{-1} (A - \lambda I) C.\]

Thus \(B - \lambda I\) is similar to \(A - \lambda I\). Hence due to here, their determinant is equal i.e.

\[\det(B - \lambda I ) = \det(A - \lambda I ).\]

This means that the characteristic polynomials of \(A\) and \(B\) are same. Since eigen values are nothing but roots of the characteristic polynomial, hence they are same too. This means that the spectrum (the set of distinct eigen values) is same.

Corollary

If \(A\) and \(B\) are similar to each other then

An eigen value has same algebraic and geometric multiplicity for both \(A\) and \(B\).
The (not necessarily distinct) eigen values of \(A\) and \(B\) are same.

Although the eigen values are same, but the eigen vectors are different.

Lemma

Let \(A\) and \(B\) be similar with

\[B = C^{-1} A C\]

for some invertible matrix \(C\). If \(u\) is an eigen vector of \(A\) for an eigen value \(\lambda\), then \(C^{-1} u\) is an eigen vector of \(B\) for the same eigen value.

Proof

\(u\) is an eigen vector of \(A\) for an eigen value \(\lambda\). Thus we have

\[A u = \lambda u.\]

Thus

\[B C^{-1} u = C^{-1} A C C^{-1} u = C^{-1} A u = C^{-1} \lambda u = \lambda C^{-1} u.\]

Now \(u \neq 0\) and \(C^{-1}\) is non singular. Thus \(C^{-1} u \neq 0\). Thus \(C^{-1}u\) is an eigen vector of \(B\).

Theorem

Let \(\lambda\) be an eigen value of a square matrix \(A\). Then the geometric multiplicity of \(\lambda\) is less than or equal to its algebraic multiplicity.

Corollary

If an \(n \times n\) matrix \(A\) has \(n\) distinct eigen values, then each of them has a geometric (and algebraic) multiplicity of \(1\).

Proof

The algebraic multiplicity of an eigen value is greater than or equal to 1. But the sum cannot exceed \(n\). Since there are \(n\) distinct eigen values, thus each of them has algebraic multiplicity of \(1\). Now geometric multiplicity of an eigen value is greater than equal to 1 and less than equal to its algebraic multiplicity.

Corollary

Let an \(n \times n\) matrix \(A\) has \(k\) distinct eigen values \(\lambda_1, \lambda_2, \dots, \lambda_k\) with algebraic multiplicities \(r_1, r_2, \dots, r_k\) and geometric multiplicities \(g_1, g_2, \dots g_k\) respectively. Then

\[\sum_{i=1}^k g_k \leq \sum_{i=1}^k r_k \leq n.\]

Moreover if

\[\sum_{i=1}^k g_k = \sum_{i=1}^k r_k\]

then

\[g_k = r_k.\]

Linear independence of eigen vectors¶

Theorem

Let \(A\) be an \(n\times n\) square matrix. Let \(x_1, x_2, \dots , x_k\) be any \(k\) eigen vectors of \(A\) for distinct eigen values \(\lambda_1, \lambda_2, \dots, \lambda_k\) respectively. Then \(x_1, x_2, \dots , x_k\) are linearly independent.

Proof

We first prove the simpler case with 2 eigen vectors \(x_1\) and \(x_2\) and corresponding eigen values \(\lambda_1\) and \(\lambda_2\) respectively.

Let there be a linear relationship between \(x_1\) and \(x_2\) given by

\[\alpha_1 x_1 + \alpha_2 x_2 = 0.\]

Multiplying both sides with \((A - \lambda_1 I)\) we get

\[\begin{split}\begin{aligned} & \alpha_1 (A - \lambda_1 I) x_1 + \alpha_2(A - \lambda_1 I) x_2 = 0\\ \implies & \alpha_1 (\lambda_1 - \lambda_1) x_1 + \alpha_2(\lambda_1 - \lambda_2) x_2 = 0\\ \implies & \alpha_2(\lambda_1 - \lambda_2) x_2 = 0. \end{aligned}\end{split}\]

Since \(\lambda_1 \neq \lambda_2\) and \(x_2 \neq 0\) , hence \(\alpha_2 = 0\).

Similarly by multiplying with \((A - \lambda_2 I)\) on both sides, we can show that \(\alpha_1 = 0\). Thus \(x_1\) and \(x_2\) are linearly independent.

Now for the general case, consider a linear relationship between \(x_1, x_2, \dots , x_k\) given by

\[\alpha_1 x_1 + \alpha_2 x_2 + \dots \alpha_k x_k = 0.\]

Multiplying by \(\prod_{i \neq j, i=1}^k (A - \lambda_i I)\) and using the fact that \(\lambda_i \neq \lambda_j\) if \(i \neq j\), we get \(\alpha_j = 0\). Thus the only linear relationship is the trivial relationship. This completes the proof.

For eigen values with geometric multiplicity greater than \(1\) there are multiple eigenvectors corresponding to the eigen value which are linearly independent. In this context, above theorem can be generalized further.

Theorem

Let \(\lambda_1, \lambda_2, \dots, \lambda_k\) be \(k\) distinct eigen values of \(A\). Let \(\{x_1^j, x_2^j, \dots x_{g_j}^j\}\) be any \(g_j\) linearly independent eigen vectors from the eigen space of \(\lambda_j\) where \(g_j\) is the geometric multiplicity of \(\lambda_j\). Then the combined set of eigen vectors given by \(\{x_1^1, \dots x_{g_1}^1, \dots x_1^k, \dots x_{g_k}^k\}\) consisting of \(\sum_{j=1}^k g_j\) eigen vectors is linearly independent.

This result puts an upper limit on the number of linearly independent eigen vectors of a square matrix.

Lemma

Let \(\{ \lambda_1, \dots, \lambda_k \}\) represents the spectrum of an \(n \times n\) matrix \(A\). Let \(g_1, \dots, g_k\) be the geometric multiplicities of \(\lambda_1, \dots \lambda_k\) respectively. Then the number of linearly independent eigen vectors for \(A\) is

\[\sum_{i=1}^k g_i.\]

Moreover if

\[\sum_{i=1}^k g_i = n\]

then a set of \(n\) linearly independent eigen vectors of \(A\) can be found which forms a basis for \(\FF^n\).

Diagonalization¶

Diagonalization is one of the fundamental operations in linear algebra. This section discusses diagonalization of square matrices in depth.

Definition

An \(n \times n\) matrix \(A\) is said to be diagonalizable if it is similar to a diagonal matrix. In other words there exists an \(n\times n\) non-singular matrix \(P\) such that \(D = P^{-1} A P\) is a diagonal matrix. If this happens then we say that \(P\) diagonalizes \(A\) or \(A\) is diagonalized by \(P\).

Remark

\[D = P^{-1} A P \iff P D = A P \iff P D P^{-1} = A.\]

We note that if we restrict to real matrices, then \(U\) and \(D\) should also be real. If \(A \in \CC^{n \times n}\) (it may still be real) then \(P\) and \(D\) can be complex.

The next theorem is the culmination of a variety of results studied so far.

Theorem

Let \(A\) be a diagonalizable matrix with \(D = P^{-1} A P\) being its diagonalization. Let \(D = \Diag(d_1, d_2, \dots, d_n)\). Then the following hold

\(\Rank(A) = \Rank(D)\) which equals the number of non-zero entries on the main diagonal of \(D\).
\(\det(A) = d_1 d_2 \dots d_n\).
\(\Trace(A) = d_1 + d_2 + \dots d_n\).
The characteristic polynomial of \(A\) is

\[p(\lambda) = (-1)^n (\lambda - d_1) (\lambda -d_2) \dots (\lambda - d_n).\]
The spectrum of \(A\) comprises the distinct scalars on the diagonal entries in \(D\).
The (not necessarily distinct) eigenvalues of \(A\) are the diagonal elements of \(D\).
The columns of \(P\) are (linearly independent) eigenvectors of \(A\).
The algebraic and geometric multiplicities of an eigenvalue \(\lambda\) of \(A\) equal the number of diagonal elements of \(D\) that equal \(\lambda\).

Proof

From here we note that \(D\) and \(A\) are similar. Due to here

\[\det(A) = \det(D).\]

Due to here

\[\det(D) = \prod_{i=1}^n d_i.\]

Now due to here

\[\Trace(A) = \Trace(D) = \sum_{i=1}^n d_i.\]

Further due to here the characteristic polynomial and spectrum of \(A\) and \(D\) are same. Due to here the eigen values of \(D\) are nothing but its diagonal entries. Hence they are also the eigen values of \(A\).

\[D = P^{-1} A P \implies A P = P D.\]

Now writing

\[P = \begin{bmatrix} p_1 & p_2 & \dots & p_n \end{bmatrix}\]

we have

\[A P = \begin{bmatrix} A p_1 & A p_2 & \dots & A p_n \end{bmatrix} = P D = \begin{bmatrix} d_1 p_1 & d_2 p_2 & \dots & d_n p_n \end{bmatrix}.\]

Thus \(p_i\) are eigen vectors of \(A\).

Since the characteristic polynomials of \(A\) and \(D\) are same, hence the algebraic multiplicities of eigen values are same.

From here we get that there is a one to one correspondence between the eigen vectors of \(A\) and \(D\) through the change of basis given by \(P\). Thus the linear independence relationships between the eigen vectors remain the same. Hence the geometric multiplicities of individual eigenvalues are also the same.

This completes the proof.

So far we have verified various results which are available if a matrix \(A\) is diagonalizable. We haven’t yet identified the conditions under which \(A\) is diagonalizable. We note that not every matrix is diagonalizable. The following theorem gives necessary and sufficient conditions under which a matrix is diagonalizable.

Theorem

An \(n \times n\) matrix \(A\) is diagonalizable by an \(n \times n\) non-singular matrix \(P\) if and only if the columns of \(P\) are (linearly independent) eigenvectors of \(A\).

Proof

We note that since \(P\) is non-singular hence columns of \(P\) have to be linearly independent.

The necessary condition part was proven in here. We now show that if \(P\) consists of \(n\) linearly independent eigen vectors of \(A\) then \(A\) is diagonalizable.

Let the columns of \(P\) be \(p_1, p_2, \dots, p_n\) and corresponding (not necessarily distinct) eigen values be \(d_1, d_2, \dots , d_n\). Then

\[A p_i = d_i p_i.\]

Thus by letting \(D = \Diag (d_1, d_2, \dots, d_n)\), we have

\[A P = P D.\]

Now since columns of \(P\) are linearly independent, hence \(P\) is invertible. This gives us

\[D = P^{-1} A P.\]

Thus \(A\) is similar to a diagonal matrix \(D\). This validates the sufficient condition.

A corollary follows.

Corollary

An \(n \times n\) matrix is diagonalizable if and only if there exists a linearly independent set of \(n\) eigenvectors of \(A\).

Now we know that geometric multiplicities of eigen values of \(A\) provide us information about linearly independent eigenvectors of \(A\).

Corollary

Let \(A\) be an \(n \times n\) matrix. Let \(\lambda_1, \lambda_2, \dots, \lambda_k\) be its \(k\) distinct eigen values (comprising its spectrum). Let \(g_j\) be the geometric multiplicity of \(\lambda_j\).Then \(A\) is diagonalizable if and only if

\[\sum_{i=1}^n g_i = n.\]

Symmetric matrices¶

This subsection is focused on real symmetric matrices.

Following is a fundamental property of real symmetric matrices.

Theorem

Every real symmetric matrix has an eigen value.

The proof of this result is beyond the scope of this book.

Lemma

Let \(A\) be an \(n \times n\) real symmetric matrix. Let \(\lambda_1\) and \(\lambda_2\) be any two distinct eigen values of \(A\) and let \(x_1\) and \(x_2\) be any two corresponding eigen vectors. Then \(x_1\) and \(x_2\) are orthogonal.

Proof

By definition we have \(A x_1 = \lambda_1 x_1\) and \(A x_2 = \lambda_2 x_2\). Thus

\[\begin{split}\begin{aligned} & x_2^T A x_1 = \lambda_1 x_2^T x_1\\ \implies & x_1^T A^T x_2 = \lambda_1 x_1^T x_2 \\ \implies & x_1^T A x_2 = \lambda_1 x_1^T x_2\\ \implies & x_1^T \lambda_2 x_2 = \lambda_1 x_1^T x_2\\ \implies & (\lambda_1 - \lambda_2) x_1^T x_2 = 0 \\ \implies & x_1^T x_2 = 0. \end{aligned}\end{split}\]

Thus \(x_1\) and \(x_2\) are orthogonal. In between we took transpose on both sides, used the fact that \(A= A^T\) and \(\lambda_1 - \lambda_2 \neq 0\).

Definition

A real \(n \times n\) matrix \(A\) is said to be orthogonally diagonalizable if there exists an orthogonal matrix \(U\) which can diagonalize \(A\), i.e.

\[D = U^T A U\]

is a real diagonal matrix.

Lemma

Every orthogonally diagonalizable matrix \(A\) is symmetric.

Proof

We have a diagonal matrix \(D\) such that

\[A = U D U^T.\]

Taking transpose on both sides we get

\[A^T = U D^T U^T = U D U^T = A.\]

Thus \(A\) is symmetric.

Theorem

Every symmetric matrix \(A\) is orthogonally diagonalizable.

We skip the proof of this theorem.

Hermitian matrices¶

Following is a fundamental property of Hermitian matrices.

Theorem

Every Hermitian matrix has an eigen value.

The proof of this result is beyond the scope of this book.

Lemma

The eigenvalues of a Hermitian matrix are real.

Proof

Let \(A\) be a Hermitian matrix and let \(\lambda\) be an eigen value of \(A\). Let \(u\) be a corresponding eigen vector. Then

\[\begin{split}\begin{aligned} & A u = \lambda u\\ \implies & u^H A^H = u^H \overline{\lambda} \\ \implies & u^H A^H u = u^H \overline{\lambda} u\\ \implies & u^H A u = \overline{\lambda} u^H u \\ \implies & u^H \lambda u = \overline{\lambda} u^H u \\ \implies &\|u\|_2^2 (\lambda - \overline{\lambda}) = 0\\ \implies & \lambda = \overline{\lambda} \end{aligned}\end{split}\]

thus \(\lambda\) is real. We used the facts that \(A = A^H\) and \(u \neq 0 \implies \|u\|_2 \neq 0\).

Lemma

Let \(A\) be an \(n \times n\) complex Hermitian matrix. Let \(\lambda_1\) and \(\lambda_2\) be any two distinct eigen values of \(A\) and let \(x_1\) and \(x_2\) be any two corresponding eigen vectors. Then \(x_1\) and \(x_2\) are orthogonal.

Proof

By definition we have \(A x_1 = \lambda_1 x_1\) and \(A x_2 = \lambda_2 x_2\). Thus

\[\begin{split}\begin{aligned} & x_2^H A x_1 = \lambda_1 x_2^H x_1\\ \implies & x_1^H A^H x_2 = \lambda_1 x_1^H x_2 \\ \implies & x_1^H A x_2 = \lambda_1 x_1^H x_2\\ \implies & x_1^H \lambda_2 x_2 = \lambda_1 x_1^H x_2\\ \implies & (\lambda_1 - \lambda_2) x_1^H x_2 = 0 \\ \implies & x_1^H x_2 = 0. \end{aligned}\end{split}\]

Thus \(x_1\) and \(x_2\) are orthogonal. In between we took conjugate transpose on both sides, used the fact that \(A= A^H\) and \(\lambda_1 - \lambda_2 \neq 0\).

Definition

A complex \(n \times n\) matrix \(A\) is said to be unitary diagonalizable if there exists a unitary matrix \(U\) which can diagonalize \(A\), i.e.

\[D = U^H A U\]

is a complex diagonal matrix.

Lemma

Let \(A\) be a unitary diagonalizable matrix whose diagonalization \(D\) is real. Then \(A\) is Hermitian.

Proof

We have a real diagonal matrix \(D\) such that

\[A = U D U^H.\]

Taking conjugate transpose on both sides we get

\[A^H = U D^H U^H = U D U^H = A.\]

Thus \(A\) is Hermitian. We used the fact that \(D^H = D\) since \(D\) is real.

Theorem

Every Hermitian matrix \(A\) is unitary diagonalizable.

We skip the proof of this theorem. The theorem means that if \(A\) is Hermitian then \(A = U \Lambda U^H\)

Definition

\[\Lambda = \Diag(\lambda_1, \dots, \lambda_n).\]

Let \(U\) be a unit matrix consisting of orthonormal eigen vectors corresponding to \(\lambda_1, \dots, \lambda_n\). Then The eigen value decomposition of \(A\) is defined as

\[A = U \Lambda U^H.\]

If \(\lambda_i\) are distinct, then the decomposition is unique. If they are not distinct, then

Remark

Let \(\Lambda\) be a diagonal matrix as in here. Consider some vector \(x \in \CC^n\).

\[x^H \Lambda x = \sum_{i=1}^n \lambda_i | x_i |^2.\]

Now if \(\lambda_i \geq 0\) then

\[x^H \Lambda x \leq \lambda_1 \sum_{i=1}^n | x_i |^2 = \lambda_1 \| x \|_2^2.\]

Also

\[x^H \Lambda x \geq \lambda_n \sum_{i=1}^n | x_i |^2 = \lambda_n \| x \|_2^2.\]

Lemma

Let \(A\) be a Hermitian matrix with non-negative eigen values. Let \(\lambda_1\) be its largest and \(\lambda_n\) be its smallest eigen values.

\[\lambda_n \| x\|_2^2 \leq x^H A x \leq \lambda_1 \|x \|_2^2 \Forall x \in \CC^n.\]

Proof

\(A\) has an eigen value decomposition given by

\[A = U \Lambda U^H.\]

Let \(x \in \CC^n\) and let \(v = U^H x\). Clearly \(\| x \|_2 = \| v \|_2\). Then

\[x^H A x = x^H U \Lambda U^H x = v^H \Lambda v.\]

From previous remark we have

\[\lambda_n \| v \|_2^2 \leq v^H \Lambda v \leq \lambda_1 \| v \|_2^2.\]

Thus we get

\[\lambda_n \| x \|_2^2 \leq x^H A x \leq \lambda_1 \| x \|_2^2.\]

Miscellaneous properties¶

This subsection lists some miscellaneous properties of eigen values of a square matrix.

Lemma

\(\lambda\) is an eigen value of \(A\) if and only if \(\lambda + k\) is an eigen value of \(A + k I\). Moreover \(A\) and \(A + kI\) share the same eigen vectors.

Proof

\[\begin{split}\begin{aligned} & A x = \lambda x \\ \iff & A x + k x = \lambda x + k x \\ \iff & (A + k I ) x = (\lambda + k) x. \end{aligned}\end{split}\]

Thus \(\lambda\) is an eigen value of \(A\) with an eigen vector \(x\) if and only if \(\lambda + k\) is an eigen vector of \(A + kI\) with an eigen vector \(x\).

Diagonally dominant matrices¶

Definition

Let \(A = [a_{ij}]\) be a square matrix in \(\CC^{n \times n}\). \(A\) is called diagonally dominant if

\[| a_{ii} | \geq \sum_{j \neq i } |a_{ij}|\]

holds true for all \(1 \leq i \leq n\). i.e. the absolute value of the diagonal element is greater than or equal to the sum of absolute values of all the off diagonal elements on that row.

Definition

Let \(A = [a_{ij}]\) be a square matrix in \(\CC^{n \times n}\). \(A\) is called strictly diagonally dominant if

\[| a_{ii} | > \sum_{j \neq i } |a_{ij}|\]

holds true for all \(1 \leq i \leq n\). i.e. the absolute value of the diagonal element is bigger than the sum of absolute values of all the off diagonal elements on that row.

ExampleStrictly diagonally dominant matrix

Let us consider

\[\begin{split}A = \begin{bmatrix} -4 & -2 & -1 & 0\\ -4 & 7 & 2 & 0\\ 3 & -4 & 9 & 1\\ 2 & -1 & -3 & 15 \end{bmatrix}\end{split}\]

We can see that the strict diagonal dominance condition is satisfied for each row as follows:

\[\begin{split}\begin{aligned} & \text{ row 1}: \quad & |-4| > |-2| + |-1| + |0| = 3 \\ & \text{ row 2}: \quad & |7| > |-4| + |2| + |0| = 6 \\ & \text{ row 3}: \quad & |9| > |3| + |-4| + |1| = 8 \\ & \text{ row 4}: \quad & |15| > |2| + |-1| + |-3| = 6 \end{aligned}\end{split}\]

Strictly diagonally dominant matrices have a very special property. They are always non-singular.

Theorem

Strictly diagonally dominant matrices are non-singular.

Proof

Suppose that \(A\) is diagonally dominant and singular. Then there exists a vector \(u \in \CC^n\) with \(u\neq 0\) such that

(2)¶\[A u = 0.\]

Let

\[u = \begin{bmatrix}u_1 & u_2 & \dots & u_n \end{bmatrix}^T.\]

We first show that every entry in \(u\) cannot be equal in magnitude. Let us assume that this is so. i.e.

\[c = | u_1 | = | u_2 | = \dots = | u_n|.\]

Since \(u \neq 0\) hence \(c \neq 0\). Now for any row \(i\) in (2) , we have

\[\begin{split}\begin{aligned} & \sum_{j=1}^n a_{ij} u_j = 0\\ \implies & \sum_{j=1}^n \pm a_{ij} c = 0\\ \implies & \sum_{j=1}^n \pm a_{ij} = 0\\ \implies & \mp a_{ii} = \sum_{j \neq i} \pm a_{ij}\\ \implies & |a_{ii}| = | \sum_{j \neq i} \pm a_{ij}|\\ \implies & |a_{ii}| \leq \sum_{j \neq i} |a_{ij}| \quad {\text{ using triangle inequality}} \end{aligned}\end{split}\]

but this contradicts our assumption that \(A\) is strictly diagonally dominant. Thus all entries in \(u\) are not equal in magnitude.

Let us now assume that the largest entry in \(u\) lies at index \(i\) with \(|u_i| = c\). Without loss of generality we can scale down \(u\) by \(c\) to get another vector in which all entries are less than or equal to 1 in magnitude while \(i\)-th entry is \(\pm 1\). i.e. \(u_i = \pm 1\) and \(|u_j| \leq 1\) for all other entries.

Now from (2) we get for the \(i\)-th row

\[\begin{split}\begin{aligned} & \sum_{j=1}^n a_{ij} u_j = 0\\\ \implies & \pm a_{ii} = \sum_{j \neq i} u_j a_{ij}\\ \implies & |a_{ii}| \leq \sum_{j \neq i} |u_j a_{ij}| \leq \sum_{j \neq i} |a_{ij}| \end{aligned}\end{split}\]

which again contradicts our assumption that \(A\) is strictly diagonally dominant.

Hence strictly diagonally dominant matrices are non-singular.

Gershgorin’s theorem¶

We are now ready to examine Gershgorin’ theorem which provides very useful bounds on the spectrum of a square matrix.

Theorem

Every eigen value \(\lambda\) of a square matrix \(A \in \CC^{n\times n}\) satisfies

(3)¶\[| \lambda - a_{ii}| \leq \sum_{j\neq i} |a_{ij}| \text{ for some } i \in \{1,2, \dots, n \}.\]

Proof

The proof is a straight forward application of non-singularity of diagonally dominant matrices.

We know that for an eigen value \(\lambda\), \(\det(\lambda I - A) = 0\) i.e. the matrix \((\lambda I - A)\) is singular. Hence it cannot be strictly diagonally dominant due to here.

Thus looking at each row \(i\) of \((\lambda I - A)\) we can say that

\[| \lambda - a_{ii}| > \sum_{j\neq i} |a_{ij}|\]

cannot be true for all rows simultaneously. i.e. it must fail at least for one row. This means that there exists at least one row \(i\) for which

\[| \lambda - a_{ii}| \leq \sum_{j\neq i} |a_{ij}|\]

holds true.

What this theorem means is pretty simple. Consider a disc in the complex plane for the \(i\)-th row of \(A\) whose center is given by \(a_{ii}\) and whose radius is given by \(r = \sum_{j\neq i} |a_{ij}|\) i.e. the sum of magnitudes of all non-diagonal entries in \(i\)-th row.

There are \(n\) such discs corresponding to \(n\) rows in \(A\). (3) means that every eigen value must lie within the union of these discs. It cannot lie outside.

This idea is crystallized in following definition.

Definition

For \(i\)-th row of matrix \(A\) we define the radius \(r_i = \sum_{j\neq i} |a_{ij}|\) and the center \(c_i = a_{ii}\). Then the set given by

\[D_i = \{z \in \CC : | z - a_{ii} | \leq r_i \}\]

is called the \(i\)-th Gershgorin’s disc of \(A\).

We note that the definition is equally valid for real as well as complex matrices. For real matrices, the centers of disks lie on the real line. For complex matrices, the centers may lie anywhere in the complex plane.

Clearly there is nothing magical about the rows of \(A\). We can as well consider the columns of \(A\).

Theorem

Every eigen value of a matrix \(A\) must lie in a Gershgorin disc corresponding to the columns of \(A\) where the Gershgorin disc for \(j\)-th column is given by

\[D_j = \{z \in \CC : | z - a_{jj} | \leq r_j \}\]

with

\[r_j = \sum_{i \neq j} |a_{ij}|\]

Proof

We know that eigen values of \(A\) are same as eigen values of \(A^T\) and columns of \(A\) are nothing but rows of \(A^T\). Hence eigen values of \(A\) must satisfy conditions in here w.r.t. the matrix \(A^T\). This completes the proof.