Eigen values¶
Much of the discussion in this section will be equally applicable to real as well as complex matrices. We will use the complex notation mostly and make specific remarks for real matrices wherever needed.
A scalar \(\lambda\) is an eigen value of an \(n \times n\) matrix \(A = [ a_{ij} ]\) if there exists a non null vector \(x\) such that
A non null vector \(x\) which satisfies this equation is called an eigen vector of \(A\) for the eigen value \(\lambda\).
An eigen value is also known as a characteristic value, proper value or a latent value.
We note that (1) can be written as
Thus \(\lambda\) is an eigen value of \(A\) if and only if the matrix \(A - \lambda I\) is singular.
Assume that for \(x\) there are two eigen values \(\lambda_1\) and \(\lambda_2\), then
This can happen only when either \(x = 0\) or \(\lambda_1 = \lambda_2\). Since \(x\) is an eigen vector, it cannot be 0. Thus \(\lambda_1 = \lambda_2\).
If \(x\) is an eigen vector for \(A\), then the corresponding eigen value is given by
since \(x\) is non-zero.
An eigen vector \(x\) of \(A\) for eigen value \(\lambda\) belongs to the null space of \(A - \lambda I\), i.e.
In other words \(x\) is a nontrivial solution to the homogeneous system of linear equations given by
The set comprising all the eigen vectors of \(A\) for an eigen value \(\lambda\) is given by
since \(0\) cannot be an eigen vector.
Clearly
A scalar \(\lambda\) can be an eigen value of a square matrix \(A\) if and only if
\(\det (A - \lambda I)\) is a polynomial in \(\lambda\) of degree \(n\).
where \(\alpha_i\) depend on entries in \(A\).
In this sense, an eigen value of \(A\) is a root of the equation
Its easy to show that \(\alpha^n = (-1)^n\).
For any square matrix \(A\), the polynomial given by \(p(\lambda) = \det(A - \lambda I )\) is known as its characteristic polynomial. The equation give by
is known as its characteristic equation. The eigen values of \(A\) are the roots of its characteristic polynomial or solutions of its characteristic equation.
For real square matrices, if we restrict eigen values to real values, then the characteristic polynomial can be factored as
The polynomial has \(k\) distinct real roots. For each root \(\lambda_i\), \(r_i\) is a positive integer indicating how many times the root appears. \(q(\lambda)\) is a polynomial that has no real roots. The following is true
Clearly \(k \leq n\).
For complex square matrices where eigen values can be complex (including real square matrices), the characteristic polynomial can be factored as
The polynomial can be completely factorized into first degree polynomials. There are \(k\) distinct roots or eigen values. The following is true
Thus including the duplicates there are exactly \(n\) eigen values for a complex square matrix.
Let us consider the sum of \(r_i\) which gives the count of total number of roots of \(p(\lambda)\).
With this there are \(m\) not-necessarily distinct roots of \(p(\lambda)\). Let us write \(p(\lambda)\) as
where \(c_1, c_2, \dots, c_m\) are \(m\) scalars (not necessarily distinct) of which \(r_1\) scalars are \(\lambda_1\), \(r_2\) are \(\lambda_2\) and so on. Obviously for the complex case \(q(\lambda)=1\).
We will refer to the set (allowing repetitions) \(\{c_1, c_2, \dots, c_m \}\) as the eigen values of the matrix \(A\) where \(c_i\) are not necessarily distinct. In contrast the spectrum of \(A\) refers to the set of distinct eigen values of \(A\). The symbol \(c\) has been chosen based on the other name for eigen values (the characteristic values).
We can put together eigen vectors of a matrix into another matrix by itself. This can be very useful tool. We start with a simple idea.
Let \(A\) be an \(n \times n\) matrix. Let \(u_1, u_2, \dots, u_r\) be \(r\) non-zero vectors from \(\FF^n\). Let us construct an \(n \times r\) matrix
Then all the \(r\) vectors are eigen vectors of \(A\) if and only if there exists a diagonal matrix \(D = \Diag(d_1, \dots, d_r)\) such that
Expanding the equation, we can write
Clearly we want
where \(u_i\) are non-zero. This is possible only when \(d_i\) is an eigen value of \(A\) and \(u_i\) is an eigen vector for \(d_i\).
Converse: Assume that \(u_i\) are eigen vectors. Choose \(d_i\) to be corresponding eigen values. Then the equation holds.
Let \(0\) be an eigen value of \(A\). Then there exists \(u \neq 0\) such that
Thus \(u\) is a non-trivial solution of the homogeneous linear system. Thus \(A\) is singular.
Converse: Assuming that \(A\) is singular, there exists \(u \neq 0\) s.t.
Thus \(0\) is an eigen value of \(A\).
The eigen values of \(A^T\) are given by
But
Hence (using here)
Thus the characteristic polynomials of \(A\) and \(A^T\) are same. Hence the eigen values are same. In other words the spectrum of \(A\) and \(A^T\) are same.
If \(x\) is an eigen vector with a non-zero eigen value \(\lambda\) for \(A\) then \(Ax\) and \(x\) are collinear.
In other words the angle between \(Ax\) and \(x\) is either \(0^{\circ}\) when \(\lambda\) is positive and is \(180^{\circ}\) when \(\lambda\) is negative. Let us look at the inner product:
Meanwhile
Thus
The angle \(\theta\) between \(Ax\) and \(x\) is given by
For \(p=1\) the statement holds trivially since \(\lambda^1\) is an eigen value of \(A^1\). Assume that the statement holds for some value of \(p\). Thus let \(\lambda^p\) be an eigen value of \(A^{p}\) and let \(u\) be corresponding eigen vector. Now
Thus \(\lambda^{p + 1}\) is an eigen value for \(A^{p + 1}\) with the same eigen vector \(u\). With the principle of mathematical induction, the proof is complete.
Let \(u \neq 0\) be an eigen vector of \(A\) for the eigen value \(\lambda\). Then
Thus \(u\) is also an eigen vector of \(A^{-1}\) for the eigen value \(\frac{1}{\lambda}\).
Now let \(B = A^{-1}\). Then \(B^{-1} = A\). Thus if \(\mu\) is an eigen value of \(B\) then \(\frac{1}{\mu}\) is an eigen value of \(B^{-1} = A\).
Thus if \(A\) is invertible then eigen values of \(A\) and \(A^{-1}\) have one to one correspondence.
This result is very useful. Since if it can be shown that a matrix \(A\) is similar to a diagonal or a triangular matrix whose eigen values are easy to obtain then determination of the eigen values of \(A\) becomes straight forward.
Invariant subspaces¶
Let \(A\) be a square \(n\times n\) matrix and let \(\WW\) be a subspace of \(\FF^n\) i.e. \(\WW \leq \FF\). Then \(\WW\) is invariant relative to \(A\) if
i.e. \(A (W) \subseteq W\) or for every vector \(w \in \WW\) its mapping \(A w\) is also in \(\WW\). Thus action of \(A\) on \(\WW\) doesn’t take us outside of \(\WW\).
We also say that \(\WW\) is \(A\)-invariant.
Eigen vectors are generators of invariant subspaces.
Let \(A\) be an \(n \times n\) matrix. Let \(x_1, x_2, \dots, x_r\) be \(r\) eigen vectors of \(A\). Let us construct an \(n \times r\) matrix
Then the column space of \(X\) i.e. \(\ColSpace(X)\) is invariant relative to \(A\).
Let us assume that \(c_1, c_2, \dots, c_r\) are the eigen values corresponding to \(x_1, x_2, \dots, x_r\) (not necessarily distinct).
Let any vector \(x \in \ColSpace(X)\) be given by
Then
Clearly \(Ax\) is also a linear combination of \(x_i\) hence belongs to \(\ColSpace(X)\). Thus \(X\) is invariant relative to \(A\) or \(X\) is \(A\)-invariant.
Triangular matrices¶
If \(A\) is triangular then \(A - \lambda I\) is also triangular with its diagonal entries being \((a_{i i} - \lambda)\). Using here, we have
Clearly the roots of characteristic polynomial are \(a_{i i}\).
Several small results follow from this lemma.
Let \(A = [a_{i j}]\) be an \(n \times n\) triangular matrix.
- The characteristic polynomial of \(A\) is \(p(\lambda) = (-1)^n (\lambda - a_{i i})\).
- A scalar \(\lambda\) is an eigen value of \(A\) iff its one of the diagonal entries of \(A\).
- The algebraic multiplicity of an eigen value \(\lambda\) is equal to the number of times it appears on the main diagonal of \(A\).
- The spectrum of \(A\) is given by the distinct entries on the main diagonal of \(A\).
A diagonal matrix is naturally both an upper triangular matrix as well as a lower triangular matrix. Similar results hold for the eigen values of a diagonal matrix also.
Let \(A = [a_{i j}]\) be an \(n \times n\) diagonal matrix.
- Its eigen values are the entries on its main diagonal.
- The characteristic polynomial of \(A\) is \(p(\lambda) = (-1)^n (\lambda - a_{i i})\).
- A scalar \(\lambda\) is an eigen value of \(A\) iff its one of the diagonal entries of \(A\).
- The algebraic multiplicity of an eigen value \(\lambda\) is equal to the number of times it appears on the main diagonal of \(A\).
- The spectrum of \(A\) is given by the distinct entries on the main diagonal of \(A\).
There is also a result for the geometric multiplicity of eigen values for a diagonal matrix.
The unit vectors \(e_i\) are eigen vectors for \(A\) since
They are independent. Thus if a particular eigen value appears \(r\) number of times, then there are \(r\) linearly independent eigen vectors for the eigen value. Thus its geometric multiplicity is equal to the algebraic multiplicity.
Similar matrices¶
Some very useful results are available for similar matrices.
Let \(B\) be similar to \(A\). Thus there exists an invertible matrix \(C\) such that
Now
Thus \(B - \lambda I\) is similar to \(A - \lambda I\). Hence due to here, their determinant is equal i.e.
This means that the characteristic polynomials of \(A\) and \(B\) are same. Since eigen values are nothing but roots of the characteristic polynomial, hence they are same too. This means that the spectrum (the set of distinct eigen values) is same.
If \(A\) and \(B\) are similar to each other then
- An eigen value has same algebraic and geometric multiplicity for both \(A\) and \(B\).
- The (not necessarily distinct) eigen values of \(A\) and \(B\) are same.
Although the eigen values are same, but the eigen vectors are different.
Let \(A\) and \(B\) be similar with
for some invertible matrix \(C\). If \(u\) is an eigen vector of \(A\) for an eigen value \(\lambda\), then \(C^{-1} u\) is an eigen vector of \(B\) for the same eigen value.
\(u\) is an eigen vector of \(A\) for an eigen value \(\lambda\). Thus we have
Thus
Now \(u \neq 0\) and \(C^{-1}\) is non singular. Thus \(C^{-1} u \neq 0\). Thus \(C^{-1}u\) is an eigen vector of \(B\).
Let an \(n \times n\) matrix \(A\) has \(k\) distinct eigen values \(\lambda_1, \lambda_2, \dots, \lambda_k\) with algebraic multiplicities \(r_1, r_2, \dots, r_k\) and geometric multiplicities \(g_1, g_2, \dots g_k\) respectively. Then
Moreover if
then
Linear independence of eigen vectors¶
We first prove the simpler case with 2 eigen vectors \(x_1\) and \(x_2\) and corresponding eigen values \(\lambda_1\) and \(\lambda_2\) respectively.
Let there be a linear relationship between \(x_1\) and \(x_2\) given by
Multiplying both sides with \((A - \lambda_1 I)\) we get
Since \(\lambda_1 \neq \lambda_2\) and \(x_2 \neq 0\) , hence \(\alpha_2 = 0\).
Similarly by multiplying with \((A - \lambda_2 I)\) on both sides, we can show that \(\alpha_1 = 0\). Thus \(x_1\) and \(x_2\) are linearly independent.
Now for the general case, consider a linear relationship between \(x_1, x_2, \dots , x_k\) given by
Multiplying by \(\prod_{i \neq j, i=1}^k (A - \lambda_i I)\) and using the fact that \(\lambda_i \neq \lambda_j\) if \(i \neq j\), we get \(\alpha_j = 0\). Thus the only linear relationship is the trivial relationship. This completes the proof.
For eigen values with geometric multiplicity greater than \(1\) there are multiple eigenvectors corresponding to the eigen value which are linearly independent. In this context, above theorem can be generalized further.
This result puts an upper limit on the number of linearly independent eigen vectors of a square matrix.
Let \(\{ \lambda_1, \dots, \lambda_k \}\) represents the spectrum of an \(n \times n\) matrix \(A\). Let \(g_1, \dots, g_k\) be the geometric multiplicities of \(\lambda_1, \dots \lambda_k\) respectively. Then the number of linearly independent eigen vectors for \(A\) is
Moreover if
then a set of \(n\) linearly independent eigen vectors of \(A\) can be found which forms a basis for \(\FF^n\).
Diagonalization¶
Diagonalization is one of the fundamental operations in linear algebra. This section discusses diagonalization of square matrices in depth.
We note that if we restrict to real matrices, then \(U\) and \(D\) should also be real. If \(A \in \CC^{n \times n}\) (it may still be real) then \(P\) and \(D\) can be complex.
The next theorem is the culmination of a variety of results studied so far.
Let \(A\) be a diagonalizable matrix with \(D = P^{-1} A P\) being its diagonalization. Let \(D = \Diag(d_1, d_2, \dots, d_n)\). Then the following hold
\(\Rank(A) = \Rank(D)\) which equals the number of non-zero entries on the main diagonal of \(D\).
\(\det(A) = d_1 d_2 \dots d_n\).
\(\Trace(A) = d_1 + d_2 + \dots d_n\).
The characteristic polynomial of \(A\) is
\[p(\lambda) = (-1)^n (\lambda - d_1) (\lambda -d_2) \dots (\lambda - d_n).\]The spectrum of \(A\) comprises the distinct scalars on the diagonal entries in \(D\).
The (not necessarily distinct) eigenvalues of \(A\) are the diagonal elements of \(D\).
The columns of \(P\) are (linearly independent) eigenvectors of \(A\).
The algebraic and geometric multiplicities of an eigenvalue \(\lambda\) of \(A\) equal the number of diagonal elements of \(D\) that equal \(\lambda\).
From here we note that \(D\) and \(A\) are similar. Due to here
Due to here
Now due to here
Further due to here the characteristic polynomial and spectrum of \(A\) and \(D\) are same. Due to here the eigen values of \(D\) are nothing but its diagonal entries. Hence they are also the eigen values of \(A\).
Now writing
we have
Thus \(p_i\) are eigen vectors of \(A\).
Since the characteristic polynomials of \(A\) and \(D\) are same, hence the algebraic multiplicities of eigen values are same.
From here we get that there is a one to one correspondence between the eigen vectors of \(A\) and \(D\) through the change of basis given by \(P\). Thus the linear independence relationships between the eigen vectors remain the same. Hence the geometric multiplicities of individual eigenvalues are also the same.
This completes the proof.
So far we have verified various results which are available if a matrix \(A\) is diagonalizable. We haven’t yet identified the conditions under which \(A\) is diagonalizable. We note that not every matrix is diagonalizable. The following theorem gives necessary and sufficient conditions under which a matrix is diagonalizable.
We note that since \(P\) is non-singular hence columns of \(P\) have to be linearly independent.
The necessary condition part was proven in here. We now show that if \(P\) consists of \(n\) linearly independent eigen vectors of \(A\) then \(A\) is diagonalizable.
Let the columns of \(P\) be \(p_1, p_2, \dots, p_n\) and corresponding (not necessarily distinct) eigen values be \(d_1, d_2, \dots , d_n\). Then
Thus by letting \(D = \Diag (d_1, d_2, \dots, d_n)\), we have
Now since columns of \(P\) are linearly independent, hence \(P\) is invertible. This gives us
Thus \(A\) is similar to a diagonal matrix \(D\). This validates the sufficient condition.
A corollary follows.
Now we know that geometric multiplicities of eigen values of \(A\) provide us information about linearly independent eigenvectors of \(A\).
Let \(A\) be an \(n \times n\) matrix. Let \(\lambda_1, \lambda_2, \dots, \lambda_k\) be its \(k\) distinct eigen values (comprising its spectrum). Let \(g_j\) be the geometric multiplicity of \(\lambda_j\).Then \(A\) is diagonalizable if and only if
Symmetric matrices¶
This subsection is focused on real symmetric matrices.
Following is a fundamental property of real symmetric matrices.
The proof of this result is beyond the scope of this book.
By definition we have \(A x_1 = \lambda_1 x_1\) and \(A x_2 = \lambda_2 x_2\). Thus
Thus \(x_1\) and \(x_2\) are orthogonal. In between we took transpose on both sides, used the fact that \(A= A^T\) and \(\lambda_1 - \lambda_2 \neq 0\).
A real \(n \times n\) matrix \(A\) is said to be orthogonally diagonalizable if there exists an orthogonal matrix \(U\) which can diagonalize \(A\), i.e.
is a real diagonal matrix.
We have a diagonal matrix \(D\) such that
Taking transpose on both sides we get
Thus \(A\) is symmetric.
We skip the proof of this theorem.
Hermitian matrices¶
Following is a fundamental property of Hermitian matrices.
The proof of this result is beyond the scope of this book.
Let \(A\) be a Hermitian matrix and let \(\lambda\) be an eigen value of \(A\). Let \(u\) be a corresponding eigen vector. Then
thus \(\lambda\) is real. We used the facts that \(A = A^H\) and \(u \neq 0 \implies \|u\|_2 \neq 0\).
By definition we have \(A x_1 = \lambda_1 x_1\) and \(A x_2 = \lambda_2 x_2\). Thus
Thus \(x_1\) and \(x_2\) are orthogonal. In between we took conjugate transpose on both sides, used the fact that \(A= A^H\) and \(\lambda_1 - \lambda_2 \neq 0\).
A complex \(n \times n\) matrix \(A\) is said to be unitary diagonalizable if there exists a unitary matrix \(U\) which can diagonalize \(A\), i.e.
is a complex diagonal matrix.
We have a real diagonal matrix \(D\) such that
Taking conjugate transpose on both sides we get
Thus \(A\) is Hermitian. We used the fact that \(D^H = D\) since \(D\) is real.
We skip the proof of this theorem. The theorem means that if \(A\) is Hermitian then \(A = U \Lambda U^H\)
Let \(A\) be an \(n \times n\) Hermitian matrix. Let \(\lambda_1, \dots \lambda_n\) be its eigen values such that \(|\lambda_1| \geq |\lambda_2| \geq \dots \geq |\lambda_n |\). Let
Let \(U\) be a unit matrix consisting of orthonormal eigen vectors corresponding to \(\lambda_1, \dots, \lambda_n\). Then The eigen value decomposition of \(A\) is defined as
If \(\lambda_i\) are distinct, then the decomposition is unique. If they are not distinct, then
Let \(\Lambda\) be a diagonal matrix as in here. Consider some vector \(x \in \CC^n\).
Now if \(\lambda_i \geq 0\) then
Also
Let \(A\) be a Hermitian matrix with non-negative eigen values. Let \(\lambda_1\) be its largest and \(\lambda_n\) be its smallest eigen values.
\(A\) has an eigen value decomposition given by
Let \(x \in \CC^n\) and let \(v = U^H x\). Clearly \(\| x \|_2 = \| v \|_2\). Then
From previous remark we have
Thus we get
Miscellaneous properties¶
This subsection lists some miscellaneous properties of eigen values of a square matrix.
Thus \(\lambda\) is an eigen value of \(A\) with an eigen vector \(x\) if and only if \(\lambda + k\) is an eigen vector of \(A + kI\) with an eigen vector \(x\).
Diagonally dominant matrices¶
Let \(A = [a_{ij}]\) be a square matrix in \(\CC^{n \times n}\). \(A\) is called diagonally dominant if
holds true for all \(1 \leq i \leq n\). i.e. the absolute value of the diagonal element is greater than or equal to the sum of absolute values of all the off diagonal elements on that row.
Let \(A = [a_{ij}]\) be a square matrix in \(\CC^{n \times n}\). \(A\) is called strictly diagonally dominant if
holds true for all \(1 \leq i \leq n\). i.e. the absolute value of the diagonal element is bigger than the sum of absolute values of all the off diagonal elements on that row.
Let us consider
We can see that the strict diagonal dominance condition is satisfied for each row as follows:
Strictly diagonally dominant matrices have a very special property. They are always non-singular.
Suppose that \(A\) is diagonally dominant and singular. Then there exists a vector \(u \in \CC^n\) with \(u\neq 0\) such that
Let
We first show that every entry in \(u\) cannot be equal in magnitude. Let us assume that this is so. i.e.
Since \(u \neq 0\) hence \(c \neq 0\). Now for any row \(i\) in (2) , we have
but this contradicts our assumption that \(A\) is strictly diagonally dominant. Thus all entries in \(u\) are not equal in magnitude.
Let us now assume that the largest entry in \(u\) lies at index \(i\) with \(|u_i| = c\). Without loss of generality we can scale down \(u\) by \(c\) to get another vector in which all entries are less than or equal to 1 in magnitude while \(i\)-th entry is \(\pm 1\). i.e. \(u_i = \pm 1\) and \(|u_j| \leq 1\) for all other entries.
Now from (2) we get for the \(i\)-th row
which again contradicts our assumption that \(A\) is strictly diagonally dominant.
Hence strictly diagonally dominant matrices are non-singular.
Gershgorin’s theorem¶
We are now ready to examine Gershgorin’ theorem which provides very useful bounds on the spectrum of a square matrix.
Every eigen value \(\lambda\) of a square matrix \(A \in \CC^{n\times n}\) satisfies
The proof is a straight forward application of non-singularity of diagonally dominant matrices.
We know that for an eigen value \(\lambda\), \(\det(\lambda I - A) = 0\) i.e. the matrix \((\lambda I - A)\) is singular. Hence it cannot be strictly diagonally dominant due to here.
Thus looking at each row \(i\) of \((\lambda I - A)\) we can say that
cannot be true for all rows simultaneously. i.e. it must fail at least for one row. This means that there exists at least one row \(i\) for which
holds true.
What this theorem means is pretty simple. Consider a disc in the complex plane for the \(i\)-th row of \(A\) whose center is given by \(a_{ii}\) and whose radius is given by \(r = \sum_{j\neq i} |a_{ij}|\) i.e. the sum of magnitudes of all non-diagonal entries in \(i\)-th row.
There are \(n\) such discs corresponding to \(n\) rows in \(A\). (3) means that every eigen value must lie within the union of these discs. It cannot lie outside.
This idea is crystallized in following definition.
For \(i\)-th row of matrix \(A\) we define the radius \(r_i = \sum_{j\neq i} |a_{ij}|\) and the center \(c_i = a_{ii}\). Then the set given by
is called the \(i\)-th Gershgorin’s disc of \(A\).
We note that the definition is equally valid for real as well as complex matrices. For real matrices, the centers of disks lie on the real line. For complex matrices, the centers may lie anywhere in the complex plane.
Clearly there is nothing magical about the rows of \(A\). We can as well consider the columns of \(A\).
Every eigen value of a matrix \(A\) must lie in a Gershgorin disc corresponding to the columns of \(A\) where the Gershgorin disc for \(j\)-th column is given by
with