Introduction to compressive sensing¶

In this section we formally define the problem of compressed sensing.

Compressive sensing refers to the idea that for sparse or compressible signals, a small number of nonadaptive measurements carries sufficient information to approximate the signal well. In the literature it is also known as compressed sensing and compressive sampling . Different authors seem to prefer different names.

In this section we will represent a signal dictionary as well as its synthesis matrix as \(\DD\) .

We recall the definition of sparse signals. A signal \(x \in \CC^N\) is \(K\) -sparse in \(\DD\) if there exists a representation \(\alpha\) for \(x\) which has at most \(K\) non-zeros. i.e.

\[x = \DD \alpha\]

and

\[\| \alpha \|_0 \leq K.\]

The dictionary could be standard basis, Fourier basis, wavelet basis, a wavelet packet dictionary, a multi-ONB or even a randomly generated dictionary.

Real life signals are not sparse, yet they are compressible in the sense that entries in the signal decay rapidly when sorted by magnitude. As a result, compressible signals are well approximated by sparse signals. Note that we are talking about the sparsity or compressibility of the signal in a suitable dictionary. Thus, we mean that the signal \(x\) has a representation \(\alpha\) in \(\DD\) in which the coefficients decay rapidly when sorted by magnitude.

Definition

In compressed sensing, a measurement is a linear functional applied to a signal

\[y = \langle x, f \rangle.\]

The compressed sensor makes multiple such linear measurements. This can best be represented by the action of a sensing matrix \(\Phi\) on the signal \(x\) given by

\[y = \Phi x\]

where \(\Phi \in \CC^{M \times N}\) represents \(M\) different measurements made on the signal \(x\) by the sensing process. Each row of \(\Phi\) represents one linear measurement.

The vector \(y \in \CC^M\) is known as measurement vector .

\(\CC^N\) forms the signal space while \(\CC^M\) forms the measurement space .

We also note that above can be written as

\[y = \Phi x = \Phi \DD \alpha = (\Phi \DD) \alpha.\]

It is assumed that the signal \(x\) is \(K\) -sparse or \(K\) -compressible in \(\DD\) and \(K \ll N\) .

The objective is to recover \(x\) from \(y\) given that \(\Phi\) and \(\DD\) are known.

We do this by first recovering the sparse representation \(\alpha\) from \(y\) and then computing \(x = \DD \alpha\) .

If \(M \geq N\) then the problem is a straight forward least squares problem. So we don’t consider it here.

The more interesting case is when \(K < M \ll N\) i.e. the number of measurements is much less than the dimension of the ambient signal space while more than the sparsity level of signal namely \(K\) .

We note that given \(\alpha\) is found, finding \(x\) is straightforward. We therefore can remove the dictionary from our consideration and look at the simplified problem given as: recover \(x\) from \(y\) with

\[y = \Phi x\]

where \(x \in \CC^N\) itself is assumed to be \(K\) -sparse or \(K\) -compressible and \(\Phi \in \CC^{M \times N}\) is the sensing matrix.

Note

The definition above doesn’t consider the noise introduced during taking the measurements. We will introduce noise later.

The sensing matrix¶

There are two ways to look at the sensing matrix. First view is in terms of its columns

(1)¶\[\Phi = \begin{bmatrix} \phi_1 & \phi_2 & \dots & \phi_N \end{bmatrix}\]

where \(\phi_i \in \CC^M\) are the columns of sensing matrix. In this view we see that

\[y = \sum_{i=1}^{N} x_i \phi_i\]

i.e. \(y\) belongs to the column span of \(\Phi\) and one representation of \(y\) in \(\Phi\) is given by \(x\) .

This view looks very similar to a dictionary and its atoms but there is a difference. In a dictionary, we require each atom to be unit norm. We don’t require columns of the sensing matrix \(\Phi\) to be unit norm.

The second view of sensing matrix \(\Phi\) is in terms of its columns. We write

(2)¶\[\begin{split}\Phi = \begin{bmatrix} \chi_1^H \\ \chi_2^H \\ \vdots \\ \chi_M^H \end{bmatrix}\end{split}\]

where \(\chi_i \in \CC^N\) are conjugate transposes of rows of \(\Phi\) . This view gives us following result

\[\begin{split}\begin{bmatrix} y_1\\ y_2 \\ \vdots y_M \end{bmatrix} = \begin{bmatrix} \chi_1^H \\ \chi_2^H \\ \vdots \\ \chi_M^H \end{bmatrix} x = \begin{bmatrix} \chi_1^H x\\ \chi_2^H x\\ \vdots \\ \chi_M^H x \end{bmatrix} = \begin{bmatrix} \langle x , \chi_1 \rangle \\ \langle x , \chi_2 \rangle \\ \vdots \\ \langle x , \chi_M \rangle \\ \end{bmatrix}\end{split}\]

In this view \(y_i\) is a measurement given by the inner product of \(x\) with \(\chi_i\) \(( \langle x , \chi_i \rangle = \chi_i^H x)\) .

We will call \(\chi_i\) as a sensing vector . There are \(M\) such sensing vectors in \(\CC^N\) comprising \(\Phi\) corresponding to \(M\) measurements in the measurement space \(\CC^M\) .

Note

Dictionary design focuses on creating sparsest possible representations of the signals in a particular domain. Sensing matrix design focuses on reducing the number of measurements as much as possible while still being able to recover the sparse representation from the measurements.

Number of measurements¶

A fundamental question of compressed sensing framework is: How many measurements are necessary to acquire :math:`K` -sparse signals? By necessary we mean that \(y\) carries enough information about \(x\) such that \(x\) can be recovered from \(y\) .

If \(M < K\) then recovery is not possible.

We further note that the sensing matrix \(\Phi\) should not map two different \(K\) -sparse signals to the same measurement vector. Thus, we will need \(M \geq 2K\) and each collection of \(2K\) columns in \(\Phi\) must be non-singular.

Think

Why do we need 2K or more measurements? What happens if 2K or less columns in \(\Phi\) form a linearly dependent set?

If the \(K\)-column sub matrices of \(\Phi\) are badly conditioned, then it is possible that some sparse signals get mapped to very similar measurement vectors. Thus it is numerically unstable to recover the signal. Moreover, if noise is present, stability further degrades.

In [CT06] Cand`es and Tao showed that the geometry of sparse signals should be preserved under the action of a sensing matrix. In particular the distance between two sparse signals shouldn’t change by much during sensing.

They quantified this idea in the form of a restricted isometric constant of a matrix \(\Phi\) as the smallest number \(\delta_K\) for which the following holds

\[(1 - \delta_K) \| x \|_2^2 \leq \| \Phi x \|_2^2 \leq (1 + \delta_K) \| x \|_2^2 \Forall x : \| x \|_0 \leq K.\]

We will study more about this property known as restricted isometry property (RIP) later. Here we just sketch the implications of RIP for compressed sensing.

When \(\delta_K < 1\) then the inequalities imply that every collection of \(K\) columns from \(\Phi\) is non-singular. Since we need every collection of \(2K\) columns to be non-singular, we actually need \(\delta_{2K} < 1\) which is the minimum requirement for recovery of \(K\) sparse signals.

Further if \(\delta_{2K} \ll 1\), then we note that sensing operator very nearly maintains the \(l_2\) distance between any two \(K\) sparse signals. In consequence, it is possible to invert the sensing process stably.

It is now known that many randomly generated matrices have excellent RIP behavior. One can show that if \(\delta_{2K} \leq 0.1\) , then with

\[M = \bigO{K \ln ^{\alpha} N}\]

measurements, one can recover \(x\) with high probability.

Some of the typical random matrices which have suitable RIP properties are

Gaussian sensing matrices
Partial Fourier matrices
Rademacher sensing matrices

Signal recovery¶

The second fundamental problem in compressed sensing is: Given the compressed measurements \(y\) how do we recover the signal \(x\)? This problem is known as SPARSE-RECOVERY problem.

A simple formulation of the problem as: minimize \(\| x \|_0\) subject to \(y = \Phi x\) is hopeless since it entails a combinatorial explosion of search space.

Over the years, people have developed a number of algorithms to tackle the sparse recovery problem.

The algorithms can be broadly classified into following categories

[Greedy pursuits] These algorithms attempt to build the approximation of the signal iteratively by making locally optimal choices at each step. Examples of such algorithms include OMP (orthogonal matching pursuit), stage-wise OMP, regularized OMP, CoSaMP (compressive sampling pursuit) and IHT (iterative hard thresholding).
[Convex relaxation] These techniques relax the \(l_0\) “norm” minimization problem into a suitable problem which is a convex optimization problem. This relaxation is valid for a large class of signals of interest. Once the problem has been formulated as a convex optimization problem, a number of solutions are available, e.g. interior point methods, projected gradient methods and iterative thresholding.
[Combinatorial algorithms] These methods are based on research in group testing and are specifically suited for situations where highly structured measurements of the signal are taken. This class includes algorithms like Fourier sampling, chaining pursuit, and HHS pursuit.

A major emphasis of the following chapters will be the study of these sparse recovery algorithms.

In the following we present examples of real life problems which can be modeled as compressed sensing problems.

Error correction in linear codes¶

The classical error correction problem was discussed in one of the seminal founding papers on compressed sensing [CT05].

Let \(f \in \RR^N\) be a “plaintext” message being sent over a communication channel.

In order to make the message robust against errors in communication channel, we encode the error with an error correcting code.

We consider \(A \in \RR^{D \times N}\) with \(D > N\) as a linear code . \(A\) is essentially a collection of code words given by

\[A = \begin{bmatrix} a_1 & a_2 & \dots & a_N \end{bmatrix}\]

where \(a_i \in \RR^D\) are the codewords.

We construct the “ciphertext”

\[x = A f\]

where \(x \in \RR^D\) is sent over the communication channel. \(x\) is a redundant representation of \(f\) which is expected to be robust against small errors during transmission.

\(A\) is assumed to be full column rank. Thus \(A^T A\) is invertible and we can easily see that

\[f = A^{\dag} x\]

where

\[A^{\dag} = (A^T A)^{-1}A^T\]

is the left pseudo inverse of \(A\) . The communication channel is going to add some error. What we actually receive is

\[y = x + e = A f + e\]

where \(e \in \RR^D\) is the error being introduced by the channel.

The least squares solution by minimizing the error \(l_2\) norm is given by

\[f' = A^{\dag} y = A^{\dag} (A f + e) = f + A^{\dag} e.\]

Since \(A^{\dag} e\) is usually non-zero (we cannot assume that \(A^{\dag}\) will annihilate \(e\) ), hence \(f'\) is not an exact replica of \(f\).

What is needed is an exact reconstruction of \(f\). To achieve this, a common assumption in literature is that error vector \(e\) is in fact sparse. i.e.

\[\| e \|_0 \leq K \ll D.\]

To reconstruct \(f\) it is sufficient to reconstruct \(e\) since once \(e\) is known we can get

\[x = y -e\]

and from there \(f\) can be faithfully reconstructed.

The question is: for a given sparsity level \(K\) for the error vector \(e\) can one reconstruct \(e\) via practical algorithms? By practical we mean algorithms which are of polynomial time w.r.t. the length of “ciphertext” (D).

The approach in [CT05] is as follows.

We construct a matrix \(F \in \RR^{M \times D}\) which can annihilate \(A\) i.e.

\[FA = 0.\]

We then apply \(F\) to \(y\) giving us

\[\tilde{y} = F (A f + e) = Fe.\]

Therefore the decoding problem is reduced to that of reconstructing a sparse vector \(e \in \RR^D\) from the measurements \(Fe \in \RR^M\) where we would like to have \(M \ll D\) .

With this the problem of finding \(e\) can be cast as problem of finding a sparse solution for the under-determined system given by

(3)¶\[\begin{split}\begin{aligned} & \underset{e \in \Sigma_K}{\text{minimize}} & & \| e \|_0 \\ & \text{subject to} & & \tilde{y} = F e\\ \end{aligned}\end{split}\]

This now becomes the compressed sensing problem. The natural questions are

How many measurements \(M\) are necessary (in \(F\) ) to be able to recover \(e\) exactly?
How should \(F\) be constructed?
How do we recover \(e\) from \(\tilde{y}\) ?

These questions are discussed in upcoming sections.