Recovery in presence of measurement noise¶

Measurement vector in the presence of noise is given by

\[y =\Phi x + e\]

where \(e\) is the measurement noise or error. \(\| e \|_2\) is the \(l_2\) size of measurement error.

Recovery error as usual is given by

\[\| \Delta (y) - x \|_2 = \| \Delta (\Phi x + e) - x \|_2\]

Stability of a recovery algorithm is characterized by comparing variation of recovery error w.r.t. measurement error.

NSP is both necessary and sufficient for establishing guarantees of the form:

\[\| \Delta (\Phi x) - x \|_2 \leq C \frac{\sigma_K (x)_1}{\sqrt{K}}\]

These guarantees do not account for presence of noise during measurement. We need stronger conditions for handling noise. The restricted isometry property for sensing matrices comes to our rescue.

Restricted isometry property¶

A matrix \(\Phi\) satisfies the restricted isometry property (RIP) of order \(K\) if there exists \(\delta_K \in (0,1)\) such that

(1)¶\[ (1- \delta_K) \| x \|^2_2 \leq \| \Phi x \|^2_2 \leq (1 + \delta_K) \| x \|^2_2\]

holds for all \(x \in \Sigma_K = \{ x : \| x\|_0 \leq K \}\) .

If a matrix satisfies RIP of order \(K\) , then we can see that it approximately preserves the size of a \(K\)-sparse vector.
If a matrix satisfies RIP of order \(2K\) , then we can see that it approximately preserves the distance between any two \(K\)-sparse vectors since difference vectors would be \(2K\) sparse.
We say that the matrix is nearly orthonormal for sparse vectors.
If a matrix satisfies RIP of order \(K\) with a constant \(\delta_K\) , it automatically satisfies RIP of any order \(K' < K\) with a constant \(\delta_{K'} \leq \delta_{K}\) .

Stability¶

Informally, a recovery algorithm is stable if recovery error is small in the presence of small measurement error.

Is RIP necessary and sufficient for sparse signal recovery from noisy measurements?

Let us look at the necessary part. We will define a notion of stability of the recovery algorithm.

Definition

Let \(\Phi : \RR^N \rightarrow \RR^M\) be a sensing matrix and \(\Delta : \RR^M \rightarrow \RR^N\) be a recovery algorithm. We say that the pair \((\Phi, \Delta)\) is \(C\)-stable if for any \(x \in \Sigma_K\) and any \(e \in \RR^M\) we have that

\[\| \Delta(\Phi x + e) - x\|_2 \leq C \| e\|_2.\]

Error is added to the measurements.
LHS is \(l_2\) norm of recovery error.
RHS consists of scaling of the \(l_2\) norm of measurement error.
The definition says that recovery error is bounded by a multiple of the measurement error.
Thus, adding a small amount of measurement noise shouldn’t be causing arbitrarily large recovery error.

It turns out that \(C\)-stability requires \(\Phi\) to satisfy RIP.

Theorem

If a pair \((\Phi, \Delta)\) is \(C\)-stable then

\[\frac{1}{C} \| x\|_2 \leq \| \Phi x \|_2\]

for all \(x \in \Sigma_{2K}\) .

Proof

Any \(x \in \Sigma_{2K}\) can be written in the form of \(x = y - z\) where \(y, z \in \Sigma_K\) .

So let \(x \in \Sigma_{2K}\) . Split it in the form of \(x = y -z\) with \(y, z \in \Sigma_{K}\) .

Define

\[e_y = \frac{\Phi (z - y)}{2} \quad \text{and} \quad e_z = \frac{\Phi (y - z)}{2}\]

Thus

\[e_y - e_z = \Phi (z - y) \implies \Phi y + e_y = \Phi z + e_z\]

We have

\[\Phi y + e_y = \Phi z + e_z = \frac{\Phi (y + z)}{2}.\]

Also, we have

\[\| e_y \|_2 = \| e_z \|_2 = \frac{\| \Phi (y - z) \|_2}{2} = \frac{\| \Phi x \|_2}{2}\]

Let

\[y' = \Delta (\Phi y + e_y) = \Delta (\Phi z + e_z)\]

Since \((\Phi, \Delta)\) is \(C\)-stable, hence we have

\[\| y'- y\|_2 \leq C \| e_y\|_2.\]

also

\[\| y'- z\|_2 \leq C \| e_z\|_2.\]

Using the triangle inequality

\[\begin{split}\| x \|_2 &= \| y - z\|_2 = \| y - y' + y' - z \|_2\\ &\leq \| y - y' \|_2 + \| y' - z\|_2\\ &\leq C \| e_y \|_2 + C \| e_z \|_2 = C (\| e_y \|_2 + \| e_z \|_2) = C \| \Phi x \|_2\end{split}\]

Thus we have \(\forall x \in \Sigma_{2K}\)

\[\frac{1}{C}\| x \|_2 \leq \| \Phi x \|_2\]

This theorem gives us the lower bound for RIP property of order \(2K\) in (1) with \(\delta_{2K} = 1 - \frac{1}{C^2}\) as a necessary condition for \(C\)-stable recovery algorithms.

Note that smaller the constant \(C\) , lower is the bound on recovery error (w.r.t. measurement error). But as \(C \to 1\) , \(\delta_{2K} \to 0\), thus reducing the impact of measurement noise requires sensing matrix \(\Phi\) to be designed with tighter RIP constraints.

\(C\)-stability doesn’t require an upper bound on the RIP property in (1).

It turns out that If \(\Phi\) satisfies RIP, then this is also sufficient for a variety of algorithms to be able to successfully recover a sparse signal from noisy measurements. We will discuss this later.

Measurement bounds¶

As stated in previous section, for a \((\Phi, \Delta)\) pair to be \(C\)-stable we require that \(\Phi\) satisfies RIP of order \(2K\) with a constant \(\delta_{2K}\).

Let us ignore \(\delta_{2K}\) for the time being and look at relationship between \(M\) , \(N\) and \(K\).

We have a sensing matrix \(\Phi\) of size \(M\times N\) and expect it to provide RIP of order \(2K\) .

How many measurements \(M\) are necessary?

We will assume that \(K < N / 2\). This assumption is valid for approximately sparse signals.

Before we start figuring out the bounds, let us develop a special subset of \(\Sigma_K\) sets.

Consider the set

\[U = \{ x \in \{0, +1, -1\}^N : \| x\|_0 = K \}\]

Some explanation: By \(A^N\) we mean \(A \times A \times \dots \times A\) i.e. \(N\) times Cartesian product of \(A\) .

When we say \(\| x\|_0 = K\) , we mean that only \(K\) terms in each member of \(U\) can be non-zero (i.e. \(-1\) or \(+1\) ).

So \(U\) is a set of signal vectors \(x\) of length \(N\) where each sample takes values from \(\{0, +1, -1\}\) and number of allowed non-zero samples is fixed at \(K\) .

An example below explains it further.

ExampleU for N=6 and K=2

Each vector in \(U\) will have 6 elements out of which \(2\) can be non zero. There are \(\binom{6}{2}\) ways of choosing the non-zero elements. Some of those sets are listed below as examples:

\[\begin{split}&(+1,+1,0,0,0,0)\\ &(+1,-1,0,0,0,0)\\ &(0,-1,0,+1,0,0)\\ &(0,-1,0,+1,0,0)\\ &(0,0,0,0,-1,+1)\\ &(0,0,-1,-1,0,0)\end{split}\]

\(U\) is a grid in the union of subspaces \(\Sigma_K\).

Revisiting

\[U = \{ x \in \{0, +1, -1\}^N : \| x\|_0 = K \}\]

It’s now obvious that

\[\| x \|_2^2 = K \quad \forall x \in U.\]

Since there are \(\binom{N}{K}\) ways of choosing \(K\) non-zero elements and each non zero element can take either of the two values \(+1\) or \(-1\) , hence the cardinality of set \(U\) is given by:

\[|U| = \binom{N}{K} 2^K\]

By definition

\[U \subset \Sigma_K.\]

Further Let \(x, y \in U\) .

Then \(x - y\) will have a maximum of \(2K\) non-zero elements. The non-zero elements would have values \(\in \{-2,-1,1,2\}\) .

Thus \(\| x - y \|_0 = R \leq 2K\).

Further, \(\| x - y \|_2^2 \geq R\).

Hence

\[\| x - y \|_0 \leq \| x - y \|_2^2 \quad \forall x, y \in U.\]

We now state a lemma which will help us in getting to the bounds.

Lemma

Let \(K\) and \(N\) satisfying \(K < \frac{N}{2}\) be given. There exists a set \(X \subset \Sigma_K\) such that for any \(x \in X\) we have \(\| x \|_2 \leq \sqrt{K}\) and for any \(x, y \in X\) with \(x \neq y\) ,

\[\| x - y \|_2 \geq \sqrt{\frac{K}{2}}.\]

and

\[\ln | X | \geq \frac{K}{2} \ln \left( \frac{N}{K} \right) .\]

The lemma establishes the existence of a set in the union of subspaces \(\Sigma_K\) within a sphere of radius \(\sqrt{K}\) whose points are sufficiently apart and whose size is sufficiently large.

Proof

We just need to find one set \(X\) which satisfies the requirements of this lemma. We have to construct a set \(X\) such that

\(\| x \|_2 \leq \sqrt{K} \quad \forall x \in X.\)
\(\| x - y \|_2 \geq \sqrt{\frac{K}{2}} \quad \forall x, y \in X.\)
\(\ln | X | \geq \frac{K}{2} \ln \left( \frac{N}{K} \right)\) or equivalently \(|X| \geq \left( \frac{N}{K} \right)^{\frac{K}{2}}\) .

We will construct \(X\) by picking vectors from \(U\) . Thus \(X \subset U\) .

Since \(x \in X \subset U\) hence \(\| x \|_2 = \sqrt{K} \leq \sqrt{K} \quad \forall x \in X\) .

Consider any fixed \(x \in U\) .

How many elements \(y\) are there in \(U\) such that \(\|x - y\|_2^2 < \frac{K}{2}\) ?

Define

\[U_x^2 = \left \{ y \in U : \|x - y\|_2^2 < \frac{K}{2} \right \}\]

Clearly by requirements in the lemma, if \(x \in X\) then \(U_x^2 \cap X = \phi\) . i.e. no vector in \(U_x^2\) belongs to \(X\) .

How many elements are there in \(U_x^2\) ?

Let us find an upper bound. \(\forall x, y \in U\) we have \(\|x - y\|_0 \leq \|x - y\|_2^2\) .

If \(x\) and \(y\) differ in \(\frac{K}{2}\) or more places, then naturally \(\|x - y\|_2^2 \geq \frac{K}{2}\) .

Hence if \(\|x - y\|_2^2 < \frac{K}{2}\) then \(\|x - y\|_0 < \frac{K}{2}\) hence \(\|x - y\|_0 \leq \frac{K}{2}\) for any \(x, y \in U_x^2\) .

So define

\[U_x^0 = \left \{ y \in U : \|x - y\|_0 \leq \frac{K}{2} \right \}\]

We have

\[U_x^2 \subseteq U_x^0\]

Thus we have an upper bound given by

\[| U_x^2 | \leq | U_x^0 |.\]

Let us look at \(U_x^0\) carefully.

We can choose \(\frac{K}{2}\) indices where \(x\) and \(y\) may differ in \(\binom{N}{\frac{K}{2}}\) ways.

At each of these \(\frac{K}{2}\) indices, \(y_i\) can take value as one of \((0, +1, -1)\) .

Thus We have an upper bound

\[| U_x^2 | \leq | U_x^0 | \leq \binom {N}{\frac{K}{2}} 3^{\frac{K}{2}}.\]

We now describe an iterative process for building \(X\) from vectors in \(U\) .

Say we have added \(j\) vectors to \(X\) namely \(x_1, x_2,\dots, x_j\) .

Then

\[(U^2_{x_1} \cup U^2_{x_2} \cup \dots \cup U^2_{x_j}) \cap X = \phi\]

Number of vectors in \(U^2_{x_1} \cup U^2_{x_2} \cup \dots \cup U^2_{x_j}\) is bounded by \(j \binom {N}{ \frac{K}{2}} 3^{\frac{K}{2}}\) .

Thus we have at least

\[\binom{N}{K} 2^K - j \binom {N}{ \frac{K}{2}} 3^{\frac{K}{2}}\]

vectors left in \(U\) to choose from for adding in \(X\) .

We can keep adding vectors to \(X\) till there are no more suitable vectors left.

So we can construct a set of size \(|X|\) provided

(2)¶\[|X| \binom {N}{ \frac{K}{2}} 3^{\frac{K}{2}} \leq \binom{N}{K} 2^K\]

Now

\[\frac{\binom{N}{K}}{\binom{N}{\frac{K}{2}}} = \frac {\left ( \frac{K}{2} \right ) ! \left (N - \frac{K}{2} \right ) ! } {K! (N-K)!} = \prod_{i=1}^{\frac{K}{2}} \frac{N - K + i}{ K/ 2 + i}\]

Note that \(\frac{N - K + i}{ K/ 2 + i}\) is a decreasing function of \(i\) .

Its minimum value is achieved for \(i=\frac{K}{2}\) as \((\frac{N}{K} - \frac{1}{2})\) .

So we have

\[\begin{split}&\frac{N - K + i}{ K/ 2 + i} \geq \frac{N}{K} - \frac{1}{2}\\ &\implies \prod_{i=1}^{\frac{K}{2}} \frac{N - K + i}{ K/ 2 + i} \geq \left ( \frac{N}{K} - \frac{1}{2} \right )^{\frac{K}{2}}\\ &\implies \frac{\binom{N}{K}}{\binom{N}{\frac{K}{2}}} \geq \left ( \frac{N}{K} - \frac{1}{2} \right )^{\frac{K}{2}}\end{split}\]

Rephrasing (2) we have

\[|X| \left( \frac{3}{4} \right )^{\frac{K}{2}} \leq \frac{\binom{N}{K}}{\binom{N}{\frac{K}{2}}}\]

So if

\[|X| \left( \frac{3}{4} \right ) ^{\frac{K}{2}} \leq \left ( \frac{N}{K} - \frac{1}{2} \right )^{\frac{K}{2}}\]

then (2) will be satisfied.

Now it is given that \(K < \frac{N}{2}\) . So we have:

\[\begin{split}& K < \frac{N}{2}\\ &\implies \frac{N}{K} > 2\\ &\implies \frac{N}{4K} > \frac{1}{2}\\ &\implies \frac{N}{K} - \frac{N}{4K} < \frac{N}{K} - \frac{1}{2}\\ &\implies \frac{3N}{4K} < \frac{N}{K} - \frac{1}{2}\\ &\implies \left( \frac{3N}{4K} \right) ^ {\frac{K}{2}}< \left ( \frac{N}{K} - \frac{1}{2} \right )^{\frac{K}{2}}\\\end{split}\]

Thus we have

\[\left( \frac{N}{K} \right) ^ {\frac{K}{2}} \left( \frac{3}{4} \right) ^ {\frac{K}{2}} < \frac{\binom{N}{K}}{\binom{N}{\frac{K}{2}}}\]

Choose

\[|X| = \left( \frac{N}{K} \right) ^ {\frac{K}{2}}\]

Clearly, this value of \(|X|\) satisfies (2). Hence \(X\) can have at least these many elements. Thus

\[\begin{split}&|X| \geq \left( \frac{N}{K} \right) ^ {\frac{K}{2}}\\ &\implies \ln |X| \geq \frac{K}{2} \ln \left( \frac{N}{K} \right)\end{split}\]

which completes the proof.

We can now establish following bound on the required number of measurements to satisfy RIP.

At this moment, we won’t worry about exact value of \(\delta_{2K}\) . We will just assume that \(\delta_{2K}\) is small in range \((0, \frac{1}{2}]\) .

Theorem

Let \(\Phi\) be an \(M \times N\) matrix that satisfies RIP of order \(2K\) with constant \(\delta_{2K} \in (0, \frac{1}{2}]\) . Then

\[M \geq C K \ln \left ( \frac{N}{K} \right )\]

where \(C = \frac{1}{2 \ln (\sqrt{24} + 1)} \approx 0.28173\) .

Proof

Since \(\Phi\) satisfies RIP of order \(2K\) we have

\[\begin{split}& (1 - \delta_{2K}) \| x \|^2_2 \leq \| \Phi x \|^2_2 \leq (1 + \delta_{2K}) \| x\|^2_2 \quad \forall x \in \Sigma_{2K}.\\ & \implies (1 - \delta_{2K}) \| x - y \|^2_2 \leq \| \Phi x - \Phi y\|^2_2 \leq (1 + \delta_{2K}) \| x - y\|^2_2 \quad \forall x, y \in \Sigma_K.\end{split}\]

Also

\[\delta_{2K} \leq \frac{1}{2} \implies 1 - \delta_{2K} > \frac{1}{2} \text{ and } 1 + \delta_{2K} \leq \frac{3}{2}\]

Consider the set \(X \subset U \subset \Sigma_K\) developed in above.

We have

\[\begin{split}&\| x - y\|^2_2 \geq \frac{K}{2} \quad \forall x, y \in X\\ &\implies (1 - \delta_{2K}) \| x - y \|^2_2 \geq \frac{K}{4}\\ &\implies \| \Phi x - \Phi y\|^2_2 \geq \frac{K}{4}\\ &\implies \| \Phi x - \Phi y\|_2 \geq \sqrt{\frac{K}{4}} \quad \forall x, y \in X\end{split}\]

Also

\[\begin{split}&\| \Phi x \|^2_2 \leq (1 + \delta_{2K}) \| x\|^2_2 \leq \frac{3}{2} \| x\|^2_2 \quad \forall x \in X \subset \Sigma_K \subset \Sigma_{2K}\\ &\implies \| \Phi x \|_2 \leq \sqrt {\frac{3}{2}} \| x\|_2 \leq \sqrt {\frac{3K}{2}} \quad \forall x \in X.\end{split}\]

since \(\| x\|_2 \leq \sqrt{K} \quad \forall x \in X\) .

So we have a lower bound:

(3)¶\[\| \Phi x - \Phi y\|_2 \geq \sqrt{\frac{K}{4}} \quad \forall x, y \in X.\]

and an upper bound:

(4)¶\[\| \Phi x \|_2 \leq \sqrt {\frac{3K}{2}} \quad \forall x \in X.\]

What do these bounds mean? Let us start with the lower bound. \(\Phi x\) and \(\Phi y\) are projections of \(x\) and \(y\) in \(\RR^M\) (measurement space).

Construct \(l_2\) balls of radius \(\sqrt{\frac{K}{4}} / 2= \sqrt{\frac{K}{16}}\) in \(\RR^M\) around \(\Phi x\) and \(\Phi y\) .

Lower bound says that these balls are disjoint. Since \(x, y\) are arbitrary, this applies to every \(x \in X\).

Upper bound tells us that all vectors \(\Phi x\) lie in a ball of radius \(\sqrt {\frac{3K}{2}}\) around origin in \(\RR^M\) .

Thus, the set of all balls lies within a larger ball of radius \(\sqrt {\frac{3K}{2}} + \sqrt{\frac{K}{16}}\) around origin in \(\RR^M\) .

So we require that the volume of the larger ball MUST be greater than the sum of volumes of \(|X|\) individual balls.

Since volume of an \(l_2\) ball of radius \(r\) is proportional to \(r^M\) , we have:

\[\begin{split}&\left ( \sqrt {\frac{3K}{2}} + \sqrt{\frac{K}{16}} \right )^M \geq |X| . \left ( \sqrt{\frac{K}{16}} \right )^M\\. & \implies (\sqrt {24} + 1)^M \geq |X| \\ & \implies M \geq \frac{\ln |X| }{\ln (\sqrt {24} + 1) }\end{split}\]

Again from above we have

\[\ln |X| \geq \frac{K}{2} \ln \left ( \frac{N}{K} \right ).\]

Putting back we get

\[M \geq \frac{\frac{K}{2} \ln \left ( \frac{N}{K} \right ) }{\ln (\sqrt {24} + 1) }\]

which establishes a lower bound on the number of measurements \(M\) .

ExampleLower bounds on M for RIP of order 2K

\(N=1000, K=100 \implies M \geq 65\) .
\(N=1000, K=200 \implies M \geq 91\) .
\(N=1000, K=400 \implies M \geq 104\) .

Some remarks are in order:

The theorem only establishes a necessary lower bound on \(M\) . It doesn’t mean that if we choose an \(M\) larger than the lower bound then \(\Phi\) will have RIP of order \(2K\) with any constant \(\delta_{2K} \in (0, \frac{1}{2}]\) .
The restriction \(\delta_{2K} \leq \frac{1}{2}\) is arbitrary and is made for convenience. In general, we can work with \(0 < \delta_{2K} \leq \delta_{\text{max}} < 1\) and develop the bounds accordingly.
This result fails to capture dependence of \(M\) on the RIP constant \(\delta_{2K}\) directly. Johnson-Lindenstrauss lemma helps us resolve this which concerns embeddings of finite sets of points in low-dimensional spaces.
We haven’t made significant efforts to optimize the constants. Still they are quite reasonable.