Recovery of exactly sparse signals

The null space of a matrix \(\Phi\) is denoted as

\[\NullSpace(\Phi) = \{ v \in \RR^N :\Phi v = 0\}.\]

The set of \(K\) -sparse signals is defined as

\[\Sigma_K = \{ x \in \RR^N : \|x\|_0 \leq K\}.\]
ExampleK sparse signals

Let \(N=10\) .

  • \(x=(1,2, 1, -1, 2 , -3, 4, -2, 2, -2) \in \RR^{10}\) is not a sparse signal.
  • \(x=(0,0,0,0,1,0,0,-1,0,0)\in \RR^{10}\) is a 2-sparse signal. Its also a 4 sparse signal.
Lemma
If \(a\) and \(b\) are two $K$ sparse signals then \(a - b\) is a \(2K\) sparse signal.
Proof
\((a - b)_i\) is non zero only if at least one of \(a_i\) and \(b_i\) is non-zero. Hence number of non-zero components of \(a - b\) cannot exceed \(2K\) . Hence \(a - b\) is a \(2K\) -sparse signal.
ExampleDifference of K sparse signals

Let N = 5.

  • Let \(a = (0,1,-1,0, 0)\) and \(b = (0,2,0,-1, 0)\). Then \(a - b = (0,-1,-1,1, 0)\) is a 3 sparse hence 4 sparse signal.
  • Let \(a = (0,1,-1,0, 0)\) and \(b = (0,2,-1,0, 0)\). Then \(a - b = (0,-1,-2,0, 0)\) is a 2 sparse hence 4 sparse signal.
  • Let \(a = (0,1,-1,0, 0)\) and \(b = (0,0,0,1, -1)\). Then \(a - b = (0,1,-1,-1, 1)\) is a 4 sparse signal.
Lemma
A sensing matrix \(\Phi\) uniquely represents all \(x \in \Sigma_K\) if and only if \(\NullSpace(\Phi) \cap \Sigma_{2K} = \phi\) . i.e. \(\NullSpace(\Phi)\) contains no vectors in \(\Sigma_{2K}\) .
Proof

Let \(a\) and \(b\) be two \(K\) sparse signals. Then \(\Phi a\) and \(\Phi b\) are corresponding measurements. Now if \(\Phi\) allows recovery of all \(K\) sparse signals, then \(\Phi a \neq \Phi b\) . Thus \(\Phi (a - b) \neq 0\) . Thus \(a - b \notin \NullSpace(\Phi)\) .

Let \(x \in \NullSpace(\Phi) \cap \Sigma_{2K}\) . Thus \(\Phi x = 0\) and \(\#x \leq 2K\) . Then we can find \(y, z \in \Sigma_K\) such that \(x = z - y\) . Thus \(m = \Phi z = \Phi y\) . But then, \(\Phi\) doesn’t uniquely represent \(y, z \in \Sigma_K\) .

There are many equivalent ways of characterizing above condition.

The spark

We recall from definition of spark, that spark of a matrix \(\Phi\) is defined as the minimum number of columns which are linearly dependent.

Definition
A signal \(x \in \RR^N\) is called an explanation of a measurement \(y \in \RR^M\) w.r.t. sensing matrix \(\Phi\) if \(y = \Phi x\) .
Theorem
For any measurement \(y \in \RR^M\), there exists at most one signal \(x \in \Sigma_K\) such that \(y = \Phi x\) if and only if \(\spark(\Phi) > 2K\) .
Proof

We need to show

  • If for every measurement, there is only one \(K\) -sparse explanation, then \(\spark(\Phi) > 2K\) .
  • If \(\spark(\Phi) > 2K\) then for every measurement, there is only one \(K\) -sparse explanation.

Assume that for every \(y \in \RR^M\) there exists at most one \(K\) sparse signal \(x \in \RR^N\) such that \(y = \Phi x\) .

Now assume that \(\spark(\Phi) \leq 2K\) . Thus there exists a set of at most \(2K\) columns which are linearly dependent.

Thus there exists \(v \in \Sigma_{2K}\) such that \(\Phi v = 0\) . Thus \(v \in \NullSpace (\Phi)\) .

Thus \(\Sigma_{2K} \cap \NullSpace (\Phi) \neq \phi\) .

Hence \(\Phi\) doesn’t uniquely represent each signal \(x \in \Sigma_K\) . A contradiction.

Hence \(\spark(\Phi) > 2K\) .

Now suppose that \(\spark(\Phi) > 2K\) .

Assume that for some \(y\) there exist two different K-sparse explanations \(x, x'\) such that \(y = \Phi x =\Phi x'\) .

Thus \(\Phi (x - x') = 0\) . Thus \(x - x ' \in \NullSpace (\Phi)\) and \(x - x' \in \Sigma_{2K}\) .

Thus \(\spark(\Phi) \leq 2K\) . A contradiction.

Since \(\spark(\Phi) \in [2, M+1]\) and we require that \(\spark(\Phi) > 2K\) hence we require that \(M \geq 2K\) .

Recovery of approximately sparse signals

Spark is a useful criteria for characterization of sensing matrices for truly sparse signals. But this doesn’t work well for approximately sparse signals. We need to have more restrictive criteria on \(\Phi\) for ensuring recovery of approximately sparse signals from compressed measurements.

In this context we will deal with two types of errors:

  • [Approximation error] Let us approximate a signal \(x\) using only \(K\) coefficients. Let us call the approximation as \(\widehat{x}\) . Thus \(e_a = (x - \widehat{x})\) is approximation error.
  • [Recovery error] Let \(\Phi\) be a sensing matrix. Let \(\Delta\) be a recovery algorithm. Then \(x'= \Delta(\Phi x)\) is the recovered signal vector. The error \(e_r = (x - x')\) is recovery error.

In this section we will

  • Formalize the notion of null space property (NSP) of a matrix \(\Phi\) .
  • Describe a measure for performance of an arbitrary recovery algorithm \(\Delta\) .
  • Establish the connection between NSP and performance guarantee for recovery algorithms.

Suppose we approximate \(x\) by a \(K\) -sparse signal \(\widehat{x} \in \Sigma_K\), then the minimum error for \(l_p\) norm is given by

\[\sigma_K(x)_p = \min_{\widehat{x} \in \Sigma_K} \| x - \widehat{x}\|_p.\]

Specific \(\widehat{x} \in \Sigma_K\) for which this minimum is achieved is the best \(K\) -term approximation.

In the following, we will need some new notation.

Let \(I = \{1,2,\dots, N\}\) be the set of indices for signal \(x \in \RR^N\) .

Let \(\Lambda \subset I\) be a subset of indices.

Let \(\Lambda^c = I \setminus \Lambda\) .

\(x_{\Lambda}\) will denote a signal vector obtained by setting the entries of \(x\) indexed by \(\Lambda^c\) to zero.

Example

Let N = 4. Then \(I = \{1,2,3,4\}\) . Let \(\Lambda = \{1,3\}\) . Then \(\Lambda^c = \{2, 4\}\) .

Now let \(x = (-1,1,2,-4)\) . Then \(x_{\Lambda} = (-1, 0, 2, 0)\) .

\(\Phi_{\Lambda}\) will denote a \(M\times N\) matrix obtained by setting the columns of \(\Phi\) indexed by \(\Lambda^c\) to zero.

Example

Let N = 4. Then \(I = \{1,2,3,4\}\) . Let \(\Lambda = \{1,3\}\) . Then \(\Lambda^c = \{2, 4\}\) .

Now let \(x = (-1,1,2,-4)\) . Then \(x_{\Lambda} = (-1, 0, 2, -4)\) .

Now let

\[\begin{split}\Phi = \begin{pmatrix} 1 & 0 & -1 & 1\\ -1 & -2 & 2 & 3 \end{pmatrix}\end{split}\]

Then

\[\begin{split}\Phi_{\Lambda} = \begin{pmatrix} 1 & 0 & -1 & 0\\ -1 & 0 & 2 & 0 \end{pmatrix}\end{split}\]
Definition

A matrix \(\Phi\) satisfies the null space property (NSP) of order \(K\) if there exists a constant \(C > 0\) such that,

\[\| h_{\Lambda}\|_2 \leq C \frac{\| h_{{\Lambda}^c}\|_1 }{\sqrt{K}}\]

holds \(\forall h \in \NullSpace (\Phi)\) and \(\forall \Lambda\) such that \(|\Lambda| \leq K\) .

  • Let \(h\) be \(K\) sparse. Thus choosing the indices on which \(h\) is non-zero, I can construct a \(\Lambda\) such that \(|\Lambda| \leq K\) and \(h_{{\Lambda}^c} = 0\) . Thus \(\| h_{{\Lambda}^c}\|_1\) = 0. Hence above condition is not satisfied. Thus such a vector \(h\) should not belong to \(\NullSpace(\Phi)\) if \(\Phi\) satisfies NSP.
  • Essentially vectors in \(\NullSpace (\Phi)\) shouldn’t be concentrated in a small subset of indices.
  • If \(\Phi\) satisfies NSP then the only \(K\) -sparse vector in \(\NullSpace(\Phi)\) is \(h = 0\) .

Measuring the performance of a recovery algorithm

Let \(\Delta : \RR^M \rightarrow \RR^N\) represent a recovery method to recover approximately sparse \(x\) from \(y\) .

\(l_2\) recovery error is given by

\[\| \Delta (\Phi x) - x \|_2.\]

The \(l_1\) error for \(K\) -term approximation is given by \(\sigma_K(x)_1\) .

We will be interested in guarantees of the form

(1)\[ \| \Delta (\Phi x) - x \|_2 \leq C \frac{\sigma_K (x)_1}{\sqrt{K}}\]

Why, this recovery guarantee formulation?

  • Exact recovery of K-sparse signals. \(\sigma_K (x)_1 = 0\) if \(x \in \Sigma_K\) .
  • Robust recovery of non-sparse signals
  • Recovery dependent on how well the signals are approximated by \(K\) -sparse vectors.
  • Such guarantees are known as instance optimal guarantees.
  • Also known as uniform guarantees.

Why the specific choice of norms?

  • Different choices of \(l_p\) norms lead to different guarantees.
  • \(l_2\) norm on the LHS is a typical least squares error.
  • \(l_2\) norm on the RHS will require prohibitively large numbertodo{Why? Prove it.} of measurements.
  • \(l_1\) norm on the RHS helps us keep the number of measurements less.

If an algorithm \(\Delta\) provides instance optimal guarantees as defined above, what kind of requirements does it place on the sensing matrix \(\Phi\) ?

We show that NSP of order \(2K\) is a necessary condition for providing uniform guarantees.

Theorem
Let \(\Phi : \RR^N \rightarrow \RR^M\) denote a sensing matrix and \(\Delta : \RR^M \rightarrow \RR^N\) denote an arbitrary recovery algorithm. If the pair \((\Phi, \Delta)\) satisfies instance optimal guarantee (1), then \(\Phi\) satisfies NSP of the order \(2K\) .
Proof

We are given that

  • \((\Phi, \Delta)\) form an encoder-decoder pair.
  • Together, they satisfy instance optimal guarantee :eq`eq:nspguarantee`.
  • Thus they are able to recover all sparse signals exactly.
  • For non-sparse signals, they are able to recover their \(K\) -sparse approximation with bounded recovery error.

We need to show that if \(h \in \NullSpace(\Phi)\), then \(h\) satisfies

\[\| h_{\Lambda}\|_2 \leq C \frac{\| h_{{\Lambda}^c}\|_1 }{\sqrt{2K}}\]

where \(\Lambda\) corresponds to \(2K\) largest magnitude entries in \(h\) .

Note that we have used \(2K\) in this expression, since we need to show that \(\Phi\) satisfies NSP of order \(2K\) .

Let \(h \in \NullSpace(\Phi)\) .

Let \(\Lambda\) be the indices corresponding to the \(2K\) largest entries of h. Thus

\[h = h_{\Lambda} + h_{\Lambda^c}.\]

Split \(\Lambda\) into \(\Lambda_0\) and \(\Lambda_1\) such that \(|\Lambda_0| = |\Lambda_1| = K\) . Now

\[h_{\Lambda} = h_{\Lambda_0} + h_{\Lambda_1}.\]

Let

\[x = h_{\Lambda_0} + h_{\Lambda^c}.\]

Let

\[x' = - h_{\Lambda_1}.\]

Then

\[h = x - x'.\]

By assumption \(h \in \NullSpace(\Phi)\)

Thus

\[\Phi h = \Phi(x - x') = 0 \implies \Phi x = \Phi x'.\]

But since \(x' \in \Sigma_K\) (recall that \(\Lambda_1\) indexes only \(K\) entries) and \(\Delta\) is able to recover all \(K\) -sparse signals exactly, hence

\[x' = \Delta (\Phi x').\]

Thus

\[\Delta (\Phi x) = \Delta (\Phi x') = x'.\]

i.e. the recovery algorithm \(\Delta\) recovers \(x'\) for the signal \(x\) . Certainly \(x'\) is not \(K\) -sparse.

Finally we also have (since \(h\) contains some additional non-zero entries)

\[\| h_{\Lambda} \|_2 \leq \| h \|_2 = \| x - x'\|_2 = \| x - \Delta (\Phi x)\| _2.\]

But as per instance optimal recovery guarantee (1) for \((\Phi, \Delta)\) pair, we have

\[\| \Delta (\Phi x) - x \|_2 \leq C \frac{\sigma_K (x)_1}{\sqrt{K}}.\]

Thus

\[\| h_{\Lambda} \|_2 \leq C \frac{\sigma_K (x)_1}{\sqrt{K}}.\]

But

\[\sigma_K (x)_1 = \min_{\widehat{x} \in \Sigma_K} \| x - \widehat{x}\|_1.\]

Recall that \(x =h_{\Lambda_0} + h_{\Lambda^c}\) where \(\Lambda_0\) indexes \(K\) entries of \(h\) which are (magnitude wise) larger than all entries indexed by \(\Lambda^c\) . Thus the best \(l_1\) -norm \(K\) term approximation of \(x\) is given by \(h_{\Lambda_0}\) .

Hence

\[\sigma_K (x)_1 = \| h_{\Lambda^c} \|_1.\]

Thus we finally have

\[\| h_{\Lambda} \|_2 \leq C \frac{\| h_{\Lambda^c} \|_1}{\sqrt{K}} = \sqrt{2}C \frac{\| h_{\Lambda^c} \|_1}{\sqrt{2K}} \quad \forall h \in \NullSpace(\Phi).\]

Thus \(\Phi\) satisfies the NSP of order \(2K\) .

It turns out that NSP of order \(2K\) is also sufficient to establish a guarantee of the form above for a practical recovery algorithm